Functional Linear Algebra (Hannah Robbins)
Functional Linear Algebra (Hannah Robbins)
Algebra
Textbooks in Mathematics
Series editors:
Al Boggess, Kenneth H. Rosen
Nonlinear Optimization
Models and Applications
William P. Fox
Linear Algebra
James R. Kirkwood, Bessie H. Kirkwood
Real Analysis
With Proof Strategies
Daniel W. Cunningham
Train Your Brain
Challenging Yet Elementary Mathematics
Bogumil Kaminski, Pawel Pralat
Contemporary Abstract Algebra, Tenth Edition
Joseph A. Gallian
Geometry and Its Applications
Walter J. Meyer
Linear Algebra
What you Need to Know
Hugo J. Woerdeman
Introduction to Real Analysis, 3rd Edition
Manfred Stoll
Discovering Dynamical Systems Through Experiment and Inquiry
Thomas LoFaro, Jeff Ford
Functional Linear Algebra
Hannah Robbins
https://www.routledge.com/Textbooks-in-Mathematics/book-series/
CANDHTEXBOOMTH
Functional Linear
Algebra
Hannah Robbins
Roanoke College
First edition published 2021
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
© 2021 Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the author and
publisher cannot assume responsibility for the validity of all materials or the consequences of their
use. The authors and publishers have attempted to trace the copyright holders of all material repro-
duced in this publication and apologize to copyright holders if permission to publish in this form
has not been obtained. If any copyright material has not been acknowledged please write and let us
know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermis-
sions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and
are used only for identification and explanation without intent to infringe.
and
1 Vectors 9
1.1 Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . 41
2 Functions of Vectors 53
2.1 Linear Functions . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.3 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . 80
2.4 Matrix Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 95
2.5 Kernel and Range . . . . . . . . . . . . . . . . . . . . . . . . 107
2.6 Row Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.7 Applications of Row Reduction . . . . . . . . . . . . . . . . . 134
2.8 Solution Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
2.9 Large Matrix Computations . . . . . . . . . . . . . . . . . . 166
2.10 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
2.11 The Invertible Matrix Theorem . . . . . . . . . . . . . . . . 192
4 Diagonalization 241
4.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . 241
4.2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
4.3 Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
4.4 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . 284
4.5 Change of Basis Matrices . . . . . . . . . . . . . . . . . . . . 293
vii
viii Contents
A Appendices 355
A.1 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . 355
A.2 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
A.3 Solutions to Odd Exercises . . . . . . . . . . . . . . . . . . . 368
Bibliography 389
Index 391
Introduction for Students
Linear algebra occupies an important place in the world of math and science
because it is an extremely versatile and useful subject. It rewards those
of us who study it with powerful computational tools, lessons about how
mathematical theory is built, examples for later study in other classes, and
much more. Even if you think you know why you are studying linear algebra,
I encourage you to learn about and appreciate those aspects of the course
which don’t seem immediately related to your original motivation. You may
find that they enhance your general mathematical understanding, and you
may need them for unexpected future applications. I initially loved linear
algebra’s lessons about how to generalize outward from familiar mathematical
environments, but I have recently used tools from linear algebra to help a
biology colleague with an application to her research.
As you work your way through this book, it is important to make sure you
understand the basic ideas, definitions, and computational skills introduced.
The best way to do this is to work through enough examples and problems
to make sure you have thoroughly grasped the material. How many problems
constitute “enough” will vary from person to person, but you’ll know you’re
there when a type of problem or idea elicits boredom instead of confusion or
stress. If you work your way through all of the exercises on a particular idea
and still need more, I encourage you to look up that topic in other linear
algebra books or online to find more problems.
The answers to the odd problems are in Appendix A.3. If your answers
don’t match up, I encourage you to seek help quickly so you can get
straightened out before you move on. Math is not a subject which rewards an
“I’ll come back to that later” mentality! Realize that help is available in many
places including your teacher, classmates, any tutoring resources available at
your school, online tutorials, etc. Take the time to figure out which type of
help works best for you – not everyone’s brain responds the same way and it
is much easier to work with your learning style rather than fight it.
Most of the computational techniques used in this book can be done either
by hand or using technology. I encourage you to do enough work with each one
by hand to understand its properties, and also to learn how to do it quickly
using a calculator or computer software. This book specifically addresses how
to use Mathematica, but feel free to use whichever technological tool best suits
your needs.
Finally, welcome to linear algebra! I hope you find this book helpful and
interesting.
ix
Introduction for Instructors
• For the vast majority of the book, I stick to vector spaces over R. Complex
vector spaces are briefly discussed in Appendix A.2.
• I freely use the label “Theorem,” but proofs are called explanations or
justifications to avoid causing students anxiety.
• More emphasis is placed on the idea of a linear function, which is used
to motivate the study of matrices and their operations. This should seem
natural to students after the central role of functions in calculus.
• Row reduction is moved further back in the semester and vector spaces
are moved earlier to avoid an artificial feeling of separation between the
computational and theoretical aspects of the course.
• Applications from a wide range of other subjects are introduced in Chapter
0 as motivation for students not intending to be math majors.
The chapters and sections are designed to be done in the order they appear,
however, there are a few places where some judicious editing is possible.
• Section 2.9 on large matrix calculations can be skipped, except for the
introduction of elementary matrices, which could be introduced where they
next appear in Section 4.2.
• Sections 3.2 and 3.3 could be skipped, since students already have m × n
matrices as an example of a vector space besides Rn . However, this will
require you to skip many later problems which include polynomial vector
spaces and C.
• Sections 5.1–5.3 depend only on the material from Chapter 1.
xi
xii Introduction for Instructors
• Section 5.4 depends only on the material from Chapter 1 and the idea of
a basis in Rn , which could be taught there rather than in Section 3.1 if
you wanted to move Chapter 5 earlier in the course.
I hope you find, as I do, that this book fills a gap in the literature and so
is helpful to you and your students.
Acknowledgments
While I have wanted to write a linear algebra book for quite some time, I
couldn’t have completed this project without help from many people who
definitely deserve to have their contributions recognized.
First of all, David Taylor deserves many, many thanks for not only
supporting my sabbatical proposal, which gave me the time off to write, but
also providing writing advice, formatting help, and reassurance when LaTeX
was being uncooperative. Thanks also to Karin Saoub for all the lessons I
learned watching her write her first book as well as her support when I was
feeling stressed by the writing process. Roland Minton provided good feedback
on my first draft. Steve Kennedy helped deepen the book’s motivation of
definitions and encouraged me to provide a more geometric approach. Maggie
Rahmoeller taught out of a draft version of the book and provided invaluable
feedback as well as reassurance that another instructor could appreciate my
book’s approach.
As I moved outside my area of expertise, many people helped inspire and
check my motivating examples: Rachel Collins introduced me to the use of
matrices in population modeling in biology, Skip Brenzovich and Gary Hollis
explained how matrices are used to model molecular structures in chemistry,
Adam Childers helped me find applications of matrices in statistics, Chris
Santacroce talked with me about coordinate systems in machining, and Alice
Kassens and Edward Nik-khah discussed how matrices are used in economics
modeling.
Thanks to Bob Ross for helping me think about how to position the book
in the marketplace and shepherding me through the publication process.
Finally, a special thank you to my Fall 2018 linear algebra students, in
particular Gabe Umland, who provided the first field test of this book, and
found many typos and arithmetic errors.
0
Introduction and Motivation
Linear algebra is widely used across many areas of math and science. It
simultaneously provides us with a great set of tools for solving many different
types of problems and gives us a chance to practice the important art of
mathematical generalization and theory building. In this initial section, we’ll
do both of these things. First we’ll explore a variety of different problems which
will all turn out to yield to linear algebra’s methods. Next we’ll pick out some
of the common characteristics of these problems and use them to focus our
mathematical inquiries in the next chapters. As we develop our linear algebra
toolbox, we’ll return to each of these applications to tackle these problems.
A common problem type in chemistry is balancing chemical reactions.
In any given chemical reaction, an initial set of chemicals interact to
produce another set of chemicals. During the reaction, the molecules of the
input chemicals break down and their constituent atoms recombine to form
the output chemicals. Since the atoms themselves are neither created nor
destroyed during a reaction, we must have the same number of atoms present
in the input and output chemicals. To balance a chemical reaction, we need to
figure out how many molecules of each chemical (both input and output) are
present so that the number of atoms balances, i.e., is the same in the inputs
as in the outputs. This must be done for each type of atom present in the
chemical reaction, and the complication is that the quantities of molecules we
use must balance each type of atom simultaneously. To make this less abstract,
let’s look at an example reaction.
There are three different types of atoms involved in this reaction: carbon,
hydrogen, and oxygen. For each molecule involved, we need to keep track of
how many carbon, hydrogen, and oxygen atoms it has. It is important that
1
2 Introduction and Motivation
we keep these quantities separate from each other since they record totally
different properties of a molecule. For example, each carbon dioxide molecule
contains 1 carbon atom, 0 hydrogen atoms, and 2 oxygen atoms.
If we have multiple copies of a molecule, we can calculate the number of
atoms of each type by multiplying each number of atoms by the number
of molecules present. For example, if our reaction produces 15 molecules
of carbon dioxide, we know this can also be viewed as producing with 15
carbon atoms, 0 hydrogen atoms, and 30 oxygen atoms. However, we don’t
just produce carbon dioxide, we also produce water, each molecule of which
contains 0 carbon atoms, 2 hydrogen atoms, and 1 oxygen atom. If our reaction
produces 20 molecules of water, we’ve produced 0 carbon atoms, 40 hydrogen
atoms, and 20 oxygen atoms.
However, our reaction will actually produce molecules of both carbon
dioxide and water. If we produce 15 molecules of carbon dioxide and 20
molecules of water, we’ve produced 15 carbon atoms (15 from the carbon
dioxide and 0 from the water), 40 hydrogen atoms (0 from the carbon dioxide
and 40 from the water), and 50 oxygen atoms (30 from the carbon dioxide
and 20 from the water).
To balance our reaction, we’d need to figure out quantities of each of our
four different molecule types so that the number of each type of atom is the
same before and after the reaction. In our example computation above, that
would mean we’d need to find quantities of propane and oxygen molecules
which together have 15 carbon atoms, 40 hydrogen atoms, and 50 oxygen
atoms to use as our inputs since those are the quantities of atoms from our
output molecules.
In physics, we often need to figure out the net effect of a group of forces
acting on the same object. If we’re thinking of our object as living in three
dimensions (which is how most of us think about the world we live in), then
each force can push on the object to varying degrees along each of our three
axes. Let’s agree to call these axes up/down, North/South, and East/West.
Different forces can either reinforce each other if they are acting in the same
direction along one or more of these axes, or they can cancel each other out if
they are acting in opposite directions. The overall force acting on the object
can be found by figuring out the combined effect of the separate forces along
each of these axes. Again, let’s look at an example.
To figure out the overall force acting on this person, we need to combine
the various forces’ actions in each of our three directions. The first step is to
Introduction and Motivation 3
realize that although we are describing the forces above as if North and South
are separate, they are actually opposites. This means we need to treat each
pair of directions as one directional axis and assign one of the two directions
along that axis as positive and the other as negative. I’ll stick with typical
map conventions and assign North to be positive and South to be negative
along the North/South axis, West as positive and East as negative along the
East/West axis, and up as positive and down as negative along the up/down
axis. With these conventions, we can restate the components of our three
forces acting on this person. The person’s jump has 1000 Newtons along the
up/down axis, 650 Newtons along the North/South axis, and 0 Newtons along
the East/West axis. The force of the wind has 0 Newtons along the up/down
axis, −200 Newtons along the North/South axis, and −375 Newtons along
the East/West axis. The force of gravity has 0 Newtons along the up/down
axis, 0 Newtons along the North/South axis, and −735 Newtons along the
East/West axis.
Now we can combine the various components of each force, but we must be
careful to add them up separately for each directional axis. Thus the total force
on the person is 265 Newtons along the up/down axis (1000 from the jump, 0
from the wind, and −735 from gravity), 450 Newtons along the North/South
axis (650 from the jump, −200 from the wind, and 0 from gravity), and −375
Newtons along the East/West axis (0 from the jump, −375 from the wind,
and 0 from gravity). Eventually we’ll develop tools to do this not just along
the standard axes, but along any set of axes at right angles to each other.
denominator is the number of tortillas produced by that recipe ( 20024 for corn
tortillas and 100
20 for flour tortillas).
Multiplying the quantities in the corn tortilla recipe by 200
24 shows us that
the factory will need 16 23 cups of cornmeal, 12.5 cups of water, and 4 16
teaspoons of salt to make 200 corn tortillas. Multiplying the quantities in the
flour tortilla recipe by 100
20 shows that the factory will need 10 cups of flour,
3.75 cups of water, 1.25 cups of oil, and 5 teaspoons of salt to make 100 flour
tortillas.
Combining the quantities of the ingredients needed for each type of tortilla,
this means that to make 100 flour tortillas and 200 corn tortillas the factory
will need 16 23 cups of cornmeal, 10 cups of flour, 16.25 cups of water, 1.25
cups of oil, and 9 16 teaspoons of salt.
everything on the screen outward at the same rate. One way to do this is to
multiply both the horizontal and vertical positions by the same number. One
such example is explored below.
(-������)
20
15
(-�����)
10
-5
Both of these points are on the same line through the origin, but the point
where we multiplied the coordinates by 2 is much farther out. You can imagine
that if you did this to every point in the plane at once it would produce the
feeling of moving toward the center.
Example 6. The smooth coneflower is a native Virginian plant. It’s life cycle
can be divided into five stages: seedling, small plant, medium plant, large
plant, and flowering plant. Data was collected for two different geographically
isolated populations of coneflowers.
For each of these two populations, there were different numbers of plants
in each of the five stages of growth. For example, in one population there were
88 large plants and in the other population there were only 16. If we want to
combine data about these two populations, we need to combine the numbers
for each life cycle stage separately. In the example above we have 104 large
plants in our two populations.
Example 7. Think about the set of all students at a particular college who
have a Facebook page. It is unlikely that all of them are Facebook friends with
everyone else in this group.
One of the least typical examples of linear algebra I’ve come across is a
scene in Neal Stephenson’s book Cryptonomicon. In it, several siblings gather
after their mother’s death to divide up their parents’ belongings. One brother
is a mathematician and he explains that each person will be allowed to rank
each item based on two quantities: their perception of its monetary value
and their emotional attachment to it. These are represented visually as a 2D
grid. After each sibling has assigned their values to every object, the objects
will be divided. Using their valuation, each sibling can now compute their
perception of the total monetary and emotional value of their share versus
the share of each other sibling. The goal is to divide up the objects so that
each person thinks that their share is worth at least as much (both monetarily
and emotionally) as everyone else’s shares.
Introduction and Motivation 7
Example 8. Suppose one person receives a table, a car, and a set of dishes
as their share. In monetary terms, they think the table is worth $450, the car
is worth $2000, and the dishes are worth $25. In emotional terms, they’ve as-
signed the table a value of 60, the car a value of 5, and the dishes a value of 100.
This person has assigned each object a pair of numbers to represent its
monetary and emotional value. The table is worth $450 monetarily and 60
emotionally, the car is worth $2000 monetarily and 5 emotionally, and the
dishes are worth $25 monetarily and 100 emotionally. This person feels their
share has a total monetary value of $450 + $2000 + $25 = $2475 and a total
emotional value of 60 + 5 + 100 = 165.
To decide whether or not this person is satisfied with this division, we’d
need to use their valuations of the objects that make up each other person’s
share to make sure that they don’t believe anyone else’s share was worth more
than $2475 or 165 in emotional value.
Now that we’ve seen a variety of different examples, let’s shift our focus to
looking for common characteristics between these problems to help us decide
where to start our exploration of linear algebra in Chapter 1.
Although the examples in this section may not seem closely related, they
have certain underlying similarities. In each problem we had several different
numbers associated to a given object. These numbers meant separate things or
described separate qualities about that object, so needed to be kept separate
from each other. For example, when we looked at forces acting on a person
we kept the components of our forces in the up/down, North/South, and
East/West directions separate from each other.
In several of these problems we needed to be able to multiply all numbers
associated with a particular object by a constant. For example, if we used 15
water molecules (which are each made of hydrogen and oxygen) in a chemical
reaction then we’d multiply both the number of hydrogens and the number of
oxygens by 15.
We also needed to add the collections of numbers associated to different
objects together, but in such a way that the result was a collection of numbers
where we only combined numbers which meant the same thing. For example,
when adding a sibling’s valuation of several objects together we’d need to add
up their monetary values and emotional values separately to get a monetary
total and an emotional total.
Noticing this pattern is our first step in the process of mathematical
generalization which was mentioned at the beginning of this section. It allows
us to strip away the differences between these various example applications to
focus on the underlying similarities we need to study and understand. In the
next chapter we’ll develop a way to write down these collections of associated
numbers, figure out how to add two such collections, and how to multiply such
a collection by a constant. Once we’ve solidified our understanding of these
basic processes, we can tackle each of these example problems using the same
basic mathematical tools.
1
Vectors
To help distinguish which variables represent vectors, I’ll write them with
arrows over the top of the variable as in ~v .
Notice that this definition doesn’t tell us what the various positions inside
the vector represent. This is deliberate, because it allows us the flexibility to
assign the appropriate meanings for our current use. However, it does mean
we’ll need to get into the habit of clearly communicating the meaning of our
vector’s entries in each application we do.
Example 1. Write down the vector which records the number of carbon,
hydrogen and oxygen atoms in a molecule of carbon dioxide (CO2 ).
9
10 Vectors
1
Example 2. The vector ~v = 2 can be thought of as describing a position
3
in 3-space.
Points in 3-space are usually described using three axes (often denoted
by x, y, and z). Our vector ~v describes the point we get to by starting at
the origin where the three axes meet and moving 1 unit along the first axis
(usually x), 2 units along the second axis (usually y), and 3 units along the
third axis (usually z). Notice here that as in Example 1, both the entries of the
vector and their positions within the vector are important. Here each entry
tells us how many5 units to move, and its position in the vector tells us which
axis to move along.
Geometrically, we will draw our vectors as arrows that start at the origin
and end at the point
4 in the plane or 3-space whose coordinates match the
entries of the vector as in Figure 1.1. (In some other situations, vectors may
instead be drawn between any two points in the plane or in 3-space.)
�
�
2
-1 1 2 3 4 5
Example 3. The set of all 2-vectors is the plane, R2 , where R stands for the
set of real numbers.
2 x1
We can write this more explicitly as R = . Thinking of these
x2
vectors geometrically as in Example 2, we can see that this set of vectors
corresponds
to the set of points in the plane. If we rewrite our 2-vectors as
x
R2 = , it becomes even clearer that this is the familiar 2D space we
y
use to graph one variable functions.
Similarly, using the interpretation of the entries from Example 2, the set
of all 3-vectors is R3 . We can expand this to the set of all vectors of any fixed
size.
−1 −1
Example 5. ~v = 0 is not equal to w
~ = 3 .
3 0
Again, these are both 3-vectors, so it’s reasonable to ask if they are equal.
Their first entries do match up, but their second and third entries do not.
(Remember that it is important not only that the vectors contain the same
numeric entries, but that they have those numbers in the same positions.)
We need to add the condition here that ~v and w ~ are both in the same
Rn to make sure that they have the same number of entries. Otherwise, it is
impossible to add their entries pairwise since some of the entries in the longer
vector won’t have corresponding entries in the shorter vector.
which means that together these two molecules contain 1 carbon atom, 2
hydrogen atoms, and 3 oxygen atoms.
In R2 , vector addition
has a striking geometric pattern. If we write our
x
2-vectors in the form , we know that the x-coordinate of our sum is the
y
sum of the x-coordinates of the vectors being added and similarly for the y-
coordinates. However, if we draw a parallelogram from the two vectors we’re
adding, then their sum forms that parallelogram’s diagonal as shown in Figure
1.2.
We can do something similar in R3 , but it is too complicated to draw in a
2D book.
5
14 Vectors
� � �
4 + =
� � �
2
�
�
�
�
1
-1 1 2 3 4 5
Now that we’ve explored vector addition, let’s turn our attention to
multiplying a vector by a scalar. Suppose we want to count the numbers
of the various types of atoms in 15 molecules of carbon dioxide. We can do
this by multiplying the number of carbon, hydrogen, and oxygen atoms in one
molecule by 15 and placing the results into the appropriate entries of our new
vector.
This motivates the following general definition for multiplying an n-vector
by a scalar.
v1 rv1
v2 rv2
Definition. If ~v = . is a vector in Rn and r is a scalar, then r·~v = . .
.. ..
vn rvn
1
Example 8. Compute 15 0.
2
To do this, we multiply each entry of our vector by 15 which gives us
1 15(1) 15
15 0 = 15(0) = 0 .
2 15(2) 30
Note that our vector is the atomic vector for carbon dioxide from Example
1.1 Vector Operations 15
1. This means our computation above gives us the atomic vector for 15
molecules of carbon dioxide.
��
2 � �
�
2 4 6 8 10 12
Here we can see that 4~v and ~v lie along the same line in R2 , but 4~v is 4
times as long as ~v .
Again, let’s plot ~v and − 12 ~v together to get a visual idea of what this looks
like.
4
16 2
Vectors
3
1
1
-2 -1 1 2 3 4
-�
�
-� -1
�
These two vectors still lie on the same line in R2 , but they are pointing
opposite directions along that -2 line. In addition to pointing the opposite way,
our new vector − 21 ~v is also half the length of ~v .
From the picture in the example above, we can see that the only difference
when our scalar is negative is that our vector’s direction is switched. Putting
this together with our observations from Example 9, we can say that in R2
multiplying a vector by a scalar r multiplies the length of the vector by |r| and
switches the direction of the vector if r is negative. As with vector addition,
there is a similar pattern in R3 which you can explore on your own.
Note that at this point we don’t have a good idea of what length means
except in R2 and R3 where we can use our formulas for the distance between
two points to compute the length of our vectors. Later on in Chapter 5
we’ll create a definition of the length of a vector which doesn’t depend on
geometry and hence can be extended to higher dimensions. However, we can
still compute the product of a vector and a scalar for any size of vector even
if we can’t visualize its geometric effect.
Now that we’ve defined these two vector operations, let’s explore some of
their nice properties. You’ll notice that many of these properties are similar
to what we have in R, which means we can use our intuition about how
addition and multiplication work in the real numbers. This is an important
consideration, because whenever a newly defined operation shares its name
with a familiar operation, it is tempting to assume they behave exactly the
same. This is not always true, so it pays to check carefully! We’ll start with
properties of vector addition.
From the definition of vector addition, we know that if ~v and w ~ are both in
Rn , their sum ~v + ~u is also in the same Rn . We will call this property closure
of addition.
~ in Rn , then ~v + w
If we have ~v and w ~ =w ~ +~v . To see this, we can compute
v1 + w1
v2 + w2
~v + w
~ = .. .
.
vn + wn
1.1 Vector Operations 17
Since addition of real numbers is associative (i.e., where we put our parentheses
in addition doesn’t matter), we know (vi + ui ) + wi = vi + (ui + wi ) for
i = 1, . . . , n. This means
(v1 + u1 ) + w1 v1 + (u1 + w1 )
(v2 + u2 ) + w2 v2 + (u2 + w2 )
.. = ..
. .
(vn + un ) + wn vn + (un + wn )
and since
v1 + (u1 + w1 ) v1 u1 + w1
v2 + (u2 + w2 ) v2 u2 + w2
.. = .. + .. = ~v + (~u + w)
~
. . .
vn + (un + wn ) vn un + wn
In the real numbers, every number r has an additive inverse, −r, so that
−r + r = 0. For every ~v in Rn we have the additive inverse −~v = (−1) · ~v so
that −~v + ~v = ~0. We can check this by computing
−v1 v1 −v1 + v1 0
−v2 v2 −v2 + v2 0
~
−~v + ~v = . + . = .. = .. = 0.
.. .. . .
−vn vn −vn + vn 0
Thus vector addition has additive inverses. (Notice that we set −~v + ~v equal
to ~0, because ~0 is the additive identity of Rn . This mirrors how −r + r = 0 in
R, because 0 is the additive identity of R.)
Now that we’ve explored vector addition, let’s turn our attention to scalar
multiplication of vectors.
From our definition of scalar multiplication, we know that if ~v is in Rn
then r · ~v is also in Rn . We call this property closure of scalar multiplication.
If we have ~v in Rn and two scalars r and s, then r · (s · ~v ) = (rs) · ~v . To
see this, we compute
sv1 rsv1
sv2 rsv2
r · (s · ~v ) = r . = . .
.. ..
svn rsvn
Factoring r out of each entry of the first vector and s out of each entry of the
second vector gives us
rv1 sv1 v1 v1
rv2 sv2 v2 v2
.. + .. = r .. + s .. = r · ~v + s · ~v .
. . . .
rvn svn vn vn
Exercises 1.1.
−1
0
1. For which n is the vector n
10 in R ?
−3
7
2. For which n is the vector in Rn ?
−8
1
3. Compute −2 2.
3
12
4. Compute 13 −9
1
3
5. Compute −4
1
−1
6. Compute −5 2
1
3
7. Compute 4 .
−2
−1
8. Compute − 12 6
4
−1
0
9. Compute 3
10 .
−3
10. Is it possible to have a scalar r and a vector ~v so that r~v doesn’t
make sense?
11. Compute each sum or explain why it is impossible.
1.1 Vector Operations 21
1
2
(a) + 2
−4
3
2 7
(b) +
−4 5
1
7
(c) 2 + .
5
3
12. Compute each sum or explain why it is impossible.
1
−3
(a) + 0
0 0
4
−1 2
(b) +
5 −3
13. Compute each sum or explain why it is impossible.
2
1
(a) −1 +
−3
17
0 −2
(b) −5 + 7
10 −1
14. Compute each sum or explain why it is impossible.
−1
3
(a) 2 +
−2
1
−1 11
(b) 2 + 5
1 0
11
3
(c) +5
−2
0
15. Compute each sum or explain why it is impossible.
−1
3
(a) 2 +
−2
1
−1
11
(b) 2 +
5
1
3 11
(c) +
−2 5
22 Vectors
1 −1 5
16. Compute 2 + 6 + −8.
3 4 −1
−1
0 2 4
17. Let ~v1 = , ~v2 = −3, ~v3 = 0 .
10
1 −5
−3
(a) Which sums of these vectors make sense?
(b) Compute those sums.
18. Use the picture of ~v and w
~ below to draw a picture of ~v + w.
~
2
�
-6 -4 -2 2
4
-2
� -4
2
-2 -1 1 2 3 4
-1
-2
2
1.1 Vector Operations 23
-1 1 2 3 4
-1
21. For each of the following scenarios, write a vector which records
the given information. Be sure to describe what each entry of your
vector represents.
-2
(a) The number of each type of atom in a molecule of glucose,
C6 H12 O6 . As in Example 1 from Chapter 0, C stands for
carbon, H for hydrogen, and O for oxygen.
(b) The pirate treasure can be found two paces south and seven
paces west of the only palm tree on the island.
(c) A hospital patient is 68.5 inches tall, weighs 158 pounds, has a
temperature of 98.7◦ F, and is 38 years old.
22. In the previous problem, imagine constructing the vector of a second
molecule or treasure map or hospital patient. For each scenario,
suppose you add together your two vectors. What is the practical
meaning of your vector sum?
23. Suppose we work at the paint counter of a hardware store. A
customer orders three gallons of lavender paint and half a gallon
of sage green paint. To make a gallon of lavender paint we mix 1/4
gallon of red paint, 1/4 gallon of blue paint and 1/2 gallon of white
paint. To make a gallon of sage green paint we mix 1/4 gallon of
white paint, 1/4 gallon of yellow paint, and 1/2 gallon of blue paint.
Use the method from Chapter 0’s Example 3 to find a vector which
gives the amounts of each color of paint needed to fill this order.
24. Adam and Steve are getting divorced, and they are trying to divide
up their assets: a car, a boat, a house, and a ski cabin. Each of
them has assigned each object a monetary and an emotional value.
If the first value vector entry is an object’s monetary value (in
24 Vectors
1.2 Span
So far, we’ve seen examples of problems where we want to add vectors or
multiply a vector by a scalar. However, there are many situations where
we’ll want to see what vectors we can get by adding and scaling a whole
set of vectors – perhaps to combine existing solutions to a problem into
new solutions. For example, rather than simply adding together one molecule
of carbon dioxide and one molecule of water, we may want to combine 15
molecules of carbon dioxide with 25 molecules of water. Since this idea comes
up so often, we’ll spend this section studying the set of new vectors we can
get by adding and scaling a given set of vectors. Notice that as with the
combination of carbon dioxide and water above, we will often want to use
different scalar multiples of each vector. Let’s start by considering a single
combination of a set of vectors.
Note that if ~v1 , . . . , ~vk are n-vectors, then any linear combination of them
will also be an n-vector. Also notice that to create a linear combination of a
given set of vectors we can choose whichever scalars we like, but must use the
specified vectors.
−2 1
Example 1. Create a linear combination of ~v1 = 8 , ~v2 = 0 , and
6 −5
3
~v3 = −1.
2
To create a linear combination of these three vectors, we need to first select
three scalars. I’ll use a1 = 12 , a2 = 2, and a3 = −1. Our linear combination is
then
−2 1 3
1
a1~v1 + a2~v2 + a3~v3 = 8 + 2 0 + (−1) −1
2
6 −5 2
−1 2 −3 −2
= 4 + 0 + 1 = 5 .
3 −10 −2 −9
Notice that since our original vectors were 3-vectors, our linear combination
of them is also a 3-vector.
and taking a sequence of moves in Rn along each of the lines defined by ~vi .
Through this point of view, our linear combination is the vector corresponding
to the point where we end up at the end of our trip. Remember from Figure 1.2
in 1.1 that we can add vectors geometrically using the parallelogram pattern,
and from Examples 9 and 10 of 1.1 that which direction we travel along ~vi ’s
line is determined by whether its scalar coefficient is positive or negative.
1 4
Example 2. Find the linear combination of ~v1 = and ~v2 = with
−2 4
a1 = 3 and a2 = − 12 .
Plugging these vectors and scalars into our definition gives us
1 1 4 3 2 1
a1~v1 + a2~v2 = 3 − = − = .
−2 2 4 −6 2 −8
1 1 4
Alternately, we can construct 3 and − 2 geometrically.
−2 4
��
-4 -2 2 4 6
��
-2
�
- ��
�
� ��
-4
� ��
-6
Now we can find the sum by drawing in the parallelogram formed by these
vectors, which we can see exactly matches the result we computed algebraically
above.
1.2 Span 27
-4 -2 2 4
�
- ��
�
-2
-4 � ��
�
-6
-�
�
� �� - ��
-8 �
As with any practical example, we first have to decide how to assign vector
entries to numeric pieces of information. There are 5 different food types in
the three specials, so we can encode the information about each breakfast
special’s contents as a 5-vector whose entries are (in order): eggs, strips of
bacon, sausage links, pieces of toast, and orders of hash browns. If we call
special #1’s
vector ~
s1 ,special #2’svector
~s2 , and special #3’s vector ~s3 , we
2 2 0
2 2 0
get ~s1 = 0, ~s2 = 2, and ~s3 = 2.
2 0 2
0 1 1
The table’s order can now be thought of as a linear combination of these
three vectors whose coefficients a1 , a2 , and a3 , are the numbers of each special
28 Vectors
ordered. Since the table ordered 2 special #1s, we set a1 = 2. Similarly they
ordered 3 special #2s so a2 = 3, and 1 special #3 so a3 = 1. This makes the
tables order the linear combination
2 2 0 4 6 0 10
2 2 0 4 6 0 10
a1~s1 + a2~s2 + a3~s3 = 2
0 + 3 2 + 2 = 0 + 6 + 2 = 8 .
2 0 2 4 0 2 6
0 1 1 0 3 1 4
This means the cook needs to make 10 eggs, 10 strips of bacon, 8 sausage
links, 6 pieces of toast, and 4 orders of hash browns.
While we could figure out the totals in the previous example simply by
counting, in many practical situations we have far more than 5 quantities to
track and far more than 3 combinations of those quantities contributing to
our final outcome.
In many situations we won’t just want to look at one linear combination
of our starting vectors, but at the set of all possible linear combinations. For
example, we might want to consider all possible combinations of some number
of carbon dioxide molecules and some number of water molecules or figure out
1
which points in the plane we can reach using linear combinations of and
−2
4
. This motivates the following definition.
4
Definition. Let ~v1 , . . . , ~vk be vectors in Rn . The span of these vectors is the
set of all possible linear combinations, which is written Span{~v1 , . . . , ~vk }. The
vectors ~v1 , . . . , ~vk are called the spanning set.
In other words, the span is the set of vectors we get by making all possible
choices of scalars in our linear combination of ~v1 , . . . , ~vk . Note that if ~v1 , . . . , ~vk
are n-vectors then all their linear combinations are also n-vectors, so their
span is a subset of Rn . Geometrically, we can visualize the span of a set
of vectors by thinking of our spanning vectors ~v1 , . . . , ~vk as giving possible
directions of movement in Rn . This makes the span all possible points in
Rn that we can reach by traveling in some combination of those possible
directions. Practically, you could imagine your spanning vectors as different
types of coins and bills which makes the span the possible amounts of money
you can give as change using those coins and bills. (Of course in this example
the only coefficients that make sense are positive whole numbers, but we will
learn to explore these situations in the most general case since that allows us
to solve the greatest array of problems.)
Let’s start by exploring the easiest possible case of a span: when we have
only one vector in our spanning set.
1.2 Span 29
1
Example 4. Find the span of ~v = .
−2
Since we only have one spanning vector, our span is just the set of all
multiples of that vector. This means we have
1 1 a
Span = a = .
−2 −2 −2a
1
In other words, the span of is the set of all vectors in R2 whose
−2
second entries are −2 times their first entries.
To visualize this span geometrically, recall from Examples 9 and 10 in 1.1
that positive multiples of a vector keep the direction of that vector but scale
its length, while negative multiples of a vector scale its length and reverse its
direction. This gives us the picture below.
��
-4 -3 -2 -1 1 2 3
-�
-2
Next, let’s step up one level of complexity and add a second spanning
vector.
30 Vectors
1 4
Example 5. Find the span of and .
−2 4
By definition, our span is all vectors ~b which can be written as
~b = a1 1 + a2 4 = a1 + 4a2
−2 4 −2a1 + 4a2
�
����
-�
2 �
����
�
-2 2 4
-2
We saw in 1.1 that we can visualize the sum of two vectors as the diagonal
of the parallelogram formed by the two vectors we’re adding. Thus our span
1.2 Span 31
consists of all points in the plane which can be thought of as the fourth corner
of a parallelogram with one corner at the origin and two sides along the two
lines defined by our two spanning vectors. This is shown for a generic point
in the plane in the picture below.
-2 2 4
-2
Hopefully you can convince yourself that no matter which point in the
plane we pick, we can always
draw a parallelogram
to show it is an element
1 4 2
of the span. This means Span , =R .
−2 4
2 −1
Example 6. Find the span of and .
8 −4
We know
2 −1 2 −1 2a1 − a2
Span , = a1 + a2 = .
8 −4 8 −4 8a1 − 4a2
10
�
4 �
-6 -4 -2 2 4 6 8
-�
-� -2
-4
This picture shows us that our two vectors actually lie on the same line
through the origin! Therefore their span is simply that line, i.e.,
2 −1 2 −1
Span , = Span = Span .
8 −4 8 −4
Notice that the previous example shows that it is possible for multiple
different spanning sets to have the same span.
While geometric descriptions are fairly straightforward in R2 , as we work
with larger vectors we may start to rely more on the algebraic description
from the definition. (This is certainly true for Rn with n > 3!)
1 −1
Example 7. Describe the span of ~v1 = 1 and ~v2 = 0 .
1 −1
By definition, Span{~v1 , ~v2 } = {a1~v1 + a2~v2 } for all scalars a1 and a2 .
Plugging in our two vectors gives us
1 −1 a1 −a2 a1 − a2
Span{~v1 , ~v2 } = a1 1 + a2 0 = a1 + 0 = a1 .
1 −1 a1 −a2 a1 − a2
Since we can let a1 be whatever number we like, it is clear that our middle
1.2 Span 33
We can also use spans to model practical problems, even those which may
not initially look like they have anything to do with linear combinations.
Before we can even start thinking about how spans might be relevant here,
we need to reinterpret this situation using vectors. We have four different
goods, so we will use 4-vectors whose entries give the quantity, in pounds,
of the various goods. I’ll use the first entry for bacon, the second for beans,
the third for cornmeal, and the fourth for turnips. Normally it would be very
strange to have a negative quantity of something like bacon, but here we will
view negative quantities as how much of a good we are giving to someone else
and positive quantities as how much we are getting.
Using this interpretation of positive vector entries as getting and negative
entries as giving, we can model an accepted exchange as a 4-vector. We know
that we can give someone 4 lbs of turnips
in exchange for getting 1 lb of bacon,
1
0
which we can now view as the vector
0 . (If you’re worried that I chose to
−4
view exchanging turnips and bacon as giving turnips and getting bacon rather
than the other way around, we’ll get to that in the next paragraph where we
discuss the scalars in our linear combinations.) Similarly, giving someone 4
0
−4
lbs of beans and getting 3 lbs of cornmeal has the vector
3 , and giving
0
0
1
someone 1 lb of turnips and getting 1 lb of beans has the vector
0 .
−1
34 Vectors
−4 −12
means we’re giving 12 lbs of turnips and getting 3 lbs of bacon, but we’re
still giving 4 times as much turnips as we get bacon. Putting these two ideas
together means that our scalar multiples allow us to trade different amounts
of goods in both directions, but keeps the relative values of the goods the
same.
Using this model, we can view each linear combination of our exchange
vectors as an overall trade at the market which is made up of some combination
of our three given exchange ratios. For example, suppose we arrive at the
market with 8 lbs of beans and swap 4 lbs of beans for 3 lbs of cornmeal,
swap another 4 lbs of beans for 4 lbs of turnips, and thenswap 2 lbs of our
0 0
−4 −4
turnips for 1/2 lb of bacon. Our first exchange is
3 = 1 3 . Our second
0 0
0 0 1/2 1
−4 1 0 0
exchange is
0 = −4 0 . Our final exchange is 0 = (1/2) 0
4 −1 −2 −4
This means our overall exchange vector is the linear combination
1 0 0 1/2
0 −4 1 −8
(1/2)
0 + 1 3 − 4 0 = 3
−4 0 −1 2
which correctly tells us overall we exchanged our 8 lbs of beans for 1/2 lb of
bacon, 3 lbs of cornmeal, and 2 lbs of turnips.
If we view our possible overall trades as exchange
vectors,
the set of all
1 0 0
0 −4 1
possible trades is therefore the span of
0 , 3 , and 0 .
−4 0 −1
1.2 Span 35
One basic question we can ask about spans is whether or not a given
vector ~b is in the span of a set of vectors ~v1 , . . . , ~vk . Geometrically, this can be
reinterpreted as asking whether we can get to the point in Rn corresponding to
~b via a combination of movements along the vectors ~v1 , . . . , ~vk . Practically, this
can be reinterpreted as asking whether we can create the output represented
by ~b using only the inputs represented by the ~vi ’s. If ~b is not the same size as
the ~v ’s then the answer is clearly no, since a linear combination of vectors has
the same size as the vectors in the spanning set. If ~b is the same size as the ~v ’s,
then we can answer this question by trying to find scalars a1 , . . . , ak so that
a1~v1 + · · · + ak~vk = ~b. Since we’re used to solving for x, this type of equation
is often written x1~v1 + · · · + xk~vk = ~b to help remind us which variables we’re
solving for. These equations come up a lot, so they have their own name.
2 1 0 6
Example 9. Solve the vector equation x1 −1 +x2 −1 +x3 2 = −9.
3 0 1 1
We can start by combining the left-hand side of our equation into a single
vector to give us
2x1 + x2 6
−x1 − x2 + 2x3 = −9 .
3x1 + x3 1
Setting corresponding entries equal gives us three equations: 2x1 + x2 = 6,
−x1 − x2 + 2x3 = −9, and 3x1 + x3 = 1. We can use the first and third
equations to solve for x2 and x3 in terms of x1 , which gives us x2 = 6 − 2x1
and x3 = 1 − 3x1 . Plugging these back into our middle equation above gives
−x1 − (6 − 2x1 ) + 2(1 − 3x1 ) = −9, which simplifies to −5x1 − 4 = −9 or
x1 = 1. This means x2 = 6 − 2 = 4 and x3 = 1 − 3 = −2. Therefore the
solution to our vector equation is x1 = 1, x2 = 4, and x3 = −2.
As in the previous example, we can simplify the left-hand side of this equation
to get
3x1 1
−x2 = 2 .
x1 3
Setting corresponding entries equal gives us the three equations 3x1 = 1,
−x2 = 2, and x1 = 3. Unfortunately, the first equation says x1 = 13 , while
the third equation says x1 = 3. Since we can’t satisfy both of these conditions
at once, there is no solution to our vector equation. Therefore ~b is not in the
span of ~v1 and ~v2 .
Notice that this vector equation version of checking whether or not a vector
is in a span doesn’t require any geometric understanding of what that span
looks like, so it is particularly useful in Rn for n > 3.
Asking whether or not a vector is in a span also comes up in practical
situations, as in the example below.
Example 11. With the same basic setup of bartering goods from Example
8, can we trade 3 lbs of cornmeal and 4 lbs of turnips for 1/2 lb of bacon and
6 lbs of beans?
Sincewe saw
in Example
8that the set of all possible exchanges is the
1 0 0
0 −4 1
span of
0 , 3 , and 0 , this is really just asking if our trade is in
−4 0 −1
this span. Giving 3 lbs of cornmeal and 4 lbs of turnips and getting 1/2 lb
1/2
6
of bacon and 6 lbs of beans corresponds to the vector
−3 . Therefore this
−4
1/2 1 0
6 0 −4
question can be rephrased as asking if
−3 is in the span of 0 , 3 ,
−4 −4 0
0
1
and
0 .
−1
1.2 Span 37
Exercises 1.2.
1 −1 2
1. Compute the linear combination 2 +7 − .
−3 0 5
1 4 1
2. Compute the linear combination 4 2 − 2 −1 + 5 0 .
3 6 −1
3. Compute the linear combination
2 0 3 5
x1 + x2 + x3 + x4 .
1 −4 −1 0
−2 3 1
4. Compute the linear combination x1 4 + x2 −6 + x3 −1 .
1 0 −2
−2 4 0
5. Find a value of h so that x1 + x2 = .
h −3 1
1 0 4
6. Find a value of k so that x1 −1 + x2 k = −5.
2 5 3
7. Rust is formed by an initial reaction of iron, F e, oxygen gas, O2 ,
and water, H2 O to form iron hydroxide, F e(OH)3 . As in Chapter
0’s Example 1, we can write each molecule in this reaction as
a vector which counts the number of each type of atom in that
molecule. Let’s agree that the first entry in our vectors will count
iron atoms (F e), the second will count oxygen (O), and the third
will count hydrogen (H). If our reaction combines x1 molecules of
iron, x2 molecules of oxygen, and x3 molecules of water to create x4
molecules of rust, give the vector equation of this chemical reaction.
8. Sodium chlorate, N aClO3 , is produced by electrolysis of sodium
chloride, N aCl, and water, H2 O. The reaction also produces
hydrogen gas, H2 . As in Chapter 0’s Example 1, we can write
each molecule in this reaction as a vector which counts the number
1.2 Span 39
of each type of atom in that molecule. Let’s agree that the first
entry in our vectors will count sodium atoms (N a), the second will
count chlorine (Cl), the third will count oxygen (O), and the fourth
will count hydrogen (H). If our reaction combines x1 molecules of
sodium chloride with x2 molecules of water to produce x3 molecules
of sodium chlorate and x4 molecules of hydrogen gas, give the vector
equation of this chemical reaction.
1 3 1
9. Does the span of −1 and 1 include the vector 3?
2 4 5
−2 6 0
10. Does the span of 0 and 3 include the vector 1?
5 9 8
1
1 2 −3
1
−3
0
0
11. Is ~b =
1 in Span −1 , 0 , 6 ?
1 2 1 −2
3 2 1 −1
12. Is ~b = 0 in Span −2 , 1 , 0 ?
3 0 1 1
4
13. Sketch a picture of the span of .
−2
2 1
14. Sketch a picture of the span of and .
0 1
2 1
15. Sketch a picture of the span of 0 and 1.
0 0
1 0 −4
16. Do 0 , 1, and 0 span all of R3 ? Briefly say why or why
−1 2 6
not.
1 2 0
17. Do 0, 0 and 1 span all of R3 ? Briefly say why or why not.
1 2 1
18. Develop a strategy to figure out whether or not a set of 4-vectors
span all of R4 .
19. Using the setup of Example 3, is it possible to order a combination
of specials so that the cook needs to make 12 eggs, 10 strips of bacon,
6 sausage links, 10 pieces of toast, and 5 orders of hash browns? If
it is possible, explain how. (Remember that the coefficients in your
solution must be positive whole numbers!)
40 Vectors
20. Using the setup of Example 8, can you trade 3 lbs of bacon and 8
lbs of beans for 3 lbs of cornmeal and 16 lbs of turnips?
21. Is the graph of the line y = x + 1 a subspace of R2 ?
x
22. Is W = 0 a subspace of R3 ?
z
23. What is the smallest subspace of Rn ? What is the largest subspace
of Rn ?
24. Let S be the subset of R3 consisting of the x-axis, y-axis and z-axis.
Show that S is not a subspace of R3 .
1.3 Linear Independence 41
which simplifies to
x1 + x2 + 2x3
x2 + 3x3 .
0
We can choose any values we like for x1 , x2 , and x3 , so we can make the first
two entries of our vector
equal whatever we want. To see this, suppose we
a
want to get the vector b . We can do this by choosing x1 = a − b, x2 = b,
0
and x3 = 0 to get
x1 + x2 + 2x3 (a − b) + b + 2(0) a
x2 + 3x3 = b + 3(0) = b
0 0 0
as desired. However, no matter what scalars we choose, we’ll always get 0
a
as our third entry. Therefore geometrically, this span is the plane b
0
inside R3 . Notice that even though we have three spanning vectors, our span
is two-dimensional not three-dimensional.
In the previous example, none of the vectors lay along the lines spanned
by the other vectors. You can check this geometrically using a 3D picture or
computationally by noticing that none of them is a multiple of one of the
other vectors. However, the third vector is in the plane spanned by the first
two vectors which we can check computationally by noticing that
1 1 2
−1 0 + 3 1 = 3 .
0 0 0
2 1 3
Example 2. Show that ~v1 = −1, ~v2 = 1, and ~v3 = 0 are linearly
3 1 4
dependent.
Here we can observe that ~v1 + ~v2 = ~v3 . This means ~v3 is an element of
Span {~v1 , ~v2 }. Since one vector is in the span of the other two, these three
vectors are linearly dependent.
Theorem 1. The vectors ~v1 , . . . , ~vk in Rn are linearly dependent if the vector
equation x1~v1 + · · · + xk~vk = ~0 has a solution where at least one of the xi s is
nonzero. Otherwise ~v1 , . . . , ~vk are linearly independent.
To see that this is equivalent to our definition, suppose we have ~v1 , . . . , ~vk
in Rn and that one of these vectors is in the span of the others. Let’s renumber
our vectors if necessary so that ~v1 is in Span{~v2 , . . . , ~vk }. This means that ~v1
is a linear combination of ~v2 , . . . , ~vk , so we can find scalars a2 , . . . , ak for which
Setting corresponding entries equal to each other gives us the three equations
x1 + 2x2 + x3 = 0, 3x2 + x3 = 0, and x1 − x3 = 0. The third equation says
x1 = x3 . Solving the second equation for x2 gives us x2 = − 13 x3 . Plugging
both of those into the first equation gives us x3 + 2(− 13 x3 ) + x3 = 0 which
simplifies to 43 x3 = 0. This means x3 = 0, so x1 = x3 = 0 and x2 = − 13 x3 = 0.
Since our only solution is to have all coefficients equal to 0, our three vectors
are linearly independent.
2 1 5
Example 4. Are −1, 1, and −4 linearly independent or linearly
3 1 8
dependent?
This means our third vector is in the span of the other two, so our three
vectors are linearly dependent.
However, I am not confident that I’d notice that relationship between these
vectors immediately. In that case, we can always solve the vector equation
2 1 5 0
x1 −1 + x2 1 + x3 −4 = 0
3 1 8 0
This is a good example of why having two different but equivalent ways
to check something can be very helpful!
Going back to our original motivation for defining linear independence and
linear dependence, we can now state the following.
This gives us the equations x1 + 4x2 = 0 and −2x1 + 4x2 = 0. The first
equation tells us x1 = −4x2 . Plugging this into the second equation, we get
8x2 + 4x2 = 0 or 12x2 = 0. This means x2 = 0, so x1 = 0 as well.
Since our only solution was both coefficients equal to zero, our two span-
ning vectors are linearly independent. This means their span has dimension
2, which matches up with the dimension of the picture we drew in 1.2.
Looking at Example 6 from 1.2 where our two spanning vectors are linearly
dependent shows us Theorem 1’s other conclusion.
2 −1
Example 6. Find the dimension of the span of and .
8 −4
From the theorem above, we know that if our two spanning vectors are
linearly independent, the span has dimension 2. If they are linearly dependent,
the span has dimension less than 2.
Checking linear independence or dependence means solving the vector
equation
2 −1 0
x1 + x2 =
8 −4 0
which can be simplified to
2x1 − x2 0
= .
8x1 − 4x2 0
This gives us the equations 2x1 − x2 = 0 and 8x1 − 4x2 = 0. The first equation
tells us x2 = 2x1 . Plugging this into the second equation, we get 8x1 −8x1 = 0
or 0 = 0. This means we can have any value of x1 as long as we set x2 = 2x1 .
Since we have solutions where the coefficients don’t equal zero, our two
spanning vectors are linearly dependent. This means their span has dimension
less than 2, which again matches up with the dimension of the picture we drew
in 1.2.
This simplifies to
x1 + x2 + 2x3 0
0 0
x1 + x2 = 0
x2 + 2x3 0
which gives us x1 + x2 + 2x3 = 0, 0 = 0, x1 + x2 = 0, and x2 + 2x3 = 0. The
third equation tells us x1 = −x2 , and the fourth equation tells us x3 = − 12 x2 .
Plugging these into the first equation gives us −x2 + x2 − x2 = 0 so x2 = 0.
This also means x1 = 0 and x3 = 0.
Since all three coefficients in our vector equation must be 0, our vectors
are linearly independent which means their span has dimension 3.
4 6 −3
2 6 0
Example 8. Find the dimension of the span of
0 ,
, and .
0 0
−1 1 2
As in the previous example we have three spanning vectors, so the
maximum dimension of this span is 3. Again, we’ll start by determining if
these vectors are linearly independent or linearly dependent using the vector
48 Vectors
equation
4 6 −3 0
2 6 0 0
x1
0 + x2 0 + x3 0 = 0 .
−1 1 2 0
This simplifies to
4x1 + 6x2 − 3x3 0
2x1 + 6x2 0
=
0 0
−x1 + x2 + 2x3 0
which gives us the equations 4x1 + 6x2 − 3x3 = 0, 2x1 + 6x2 = 0, 0 = 0,
and −x1 + x2 + 2x3 = 0. The second equation gives us x1 = −3x2 . Plugging
this into the fourth equation gives us 3x2 + x2 + 2x3 = 0 so 4x2 + 2x3 = 0
or x3 = −2x2 . Plugging x1 = −3x2 and x3 = −2x2 into any of our original
equations simplifies down to 0 = 0, so x2 can be whatever we choose. In
particular, we can choose to have x2 6= 0, so our three vectors are linearly
dependent. This means their span has dimension less than 3.
Following our algorithm, we now need to remove one of our spanning
vectors which is in the span of the other two. The clue here is that this vector
will have a nonzero coefficient in our vector equation. Since we said we could
choose any value for x2 , let’s remove our second vector. (Actually if x2 6= 0
we also have x1 and x3 nonzero, so here we could also have chosen to remove
either of the
other two vectors as well.)
6 4 −3
6 2 0
Since
0 is in the span of 0 and 0 , we know
1 −1 2
4
6 −3 4
−3
2 6 0 2 0
Span , , = Span , .
0 0 0 0 0
−1 1 2 −1 2
In particular, we know these two spans have the same dimension. To find
this dimension, let’s check whether our remaining two spanning vectors are
linearly independent or linearly dependent by solving the vector equation
4 −3 0
2 0 0
x1
0 + x2 0 = 0 .
−1 2 0
This simplifies to
4x1 − 3x2 0
2x 0
1 =
0 0
−x1 + 2x2 0
1.3 Linear Independence 49
which gives us the equations 4x1 −3x2 = 0, 2x1 = 0, 0 = 0, and −x1 +2x2 = 0.
The second equation tells us that x1 = 0, and plugging that back into the
fourth equation tells us x2 = 0. (This also satisfies the first equation.) Since
our only solution is to have both coefficients equal to 0, these two vectors are
linearly independent and
that
their
span has dimension 2.
4 6 −3
2 6 0
Thus the span of
0 , 0, and 0 is two-dimensional. (Geometri-
−1 1 2
cally, this means the span is a plane inside R4 , which is fun to think about
even if we can’t draw a good picture of it.)
Exercises 1.3.
1 3 1
1. Are −1, 0, and 2 linearly independent or linearly depen-
1 6 4
dent?
2 1 3
2. Are 2, 0 , and 2 linearly independent or linearly depen-
2 −1 0
dent?
2 3 1
1 0 0
3. Are
0, 2, and 2 linearly independent or linearly dependent?
1 1 0
−1 3 2
1 0 4
4. Are
0 , 1, and 2 linearly independent or linearly depen-
0 3 6
dent?
5. Are the two vectors pictured below linearly independent or linearly
dependent?
50 Vectors
6
-2 2 4 6
�
-2
-2 2 4
-2
(a) Are ~v1 , ~v2 , ~v3 linearly dependent or linearly independent? Show
work to support your answer.
(b) What does your answer to (a) tell you about the dimension of
Span{~v1 , ~v2 , ~v3 }?
−3 1 −2
11. Find the dimension of Span 12 , −4 , 8 .
9 −3 6
−1 3 2
1 , , 4 .
0
12. Find the dimension of Span 0 1 2
0 3 6
13. Briefly explain why two 3-vectors cannot span all of R3 .
14. Is it possible to span all of R6 with five 6-vectors, i.e., five vectors
from R6 ? Briefly say why or why not.
1 0
15. Use linear independence to decide whether or not 0 , 1, and
−1 2
−4
0 span all of R3 .
6
1 2
16. Use linear independence to decide whether or not 0, 0 and
1 2
0
1 span all of R3 .
1
17. Briefly explain how you would use the idea of linear independence
to figure out whether or not a set of 4-vectors span all of R4 .
18. Using the setup of 1.2’s Example 8, are there trades that are
impossible? Explain why or why not using the idea of linear
independence. Why does your answer make sense from a practical
perspective?
2
Functions of Vectors
Unlike a first calculus course, we don’t have to use the same kind of input
and output.
x1
x1
Example 2. f = x2 .
x2
3
In this example our inputs are 2-vectors and our outputs are 3-vectors.
We can imagine a practical interpretation of this function as taking a position
vector on a 2D table top and changing it into a position vector in a 3D dining
room using the fact that the table top has a height of 3 feet.
53
54 Functions of Vectors
In other words, the set from which the function maps is its domain, and
the set it maps to is its codomain, as shown in Figure 2.1.
domain codomain
x1
x2
Example 3. Find the domain and codomain of f = x1 + x4 .
x3 x2 − x3
x4
This function’s inputs are 4-vectors and its outputs are 2-vectors, so we
could write f : R4 → R2 . Therefore f ’s domain is R4 and its codomain is R2 .
Since these vector-valued functions are still functions, we can ask all the
same sorts of questions about them that we asked about the functions in
calculus. These include things like computing f (~x) for a given input vector,
solving f (~x) = ~b for ~x, and doing basic function operations like adding two
functions, multiplying a function by a scalar, doing function composition, and
finding inverse functions. We will eventually tackle all of these questions, both
for general mathematical interest and because we will need their answers to
solve practical problems involving vectors. However, we will not discuss all
vector-valued functions, but instead restrict our attention to a special class of
functions.
Recall that calculus quickly narrows its focus to study only continuous
functions. This is because calculus relies so heavily on limits of real numbers
that it wants functions to respect and play well with limits. In our case, we
don’t care about limits, but do care about addition and scalar multiplication
of vectors. Therefore, we will restrict our attention to functions that respect
and play well with vector addition and scalar multiplication. Putting this more
precisely, we define a linear function as follows.
Intuitively, this is saying that our function f respects addition and scalar
multiplication because we’ll get the same answer whether we add or scale our
vectors before or after applying the function f . This often gives us two choices
on how to tackle a computation, which can be very helpful if one is easier than
the other.
Additionally, a linear function preserves lines in the sense that if f is linear
then the image of a line in Rn is a line in Rm . To see this, recall that one way
to write the equation of a line in Rn is t~v + ~b where ~b is any vector on that
line, ~v is any vector parallel to that line, and t is any scalar. If we apply a
linear function f , then the image is the set of all points f (t~v + ~b), which by
linearity can be broken up to give f (t~v + ~b) = tf (~v ) + f (~b). This is also a line,
in particular the line in Rm containing the vector f (~b) and parallel to f (~v ).
Let’s move from the realm of theory to the realm of computations and look
at a few examples.
x
x
Example 4. Show f : R2 → R3 by f = y is linear.
y
0
This map is one of the three standard ways to map R2 into R3 as what is
often called the xy-plane. To check that it is a linear map, we need to check
the two conditions from the definition. To do that, we’ll need to write down
2 x
two generic vectors ~v and ~u in R and a generic scalar r in R. I’ll use ~v = ,
y
z
~u = .
w
First we need to check that f (~v + ~u) = f (~v ) + f (~u).
Computing the left-hand side gives us
x+z
x z x+z
f (~v + ~u) = f + =f = y + w .
y w y+w
0
Since our two answers are equal, it’s clear that f respects addition.
Next we need to check that f (r · ~v ) = r · f (~v ).
Computing the left-hand side gives us
rx
x rx
f (r · ~v ) = f r · =f = ry .
y ry
0
56 Functions of Vectors
Again, since our answers are equal, it’s clear that f respects scalar multipli-
cation.
Since both conditions hold, f is a linear function.
x 2x − y
Example 5. Show g : R2 → R2 by g = is linear.
y x + 3y
x z
As in Example 4, I’ll use the generic vectors ~v = and ~u = and
y w
the scalar r to show that g respects addition and scalar multiplication.
Plugging ~v and ~u into the left-hand side of g(~v + ~u) = g(~v ) + g(~u) gives us
x z x+z
g(~v + ~u) = g + =g
y w y+w
2(x + z) − (y + w) 2x + 2z − y − w
= = .
(x + z) + 3(y + w) x + z + 3y + 3w
Since our two answers are equal, it’s clear that g respects addition.
Computing the left-hand side of g(r · ~v ) = r · g(~v ) gives us
x rx 2(rx) − (ry) 2rx − ry
g(r · ~v ) = g r · =g = = .
y ry (rx) + 3(ry) rx + 3ry
However,
2 2
x z x z
h(~v ) + h(~u) = h +h = +
y w y−1 w−1
2
x2 + z 2 x + z2
= = .
y−1+w−1 y+w−2
These are clearly not equal (in both components!), so this function is not
linear.
If I were just concerned with checking whether or not h was linear, I’d
stop here. However, in the interest of practice, let’s check the other condition
as well using the same ~v and the scalar r.
2 2
x rx (rx)2 r x
h(r · ~v ) = h r · =h = = .
y ry (ry) − 1 ry − 1
Again, these are not equal (in both components), so h doesn’t respect scalar
multiplication. This is also enough, even without the check on addition, to
show h isn’t linear.
Have you noticed a pattern in the functions above? One possible pattern is
that for both of our linear functions, each component of the output vector was
a linear combination of the variables from the input vector. Our function that
wasn’t linear had both an exponent and a constant term in its output vector’s
components. This is another, probably easier way to determine if a function
from Rn to Rm is linear. Why didn’t we adopt that as our definition of a
linear map? If we were only going to talk about Rn , we could have. However,
in 2.4 and Chapter 3, we’ll want to expand our focus to other types of spaces
and our formal definition will be easier to generalize to those situations than
58 Functions of Vectors
this second idea. In many areas of math and science you’ll see this happening;
collecting many ways of describing an idea and sorting through them later to
figure out which one ends up working best. That decision will depend heavily
on what you want to do later, so you may find yourself changing your mind.
One nice thing about studying an older mathematical subject like basic linear
algebra is that it has had time to settle down to a set of definitions that work
best for what we want to do.
Exercises 2.1.
−3 x1 x2
1. Compute f where f = .
5 x2 x1 + x2
1 x1
−x3
2. Compute f 2 where f x2 = .
x1 − x2 + x3
3 x3
0
8 x1 2x1 + 4x2
3. Compute f where f = 1 x1 .
−2 x2 4
x1 + x2
6 x1
2 x2 x1 + x3 + x4
4. Compute f
0 where f x3 = −5x2 .
4x3 − x4
−1 x4
2x1 − x2
x1
5. Give the domain and codomain of f = 0 .
x2
−x1 + 4x2
x1
x1 + x2 + x3
6. Give the domain and codomain of f x2 = .
x1 − x3
x3
x1
x2 x2 − x4
5x1 + x5
7. Give the domain and codomain of f
x3 = −x3 .
x4
x1 + x3
x5
x1
x2 3x2 + x4
8.
Give the domain and codomain of f = .
x3 −x1
x4
9. Find the formula of the function f : R2 → R2 which switches the
order of a 2-vector’s entries.
10. Find the formula of the function f : R2 → R3 which multiplies the
2.1 Linear Functions 59
first entry by −1, keeps the second entry the same, and has a zero
in the third entry.
x1 2x1 − x2
11. Show that f = is not a linear map.
x2 x1 + 2
2y
x
12. Show that f = x is not a linear map.
y
1
x1 x2
13. Show that f = is a linear map.
x2 x1 + x2
x2
x1
14. Show that f = x1 − x2 is a linear map.
x2
−x1
15. If f : Rn → Rm is linear, explain why f (~0n ) = ~0m where ~0n is
the zero vector in Rn and ~0m is the zero vector in Rm . (This can
be restated as saying that a linear function always maps the zero
vector of the domain to the zero vector of the codomain.)
16. If f : Rn → Rm is linear, explain why f (−~v ) = −f (~v ) for every ~v
in Rn . (This can be restated as saying that a linear function always
maps the additive inverse of ~v to the additive inverse of f (~v ).)
0
17. Use r = 2 and ~v = to show that the function f : R2 → R2
1/2
which reflects the plane across the line y = x + 1 is not a linear
function.
60 Functions of Vectors
2.2 Matrices
We saw in the last section that a linear map f from Rn to Rm had a certain
pattern to its vector of outputs: each entry was a linear combination of the
entries of the input vector. This means if we want to describe a particular linear
map to someone, we really only need to tell them three pieces of information:
the size of the input vectors, the size of the output vectors, and the coefficients
on each input entry that appear in each of our output entries. Instead of
writing down the whole function with all the variables, we can keep track of
of all three pieces of information by writing down our coefficients in a grid of
numbers called a matrix.
Definition.
A matrix
is an ordered grid of real numbers written
a11 a12 · · · a1n
a21 a22 · · · a2n
.. .. . A matrix with m rows and n columns is called m×n.
. .
am1 am2 · · · amn
Note that when talking about matrices, our notation always puts row
information before column information. Thus an m × n matrix has m rows
and n columns, and aij is the entry in the ith row and jth column. As with
vectors, two matrices are equal exactly when they are the same size and all
corresponding pairs of entries are equal.
−3 0 1 8
Example 1. Find the size of A and give a32 where A = 2 −4 3 1.
5 −1 0 7
This matrix has three rows and four columns, so it is 3 × 4. The entry a32
is in the third row and second column, so a32 = −1.
Now that we have some understanding of what a matrix is and how to write
it down, let’s explore the idea of using matrices to record the coefficients from
our linear maps. In the first part of this section we’ll use algebraic techniques
and in the second part we’ll use geometry.
Example 2. What matrix should we use to encode the sizes of input and
output vectors and the coefficients used to build the function
6x1 + 4x2
x1
f = 5x1 + 2x2 ?
x2
−3x1 + 7x2
2.2 Matrices 61
If we look at the pattern revealed in this last example, we get the following.
Notice that the dimensions of our matrix are the sizes of the vectors from
the domain and codomain of f , but in reverse order. As we saw in the previous
example, this is because each column of f ’s matrix contains the coefficients
on one variable from ~x, so the number of columns is the same as the size of
a vector from f ’s domain. Conversely, each row of f ’s matrix contains the
coefficients from one entry of f (~x), so the number of rows is the same as the
size of a vector from f ’s codomain. In fact, this is a very important idea to
keep in mind: rows of f ’s matrix correspond to output entries while columns
correspond to input entries.
Finding the matrix of a linear function may seem complicated at first, but
after a few repetitions it will quickly become routine.
x1
−x1 + x3
Example 3. Find the matrix of the function f x2 = .
4x1 − x2 + 3x3
x3
The input vector has three entries, so our matrix A must have three
columns. The output vector has two entries, so our matrix A must have two
rows. Another way of thinking about this is that f ’s domain is R3 and its
codomain is R2 . Therefore f : R3 → R2 , so A is 2 × 3.
The first column of A contains the coefficients of x1 . In the first component
of f (~x) we have −x1 , so the first entry in this column is −1. In the second
62 Functions of Vectors
output component we have 4x1 , so the second entry in A’s first column is 4.
Similarly, the second column of A contains the coefficients on x2 . These are 0
in the first component of f (~x) and −1 in the second component. Finally, A’s
third column contains the coefficients on x3 . The first component of f (~x) has
coefficient 1 and the second has coefficient 3.
−1 0 1
Putting this all together we get that A is the 2 × 3 matrix .
4 −1 3
Since the domain and codomain of f are both Rn , this matrix will be n×n.
Each variable xi appears only in the ith entry of f (~x) with coefficient 1, so
the jth column of f ’s matrix has all entries equal to zero except for a 1 in the
jth spot. Thus f has matrix
1 0 0 ··· 0
0 1 0 · · · 0
.. .
In = ... ... . . . .
0 0 1 0
0 0 ··· 0 1
Since f is the identity map, its matrix In is called the identity matrix.
capable of producing multiple seedlings.) Find the matrix of the function that
takes a given year’s population vector and gives the next year’s population
vector.
Let’s start by thinking about the size of this matrix. Our input and output
vectors both have 5 entries, one for each stage in the coneflower’s life cycle.
This means we have f : R5 → R5 and our matrix is 5 × 5.
As in Chapter
0’s Example 6, let’s order the entries in our population
x1
x2
vectors
x3 so that x1 counts seedlings, x2 counts small plants, x3 counts
x4
x5
medium plants, x4 counts large plants, and x5 counts flowering plants.
The first row of our matrix is the coefficients on these xi s that make up
the first entry of f (~x), in other words, the coefficients on the current year’s
life cycle counts which tell us how many seedlings there will be next year.
The only life cycle stage that produces seedlings is flowering plants, and each
flowering plant produces 17.76 seedlings. This means that the first entry of
f (~x) is 17.76x5 .
The second row of our matrix is the coefficients that tell us how to compute
the number of small plants there will be next year. Each seedling has a 0.35
chance to become a small plant, each small plant has a 0.78 chance to stay a
small plant, each medium plant has a 0.24 chance to become a small plant,
each large plant has a 0.07 chance to become a small plant, and each flowering
plant has a 0.43 chance to become a small plant. This means the second entry
of f (~x) is 0.35x1 + 0.78x2 + 0.24x3 + 0.07x4 + 0.43x5 .
The third row of our matrix is the coefficients that tell us how to compute
the number of medium plants there will be next year. No seedlings become
medium plants, each small plant has a 0.18 chance to become a medium plant,
each medium plant has a 0.49 chance to stay a medium plant, each large plant
has a 0.21 chance to become a medium plant, and each flowering plant has a
0.28 chance to become a medium plant. This means the third entry of f (~x) is
0.18x2 + 0.49x3 + 0.21x4 + 0.28x5 .
The fourth row of our matrix is the coefficients that tell us how to compute
the number of large plants there will be next year. No seedlings become large
plants, each small plant has a 0.03 chance to become a large plant, each
medium plant has a 0.17 chance to become a large plant, each large plant has
a 0.38 chance to stay a large plant, and each flowering plant has a 0.18 chance
to become a large plant. This means the fourth entry of f (~x) is 0.03x2 +
0.17x3 + 0.38x4 + 0.18x5 .
The fifth row of our matrix is the coefficients that tell us how to compute
the number of flowering plants there will be next year. No seedlings become
flowering plants, each small plant has a 0.01 chance to become a flowering
64 Functions of Vectors
plant, each medium plant has a 0.10 chance to become a flowering plant, each
large plant has a 0.33 chance to become a flowering plant, and each flowering
plant has a 0.11 chance to stay a flowering plant. This means the fifth entry
of f (~x) is 0.01x2 + 0.10x3 + 0.33x4 + 0.11x5 .
Putting the coefficients from each entry of f (~x) into the corresponding
rows of f ’s matrix gives us
0 0 0 0 17.76
0.35 0.78 0.24 0.07 0.43
0 0.18 0.49 0.21 0.28
.
0 0.03 0.17 0.38 0.18
0 0.01 0.10 0.33 0.11
Example
6. Find the equation of the function corresponding to the matrix
4 0
A = 1 −3.
2 7
Our matrix A has three rows and two columns, so is 3 × 2. This means it
corresponds to a function f : R2 → R3 . This means we’re trying to fill in the
gaps in the equation
x1
f = .
x2
Each row of our matrix contains the coefficients (on x1 in the first column
and x2 in the second column) of an entry of f (~x). The first row gives us
4x1 + 0x2 , the second row gives us x1 − 3x2 , and the third row gives us
2x1 + 7x2 . Plugging this into our formula above means
4x1
x1
f = x1 − 3x2 .
x2
x1 + 7x2
Now that we understand how to ensure A~x is defined and what size vector
it is, let’s figure out how to compute its entries from those of A and ~x.
2 3 5
Example 9. Let A = and ~x = . Use the fact that A~x = f (~x)
4 −2 −8
to compute A~x.
Using the
same
process
as Example
6, we find that
A is the matrix of the
x1 2x1 + 3x2 5
function f = . Plugging in ~x = , we get
x2 4x1 − 2x2 −8
5 2(5) + 3(−8)
f =
−8 4(5) − 2(−8)
66 Functions of Vectors
Notice that since aij is the coefficient on xj in the ith component of A~x,
if A is the matrix of a linear map f we have A~x = f (~x) as planned.
This definition’s notation is quite complicated, but once you have the
overall pattern down, it becomes more routine. Let’s walk through a few
examples.
2
−1 0 1
Example 10. Compute A~x where A = and ~x = −1.
2 4 −1
3
First of all, notice that A is 2×3 and ~x is a 3-vector, so this product makes
sense. It also tells us A~x is a 2-vector.
2.2 Matrices 67
To find the first entry of A~x, we add up the pairwise products across the
first row of A and down ~x.
2
−1 0 1
−1 .
2 4 −1
3
Adding up the pairwise products along the rows of our matrix and down our
vector gives us
2
−1 0 1 −1(2) + 0(1) + 1(1) −1
1 = = .
4 −5 3 4(2) + (−5)(1) + 3(1) 6
1
2
−1
Thus f 1 = .
6
1
A(~v + w)
~ = f (~v + w)
~ = f (~v ) + f (w)
~ = A~v + Aw
~
68 Functions of Vectors
and
A(r · ~v ) = f (r · ~v ) = r · f (~v ) = r · A~v .
We can relate this new multiplication to the linear combinations of vectors
discussed in 1.2. There we were discussing the span of a set of vectors
~v1 , ~v2 , . . . , ~vn , which had the form x1~v1 + x2~v2 + · · · + xn~vn . If we have an
m × n matrix A and we think of its n columns as the m-vectors ~a1 , ~a2 , . . . , ~an ,
we can rewrite A~x as
a11 x1 + a12 x2 + · · · + a1n xn
a21 x1 + a22 x2 + · · · + a2n xn
A~x = ..
.
am1 x1 + am2 x2 + · · · + amn xn
a11 x1 a12 x2 a1n xn
a21 x1 a22 x2 a2n xn
= . + . + ··· + .
.. .. ..
am1 x1 am2 x2 amn xn
a11 a12 a1n
a21 a22 a2n
= x1 . + x2 . + · · · + xn .
.. .. ..
am1 am2 amn
= x1~a1 + x2~a2 + · · · + xn~an .
This means the vector equation x1~v1 + x2~v2 + · · · + xn~vn = ~b can also be
thought of as the matrix equation A~x = ~b or the equation f (~x) = ~b where f
is the linear function whose matrix is A. This connection shouldn’t be totally
surprising, since we constructed A so that the entries in its jth column were
the coefficients of xj in f (~x).
x1 7x1 − x2 + 6x3
Example 14. Let f : R3 → R3 by f x2 = 2x1 + 2x2 − 4x3 . Find ~x
x3 5x1 − 3x2 + 8x3
5
so that f (~x) = ~b where ~b = −2.
5
We now have three options on how to tackle this problem, because we have
three ways of writing this equation: as f (~x) = ~b, as A~x = ~b where A is f ’s
matrix, or as the vector equation x1~a1 + x2~a2 + x3~a3 = ~b where the vectors ~a1 ,
~a2 , and ~a3 are the columns of A. All of these methods will give us the same
answer for ~x, so we can choose whichever one seems easiest to us.
Solving for ~x directly from f (~x) = ~b doesn’t seem easy, so let’s explore
our two other options by rewriting this equation as both a matrix and vector
equation. In both cases, we need to find the matrix A of our function f . Since
f : R3 → R3 , we know A is a 3 × 3 matrix. Picking off the coefficients from f ,
gives us
7 −1 6
A = 2 2 −4 .
5 −3 8
If you’re worried you wouldn’t have noticed this relationship on your own,
don’t worry. We’ll be discussing how to solve f (x) = b in more detail soon.
Let’s switch gears for a moment and explore how to use the geometric
action of a function on Rn to find its matrix. Since we don’t yet know how to
do that, we’ll start in the opposite direction by exploring the geometric effects
of a matrix in hopes that once we understand this process we can reverse it.
0 1
Example 15. Suppose f has matrix A = . What does f do
−1 0
geometrically?
" !
! ! !
!# !! " ! ! #
" "
"
!# !" ! " ! ! #
! "
!
!
" !" "
!
! "
Looking at this image of the unit square, we can guess that f is the function
!#
which rotates
the plane clockwise by 90◦ . However, we also saw an interesting
1 0
thing: f was the first column of f ’s matrix and f was the second
0 1
column of f ’s matrix.
1 0
Why should the images of the and have given us the columns of
0 1
our function’s matrix in the example above? If we think geometrically about
a vector x in Rn , the entries of x tell us that vector’s position along the n
axes of Rn . We can think of each axis as being the span of a vector of length
1 which lies along that axis in the positive direction. These are the n special
n-vectors
1 0 0
0 1 0
e1 = 0 , e2 = 0 , . . . , en = ... ,
.. ..
. . 0
0 0 1
which are sometimes called the standard unit vectors.
72 Functions of Vectors
This makes
x1
x2
~x = . = x1~e1 + x2~e2 + · · · + xn~en .
..
xn
Recall from the discussion before Example 12 that if ~ai is the ith column
of A, then A~x = x1~a1 + x2~a2 + · · · + xn~an . Comparing this with the equation
above tells us that A~ei is the ith column of A! Thus we can interpret a matrix
geometrically by using the rule that A maps the positive unit vector along the
ith axis to the vector which is its ith column. Thus, to understand the effect
on Rn of multiplication by A, it is enough to understand the effect of A on
~e1 , ~e2 , . . . , ~en . This matches what we saw in Example 15. Next, let’s use this
idea to find the matrices of some 2D functions.
We can do this by figuring out where f sends each of the positive unit
vectors along the axes and using them to create f ’s matrix.
Put
more
1 0
concretely, we need to use some geometry to find f and f .
0 1
2
Let’s start by visualizing f ’s effect on the plane.
�
�
1
�
� �
� �
�
-2 -1 � 1 2
�
1
The image of is the vector of length 1 along the line y = x. We can
0 -1
-2
2.2 Matrices 1.0 73
0.8
0.6
�
�
�
0.4 �
0.2
-0.2
1
Rotating doesn’t change its length, so from the picture above, we can
0
-0.4
see that x2 + y 2 = 1 and x = y. This means x = y = √12 , so
" √1 #
1 2
f = 1 .
0 √
2
0
Similarly, we can use the image of in the picture below to figure out
1 1.0
its exact coordinates.
0.8
� 0.6
�
�
� 0.4
0.2
�
-0.2
Using these two images (in order) as the columns of f ’s matrix, we see
74 Functions of Vectors
and
" √1 #
0 2
f = .
1 − 1
√
2
� 1
�
�
�
-2 -1 � 1 2
�
�
�
�
�
-1
a linear function give us the option to find a function’s matrix whichever way
seems easier for our particular function. The option to view a problem as
f (~x) = ~b, A~x = ~b, or x1~a1 + · · · + xn~an = ~b gives us several different ways to
solve for ~x.
Exercises 2.2.
�
�
�
-2 -1 1 2
-1
�
�
�
-2
1 0
22. The picture below shows the effect of the map f on and .
0 1
Use this to find f ’s matrix.
3
78 Functions of Vectors
�
�
�
1
�
�
�
-2 -1 1 2
1
2 1 4
23. Compute 2 .
0 4 −1
3 -1
1 −1
2
24. Compute 4 −3
.
5
0 3
25. Compute each of the following or explain why it isn’t possible.
−2
1 2 5
(a) · 4
3 −1 1
1
1 2 5 −1
(b) ·
3 −1 1 2
26. Compute each of the following or explain why it isn’t possible.
1 2
−1
(a) −1 4 ·
2
0 −5
1 2 −5
(b) −1 4 · 7
0 −5 9
−1
3 11 −1 3 4
27. Let ~v1 = 2 , ~v2 = , ~v3 = , and A = .
−2 5 0 −2 9
1
Compute whichever of A~v1 , A~v2 and A~v3 are possible.
1
2 7 1 −2
28. Let ~v1 = , ~v2 = 2 , ~v3 = , and A = . Compute
−4 5 −4 2
3
whichever of A~v1 , A~v2 and A~v3 are possible.
−1
2 4
0
29. Let ~v1 = , ~v2 = −3, ~v3 = 0 and A = 1 4 0 2 .
10 0 1 2 0
1 −5
−3
Compute whichever of A~v1 , A~v2 and A~v3 are possible.
2.2 Matrices 79
4 −1 2
6 −1
30. Let ~v1 = −2 , ~v2 = , ~v3 = and A = 4 −3.
−5 9
8 7 1
Compute whichever of A~v1 , A~v2 and A~v3 are possible.
2 4 −2 4 −1
31. Write x1 + x2 + x3 + x4 = as a matrix
1 2 −1 0 9
equation.
1 −2 3 −2
32. Write x1 + x2 + x3 = as a matrix equation.
−1 3 1 4
−10 3 8
33. Write 1 0 ~x = −2 as a vector equation.
7 −2 3
2 1/2 4 10
34. Write −1 0 5 ~x = 0 as a vector equation.
7 8 −4 −9
2 2 cos(θ) − sin(θ)
35. (a) Show that fθ : R → R with matrix Rθ =
sin(θ) cos(θ)
rotates
the plane counterclockwise
by the angle θ by computing
1 0
fθ and fθ and arguing that these are the images
0 1
1 0
of and after a counterclockwise rotation by θ.
0 1
(b) We can compute the coordinates of k holes evenly spaced
around a circle of radius r centered at the origin by starting
with a single point on the perimeter of the circle and applying
fθ repeatedly, where θ is k1 th of a complete circle. Use this
method to find the coordinates of 12 evenly spaced holes on
the perimeter of a circle of radius 10.
36. The general formula for the matrix of the map f :"R2 → R2 which
#
2
1−m 2m
1+m2 1+m2
reflects the plane across the line y = mx is Rm = 2m m2 −1
.
1+m2 1+m2
The pattern from our previous example motivates the following definition.
a11 a12 ··· a1n
a21 a22 ··· a2n
Definition. The m × n matrices A = . .. and
.. .
am1 am2 ··· amn
b11 b12 ··· b1n
b21 b22 ··· b2n
B= . .. have sum
.. .
bm1 bm2 · · · bmn
a11 + b11 a12 + b12 · · · a1n + b1n
a21 + b21 a22 + b22 · · · a2n + b2n
A+B = .. .. .
. .
am1 + bm1 am2 + bm2 ··· amn + bmn
Therefore our newly defined addition for matrices corresponds to our older
notion of addition of functions.
−1 0 4 2 6 1
Example 2. Compute + .
10 −5 2 −8 1 2
Since these two matrices are the same size (both are 2 × 3), we can add
them together by adding corresponding pairs of entries. This gives us
−1 0 4 2 6 1 −1 + 2 0+6 4+1
+ =
10 −5 2 −8 1 2 10 + (−8) −5 + 1 2 + 2
1 6 5
= .
2 4 4
3 −1 0 4 6 1 −2
Example 3. Why can’t we compute + ?
7 10 −5 2 9 0 −3
Our first matrix is 2 × 4 while the second is 2 × 3. Since they don’t have
the same number of columns, they aren’t the same size. This means that if we
tried to add their corresponding entries, we’d have entries of the first matrix
without partners in the second matrix. Thus it is impossible to add these two
matrices.
2.3 Matrix Operations 83
The pattern we saw in the previous example holds for any matrix and
any scalar, because when we multiply a function f by a scalar r, we’re really
creating a new function r · f where (r · f )(~x) = r · f (~x). Thus to multiply a
function by a scalar we’re really just multiplying our function’s output vector
by that scalar. Since multiplying a vector by a scalar means multiplying each
entry of the vector by the scalar, we can find r · f by multiplying each entry
of f (~x) by r. Distributing this multiplication by r through each entry of f (~x)
means multiplying the coefficient of each variable in each entry of f (~x) by r,
so the coefficients for r · f are just r times the coefficients of f . We want to
define multiplication of a matrix by a scalar so that if A is the matrix of f
then r · A is the matrix of r · f . This prompts the following definition.
a11 a12 · · · a1n
a21 a22 · · · a2n
Definition. Let A = . .. be an m × n matrix and r be
.. .
am1 am2 · · · amn
ra11 ra12 · · · ra1n
ra21 ra22 · · · ra2n
any scalar. Then r · A = . .. .
.. .
ram1 ram2 ··· ramn
Our discussion above explains why the coefficient on xj in the ith entry
84 Functions of Vectors
Since f is also linear, we can pull the scalar r out of the right-hand side to get
(f ◦ g)(r · ~v ) = r · f (g(~v ))
2.3 Matrix Operations 85
Simplifying (although not all the way because we’re looking for the pattern)
gives us
9(4)x1 + 9(2)x2 + (−1)(6)x1 + (−1)(5)x2
f (g(~x)) =
3(4)x1 + 3(2)x2 + 7(6)x1 + (7)5x2
(9(4) + (−1)(6))x1 + (9(2) + (−1)(5))x2
= .
(3(4) + 7(6))x1 + (3(2) + (7)5)x2
Since AB is the matrix of this composition map, we must have
9(4) + (−1)(6) 9(2) + (−1)(5)
AB = .
3(4) + 7(6) 3(2) + (7)5
9 −1 4 2
Comparing this to A = and B = , we can see that the top
3 7 6 5
left entry of AB is a sum of products of corresponding entries from the top
row of A and the left column of B. This pattern of combining a row of A with
a column of B continues throughout the rest of AB’s entries.
2.3 Matrix Operations 87
To see that the pattern we observed in the previous example holds more
generally, suppose
x1 a11 x1 + a12 x2 + · · · + a1k xk
x2 a21 x1 + a22 x2 + · · · + a2k xk
f . = ..
.. .
xk am1 x1 + am2 x2 + · · · + amk xk
and
x1 b11 x1 + b12 x2 + · · · + b1n xn
x2 b21 x1 + b22 x2 + · · · + b2n xn
g . = .. .
.. .
xn bk1 x1 + bk2 x2 + · · · + bkn xn
To find the ith entry of f (g(~x)), we need to plug the entries of g(~x) into the ith
entry of f (~x) as x1 , . . . , xk . The ith entry of f (~x) is ai1 x1 + ai2 x2 + · · · + aik xk
so plugging in the entries of g(~x) gives us
This looks incredibly messy, but remember that to find the ijth entry of AB,
we only care about the coefficient on xj . The first term in the sum above
contains ai1 b1j xj , the second term contains ai2 b2j xj , and so on until the last
(kth) term which contains aik bkj xj . Thus the xj term in the ith entry of
f (g(~x)) is
(ai1 b1j + ai2 b2j + · · · + aik bkj )xj .
This allows us to make the following definition.
a11 a12 ··· a1k
a21 a22 ··· a2k
Definition. The product of the m × k matrix A = . ..
.. .
am1 am2 · · · amk
b11 b12 ··· b1n
b21 b22 ··· b2n
and the k × n matrix B = . .. is the m × n matrix AB
.. .
bk1 bk2 · · · bkn
whose ijth entry is ai1 b1j + ai2 b2j + · · · + aik bkj .
Notice that to find the ijth entry of AB, we add up the pairwise products
along the ith row of A and down the jth column of B. This should feel similar
to our method for multiplying a vector by a matrix. In fact, the jth column
of AB is A times the vector which is the jth column of B.
88 Functions of Vectors
5 3
−2 −1 1
Example 9. Compute AB where A = and B = 0 1.
0 6 −2
−4 2
The first matrix A is 2 × 3 and the second matrix B is 3 × 2, so the product
AB makes sense and is a 2 × 2 matrix. This means we need to compute each
of the four entries of AB.
Let’s start with the entry in the 1st row and 1st column of AB, so i = 1
and j = 1. (Here k = 3 since that is the number of columns of A and rows of
B.) We want to compute the sum of pairwise products along the 1st row of A
and 1st column of B as shown below.
5 3
− 2 −1 1
0 1
0 6 −2
−4 2
so
−14 −5
AB = .
8 2
The fact that this new operation is called matrix multiplication will tempt
you to assume that it behaves like multiplication in the real numbers. However,
that is a dangerous parallel to draw, because matrix multiplication isn’t
based on multiplication at all – it is based on function composition. With
that in mind, let’s explore two very important differences between matrix
multiplication and our usual notion of multiplication.
The second obvious issue is that even when AB and BA both exist, they
may have different dimensions.
Example 11. The two matrices A and B from Example 9 have both AB
and BA defined, but they are different sizes.
90 Functions of Vectors
The matrix A from Example 9 is 2×3 and the matrix B is 3×2. No matter
which order we list them, the number of rows in the first matrix equals the
number of columns in the second matrix. The product AB that we computed
in Example 9 has as many rows as A and as many columns as B, so it is 2 × 2.
However, the other product BA has as many rows as B and as many columns
as A, so it is 3 × 3. Since AB and BA are different sizes, it is impossible for
them to be equal.
Even if both of these matrix products make sense and are the same size,
remember that AB is the matrix of f ◦ g while BA is the matrix of g ◦ f .
From working with function composition in calculus, you should be familiar
with the fact that f (g(~x)) and g(f (~x)) may be very different functions. This
means it makes sense that AB can easily be quite different from BA. (We will
see some special cases where AB = BA, but they are the exceptions rather
than the rule.)
1 −2 3 1
Example 12. Check that A = and B = have AB 6= BA.
−5 1 0 4
Both A and B are 2 × 2 matrices, so AB and BA are both defined and
both have the same size (also 2 × 2). However
1 −2 3 1 1(3) − 2(0) 1(1) − 2(4) 3 −7
AB = = =
−5 1 0 4 −5(3) + 1(0) −5(1) + 1(4) −15 −1
while
3 1 1 −2 3(1) + 1(−5) 3(−2) + 1(1) −2 −5
BA = = =
0 4 −5 1 0(1) + 4(−5) 0(−2) + 4(1) −20 4
so AB 6= BA.
The moral of the story here is to be very careful about the order of matrix
multiplication and fight against the urge to assume you can switch that order
without changing the product.
Another oddity of matrix multiplication is given by the following theorem.
Note that the “0” in this theorem is the appropriately sized zero matrix.
2 6 12 0
Example 13. Check that AB = 0 where A = and B = .
1 3 −4 0
First of all, notice that neither A nor B are the zero matrix, since each of
them contains at least one nonzero entry.
2.3 Matrix Operations 91
This contradicts our usual intuition from the real numbers where we often
factor an equation of the form f (x) = 0 and then set each factor equal to zero
to solve. This won’t work to solve AB = 0 or even A~x = ~0. We will have to
develop new tools for this more complicated matrix situation.
One property matrix multiplication does share with multiplication of real
numbers is the existence of an identity element which mimics the way 1 acts
in R. There 1r = r1 = r for any real number r. The identity element for
matrix multiplication is the appropriately named identity matrix discussed in
Example 4 of Section 2.2, which is the matrix of the identity map. Since there
is an identity map from Rn to itself for every n, this is actually a collection of
n × n identity matrices
1 0 0 ··· 0
0 1 0 · · · 0
.. .
In = ... ... . . . .
0 0 1 0
0 0 ··· 0 1
Any m × n matrix A corresponds to a map f : Rn → Rm . The composition
of f with the n×n identity map is f ◦idn which is just f , while the composition
of the m × m identity map with f is idm ◦ f which is again f . Translating this
into matrix multiplication tells us that AIn = A and Im A = A.
Exercises 2.3.
1 −1
1. Compute 3 2 0 .
3 −2
10 −5
2. Compute 4 .
5 2
2 4
3. Compute 12 −6 0.
1 8
−1 3
4. Compute 3 2 4.
−5 0
3 7
5. Compute −2 .
1 −1
1 −2
6. Compute 3 .
−1 4
2 2
7. Find 4 .
1 −4
−1 5 1 −2 5
8. Compute + or explain why it isn’t possible.
2 3 2 0 7
−1 5 2 −3
9. Compute + or explain why it isn’t possible.
2 3 0 1
1 −1 1 2 3 −4 2
10. Let A = 2 0 , B = 2 1 −1, C = 1 −1.
3 −2 3 4 0 2 0
Compute whichever of A + B, B + C and A + C are possible.
3 1 −1 3
10 −5
11. Let A = −2 2 , B = , C = 0 −2.
5 2
0 −1 4 9
Compute whichever of A + B, A + C and B + C are possible.
2 4 −1 3
1 2
12. Let A = −6 0 , B = , C = 0 10 .
3 4
1 8 2 9
Compute whichever of A + B, A + C, and B + C are possible.
2.3 Matrix Operations 93
1 2 0 −1 3 3 1 7
13. Let A = −2 3 −2 , B = 2 4, C = −2 −5 10.
0 −1 4 −5 0 0 3 4
Compute whichever of A + B, A + C and B + C are possible.
1 −4 2 5 2 −3 3 7
14. Let A = ,B= ,C= .
0 −1 3 4 0 6 1 −1
Compute whichever of A + B, A + C and B + C are possible.
1 −2 0 2 2 1 0 7
15. Let A = ,B= ,C= .
−4 2 5 1 −4 3 −2 0
Compute whichever of A + B, A + C and B + C are possible.
16. Suppose A is a 10 × 6 matrix.
(a) What size matrix B would make computing AB possible?
(b) What size matrix B would make computing BA possible?
17. Suppose A is a 5 × 9 matrix.
(a) What size matrix B would make computing AB possible?
(b) What size matrix B would make computing BA possible?
18. Suppose A is a 7 × 3 matrix.
(a) What value of n would make computing AIn possible?
(b) What value of m would make computing Im A possible?
a b
19. Let A = . Use matrix multiplication to check that I2 A = A
c d
and AI2 = A.
1 2 5 −2 2
20. Compute · or explain why it isn’t possible.
3 −1 1 4 −1
−2 2 1 2 5
21. Compute · or explain why it isn’t possible.
4 −1 3 −1 1
1 −1 1 2 3 −4 2
22. Let A = 2 0 , B = 2 1 −1, C = 1 −1.
3 −2 3 4 0 2 0
Compute whichever of AB, BC and AC are possible.
3 1 −1 3
10 −5
23. Let A = −2 2 , B = , C = 0 −2.
5 2
0 −1 4 9
Compute whichever of AB, BC and CA are possible.
2 4 −1 3
1 2
24. Let A = −6 0 , B = , C = 0 10 .
3 4
1 8 2 9
Compute whichever of AB, AC, and BC are possible.
94 Functions of Vectors
1 2 0 −1 3 3 1 7
25. Let A = −2 3 −2 , B = 2 4, C = −2 −5 10.
0 −1 4 −5 0 0 3 4
Compute whichever of AB, BC, BA are possible.
1 −4 2 5 2 −3 3 7
26. Let A = ,B= ,C= .
0 −1 3 4 0 6 1 −1
Compute whichever of AB, BC and AC are possible.
1 −2 0 2 2 1 0 7
27. Let A = ,B= ,C= .
−4 2 5 1 −4 3 −2 0
Compute whichever of AB, BC and AC are possible.
2 4 1
7 −3 5
28. Let A = and B = −4 −2 1. Explain why AB
−2 0 8
0 3 6
can be computed, but BA cannot.
3 −2 4 1
29. Let A = and B = . Show AB 6= BA even though
6 1 −1 2
both AB and BA can be computed.
1 −1
30. Let A = . Find a nonzero matrix B for which AB = BA.
0 2
1 −1
31. Let A = . Find a nonzero matrix B for which AB = 0.
−2 2
2.4 Matrix Vector Spaces 95
A + B = B + A.
This also jives with our intuition from calculus where x2 + 2x and 2x + x2 are
the same function. This means matrix addition is commutative.
We can use a similar line of reasoning to see that if we’re adding three
matrices, we can start by adding any pair of them and still get the same
result. This property is also inherited from vector addition, this time from the
fact that (~v + w)
~ + ~u = ~v + (w
~ + ~u). This means for any three functions from
Rn to Rm we have
(A + B) + C = A + (B + C).
What about additive inverses? In other words, for each matrix A in Mmn
we want a partner matrix −A so that −A + A is the zero matrix. Here it is
easier to think computationally instead of in terms of functions. To get each
entry of this sum to be zero, we need the entries of −A to be the entries of A
just with the opposite sign. This means that
−A = (−1) · A.
and therefore
(r + s)A = rA + sA
for the m × n matrix of f .
Similarly, we know scalar multiplication of vectors distributes over vector
addition. In function terms, this means we know
rA + rB = r(A + B).
Did these properties look familiar? That’s because they’re the same as
the good properties of vector addition and multiplication of vectors by scalars
2.4 Matrix Vector Spaces 97
summarized in Theorem 1 of 1.1. One way to think of this is that Mmn is acting
like Rn if we think of matrices in place of the vectors and use matrix addition
and scalar multiplication instead of vector addition and scalar multiplication.
This motivates the following definition.
Note that we will usually call the elements ~v of a vector space V vectors
and put an arrow over their variables even though we know that they may not
be vectors in Rn in the sense of our definition from 1.1. If this is confusing,
you can feel free to mentally add air quotes around the word “vector” when
thinking about this privately.
We saw at the beginning of this section that matrix addition and scalar
multiplication of matrices have these properties.
This gives us tons of vector spaces to work with: an Rn for every choice of
n and an Mmn for every choice of m and n.
98 Functions of Vectors
Before we look at some examples, let’s also generalize our subspace test
from 1.2. Otherwise we’d be forced to recheck all ten properties from our vector
space definition every time we wanted to say something was a subspace, which
is much more work than we really need to do.
I’ve labeled each condition with the same number as its corresponding
property from our definition of a vector space.
To see that this is true, let’s convince ourselves that if we have these three
properties we get the other seven from the definition of a vector space.
Notice that since V is a vector space and W ⊆ V , we automatically
get properties 2, 3, 7, 8, 9, and 10. This means that we really just need to
understand why we get property 5.
Pick any w ~ in W . Since w~ is in V , we know that it has an additive inverse
−w~ in V . If we can show that −w ~ is also in W , we’ll be done. To see that,
consider
~0V = 0 · w
~ = (−1 + 1) · w
~ = (−1) · w
~ +1·w
~ = (−1) · w
~ +w
~
Since this is in W , we’ve satisfied all three conditions of the subspace test.
Therefore, W is a subspace of M22 . This also shows that W is a vector space.
Definition. Let V with + and · be a vector space. The span of ~v1 , . . . , ~vn
in V is Span{~v1 , . . . , ~vn } = {a1~v1 + · · · + an~vn } where a1 , . . . , an are scalars.
Remember that the + and · in this definition are the operations from V .
1 0 1 1
Example 4. Find the span of and in M22 .
0 1 0 0
From our definition, we have
1 0 1 1 1 0 1 1
Span , = a1 + a2 .
0 1 0 0 0 1 0 0
100 Functions of Vectors
~ = a1~v1 + · · · + an~vn
w
and
~u = b1~v1 + · · · + bn~vn
for some scalars a1 , . . . , an and b1 , . . . , bn . This means their sum is
which is clearly in Span{~v1 , . . . , ~vn }. If we let all our scalars equal zero, we
have
0 · ~v1 + · · · + 0 · ~vn = ~0V
in Span{~v1 , . . . , ~vn }. If w
~ is in Span{~v1 , . . . , ~vn } and r is any scalar, then
r·w
~ = r · (a1~v1 + · · · + an~vn ) = r · (a1~v1 ) + · · · + r · (an~vn )
= (ra1 )~v1 + · · · + (ran )~vn
which is also in Span{~v1 , . . . , ~vn }. Since Span{~v1 , . . . , ~vn } satisfies all three
conditions of the subspace test, it is a subspace of V .
a 0
Example 5. Show W = is a subspace of M22 .
0 b
We showed this in Example 3 using the subspace test, but it can also be
shown by writing W as a span.
2.4 Matrix Vector Spaces 101
Now we can factor an a out of the first matrix and a b out of the second to
get
1 0 0 0
W = a +b .
0 0 0 1
We’re more used to seeing x1 and x2 as our scalar coefficients, but there’s no
harm in calling them a and b instead. This means
1 0 0 0 1 0 0 0
W = a +b = Span , .
0 0 0 1 0 0 0 1
Since
1 0 0 0
W = Span ,
0 0 0 1
Theorem 2 tells us that W is a subspace of M22 .
Another important idea from Rn that we can generalize to any vector space
is the idea of linear independence and linear dependence. In 1.3 we gave two
equivalent definitions, so we have a choice here about which one to generalize.
I’ll generalize the vector equation version here, and you can work through
generalizing the span version and arguing that the two are still equivalent in
the exercises.
Definition. A set of elements ~v1 , . . . , ~vk from a vector space V are linearly
dependent if the equation x1 · ~v1 + · · · + xk · ~vk = ~0V has a solution where at
least one of the xi s is nonzero. Otherwise ~v1 , . . . , ~vk are linearly independent.
As with span, here + and · are the addition and scalar multiplication of
V and ~0V is V ’s zero vector.
1 −1 −2 2 −5 5
Example 6. Are , , and linearly independent or
0 2 1 1 1 −5
linearly dependent?
If we have no solutions to this equation apart from setting all the variables
equal to zero, then our three matrices are linearly independent. If we can find
a solution where any of the variables is nonzero, our matrices are linearly
dependent.
Simplifying the left-hand side of this equation gives us
x1 − 2x2 − 5x3 −x1 + 2x2 + 5x3 0 0
= .
x2 + x3 2x1 + x2 − 5x3 0 0
This simplifies to
x1 + x2 −x1 − x3 0 0
=
x1 − x3 −x1 + x2 0 0
which gives us the equations x1 + x2 = 0, −x1 − x3 = 0, x1 − x3 = 0, and
−x1 + x2 = 0. The first equation gives us x1 = −x2 , while the fourth equation
gives us x1 = x2 . This means −x2 = x2 which can only be satisfied if x2 = 0.
This also means x1 = 0. The third equation tells us x1 = x3 , so x3 = 0 as
well. Since our only solution is to have all three variables equal zero, our three
matrices are linearly independent.
Just as we did with subspaces, spans, and linear independence, we can also
generalize our idea of linear functions. We’ll use the same properties as in 2.1,
but now we’ll allow the domain and codomain of our map to be general vector
spaces instead of restricting them to be Rn and Rm .
2.4 Matrix Vector Spaces 103
Remember that the addition and scalar multiplication used on the left-
hand side of each of this definition’s examples are the operations from the
vector space V , while the addition and scalar multiplication used on the right-
hand sides are the operations from W . These may be different if V and W are
different types of vector spaces!
a+d
a b c
Example 8. Show that f : M23 → R3 by f = 2b is
d e f
c+e−f
linear.
This means f satisfies the first condition. Next we need to check scalar
multiplication.
ra + rd
a b c ra rb rc
f r =f = 2rb
d e f rd re rf
rc + re − rf
r(a + d) a+d
= r(2b) = r 2b .
r(c + e − f ) c+e−f
To show this function isn’t linear, we only need to show that it fails one of
our two conditions. This function actually fails both, but here I’ll just show
that it fails the condition for scalar multiplication.
x1 rx1 rx1 rx2 rx1 rx2
f r =f r = =
x2 rx2 rx1 + rx2 rx1 rx2 rx1 + rx2 r2 x1 x2
but
x1 x1 x2 rx1 rx2
rf =r = .
x2 x1 + x2 x1 x2 r(x1 + x2 ) rx1 x2
Comparing the bottom right entries shows that these two options are not
equal, so f is not linear.
Exercises 2.4.
3 17 0
1. What is the additive inverse of in M23 ?
−5 2 −8
−1 9
2. What is the additive inverse of in M22 ?
8 0
a −2a
3. Show W = with the usual matrix addition and scalar
0 a
multiplication is a subspace of M22 .
a b
4. Is W = a subspace of M22 ?
−a 2b
a b
5. Let W be the set of all 2 × 2 matrices of the form where
c 0
a + b + c = 1. Is W a subspace of M22 ?
6. The trace of an n × n matrix A, written tr(A), is the sum of A’s
diagonal entries. Is W = {A in M22 | tr(A) = 0} a subspace of M22 ?
a b
7. Show that V = a, b, c, d ≥ 0 with the usual matrix
c d
addition and scalar multiplication is NOT a vector space.
8. Explain why we can’t make the set of n × n matrices into a vector
space where the “+” is defined as matrix multiplication, in other
words, where A + B = AB.
9. Explain why we can’t make R into a vector space where “+” is
defined to be the usual multiplication of real numbers, i.e., a + b =
ab.
2.4 Matrix Vector Spaces 105
a 0
10. Show that D = with a, b in R is a vector space under
0 2a
the usual operations of M22 .
11. Fix a particular 3×3 matrix A. Let W be the set of all 3×3 matrices
which commute with A, i.e., W = {B in M33 | AB = BA}. Show
that W is a vector space.
1 0 0 1
12. Find the span of and in M22 .
0 2 1 0
−1 0 1 0 4 0
13. Find the span of and in M23 .
0 1 0 0 −1 1
1 −2 2 −1 −1 0 1 1
14. Is in the span of , , and ?
−1 0 0 2 3 −1 1 1
2 −2 10 3 −1 4 4 0 −2
15. Is in the span of and ?
−1 1 8 2 0 5 5 −1 2
1 2 4 3 −2 1
16. Are ~v1 = , ~v2 = and ~v3 = linearly
3 4 2 −1 4 9
independent or linearly dependent in M22 ?
2 0 1 1 0 1
17. Are ~v1 = , ~v2 = , and ~v3 = linearly
0 2 0 1 1 0
independent or linearly dependent in M22 ?
2 0 1 1 0 1
18. Are ~v1 = , ~v2 = , and ~v3 = linearly
0 2 0 1 1 0
independent or linearly dependent in M22 ?
1 −1 −3 2 0 −1
19. Are ~v1 = , ~v2 = , ~v3 = linearly
0 −2 1 −1 1 −7
independent or linearly dependent in M22 ?
20. In 1.3 we had two equivalent definitions for linear dependence, one
in terms of an equation and one involving spans. In this section,
we generalized the equation definition to get our definition of linear
dependence in a vector space.
(a) Generalize the definition of linear dependence involving spans
to a vector space.
(b) Generalize our explanation from 1.3 of why the two definitions
were equivalent to a vector space.
x1 x1 + 3x2 0
21. Let f = .
x2 0 2x1 − x2
(a) What is the domain of f ?
(b) What is the codomain of f ?
a+b
a b
22. Let f = b − c .
c d
2d
106 Functions of Vectors
We can answer this question by solving f (~x) = ~0. In our case this is
0 0
x1 − 2x2 = 0 .
−3x1 + 6x2 0
This means we need x1 −2x2 = 0 and −3x1 +6x2 = 0. Both of these equations
simplify to 2 ~ 3
x1 = 2x2 , so a vector from R maps to 0 in R exactly when it has
x1
the form .
2x1
We can see this subset of R2 geometrically by relabeling x1 and x2 as x
and y, so our requirement to map to ~0 becomes x = 2y. Now solving for y
2
allows us to view this set in a familiar format as the line y = 12 x pictured
below.
1
�
�= �
�
-2 -1 1 2
-1
-2
108 Functions of Vectors
Since this subset of the domain is clearly important, we make the following
definition.
In other words, the kernel of a linear function is the set of all vectors in
its domain which map to the zero vector of the codomain. (Keep in mind that
we’re using “zero vector of the codomain” in the context of our vector space
definition, so if W = Mmn our zero vector would be the zero matrix.)
3
2. Find the kernel of the function f : R → M22 where
Example
x1
x x1 − x3
f x2 = 2 .
0 x2
x3
The domain of f is R3 and the codomain is M22 , so the kernel of f is the
set of 3-vectors which map to the 2 × 2 zero matrix. We can solve for those
vectors using the equation f (~x) = ~0M22 which can be rewritten as
x2 x1 − x3 0 0
= .
0 x2 0 0
In other words, the null space of A is all vectors ~x which satisfy A~x = ~0.
(Here our zero vector is the actual vector of zeros in Rm .) If f is the linear
function with matrix A, then A~x = ~0 is the same as f (~x) = ~0, so the null
space of A is the kernel of f .
2.5 Kernel and Range 109
1 −2
Example 3. Find the null space of A = .
− 12 1
The null space of A is all 2-vectors ~x where A~x = ~0. We can rewrite this
as
1 −2 x1 0
= .
− 12 1 x2 0
Multiplying out the left-hand side gives us
x1 − 2x2 0
= .
− 12 x1 + x2 0
So far we’ve seen the kernel (or null space) as a subset of the domain of
our function. However, the situation is more special than that.
Another way to say this (while practicing our new vocabulary) is that
the kernel of a linear function is a subspace of its domain, which we can
check using our subspace test. This means we need to verify that the kernel
contains the zero vector of the domain and is closed under addition and scalar
multiplication.
To show that ~0V (the zero vector of the domain V ) is in the kernel of f ,
we need to show f (~0V ) = ~0W . To do this we’ll use the fact that f is linear
along with a trick: because (−1) · ~x is the additive inverse, −~x, of ~x, we get
~0 = ~x + (−1) · ~x for any vector ~x in any vector space V . This means
so ~0V is in ker(f ).
To show that the kernel of f is closed under addition, suppose we have two
vectors ~v1 and ~v2 in ker(f ). We need to show ~v1 + ~v2 is also in ker(f ). Since
~v1 and ~v2 are in ker(f ) we know f (~v1 ) = ~0W and f (~v2 ) = ~0W . Therefore we
have
f (~v1 + ~v2 ) = f (~v1 ) + f (~v2 ) = ~0W + ~0W = ~0W
so ~v1 + ~v2 is in ker(f ).
110 Functions of Vectors
This means that we automatically know that the sets we found in Examples
1 and 3 are subspaces of R2 and the set we found in Example 2 is a subspace
of R3 .
During our subspace check above, we showed that the kernel of any linear
map contains the zero vector of its domain. One interesting question we can
ask about a function is whether its kernel contains anything besides the zero
vector. This is equivalent to asking when the only vector that maps to the
zero vector of the codomain is the zero vector of the domain. Geometrically,
this would mean that there is no collapsing as we map to ~0 since only a single
unique vector is being mapped there. We can expand this idea of a function
having no collapsing as we map to anything in the codomain, which motivates
the following definition.
x
x
Example 5. The function f : R3 → R2 by f y = is not 1-1.
y
z
Geometrically, this function can be thought of as a projection of R3 into
the xy-plane. This projection definitely collapses multiple different vectors to
the same output vector as each line in the z direction is identified with a single
vector in R2 . Algebraically, f identifies all vectors in R3 which differ only in
the z component. From either perspective, f is not 1-1.
2.5 Kernel and Range 111
This means another way to view 1-1 functions is that they are the functions
with the smallest possible kernels.
To see why this theorem is true, suppose we have a linear map f : V → W
and two vectors ~x and ~y in V for which f (~x) = f (~y ). This can be rewritten as
f (~x − ~y ) = ~0W .
Now that we’ve discussed the kernel within our domain, let’s switch gears
to look at a special subset of our codomain. Since the codomain was defined
as the vector space where the outputs of the function live, it’s very natural to
ask what those outputs are.
2.5 Kernel and Range 113
2 2
Example
10.
What are the outputs of the function f : R → R by
x x−y
f = ?
y −x + y
The outputs of this function are all vectors in the codomain, R2 , which
are f (~x) for some vector ~x in the domain. Looking at the formula for our
x−y
function, this means our outputs are all 2-vectors of the form for
−x + y
x−y
some scalars x and y. We can rewrite this output vector as , which
−(x − y)
shows that our outputs can be thought of as any 2-vector whose second entry
is the negative
of the first entry. To summarize, f ’s outputs are all 2-vectors of
2
z
the form for some scalar z. We can visualize this set as the line y = −x
−z
in the picture below.
-2 -1 1 2
� = -�
-1
In other words, the range of a function is the set of all its outputs. This is
clearly a subset of its codomain.
a+c
a b
Example 11. Find the range of f : M22 → R3 by f = b .
c d
2b
The range of f is the set of all 3-vectors which are f (A) 2 × 2
for some
a b
matrix A. We can express this as the set of ~x where ~x = f , i.e., we
c d
114 Functions of Vectors
As with the kernel and null space, the range has another name in the
special case where our linear function f maps from Rn to Rm as in Example
10.
In other words, the column space of A is the set of all vectors ~b for which
A~x = ~b has a solution. From 1.2 we know that this set can also be expressed
as the span of A’s columns, so an alternate way of thinking about the column
space is as the span of the columns of A.
1 1
Example 12. Find the column space of A = 0 0 .
0 −2
Our matrix is 3 × 2, so its column space is the set of all 3-vectors ~b where
A~x = ~b for some 2-vector ~x. We can rewrite this equation as
1 1 b1
0 0 x1 = b2
x2
0 −2 b3
which simplifies to
x1 + x2 b1
0 = b2 .
−2x2 b3
(As in the previous example, we view b1 , b2 , and b3 as known quantities and
solve for x1 and x2 .) Setting corresponding entries equal gives us x1 + x2 = b1 ,
2.5 Kernel and Range 115
As with the kernel and domain, the range (or column space) is more than
just a subset of the codomain.
As with the kernel, we’ll use the subspace test by checking that the range
contains the zero vector of the codomain and is closed under addition and
scalar multiplication.
To show that ~0W (the zero vector of the codomain W ) is in the range, we
need to find some ~v in V with f (~v ) = ~0W . However, we’ve already shown that
f (~0V ) = ~0W , so ~0W is in the range of f .
To show that the range of f is closed under addition, suppose we have
two vectors w ~ 1 and w~ 2 in the range. We need to show w ~1 + w
~ 2 is also in the
range. Since w ~ 1 and w ~ 2 are in the range, we must have ~v1 and ~v2 in V with
f (~v1 ) = w
~ 1 and f (~v2 ) = w~ 2 . Since f is linear, we get
~1 + w
w ~ 2 = f (~v1 ) + f (~v2 ) = f (~v1 + ~v2 )
and w
~1 + w~ 2 is in the range of f .
To show that the range of f is closed under scalar multiplication, suppose
~ 1 in the range and a scalar r. We need to show r · w
we have w ~ 1 is also in the
range. As above, we have ~v1 with f (~v1 ) = w
~ 1 , so
r·w
~ 1 = r · f (~v1 ) = f (r · ~v1 )
so r · w
~ 1 is in the range of f .
Therefore the range is a subspace of the codomain.
This means we get that the set from Example 10 is a subspace of R2 and
the sets from Examples 11 and 12 are subspaces of R3 .
Unlike in our discussion of 1-1 where we asked which functions had the
smallest possible kernel, we’ll ask which functions have the largest possible
range. Since the range is a subset of the codomain, the largest the range
could be is the whole codomain. Geometrically, this means that the function’s
outputs fill up the entire codomain, which prompts the following definition.
116 Functions of Vectors
a+c
a b
Example 14. The function f : M22 → R3 by f = b is not
c d
2b
onto.
x1
We saw in Example 11 that range(f ) = x2 . Since not all 3-vectors
2x2
have x3 = 2x2 , this is strictly smaller than f ’s codomain R3 . Therefore f is
not onto.
As we did with the kernel, let’s explore onto functions in the case where
f : Rn → Rm . Here the range of f can also be thought of as the column space
of f ’s m × n matrix A. If f is onto, then Col(A) = Rm . This means Col(A)
has dimension m. Since the column space of A is the span of A’s columns,
this means that A must have m linearly independent columns. Since A has
n columns in total, A has at most n linearly independent columns. Therefore
we get a parallel to Theorem 2.
Example 15. Use Theorem 5 to check whether or not the function f with
7 −2 1
matrix is onto.
3 4 0
This matrix is 2 × 3, so we know f : R3 → R2 . Since 2 ≯ 3, Theorem 5
can’t tell us anything about whether or not the function corresponding to this
matrix is onto.
2.5 Kernel and Range 117
Example 16. Use Theorem 5 to check whether or not the function f with
4 9
matrix −1 0 is onto.
2 1
This matrix is 3 × 2, so we know f : R2 → R3 . Since 3 > 2 we know the
function corresponding to this matrix is not onto.
Exercises 2.5.
0 1 0 a b c
a c
1. Is −1 0 1 in the kernel of f d e f = ?
2a e+k
2 1 −2 g h k
3 −2 a b 4a − 6d
2. Is in the kernel of f = ?
6 2 c d 3b + d
x1
x2 4x1 + 8x3
3. Find the kernel of f
= .
x3 x2 − x3 + 2x4
x4
x1 x1 − 4x2 + 2x3
4. Find the kernel of f x2 = −x2 + 3x3 .
x3 x1 − 3x2 − x3
2a − b
a b
5. Find the kernel of f = a + c − d.
c d
−3c
x1
x + x2 0
6. Find the kernel of f x2 = 1 .
x2 + x3 −x1 − x2
x3
a b d a−b
7. Find the kernel of f = .
c d a+c 5d
a b c 2f + 4b c − 3a
8. Find the kernel of f = .
d e f a − 2b −e
1 3 0
9. Find the null space of A = −2 2 −8.
0 −1 1
2 0 4 −2
10. Find the null space of A = .
0 1 −3 1
118 Functions of Vectors
1 −1 0 2 0
11. Find the null space of A = 0 0 1 −3 0.
0 0 0 0 1
1 −4 0 1 0 0
12. Find the null space of A = 0 0 1 −2 0 −1.
0 0 0 0 1 5
13. Explain, without doing
any computations,
why the linear map
1 0 0 −2
whose matrix is A = cannot be 1-1.
0 0 1 3
14. Explain, without doing any computations, why the linear map
1 −5 0 0
whose matrix is A = 0 0 1 −2 cannot be 1-1.
0 0 −4 8
x1 + x2
x1
15. Is f = 0 a 1-1 function?
x2
2x1 + x2
a b a−d c
16. Is f = a 1-1 function?
c d d b+c
a b c
1 2 a c
17. Is in the range of f d e f = ?
3 4 2a e + k
g h k
−1 1 0 0 −2
18. Is in the range of the function with matrix ?
7 0 0 1 3
1 −1
19. Find the range of the linear map f whose matrix is 4 −3.
0 3
20. Find therange of
the linear map f whose matrix is
1 −3 0
0 0 1
A= .
0 0 0
0 0 0
x1
2x1 x2
21. Find the range of f x2 = .
x2 + x3 x3
x3
a + 2e + f
a b c
22. Find the range of f = d + c .
d e f
−4b
a b c a+b c
23. Find the range of f = .
d e f d e+f
2.5 Kernel and Range 119
a+c 0 0
a b
24. Find the range of f = 0 b+d 0 .
c d
0 0 a+b+c+d
−1 1 −4 2
25. Is ~v = 2 in the column space of A = 0 −1 3 ?
1 1 −3 −1
4 2 −1 5
26. Is ~v = 10 in the column space of A = 10 3 −1?
2 0 −1 4
1 1
27. Find the column space of A = 0 0.
2 5
2 5
28. Find the column space of A = .
4 10
29. Explain without
doing any
computations why the linear map whose
9 −2
matrix is A = 3 1 cannot be onto.
−4 6
30. Explain without
doing any computations
why the linear map whose
2 0 4
−3 1 1
matrix is A =
6 −6 2 cannot be onto.
0 −1 1
x1
2x1
31. Is f x2 = an onto function?
−x2
x3
x1
x1 x2
32. Is f x2 = an onto function?
x3 x1 + x2
x3
33. Let f : R3 → R2 be a linear map whose matrix is A.
(a) Give an example of an A for which f is an onto map.
(b) Give an example of an A for which f is not an onto map.
(c) Briefly explain why we can’t give an A for which f is 1-1.
34. Let f : R2 → R3 be a linear map whose matrix is A.
(a) Give an example of an A for which f is a 1-1 map.
(b) Give an example of an A for which f is not a 1-1 map.
(c) Briefly explain why we can’t give an A for which f is onto.
120 Functions of Vectors
1. Find ~
Example
the augmented
coefficient matrix of A~x = b where
−3 4 −1 5
A= and ~b = .
0 2 6 −2
The augmented coefficient matrix is the matrix A with ~b added as its new
fourth column to give us
−3 4 −1 5
.
0 2 6 −2
(Note the vertical line where we joined ~b onto A, which tells us this is an
augmented coefficient matrix.)
2.6 Row Reduction 121
From 1.1 we know two vectors are equal precisely when each pair of their
corresponding entries is equal, so the equation above can be broken up into
m separate equations which must all hold at once. These equations are
~
Example 2. Write thematrix equation
A~x = b as list of linear equations,
−3 4 −1 5
where A = and ~b = .
0 2 6 −2
Each row of A gives us the coefficients for one equation, which will equal
the corresponding entry in ~b. Since A has 2 rows and ~b has 2 entries, this means
we’ll have 2 linear equations. Each column of A corresponds to a variable, so
A’s 3 columns mean we have 3 variables which I’ll call x1 , x2 , and x3 . Putting
this all together, the first row of A and first entry of ~b give us the equation
−3x1 + 4x2 − x3 = 5. The second row of A and second entry of ~b give us
2x2 + 6x3 = −2. (Note that there is no x1 term in the second equation,
because the first entry in A’s second row is 0, which means x1 ’s coefficient in
the second equation is 0.) This means we can write A~x = ~b as
−3x1 + 4x2 − x3 = 5
2x2 + 6x3 = −2.
This matrix has a row of zeros, which is at the bottom. The first nonzero
entry in each of the other rows is a 1. These leading 1s are circled below.
1 0 0 −1
0 1 0 2
0 0 1 5
0 0 0 0
The first three columns, which contain the leading 1s, have all other entries
0. The leading 1 in the top row is above those in the second and third rows,
and it is also to their left. Similarly, the leading 1 in the second row is above
that in the third row, and it is also to its left. Thus A is in reduced echelon
form.
Note that this matrix isn’t in our ideal format because of the bottom row
of zeros. However, this row corresponds to the equation 0 = 0, so isn’t really
a problem.
1 0 3 0 −1
Example 5. Circle all leading 1s of A = 0 1 1 0 2 . Is A in reduced
0 0 0 1 9
echelon form?
Our matrix doesn’t have any rows of zeros, and the leftmost nonzero entry
of each row is a 1. See the picture below where these leading 1s are circled.
1 0 3 0 −1
0 1 1 0 2
0 0 0 1 9
(Note that the 1 in the second row and third column is not a leading 1,
because it is not the leftmost nonzero entry.)
The first, second, and fourth columns with the leading 1s have all other
entries 0. The leading 1 in the top row is above those in the second and third
rows, and it is also to their left. Similarly, the leading 1 in the second row is
above that in the third row, and it is also to its left. Thus A is in reduced
echelon form.
This matrix also isn’t in our ideal format, because it doesn’t have an
equation of the form x3 = s. This is because there is no leading 1 in the third
2.6 Row Reduction 125
column. We also have x3 terms in the equations defining x1 and x2 . This does
happen sometimes, and we’ll discuss how to deal with this type of outcome in
2.7 and 2.8.
1 0 1
Example 6. Why isn’t A = in reduced echelon form?
0 −2 3
The leftmost nonzero entry in the second row is −2 instead of a leading 1,
so A isn’t in reduced echelon form.
1 0 0
Example 7. Why isn’t A = 0 0 1 in reduced echelon form?
0 1 0
Note that this isn’t an augmented coefficient matrix. That’s okay, we can
still think about whether or not it is in reduced echelon form.
The leading 1 in the second row of A is above the leading 1 in the third
row, but it is not to the left of the third row’s leading 1. This is why A isn’t
in reduced echelon form.
0 1 3
Example 8. Why isn’t A = in reduced echelon form?
0 0 1
The third column has a leading 1, but its other entry isn’t 0 (instead it
is 3). This means A isn’t in reduced echelon form. (The fact that the first
column of the matrix doesn’t have a leading 1 isn’t a problem.)
Now that we understand reduced echelon form, we can start developing the
tools needed to create an algorithm which transforms an augmented coefficient
matrix into an augmented coefficient matrix in reduced echelon form while at
every step keeping the solutions to our matrix equation the same. These tools
are based on methods from algebra used to manipulate sets of equations, and
since those equations correspond to rows of our augmented coefficient matrix
our algorithm’s tools are called row operations.
The first of our three row operations is to swap the order of the rows in
our augmented coefficient matrix, which corresponds to swapping the order
of equations in our list. The order in which the equations appear in a list
doesn’t matter, so it is clear that reordering the rows of our augmented
coefficient matrix doesn’t change the solutions. We’ll usually do this reordering
by swapping one pair of rows at a time. I’ll use the notation ri ↔ rj to indicate
that we’re swapping the ith and jth rows of our matrix.
1 1 2 1 −1 1
Example 9. The solutions to and are the same
1 −1 1 1 1 2
1 1 2 1 −1 1
where →r1 ↔r2 .
1 −1 1 1 1 2
126 Functions of Vectors
Our next row operation is to multiply one row of our augmented coefficient
matrix by a nonzero scalar. This corresponds to multiplying an entire equation
by a nonzero scalar, i.e., multiplying all coefficients and the value the equation
equals by that scalar. Since our scalar isn’t zero, this doesn’t change the
solutions of that equation. I’ll use the notation c ri to indicate that we’re
multiplying the ith row of the augmented coefficient matrix by c.
2 1 6 2 1 6
Example 10. The solutions to and are the same where
0 4 8 0 1 4
2 1 6 2 1 6
→ 14 r2 .
0 4 8 0 1 2
The first matrix corresponds to the equations 2x1 + x2 = 6 and 4x2 = 8
while the second corresponds to the equations 2x1 + x2 = 6 and x2 = 4.
When solving the first pair of equations, we’d automatically divide the second
equation by 4 (i.e., multiply by 14 ) to get x2 = 2. This is the second equation
from our second augmented coefficient matrix, so this row operation doesn’t
change the fact that x2 = 2. The first rows of our two matrices are the same,
so we’d be plugging x2 = 2 back into the same equation 2x1 + x2 = 6 to
get x1 = 3 in both cases. Therefore multiplying our second row by 14 didn’t
change the solution.
Our final row operation is to add a multiple of one row of our augmented
coefficient matrix to another row. This corresponds to adding a multiple of
one equation to another equation, and is commonly done to use part of one
equation to cancel out one of the variables from another equation (see the
example below). It also doesn’t change the solutions of our list of equations.
I’ll use the notation ri + c rj to indicate that we’re replacing the ith row of
our augmented coefficient matrix by the ith row plus c times the jth row.
(Note that the notation for this row operation is not symmetric, since the row
operation itself isn’t symmetric. The row listed first is the only row of the
matrix being changed.)
−1 2 3 −1 2 3
Example 11. The solutions to and are the same
1 −1
5 0 1 8
−1 2 3 −1 2 3
where →r2 +r1 .
1 −1 5 0 1 8
2.6 Row Reduction 127
Here our first matrix corresponds to the equations −x1 + 2x2 = 3 and
x1 − x2 = 5. When solving this, a common strategy is to add two equations
to cancel out one of the variables so we can solve for the other variable. In
this case, if we simply add the equations together, we’ll cancel out x1 to get
x2 = 8. Plugging this back into one of the original equations then gives us
x1 = 13. The second matrix corresponds to the equations −x1 + 2x2 = 3 and
x2 = 8. Essentially, here we’ve already done the addition of equations. In any
case, you can see that the solution is still x1 = 13 and x2 = 8.
Now that we understand the three building blocks of our algorithm, it’s
time to create the algorithm itself. Our goal is to use these three row operations
to transform our augmented coefficient matrix into its reduced echelon form.
Since row operations don’t change the solutions, we’ll be able to solve our
original matrix equation by reading off the solutions of the reduced echelon
form’s matrix equation. The row reduction algorithm has two parts. In the
first half, we start at the top left corner of our matrix and work our way
downward and to the right, creating the leading 1s and the zeros beneath
them. In the second half, we start at the bottom right corner of our matrix
and work our way up and to the left, creating the zeros above each leading
1. Since our matrices have finitely many rows and columns, this means our
algorithm will always terminate in finitely many steps. Formally, the row
reduction algorithm’s instructions are as follows:
Part 1:
• If possible, swap rows to put a nonzero entry in the top left corner of the
matrix. If not possible, skip to the last step in this part.
• Multiply the top row by a nonzero constant to make its first nonzero entry
into a leading 1.
• Add a multiple of the top row to each lower row to get zero entries below
the top row’s leading 1.
• Ignore the top row and leftmost column of the matrix. If there are any rows
remaining, go back to the top of Part 1, and repeat the process on the
remaining entries of the matrix. If there are no rows remaining after you
ignore the top row, go to Part 2.
Part 2:
• If the bottom row has a leading 1, add a multiple of the bottom row to
each higher row to get zero entries above that leading 1. If the bottom row
doesn’t have a leading 1, skip to the next step.
128 Functions of Vectors
• Ignore the bottom row of the matrix. If there are two or more rows remaining,
go back to the top of Part 2, and repeat the process on the remaining entries
of the matrix. If you have reached the top row after you ignore the bottom
row, you’re done with the algorithm.
(Note that we multiplied the whole row by 12 , not just the first entry!) Next
we need to add multiples of the top row to the lower rows to get zeros below
our leading 1. The second row already has a zero as its first entry, so we just
need to tackle the third row. Its first entry is −1, so we can simply add row 1
to row 3 which gives us
1 0 0 −2 1 0 0 −2
0 3 3 0 →r3 +r1 0 3 3 0 .
−1 0 1 6 0 0 1 4
We’ve reached the end of our first repetition of Part 1 of the algorithm, so
we’ll ignore the top row and left column of our matrix and repeat Part 1 on
the remaining matrix entries shown in the picture below.
1 0 0 −2
0 3 3 0
0 0 1 4
The top left corner of this remaining section is nonzero, so we can skip
2.6 Row Reduction 129
swapping rows and go directly to creating our leading 1. Since the current
entry is 3, we’ll multiply the second row of the matrix by 13 and get
1 0 0 −2 1 0 0 −2
0 3 3 0 → 1 r 0 1 1 0 .
3 2
0 0 1 4 0 0 1 4
We already have zero underneath our new leading 1, so we can again drop the
top row and left column of our block as shown below.
1 0 0 −2
0 1 1 0
0 0 1 4
Our remaining section’s top left corner is not only nonzero, it is already a
leading 1! There are no matrix entries below this leading 1 and we’ve reached
the bottom row of our matrix, so we’re done with Part 1 of our algorithm and
ready to start Part 2.
The bottom row of our matrix has a leading 1, so we need to add multiples
of this third row to the rows above to get zeros above that leading 1. The top
row already has a zero there, but the second row doesn’t. Since the second
row’s entry is 1, we’ll need to add −1 times the third row to the second row
to cancel out that entry. This looks like
1 0 0 −2 1 0 0 −2
0 1 1 0 →r2 −r3 0 1 0 −4 .
0 0 1 4 0 0 1 4
We’ve reached the end of our first repetition of Part 2, so we’ll ignore the
bottom row and repeat Part 2 on the top two rows of our matrix. This makes
our new “bottom row” the second row, which has a leading 1. Since the top
row has a zero above this leading 1, we don’t have to do anything. Ignoring
the second row leaves us at the top row of our matrix, so we’re done with the
algorithm and can see that
0 3 3 0
2 0 0 −4
−1 0 1 6
Now that we understand how to find the reduced echelon form of a matrix,
let’s practice using that to solve a matrix equation.
130 Functions of Vectors
0 3 3
Example 13. Solve the matrix equation A~x = ~b where A = 2 0 0
−1 0 1
0
and ~b = −4.
6
In Example 12, we saw that the augmented coefficient matrix of this
equation has reduced echelon form
1 0 0 −2
0 1 0 −4 .
0 0 1 4
Since the original augmented coefficient matrix and the reduced echelon form
have the same solutions, we can read off the solutions from the reduced
echelon
−2
form. This gives us x1 = −2, x2 = −4, and x3 = 4 or ~x = −4. If you’d
4
like to check your work, you can multiply this vector by our original matrix
A and see that your answer is ~b.
Exercises 2.6.
1. Find
the augmented
coefficient matrix of the matrix equation
3 −4 0 −5
~x = .
4 1 17 8
2. Find
the
augmented
coefficient matrix of the matrix equation
3 4 4
6 2 ~x = −2.
0 1 0
3. Find
the augmented
coefficient
matrix of the matrix equation
−1 3 5 0
2 4 6 ~x = −6.
0 9 −2 10
4. Find
the augmented
coefficient
matrix of the matrix equation
9 2 0 −1 1
~x = .
5 −8 3 4 2
2.6 Row Reduction 131
6 −6 1 0
(a) Rewrite this augmented coefficient matrix as a matrix equation.
(b) Rewrite this augmented coefficient matrix as a list of equations.
(c) Rewrite this augmented coefficient matrix as a vector equation.
15. Decide whether or not the following matrix is in reduced
echelon
1 −3 0 0
form. If it is, circle all its leading 1s. 0 0 1 0
0 0 0 1
132 Functions of Vectors
0 0 1 2 3
4 8 0 −4
0 −1 2 1
20. Find the reduced echelon form of A = 3 6
.
0 −3
0 2 −4 6
0 1 −5 0
21. Find the reduced echelon form of A = −1 2 0 −2.
0 −4 12 4
−5 0 −10 5
22. Find the reduced echelon form of A = 6 1 12 8 .
2 0 3 −1
4 0 8
23. Find the reduced echelon form of A = 2 1 4 .
−3 0 −7
4 −2 4 14
24. Find the reduced echelon form of A = .
7 0 7 14
25. The
reduced echelon
form of A~x = ~b’s augmentation matrix is
1 0 0 −7
0 1 0 2 . Use this to solve for ~x.
0 0 1 6
26. The
reduced echelon
form of A~x = ~b’s augmentation matrix is
1 0 0 4
0 1 0 −3. Use this to solve for ~x.
0 0 1 12
2 0 2 14
27. Use row reduction to solve 3 1 −1 ~x = 0 .
1 −3 1 10
2.6 Row Reduction 133
2 4 0 2
28. Use row reduction to solve 5 1 −3 ~x = −1.
1 0 1 −7
−2 0 4 −14
29. Use row reduction to solve x1 0 + x2 3 + x3 2 = 7 .
−1 0 4 −9
6 −3 −12
30. Use row reduction to solve x1 + x2 = .
10 4 7
31. Use row reduction to solve the following equations simultaneously:
−3x1 + x3 = −1, x1 + x2 + x3 = 0, and 2x1 − 4x2 + x3 = −5.
32. Use row reduction to solve the following equations simultaneously:
x1 + x2 + x3 = 6, −3x2 + 2x3 = 9, and −5x1 + 4x2 + x3 = 15.
134 Functions of Vectors
1 �
�
� � �
� =
� �/�
� -� / �
� =
� � �
-2 -1 1 1 2
0
�= � �
-1
√
Essentially we’re
asking which vector reflects across the line y = ( 3)x
−1
to land on ~b = . This could perhaps
-2 be worked out geometrically with
1
30-60 right triangles, but I think
" it is
√ #
much easier to solve via row reduction.
1 3
−
Our function has matrix √ 2 2
, so the augmented coefficient matrix
3 1
2 2
2.7 Applications of Row Reduction 135
-� 1
�
-2 -1 1 2
�����
�= � � -�����
-1
corn required to produce more corn). Suppose we know that for each dollar of
agricultural output we need $0.30 of agricultural input, $0.10 of manufacturing
input, and $0.20 of service input. For each dollar of manufacturing output we
need $0.10 of agricultural input, $0.50 of manufacturing input, and $0.20 of
service input. For each dollar of service output we need $0 of agricultural
input, $0.10 of manufacturing input, and $0.40 of service input. If we want to
produce $150,000 of agricultural output, $100,000 of manufacturing output,
and $225,000 of service output above what is needed in the production process,
how much does each industry need to produce?
Let’s start by setting up the vectors that will model demand and
production. We have three industries to keep track of, so we’ll use 3-vectors
to model both the production and demand. I’ll let the first entry track
agriculture, the second track manufacturing, and the third track services.
150, 000
This means that the net output or external demand vector is ~b = 100, 000
225, 000
since this is what we want to produce beyond the input requirements of
theseindustries’
production. We can also set up our overall production vector
x1
~x = x2 where xi is the total amount, including input requirements, each
x3
industry should produce.
Next we need to set up our input-output matrix A. We want A~x to give the
input requirements of producing ~x, which means A must be a 3×3 matrix. If we
consider the top entry of A~x, it should tell us the amount of agricultural input
needed to produce ~x, i.e., $0.30 times the amount of agricultural production
which is x1 , $0.10 times the amount of manufacturing production which is
x2 , and $0 times the amount of service production which is x3 . Therefore the
top row of our matrix contains the amounts of agricultural input needed to
produce each industry in order of our vector’s entries. If we label the rows and
columns of A in the same order as our vectors, then extending this reasoning
to the other two entries of A~x shows us that aij is the amount of industry i’s
input needed by industry j. For example, a12 was the amount of agricultural
input (and agriculture is our vector’s first entry) needed for manufacturing
(which is our vector’s
second entry). This means our input-output matrix is
0.3 0.1 0
A = 0.1 0.5 0.1.
0.2 0.2 0.4
Now that we have A, ~x, and ~b, we need to relate them in a matrix equation.
The total output we want to produce is the sum of the outputs needed as
inputs for our various industries, A~x, plus the external demand ~b, so our total
output is A~x + ~b. However, we plan to produce ~x, so our total output is also
~x. Setting these two versions of total output equal gives us ~x = A~x + ~b.
2.7 Applications of Row Reduction 137
This is a matrix equation, but it doesn’t have the right format for row
reduction so we’ll need to do some algebra to get it there. We can start by
subtracting A~x from both sides to get ~x − A~x = ~b. We can’t factor ~x out of
the left-hand side, because we’d be left with 1 − A, which doesn’t make sense.
However, recall from Example 4 of 2.2 that the n × n identity matrix, In , has
In ~x = ~x. Here n = 3, and if we substitute in I3 ~x for ~x we get I3 ~x − A~x = ~b.
Now we can factor out ~x to get (I3 − A)~x = ~b so our matrix equation is in a
format where we can use row reduction to solve for ~x.
Since
1 0 0 0.3 0.1 0 0.7 −0.1 0
I3 − A = 0 1 0 − 0.1 0.5 0.1 = −0.1 0.5 −0.1
0 0 1 0.2 0.2 0.4 −0.2 −0.2 0.6
3 2
3. Compute the kernel of the function f : R → R where
Example
x1
4x1 + x2 + 3x3
f x2 = .
2x1 − 4x3
x3
138 Functions of Vectors
The kernel is all 3-vectors ~x with f (~x) = ~0. This function has matrix
4 1 3
A=
2 0 −4
so finding the kernel of f is equivalent to solving A~x = ~0. This equation has
the augmented coefficient matrix
4 1 3 0
2 0 −4 0
This isn’t the simplest kind of reduced echelon form we’d hoped for, but
we can still read off the equations given by its rows. The top row gives us
x1 − 2x3 = 0 and the bottom row gives us x2 + 11x3 = 0. Even though we
don’t have a unique one-number answer for each variable, these two equations
are still enough for us to find the kernel of f . The first equation x1 − 2x3 = 0
can be solved for x1 to give x1 = 2x3 , while the second equation x2 + 11x3 = 0
can be solved for x2 to give x2 = −11x3 . Once we pick a value for x3 , we’ll
get values for x1 and x2 , for example, if x3 = 2 then x1 = 4 and x2 = −22.
One way to write our overall answer is
2x3
ker(f ) = −11x3 .
x3
Notice that in the example above, our kernel contained more than just ~0,
so our map wasn’t 1-1. We can tell that because we ended up with a variable,
x3 , which didn’t have its own equation with a leading 1 as its coefficient. That
happened because x3 ’s column didn’t have a leading 1 in A’s reduced echelon
form. This gives us an easy-to-check criterion for a function to be 1-1.
If we shift our focus from solving equations of the form f (~x) = ~b to solving
vector equations, then we can apply row reduction to answering the question of
whether ~v1 , . . . , ~vk are linearly independent or linearly dependent. Remember
from 1.3 that ~v1 , . . . , ~vk are linearly dependent if we can find a solution to
x1~v1 + · · · + xk~vk = ~0 where some xi is nonzero, otherwise ~v1 , . . . , ~vk are
linearly independent.
2.7 Applications of Row Reduction 139
2 0 −1 5
−1 0 −2 0
Example 4. Are
0 ,
,
4
, and linearly independent or
0 0
3 0 4 2
linearly dependent?
to see whether or not we have solutions where some of the variables are
nonzero. From 2.2 we know that the vector equation x1~v1 + · · · + xk~vk = ~b
is equivalent to the matrix equation A~x = ~b where A is the matrix whose
columns are ~v1 , . . . , ~vk . This means we want to solve A~x = ~b where
2 0 −1 5 0
−1 0 −2 0 0
A= ~
0 4 0 0 and b = 0 .
3 0 4 2 0
Notice that our vector equation above had a variable, x4 , which wasn’t
necessarily zero precisely because that variable’s column didn’t have a leading
1 in the reduced echelon form of our augmented coefficient matrix. This
means that we have the following check for linear independence and linear
dependence.
140 Functions of Vectors
So far, we’ve always been in the situation where A~x = ~b had a solution.
However, we’ve seen that this isn’t always the case. How can we tell that
our matrix equation has no solution from the reduced echelon form of its
augmented coefficient matrix? The trick is to look at where our leading 1s
occur. If we have a leading 1 in the rightmost (augmentation) column, our
matrix equation has no solutions. To understand why this is true, think back
to our interpretation of a matrix equation as a list of linear equations. Since
a leading 1 is the first nonzero entry in its row, having a leading 1 in the
rightmost column means that its row looks like 0 · · · 0 | 1. This corresponds to
the equation 0x1 +· · ·+0xn = 1, i.e., 0 = 1, which is clearly impossible. Happily
such a leading 1 will be formed during Part 1 of our row reduction algorithm,
which saves us from having to do Part 2. (In fact you can see your matrix
equation has no solution as soon as you see that the first nonzero entry in any
row occurs in the last column.) It’s always important to understand when a
problem cannot be solved, because then you don’t waste time continuing to
work on it.
1 −5 4 −3
Example 5. The equation A~x = ~b with A = 2 −7 3 and ~b = −2
−2 1 7 −1
has no solution.
If we look at the bottom row of the matrix, we get the equation 0 = 1. Since
this is clearly a contradiction (there are no values of x1 , x2 , and x3 which will
make zero equal one), our original equation doesn’t have any solutions.
Mathematica isn’t as good about row reducing a matrix which has variable
entries, so we’ll do this one by hand (and get in a little extra row reduction
practice along the way).
We already have a 1 in the top left corner, so we can move on to adding
multiples of row 1 to rows 2 and 3 to create zeros under that leading 1. In our
row reduction notation, this is
1 −5 4 b1 1 −5 4 b1
2 −7 3 b2 →r2 −2r1 0 3 −5 b2 − 2b1
−2 1 7 b3 −2 1 7 b3
1 −5 4 b1
→r3 +2r1 0 3 −5 b2 − 2b1 .
0 −9 15 b3 + 2b1
Next we ignore the top row and first column, and we multiply the second row
by 13 to create our next leading 1. This gives us
1 −5 4 b1 1 −5 4 b1
0 3 −5 b2 − 2b1 → 1 r 0 −5 1 2
3 2
1 3 3 b2 − 3 b1 .
0 −9 15 b3 + 2b1 0 −9 15 b3 + 2b1
We add a multiple of row 2 to row 3 to get a zero below our new leading 1,
which looks like
1 −5 4 b1 1 −5 4 b1
0 1 −5 1
− 2 0 −5 1 2 .
3 3 b 2 3 b 1 → r 3 +9r 2
1 3 3 b2 − 3 b1
0 −9 15 b3 + 2b1 0 0 0 (b3 + 2b1 ) + (3b2 − 6b1 )
Since the first three entries of the bottom row are zero, the bottom row will
142 Functions of Vectors
prevent us from having any solutions (as in Example 5) unless the bottom
entry of the right column is also 0. This can be expressed as
which simplifies to
−4b1 + 3b2 + b3 = 0.
As long as ~b satisfies this condition, we’ll have a solution to f (~x) = ~b and ~b
will be in the range of f . Another way to write this is
b1
range(f ) = b2 −4b1 + 3b2 + b3 = 0 .
b3
Notice that the range of the example function above is smaller than its
codomain, i.e., f isn’t onto. This happened because there were some vectors
~b where f (~x) = ~b didn’t have a solution. Thinking in terms of row reduction,
this happened because there was a row in the reduced echelon form of f ’s
matrix which was all zeros. This created a row in the augmented coefficient
matrix of the form 0 · · · 0 | ∗ leaving the possibility of an unsolvable equation.
As with 1-1, this gives us an easy-to-check criterion for when a function is
onto.
Our augmented coefficient matrix has a leading 1 in the right column of its
reduced echelon form. This means our vector equation has no solution, and
1
therefore ~b = 2 is not in the span of the other vectors.
3
0 2 1 −4
Example 8. Is ~b = 4 in the span of −1, 0, and 5 ?
16 −2 1 16
As in the previous example, this is equivalent to asking whether we can
solve
2 1 −4 0
x1 −1 + x2 0 + x3 5 = 4 .
−2 1 16 16
This vector equation has augmented coefficient matrix
2 1 −4 0
−1 0 5 5
−2 1 16 16
Since our augmented coefficient matrix doesn’t have a leading 1 in the right
column of its reduced
echelon form, our vector equation has a solution.
0
Therefore ~b = 4 is in the span of the other vectors.
16
144 Functions of Vectors
Exercises 2.7.
1
1. Use row reduction to solve f (~x) = 9 where f has matrix
−1
−1 −2 −1 3
A= 0 −3 −6 4.
−2 −3 0 3
18
2. Use row reduction to solve f (~x) = −4 where f has matrix
17
2 −4 6
A = −1 3 0.
2 −5 3
2
3. Use row reduction to solve f (~x) = 5 where f has matrix
1
2 8 4
A = 2 5 1 .
4 10 −1
3
4. Use row reduction to solve f (~x) = 1 where f has matrix
8
1 1 −2
A = 3 −2 4 .
2 −3 6
x1 −x1 − x2 + 2x3
5. Find the kernel of f x2 = −9x1 + 4x2 + 5x3 .
x3 6x1 − 2x3 − 4x3
x1
2x1 + 6x2 − 5x3
6. Find the kernel of f x2 = .
x1 + 3x2 + x3
x3
x1
x2 −x1 − 2x2 − x3 + 3x4
7. Find the kernel of f −3x2 − 6x3 + 4x4 .
x3 =
−2x1 − 3x2 + 3x4
x4
5x1 − 2x2 + x3
x1 3x2 − 3x3
8. Find the kernel of f x2 =
5x1 − 5x2 + 4x3 .
x3
−x2 + x3
1 2 −3 0
9. Is the linear map f with matrix A = −1 2 4 1 1-1?
0 4 1 −2
2.7 Applications of Row Reduction 145
1 1 −2
10. Is the linear map f with matrix A = 3 −2 4 1-1?
2 −3 6
x1 −4x1 − 4x3
11. Is f x2 = 3x1 + x2 − x3 1-1?
x3 −x1 + 3x2 − x3
x1 5 −10 9
12. Is f x2 = 0 −2 1 1-1?
x3 0 6 −3
3 5 4
1 −2 5
13. Are
0 , 1 , −1 linearly dependent or linearly indepen-
−1 2 2
dent?
1 2 5
14. Are 1, 3, 6 linearly independent or linearly dependent?
1 1 4
2 3 1
1 0 0
15. Are
0, 2, 2 linearly independent or linearly dependent?
1 1 0
−1 3 5
16. Let ~v1 = 0 , ~v2 = −2, and ~v3 = −2
−2 2 6
(a) Are these vectors linearly independent or linearly dependent?
(b) What is the dimension of the space spanned by ~v1 , ~v2 , and ~v3 ?
2 0 −3
17. Let ~v1 = −1, ~v2 = 9, and ~v3 = 6 .
3 5 −2
(a) Are these vectors linearly independent or linearly dependent?
(b) What is the dimension of the space spanned by ~v1 , ~v2 , and ~v3 ?
1 2 0
−1 −1 −1
18. Let ~v1 =
0 , ~v2 = 1 , and ~v3 = 1 .
0 0 0
(a) Are these vectors linearly independent or linearly dependent?
(b) What is the dimension of the space spanned by ~v1 , ~v2 , and ~v3 ?
1 0 −4
0 2 4
19. Let ~v1 =
−1, ~v2 = 1, and ~v3 = 6 .
1 3 2
146 Functions of Vectors
We saw in the last section that we may not have any solutions to A~x = ~b,
in which case we can say that there are no solutions or that our solution set
is empty. We also saw that if there are solutions it’s possible to have either a
single solution or a set of infinitely many different solutions determined by our
choice of values for some of the variables. Are no solutions, one solution, or
infinitely many solutions our only options? Let’s explore this question visually.
For simplicity’s sake, we’ll do our exploration in 2D using a 2 × 2 matrix A.
In the 2 × 2 case, A~x = ~b is really
a b x1 b
= 1
c d x2 b2
which we can write as the equations ax1 + bx2 = b1 and cx1 + dx2 = b2 .
For familiarity’s sake, let’s use x for x1 and y for x2 , so our equations are
ax + by = b1 and cx + dy = b2 . Geometrically, this means we’re looking at
two lines in the plane. A solution to A~x = ~b is a pair of values for x and y
which satisfy both of these equations, which geometrically means a point in
the plane (x, y) which lies on both lines. This means our solution set is the
set of intersection points of our two lines.
If our two lines are in a fairly standard configuration, they’ll intersect in
precisely one point as shown in Figure 2.2.
3
-� � + � = � ��+�=�
-2 -1 1 2
-� � + � = �
-� � + � = � 1
-2 -1 1 2
-� � + � = �
-� � + � � = �
-2 -1 1 2
(Note that the equations of these -1 two lines look different even though
they actually describe the same line.) In this case, any point on our doubled
line is a point of intersection, so there are infinitely many different points
of intersection. This means A~x = ~b has infinitely many solutions as in 2.7’s
Example 3. As in that example, this happens when we can choose a value
for some of the variables and the other variables depend on that choice. Since
each variable can be given any real number as a value, we have infinitely many
choices for its value, and hence infinitely many different possible solutions.
Our exploration showed us that in 2D there are three categories of solution
sets: There are solution sets which are empty, i.e., there are no solutions, there
are solution sets which consist of a single vector, i.e., one unique solution,
and there are solution sets which are infinite, i.e., there are infinitely many
solutions. This was also suggested by our work using row reduction to solve
A~x = ~b in the last section. Although we’ll omit a proof, this is true in general.
Let’s do one more example of each type. We’ll start with the case where
we have no solutions.
−3 −1 5
Example 1. Find the solution set of A~x = ~b where A = 1 2 −2
0 5 −1
2
and ~b = 6 .
15
2.8 Solution Sets 151
The last row of this matrix corresponds to the equation 0 = 1, so our equation
has no solutions.
Geometrically, our solution set lives in R3 because ~x is a 3-vector. The three
rows of our original matrix correspond to three planes in R3 with equations:
−3x − y + 5z = 2, x + 2y − 2z = 6, and 5y − z = 15. (In general, a single
linear equation in Rn describes an (n − 1)-dimensional object.) Since we have
no solutions, we know our three planes don’t all intersect at the same point.
Feel free to use Mathematica or another software package to plot these three
planes together to verify this visually.
Again, notice that when we have no solutions we’ll always end up with an
impossible equation of the form 0 = b where b is nonzero. (If we’re in reduced
echelon form, we’ll have b = 1.) This is easily spotted in the reduced echelon
form, because we only have an equation of this format if there is a leading 1
in the rightmost column.
Next let’s look at the case where we have one unique solution.
3
Example 2. Find the solution set of f (~x) = ~b where ~b = 4 and
1
x1 −3x1 + x3
f x2 = x1 − 2x3 .
x3 −x1 + x2 + 3x3
−3 0 1
This function has matrix A = 1 0 −2, so we’re looking for the
−1 1 3
~
solution set of A~x = b. This equation has augmented coefficient matrix
−3 0 1 3
1 0 −2 4
−1 1 3 1
152 Functions of Vectors
Notice that this case can only occur if we have no impossible equations
(so we have a solution) and each variable has its own equation where it has a
leading 1 as its coefficient. This means we need a reduced echelon form which
has a leading 1 in every column except the rightmost one.
Finally, let’s look at the case where we have infinitely many solutions.
2 −4 1
Example 3. Find the solution set of A~x = ~b where A = 1 0 1 and
1 −8 −1
6
~b = 3.
3
This equation has augmented coefficient matrix
2 −4 1 6
1 0 1 3
1 −8 −1 3
Leading 1 in
rightmost
column?
No
Yes
Leading 1 in
No solutions all other
columns?
No
Yes
Infinitely
1 solution many
solutions
154 Functions of Vectors
Now that we’ve figured out the possible options for the solution set of
A~x = ~b, we’ll move on to figuring out how to concisely describe the solutions
to a given matrix equation. We’ll start with a special case of a matrix equation
where ~b = ~0, so we’re solving A~x = ~0. (We’ve seen this before in 2.5 when we
looked at the null space of a matrix.) This case is often called the homogeneous
case (having ~b = 6 ~0 is then called the nonhomogeneous case). Although it
isn’t immediately obvious, using linear algebra to balance chemical equations
always results in matrix equations with ~b = ~0. To see how this works, let’s
reexamine our example from Chapter 0.
This may look nothing like a linear algebra problem, let alone one of the
form A~x = ~b, but in fact row reduction is one of the best ways to tackle it.
Let’s suppose we have x1 molecules of propane, x2 molecules of oxygen, x3
molecules of carbon dioxide, and x4 molecules of water in our reaction. In this
notation, our chemical reaction looks like
x1 (C3 H8 ) + x2 (O2 ) → x3 (CO2 ) + x4 (H2 O).
Since the number of a particular type of atom is the same before and after
the reaction, this actually gives us a linear equation for each type of atom
present in our various molecules. In this reaction, our molecules contain three
types of atoms: carbon (C), hydrogen (H), and oxygen (O). Looking at carbon
atoms, we have 3 from each propane molecule, 0 from each oxygen molecule,
1 from each carbon monoxide molecule, and 0 from each water molecule. This
means we have 3x1 + 0x2 before our reaction and 1x3 + 0x4 afterward. Thus
we must have 3x1 = x3 or 3x1 − x3 = 0. Looking at hydrogen atoms, by the
same method we get that 8x1 = 2x4 or 8x1 − 2x4 = 0. Similarly, from oxygen
we get 2x2 = 2x3 + x4 or 2x2 − 2x3 − x4 = 0.
This means we’re really just trying to solve the equation A~x = ~0 where
3 0 −1 0
A = 8 0 0 −2 .
0 2 −2 −1
2.8 Solution Sets 155
(This is a homogeneous equation since we’re setting A~x equal to ~0.) In fact
balancing a chemical reaction always gives a homogeneous equation, since we
never have a constant term on either side of our chemical reaction.
The reduced echelon form of our augmented coefficient matrix is
1 0 0 − 14 0
0 1 0 − 54 0
0 0 1 − 34 0
There are many similar problems where, for practical reasons, we want
our solutions to have integer values. In fact, there are people who specifically
study integer-valued linear algebra, but that is outside the scope of this book.
The nice thing about solving homogeneous equations is that we always have
at least one solution, namely ~x = ~0. This means we only have two possible
options for our solution set instead of the usual three. Additionally, if there is
one unique solution, we already know it is ~x = ~0. This means we can focus our
attention on how to describe our solution set when we have infinitely many
solutions. Since listing all possible solutions would literally take forever, we’ll
instead focus on finding a finite set of vectors which span our solution set.
This is best illustrated by working through an example, which we’ll do below.
~
Example 5. Find a finite set of vectors which span the solution set of A~x = 0
3 2 0 4 −1
where A = 0 2 6 0 −8.
1 1 1 4 1
The reduced echelon form of A is
1 0 −2 0 1
0 1 3 0 −4
0 0 0 1 1
remain a column of zeros in the reduced echelon form.) The third and fifth
columns of A’s reduced echelon form had no leading 1s, so x3 and x5 are free.
Solving for x1 , x2 , and x4 gives us x1 = 2x3 − x 5 ,x2 = −3x3 +4x5 , and
2x 3 − x 5
−3x3 + 4x5
x4 = −x5 . So in vector format, our solution set is
x3 .
−x5
x5
A span is a set of linear combinations of vectors, i.e., a sum of vectors which
each have their own scalar coefficient. There isn’t a single scalar coefficient we
can factor out of each entry of our solution vector, but we can split each entry
up as a sum of two terms; one with x3 and one with x5 . This gives us
2x3 − x5 2x3 −x5
−3x3 + 4x5 −3x3 4x5
x3 = x3 + 0 .
−x5 0 −x5
x5 0 x5
Now we can pull out our scalars, since every entry of the first vector in our
sum is a multiple of x3 and every entry in the second vector is a multiple of
x5 . This means we have
2x3 −x5 2 −1
−3x3 4x5 −3 4
x3 + 0 = x3 1 + x5 0 .
0 −x5 0 −1
0 x5 0 1
This newest format is precisely the span of the two vectors being multiplied
by x3 and x5 ! Therefore our solution set can be written as
2 −1
4
−3
Span 1 , 0 .
−1
0
0 1
(Notice that there is a shortcut through this process: We get one spanning
vector per free variable, and that vector is precisely the vector of coefficients
on that variable in our general solution set vector.)
If we wanted to describe the solutions to this equation to someone else,
the most efficient way would be to tell them our spanning vectors. This same
procedure works for the solution set of any homogeneous equation.
As a bonus, it turns out that the spanning set we constructed in this
way conveys even more information. The spanning vectors form a linearly
2.8 Solution Sets 157
independent set! To see this, focus on the entries of the vectors which
correspond to the free variables. If xi is a free variable, the spanning vectors
will all have entry 0 in the ith spot except for the spanning vector formed
from xi ’s coefficients. This means that no linear combination of these spanning
vectors equals ~0 unless all coefficients in the combination are zero, i.e., our
spanning vectors are linearly independent. As discussed in 1.3, this means the
dimension equals the number of spanning vectors. Therefore the dimension
of the null space of a matrix A is the same as the number of free variables
in the solution set of A~x = ~0. In the example above, this means A has a
two-dimensional null space.
Now that we’ve successfully figured out a good way to describe the solution
set of A~x = ~0, let’s turn our attention to the situation where A~x = ~b and ~b =
6 ~0.
(Remember that b = ~ ~
6 0 doesn’t mean all its entries are nonzero, but rather that
~b has at least one nonzero entry.) We have no problem describing the solution
set when A~x = ~b has no solutions. Similarly if A~x = ~b has one unique solution,
we can easily describe the solution set by simply giving that solution. As with
A~x = ~0, this leaves us with the task of concisely describing our solution set
when we have infinitely many solutions.
Let’s start with a geometric exploration. As at the beginning of this section
we’ll work in R2 for ease of drawing pictures, but here let’s look at a single
x
line. This means A~x = ~b looks like a c = b or ax + cy = b which we
y
can rewrite more familiarly as y = ac x + b. If we’re in the homogeneous case
then b = 0, which means we’re talking3 about a line through the origin as in
Figure 2.5.
1
-� � + � = �
-2 -1 1 2
When we move back to the case where b = 6 0, we’re adding some nonzero
number b to our equation. Specifically, this is the difference between the lines
y = ac x and y = ac x + b. Adding a constant to a function shifts that function
up or down depending on the sign of the constant. (Think about how adding
“+C” on the end of a definite integral changes the graph of the antiderivative.)
This shift is shown in Figure 2.6.
3
-� � + � = �
1
-� � + � = �
-2 -1 1 2
-1
The picture above suggests that for a fixed matrix A, the solution set of
A~x = ~b is a shifted version of the solution set of A~x = ~0. In our example
above, the vector we added to create that shift is a point on the line which is
0
the solution set of A~x = ~b. For example, we could have used the point .
2
This relationship also holds more generally, i.e., the solution set of A~x = ~b
is a shifted version of the solution set of A~x = ~0 and that shift can be done
by adding any particular solution to A~x = ~b. In mathematical notation, if ~v
is any solution to A~x = ~b, i.e., A~v = ~b, then the solution set to A~x = ~b is
{~x + ~v | A~x = ~0}. This may sound a bit complicated, so let’s do an example.
0
3 2 0 4 −1 3
Example 6. Let A = 0 2 6 0 −8. Use the fact that ~v =
−1 is a
1 1 1 4 1 1
2
2.8 Solution Sets 159
8
solution to A~x = ~b for ~b = −16 to describe all solutions of A~x = ~b.
8
Our matrix A is the same asin Example 5,and we know from there that
2x 3 − x 5
−3x 3 + 4x 5
~
the solution set of A~x = 0 is
x3 . Since the solution set of
−x
5
x5
A~x = ~b can be written as ~v plus the solution set to A~x = ~0, we get that the
solutions of A~x = ~b are
2x3 − x5 0
2x3 − x5
−3x 3 + 4x
5
3
−3x 3 + 4x 5 + 3
x + −1 = x − 1
3 3 .
−x 1
−x + 1
5
5
x5 2 x5 + 2
We can factor an x3 out of the second vector and an x4 out of the third to get
−5 1 −2
1 −3 3
+ x3 + x4 .
0 1 0
0 0 1
The second two terms are precisely the span of the second and third vectors,
so this can be written as
−5
1 −2
1
−3
+ Span , 3 .
0 1 0
0 0 1
4 1 3
Example 8. Compute the rank of A = 2 3 −1.
−1 6 −7
We saw in the last section that we could check the linear independence or
2.8 Solution Sets 161
The discussion before and during the last example showed us that the
linearly independent columns of A were precisely the ones whose columns
had a leading 1 in A’s reduced echelon form and their number tells us the
dimension of the range of A’s map. This gives the following result.
For example, this shows the column space of the matrix in Example 5 is a
2D subspace of R5 .
If we combine this new idea of the rank of a matrix with our earlier
discussion of the dimension of a matrix’s null space, we get the following
theorem, usually called the Rank-Nullity Theorem. (Some people call the
dimension of a matrix’s null space its nullity.)
Exercises 2.8.
9. (a) How can you tell from the reduced echelon form of an
augmented coefficient matrix that the corresponding matrix
equation has one unique solution?
(b) Give an example of a 3 × 4 augmented coefficient matrix in
reduced echelon form whose corresponding matrix equation has
one unique solution.
10. For each option below, write down the reduced echelon form of an
augmented coefficient matrix whose matrix equation has that type
of solution set. What features of your reduced echelon form caused
you to have that type of solution set?
(a) no solution
(b) 1 solution
(c) infinitely many solutions
11. The solution set of the matrix equation A~x = ~0 is: x1 = 3x2 − 4x4 ,
x3 = 5x4 , and x2 and x4 free. Write this solution set as the span of
a set of vectors.
12. The solution set of the matrix equation A~x = ~0 is: x1 = 0,
x2 = 4x3 − 5x5 , x4 = 8x5 and x3 and x5 free. Write this solution
set as the span of a set of vectors.
1 −2 0 0 3
13. Let A = 0 0 1 0 −4. Write the solution set to A~x = ~0 as
0 0 0 1 1
the span of a set of vectors.
1 1 0 −1
14. Let A = 0 0 1 −2. Write the solution set of A~x = ~0 as the
0 0 0 0
span of a set of vectors.
1 2 0 −3
15. Let A = 0 0 1 −1. Write the solution set of A~x = ~0 as the
0 0 0 0
span of a set of vectors.
1 −1 0 2
16. Let A = . Write the solution set of A~x = ~0 as the
0 0 1 −3
span of a set of vectors.
1 0 2 −1
17. Let A = .
0 1 0 3
(a) Write the solution set of A~x = ~0 as the span of a set of vectors.
(b) Fix some nonzero vector ~b. If A~x = ~b has solutions, how is its
solution set related to your answer to (a)?
1 0 −1 4
18. Let A = 0 1 3 0.
0 0 0 0
2.8 Solution Sets 165
(a) Write the solution set of A~x = ~0 as the span of a set of vectors.
(b) Briefly describe the relationship between the solution
set of
2
A~x = ~0 and the solution set of A~x = ~b where ~b = −3.
9
19. Suppose A~x = ~b has at least one solution. Describe the geometric
relationship between its solution set and the solution set of A~x = ~0.
20. Can a linear system with five equations in seven variables have
one unique solution? If you answer yes, give an example of such a
system. If you answer no, briefly explain why this is impossible.
21. Is it possible to have a linear system of three equations in four
variables which has one unique solution? If you answer yes, give an
example of such a system. If you answer no, briefly explain why this
is impossible.
−3 4 0
22. Find the rank of A = 1 2 1 .
2 −6 −1
2 5 −1
23. Find the rank of A = 3 8 4 .
1 9 −2
1 2 4
−1 3 5
24. Find the rank of A = 0 1 0.
−2 6 9
−5 0 2 1
25. Find the rank of A = .
2 0 −4 6
1 2 3
−1 3 2
26. Show the Rank-Nullity Theorem holds for A = 0 1 1.
−2 6 4
−3 1 2
27. Show the Rank-Nullity Theorem holds for A = 4 2 −6.
0 1 −1
28. Suppose f : R5 → R3 is a linear map whose matrix has rank 2.
What is the dimension of f ’s kernel?
29. Suppose f : R6 → R9 is a linear map whose matrix has rank 4.
What is the dimension of f ’s kernel?
166 Functions of Vectors
2 1 0 −5
Example 1. Partition the matrix A = 7 −3 9 0 using a vertical
1 −1 6 8
cut between the third and fourth columns and a horizontal cut between the
second and third rows.
The standard way to indicate where we are slicing our matrix to create
our partition is to draw horizontal and vertical lines through the matrix to
visually show each cut. Here we want to cut vertically between the third and
fourth columns and horizontally between the second and third rows. This
means we need to draw a vertical line between the third and fourth columns
and a horizontal line between the second and third rows. This gives us the
following visual illustration of our partitioned matrix
2 1 0 −5
7 −3 9 0 .
1 −1 6 8
2.9 Large Matrix Computations 167
3 −1 4
Example 2. Explain why −2 1 5 is not a partition of the matrix
8 0 2
3 −1 4
A = −2 1 5.
8 0 2
The vertical cut between the first and second columns is fine, but the
horizontal cuts do not go all the way across the matrix. Therefore this is not
a partition.
The submatrix we’re interested in is the top middle 2 × 2 piece. Since we’re
only interested in the corresponding part of −2A, we can ignore the rest of
the matrix and just multiply that 2 × 2 submatrix by −2 to get
−2 0
.
6 −18
168 Functions of Vectors
need to match up with the horizontal cuts in our partition of B since these
correspond to our choice of matching k. The placement of the horizontal cuts
in our partition of A, which correspond to m, and the vertical cuts in our
partition of B, which correspond to n, depend on the size of the submatrix of
AB we want to compute.
−1 0 2 5
Example 5. Find partitions of the matrices A = 3 1 −3 2 and
4 −2 0 1
0 7 −2 4 1
6 −4 2 0 −1
B =
−9 5 −2 4 3 which could be used to compute the 2 × 3
3 5 −1 1 2
submatrix in the top right corner of AB.
From our discussion above, we know that to create a 2×3 submatrix in the
top right corner of AB we’ll need to partition A with a horizontal cut between
the second and third rows and B with a vertical cut between the second and
third columns. This gives us
0 7 −2 4 1
−1 0 2 5 6 −4 2 0 −1
3 1 −3 2 and −9 5 −2 4 3 .
4 −2 0 1
3 5 −1 1 2
Additionally, we’ll need to partition A with a vertical cut between the kth
and k + 1st columns and B with a matching horizontal cut between the kth
and k + 1st rows. If there were any way to take advantage of blocks of zero
entries in either A or B, the choice of k could matter more. However, since
there aren’t any zero blocks, let’s divide the four rows of A and columns of
B evenly, i.e., let k = 2. This means we’re adding a vertical cut between the
second and third columns of A and a horizontal cut between the second and
third rows of B. This gives us
0 7 −2 4 1
−1 0 2 5 6 −4 2 0 −1
3 1 −3 2 and −9 5 −2 4 3 .
4 −2 0 1
3 5 −1 1 2
The partition on AB gets its horizontal cuts from the horizontal cuts in
our partition of A and its vertical cuts from the vertical cuts in our partition
of B. (I remember this by remembering that AB gets its number of rows
from A and its number of columns from B so it seems natural to divide AB’s
rows as A’s rows are divided and AB’s columns as B’s columns are divided.)
Therefore we have one horizontal cut between the second and third rows of
AB and one vertical cut between the second and third columns of AB. This
gives us the partition below (with * used to denote an unknown matrix entry).
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
Now that we understand how to choose partitions for A and B and the
partition this creates for AB, how can we compute the submatrices of AB’s
new partition? This ends up being easier than you might expect. In fact we
can link it back to basic matrix multiplication by replacing each submatrix
of A and B by a variable and then multiplying as if those variables were our
matrix entries. This is illustrated below.
Each of our matrices has been divided into four submatrices, which we will call
A1 , . . . , A4 , B1 , . . . , B4 , and (AB)1 , . . . , (AB)4 respectively. In this notation,
our matrix multiplication becomes
A1 A2 B1 B2 (AB)1 (AB)2
= .
A3 A4 B3 B4 (AB)3 (AB)4
Now our computation can be done as if we were multiplying two 2×2 matrices
whose entries are our submatrices of A and B, i.e., AB1 = A1 B1 + A2 B3 ,
AB2 = A1 B2 + A2 B4 , AB3 = A3 B1 + A4 B3 , and AB4 = A3 B2 + A4 B4 .
2.9 Large Matrix Computations 171
Plugging this into our formula above, we get that the 2 × 3 submatrix in
the top right corner of AB is
−1 0 −2 4 1 2 5 −2 4 3
+
3 1 2 0 −1 −3 2 −1 1 2
2 −4 −1 −9 13 16 −7 9 15
= + = .
−4 12 2 4 −10 −5 0 2 −3
Note that the entries aij = 0 with i > j are the entries below the diagonal
of A while the entries aij = 0 with i < j are the entries above the diagonal
of A. I remember this by recognizing a triangular matrix as one which has a
triangle of zeros either above or below its diagonal. The other entries can be
freely chosen, and if these free entries are in the upper part of the matrix it
is upper triangular. If the free entries are in the lower part of the matrix it is
lower triangular. However you choose to remember this, it is important to note
that the “upper” and “lower” in the definition above refer to the arbitrary
entries rather than the entries which must be zero.
−4 7 2
Example 8. Is 0 0 −5 upper triangular or lower triangular?
0 0 1
172 Functions of Vectors
The bottom row gives us the equation −x3 = 6, which we can quickly solve to
get x3 = −6. The next row up gives us the equation −5x2 + 3x3 = 2. Plugging
in x3 = −6, we get −5x2 − 18 = 2 so x2 = −4. The top row gives us the
equation −4x1 + 7x2 + 2x3 = 4. Plugging in x2 = −4 and x3 = −6, we get
−4x1 − 28 − 12 = 4 so x1 = −11.
The top row gives the equation 2x1 = 8, so x1 = 4. The second row gives the
equation −5x1 + 4x2 = −12. Plugging in x1 = 4 gives −20 + 4x2 = −12, so
x2 = 2. The bottom row gives the equation x1 + 6x2 + 9x3 = 7. Plugging in
x1 = 4 and x2 = 2 gives 4 + 12 + 9x3 = 7, so x3 = −1.
In either case, it was much easier to solve a matrix equation where our
matrix was triangular than it would have been for a general matrix.
Now suppose we have factored A into the product of a lower triangular
matrix L and an upper triangular matrix U , so A = LU . Instead of solving
A~x = ~b directly, we can first solve L~y = ~b and then solve U~x = ~y . Then
last row says −2y1 + 3y2 + y3 = −4. Plugging in y1 = −1 and y2 = 3 gives
−1
2 + 9 + y3 = −4, so y3 = −15. This means ~y = 3 .
−15
Next we need to solve the equation U~x = ~y which in our case is
1 2 3 −1
0 −3 1 ~x = 3 .
0 0 5 −15
This is very similar to the process for L~y = ~b except that we’ll start at the
bottom row and work up. The bottom row says 5x3 = −15, so x3 = −3. The
second row says −3x2 + x3 = 3. Plugging in x3 = −3 gives −3x2 − 3 = 3, so
x2 = −2. The top row says x1 + 2x2 + 3x3 = −1. Plugging in x2 = −2 and
x3 = −3 gives us x1 − 4 − 9 = −1, so x1 = 12.
12
Thus the solution to our original matrix equation A~x = ~b is ~x = −2.
−3
Now that we understand why factoring A into the product of an upper and
lower triangular matrix is so useful, let’s discuss how to compute L and U . If
you think about it, you’ll realize that Part 1 of our row reduction algorithm
actually produces an upper triangular matrix, because it starts from the upper
left and creates zeros below each of our leading 1s. This means that we can
find our upper triangular matrix U simply by running the first half of row
reduction on A. The trickier part will therefore be finding L so that A = LU .
To do this, we’ll need a new way to view row operations as multiplication by
a special type of matrix.
Example 13. Let n = 3. Find the elementary matrices of the row operations
r3 − 7r1 , r2 ↔ r3 , and 12 r1 .
U = Ek Ek−1 · · · E2 E1 A
where Ei is the elementary matrix of the ith row operation we used. Note
that since matrix multiplication is not commutative it is important to place
E1 closest to A since this corresponds to doing its row operation first, then
E2 , etc. However, we want to write A = LU , so we need some way to move
the Ei ’s to the other side of the equation and in particular to the left of U .
Luckily each row operation has another row operation which reverses it.
If we added a multiple of one row to another row, we can subtract that same
multiple. If we multiplied a row by a constant, we can divide by that constant.
If we swapped two rows, we can swap those two rows back. Thus each Ei has
another elementary matrix which undoes it. If we call this other matrix Fi ,
then what we’re saying is that multiplication on the left by Ei and then Fi
is the same as doing nothing, i.e., Fi Ei = In . If we find the F1 , . . . , Fk which
undo E1 , . . . , Ek , then we can start moving the Ei matrices to the other side
of the equation U = Ek Ek−1 · · · E2 E1 A to get something of the form LU = A.
We start by multiplying both sides on the left by Fk to cancel off Ek . This
176 Functions of Vectors
gives us
Next we multiply on the left by Fk−1 to cancel off Ek−1 , which gives
Fk−1 Fk U = Ek−2 · · · E2 E1 A.
F1 F2 · · · Fk−1 Fk U = A.
This is very close to what we want, but to finish our factorization we will need
to argue that F1 F2 · · · Fk−1 Fk is a lower triangular matrix which we can use
as our L.
The first step in seeing F1 F2 · · · Fk−1 Fk is lower triangular is to see that
each of the Fi matrices is lower triangular. Remember that each Ei was the
elementary matrix of a row operation used in the first half of our row reduction
algorithm to either create a leading 1 along the diagonal of A or use a leading
1 to create a zero below that leading 1. Thus the row operations we use only
affect entries on or below the diagonal of A. To create the elementary matrix
of a row operation, we perform our row operation on In . Since In is lower
triangular and we’re doing a row operation that changes only entries below
the diagonal, the upper section of each of our Ei ’s will still be all 0s. Thus
each Ei is lower triangular. Since each Fi is the reverse of the row operation
of Ei , it also only affects entries below the diagonal. Therefore each Fi is also
lower triangular. You will explore in the exercises that the product of lower
triangular matrices is again lower triangular, so F1 F2 · · · Fk−1 Fk is a lower
triangular matrix, which we can call L. Therefore we can use careful row
reduction and reversal of row operations to find L and U so that A = LU .
−4 12 −20
Example 14. Find an LU -factorization of A = 0 2 4 .
−1 6 0
The first step of this process is to use Part 1 of our row reduction algorithm
to find our upper triangular matrix U . While we do this, we’ll need to keep
careful track of our row operations (in order) so we can use them to create
our lower triangular matrix L. I’ll use our usual notation to write down the
row operation at each step.
1 −3 5 1 −3 5 1 −3 5
A →− 14 r1 0 2 4 →r3 +r1 0 2 4 → 12 r2 0 1 2
−1 6 0 0 3 5 0 3 5
1 −3 5 1 −3 5
→r3 −3r2 0 1 2 →−r3 0 1 2 .
0 0 −1 0 0 1
2.9 Large Matrix Computations 177
Notice that to use our LU -factorization method to solve A~x = ~b, we need
to find L and U which takes a bit of work. This means LU -factorization makes
the most sense if we want to solve A~x = ~b for several different values of ~b.
In that case, we can find A = LU once and then use our LU -factorization
repeatedly to solve each matrix equation quickly. However, if we want to solve
A~x = ~b for a single value of ~b, it may be quicker to simply row reduce the
augmented coefficient matrix.
This certainly isn’t an exhaustive list of methods for dealing with very
large matrices, and the matrices used in the examples and exercises of this
book are small enough that we won’t need these techniques again. However,
given the increasing importance of large data sets stored in correspondingly
large matrices, I hope you will keep these techniques in mind in case you need
them in future applications.
Exercises 2.9.
1. Let A be a 3×3 matrix. Find the partition which isolates the entries
a21 and a31 or say that no such partition exists.
2. Let A be a 3×4 matrix. Find the partition which isolates the entries
a31 , a32 , and a34 or say that no such partition exists.
3. Let A be a 4×5 matrix. Find the partition which isolates the entries
a22 , a23 , a32 , and a33 or say that no such partition exists.
4. Let A be a 5×6 matrix. Find the partition which isolates the entries
a25 , a26 , a35 , a36 , a45 , and a46 or say that no such partition exists.
−4 1 0 9
5. Let A = 3 −2 5 1. Use a partition to compute the
6 3 −2 7
submatrix of 5A consisting of the bottom row and middle two
columns.
5 −4 2
−1 8 0
6. Let A = 2
. Use a partition to compute the submatrix
3 4
−1 0 6
of −6A consisting of the middle two rows and left column.
12 −7 4
7. Let A = 3 −1 4 . Use a partition to compute the submatrix
9 2 −3
of 4A consisting of the bottom two rows and right two columns.
7 1 −5 3
−1 0 3 6
8. Let A =
5 2 3 −1. Use a partition to compute the
9 3 0 1
submatrix of −2A consisting of the middle two rows and left two
columns.
2.9 Large Matrix Computations 179
1 −1 3 12 −7 4
9. Let A = 5 −2 9 and B = 3 −1 4 . Use a partition to
4 1 2 9 2 −3
compute the submatrix of A + B consisting of the middle row and
right two columns.
1 6 −7 6 −2 8
10. Let A = −3 4 −10 and B = 0 1 −1. Use a partition
2 −8 1 3 5 2
to compute the submatrix of A + B consisting of the top two rows
and middle column.
4 −2 4 3 −9 2
11. Let A = 7 1 −1 and B = 0 −2 4. Use a partition to
1 1 5 −5 1 3
compute the submatrix of A + B consisting of the bottom two rows
and left two columns.
1 0 −1 6 1 2
0 3 −1 −1 3 4
12. Let A =
−3 1 0 and B = 0 2 1. Use a partition to
1 1 −1 5 2 3
compute the submatrix of A + B consisting of the middle two rows
and right two columns.
13. Let A be a 5 × 3 matrix and B be a 3 × 4 matrix. Find partitions
of A and B that you could use to compute the submatrix of AB
consisting of the top two rows and right column.
14. Let A be a 3 × 2 matrix and B be a 2 × 4 matrix. Find partitions
of A and B that you could use to compute the submatrix of AB
consisting of the bottom row and left two columns.
15. Let A be a 3 × 3 matrix and B be a 3 × 4 matrix. Find partitions
of A and B that you could use to compute the submatrix of AB
consisting of the middle row and middle two columns.
16. Let A be a 5 × 2 matrix and B be a 2 × 3 matrix. Find partitions
of A and B that you could use to compute the submatrix of AB
consisting of the top three rows and left two columns.
1 −1 3
2 3 1
0 −2
17. Let A = 4 and B = −1 2 . Use a partition to
1 1
4 −2
−1 2 1
compute the submatrix of AB consisting of the middle two rows
and right column.
6 1 2
−1 3 4 −4 1 0 9
18. Let A =
0 2 1 and B = 3 −2 5 1 . Use a partition
6 3 −2 7
5 2 3
180 Functions of Vectors
2.10 Invertibility
So far, we’ve solved f (~x) = ~b for ~x using row reduction. However, from your
earlier study of functions, you know that some functions can be quickly undone
using another function called the inverse function. For example, f (x) = 2x
has f −1 (x) = 12 x. On the other hand, some functions don’t have an inverse,
like f (x) = x2 . In this section we’ll develop a technique to simultaneously
determine whether or not a given linear function from Rn to Rm has an
inverse and to find its inverse function if it has one.
Intuitively, the reason f (x) = 2x and f −1 (x) = 12 x are inverses is that
they undo each other. What we mean by that is that if we compose these
two functions in either
order we get the identity function. In other words,
f (f −1 (x)) = f 12 x = 2 12 x = x and f −1 (f (x)) = f −1 (2x) = 12 (2x) = x
The general versions of these two equations give us our formal definition of
invertibility.
This definition looks a little complicated, but the first condition is saying
that f −1 undoes f while the second says that f undoes f −1 . This means we
get a bonus fact: f −1 is also invertible and has inverse function f .
For the rest of this section, we will assume that all functions have
the same domain and codomain, i.e., f : Rn → Rn and all matrices
are n × n.
(See 2.2 if you need a reminder of how to find the matrix of a linear function.)
Next we compute the reduced echelon form of A, which is
1 0 0
I3 = 0 1 0 .
0 0 1
Therefore f is invertible.
If it turns out that f and A are not invertible then we are done, since we
can’t find f −1 and A−1 . However, if f and A are invertible, we’d like a method
for finding f −1 and A−1 . To do this, we’ll start by considering the equation
AA−1 = In . We don’t know how to solve an equation like this, so we’ll have to
reduce it to a set of simpler equations of the form A~x = ~b. In 2.3, we saw that
the jth column of a matrix product AB can be found by multiplying A by
the jth column of B. In our equation B = A−1 , so we can find the n columns
of A−1 by solving the n matrix equations
where ~ej is the jth column of In and so is all 0s except for a 1 as its jth
entry. (We know that each equation will have one unique solution because A’s
reduced echelon form is In .)
We could solve these equations by row reducing the n augmented coefficient
matrices
[A | ~e1 ] , [A | ~e2 ] , . . . , [A | ~en ]
however that’s a lot of redundant work since we’d have to row reduce A
repeatedly. Additionally, since A’s reduced echelon form is In , we know that
whatever ~ej becomes in the reduced echelon form of [A | ~ej ] is the answer to
A~x = ~ej . A more efficient way to find A−1 is to solve for all its columns at
once by row reducing [A | In ]. The left-hand side will become In , while each
column of the right hand side becomes a column of A−1 . Thus the reduced
−1
echelon form of [A | In ] is In | A .
The best part of this is that the process for finding A−1 and the process
for deciding if A is invertible can be done at once! Simply row reduce [A | In ].
If the left half turns into In , then A is invertible and A−1 is the right half of
the reduced echelon form. If the left half doesn’t turn into A−1 , then A isn’t
invertible.
3 7 0
Example 3. Determine whether or not A = 2 5 0 is invertible. If it is
0 1 1
invertible, find A−1 .
2.10 Invertibility 185
This gives us
1 0 0 5 −7 0
0 1 0 −2 3 0 .
0 0 1 2 −3 1
Since the left half of our row reduced matrix is I3 , A is invertible. Its inverse
is the right half of our row reduced matrix, so
5 −7 0
A−1 = −2 3 0 .
2 −3 1
x1 4x1 + x2 − x3
Example 4. Find the inverse function of f x2 = x2 + 3x3 .
x3 −4x1 − x2
This is our function from Example 2, so we already know it is invertible
and that f corresponds to the matrix
4 1 −1
A= 0 1 3 .
−4 −1 0
To find f ’s inverse function, we’ll use the fact that f −1 has matrix A−1 . We’ll
do that as in the last example by row reducing [A|I3 ]. In our case this is
4 1 −1 1 0 0
0 1 3 0 1 0
−4 −1 0 0 0 1
which row reduces to
1 0 0 − 34 − 14 −1
0 1 0 3 1 3 .
0 0 1 −1 0 −1
The right half of this matrix gives us
3
−4 − 14 −1
−1
A = 3 1 3 .
−1 0 −1
186 Functions of Vectors
This means f −1 is the function with f −1 (~x) = A−1 ~x. If we want to write f −1
in the same way we originally wrote f , we can compute
3 3
− 4 − 14 −1 x1 − 4 x1 − 14 x2 − x3
A−1 ~x = 3 1 3 x2 = 3x1 + x2 + 3x3
−1 0 −1 x3 −x1 − x3
to get 3
x1 − 4 x1 − 14 x2 − x3
f −1 x2 = 3x1 + x2 + 3x3 .
x3 −x1 − x3
Note that there is still one gap here: we know that the matrix we’re calling
A−1 satisfies AA−1 = In , but we haven’t shown that it satisfies A−1 A = In .
Suppose that you have a matrix A which you know is invertible, so there is
some A−1 with AA−1 = In and A−1 A = In . We’ve found a matrix which for
the moment I’ll call B so that AB = In . Multiplying both sides of this equation
on the left by A−1 gives us A−1 (AB) = A−1 (In ) which can be simplified to
B = A−1 . Therefore the matrix we’ve been solving for by row reducing [A | In ]
is actually A−1 .
Recall that in Chapter 0’s Example 4 we talked about multiple linear
regression as a search for an equation y = β0 + β1 x1 + · · · βp xp which can
predict the value of a variable y in terms of a set of variables x1 , . . . , xp . Now
that we understand how to compute the inverse of a matrix, we can actually
solve for the βs to find these equations.
These regression models are based on data sets where we know the values
of all the variables, including y. This allows us to plug in the values of y and
x1 , . . . , xp from each data point one at a time to create a set of linear equations
in β0 , β1 , . . . , βp . (If the constant term bothers you, pretend there is a variable
x0 whose value for every data point is 1.) If we have n data points, this gives
us the equations
where xij is the value of the jth variable and yi is the value of y for the ith
data point. (Don’t let the unusual notation here fool you, we know the values
of the xs and are going to solve for the values of the βs.)
Usually there isn’t a single value for each β that will solve this system of
equations, so we introduce an error term i onto the end of each equation. We
2.10 Invertibility 187
In other words, the transpose is the matrix formed by reflecting A’s entries
across the diagonal.
0 4
Example 5. Find the transpose of A = −1 0 .
−2 −3
Since A is 3×2, its transpose, AT , will be 2×3. Using A’s columns
as AT’s
0 −1 −2
rows (or reflecting A’s entries across the diagonal) gives us A = .
4 0 −3
Using our new notation, the coefficients that minimize the errors in the
regression equation y = β0 + β1 x1 + · · · + βp xp are given by the formula
~ = X T X −1 X T ~y .
β
You may worry about taking the inverse of X T X since X wasn’t a square
matrix, however, X is n × (p + 1) so X T is (p + 1) × n which makes X T X as
square ((p + 1) × n) · (n × (p + 1)) = (p + 1) × (p + 1) matrix.
1 47.8 163
−1
We want to find β~ = X T X X T ~y , and we can start by finding X’s
transpose.
1 1 1 1
X T = 68.5 45.2 91.3 47.8 .
167 168 182 163
This means
1 68.5 167
1 1 1 1
1 45.2 168
X T X = 68.5 45.2 91.3 47.8 1 91.3 182
167 168 182 163
1 47.8 163
4 252.8 680
= 252.8 17, 355.8 43, 441.1 .
680 43, 441.1 115, 806
−1
To find XT X , we row reduce [X T X | I3 ] which gives us (with some
rounding)
1 0 0 453.212 0.975 −12.322
0 1 0 0.975 0.003 −0.007 .
0 0 1 −3.027 −0.007 0.020
This means X T X is invertible, and we have
453.212 0.975 −3.027
−1
XT X = 0.975 0.003 −0.007 .
−3.027 −0.007 0.020
2.10 Invertibility 189
−1
~ = XT X
Plugging these elements into our formula β X T ~y gives us
174.4
β0 453.212 0.975 −3.027 1 1 1 1
164.4
β1 = 0.975 0.003 −0.007 68.5 45.2 91.3 47.8 244.2
β2 −3.027 −0.007 0.020 167 168 182 163
154.6
−445.48
= 0.60
3.48
Exercises 2.10.
1
7 0 −1 0
1. Let A = . Verify that A = 7 .
−14 1 2 1
2
2 1 −1 3 − 23 − 53
11
2. Let A = 3 1 4 . Verify that A−1 = − 23 5
3 3 .
−1 0 −2 −3 1 1 1
3 3
1 −2 1
3. Find the inverse of A = −3 7 −6.
2 −3 0
−3 4 0
4. Find the inverse of A = 1 2 1.
2 0 1
12 −4
5. Let A = . Find A−1 or show that A is not invertible.
−6 2
5 1
6. Let A = . Find A−1 or show that A is not invertible.
4 1
2 0 4
7. Let A = 0 −1 0 . Find A−1 or show that A is not invertible.
−2 6 −3
1 −1 2
8. Let A = 0 1 −2 . Find A−1 or show that A is not invertible.
2 0 −1
0 −3 −6
9. Let A = −1 −2 −1. Find A−1 or show that A is not invertible.
−2 −3 0
1 −4 2
10. Let A = 0 −1 3 . Find A−1 or show that A is not invertible.
0 1 −2
190 Functions of Vectors
1 2 1
11. Let A = 1 5 2. Find A−1 or show that A is not invertible.
2 0 1
1 0 6
12. Let A = 2 2 −9. Find A−1 or show that A is not invertible.
1 1 −3
x1 −x1 + 2x2 + x3
13. Let f x2 = 2x2 + x3 . Find f −1 or show that f is
x3 x1 + x2
not invertible.
x1 5x1 + x2 + 3x3
14. Let f x2 = 2x1 − 3x2 + 6x3 . Find f −1 or show that f is
x3 3x1 + 4x2 − 3x3
not invertible.
x1 −x1 + 2x2 + x3
15. Let f x2 = −2x2 + 2x3 . Find f −1 or show that f is
x3 x2 − x3
not invertible.
x1 x1 + x2 + 3x3
16. Let f x2 = x1 − 4x3 . Find f −1 or show that f is not
x3 x2 − x3
invertible.
17. Suppose A is an invertible n × n matrix. Explain why A−1 is
invertible.
7 −3 0
18. Why can’t the function with matrix A = be invertible?
1 −5 2
19. In Example 2 from 2.7, we solved the matrix equation (I3 − A)~x = ~b
to figure out our production levels. Use matrix inverses to solve this
equation. (You should still get the same answer!)
20. What does your answer to the previous problem say about which net
production levels, ~b, are possible to achieve for the three industries
from Example 2 in 2.7?
21. Use the method from Example 6 and the data in the table below to
find the regression equation which predicts the percentage of people
in a given city who have heart disease based on the percentage of
people in that city who bike to work and the percentage who smoke.
City Code Heart Disease Bikers Smokers
1 17 25 37
2 12 40 30
3 19 15 40
4 18 20 35
2.10 Invertibility 191
22. Use the method from Example 6 and the data in the table below to
find the regression equation which predicts a student’s percentage
on the final exam based on their percentages on the two midterms.
Student Code Final Midterm 1 Midterm 2
1 68 78 73
2 75 74 76
3 85 82 79
4 94 90 96
5 86 87 90
6 90 90 92
7 86 83 95
8 68 72 69
9 55 68 67
10 69 69 70
192 Functions of Vectors
statements. This more efficient approach will allow us to get away with only
20 arrows, two of which are left as exercises.
The bird’s-eye view of the 18 arrows we’ll establish together is given in
Figure 2.7.
13 14
4 2
6 12
5 7 11 9
8 10
then we know from our second definition of linear independence that A~x = ~0
has only the trivial solution. This gives us 5 → 6. The equation A~x = ~0 always
has the solution ~x = ~0, so if A~x = ~0 has one unique solution, that solution
is ~x = ~0. Since the null space of A is the solution set of A~x = ~0, this gives
us 6 → 7. The corresponding map f has ker(f ) = N ul(A) = {~0}, so f is
1-1 and we get 7 → 8. Finally, if A corresponds to a 1-1 map f , we know
ker(f ) = N ul(A) = {~0}, so A~x = ~0 has only the trivial solution which means
that the columns of A are linearly independent. Thus we have 8 → 5 and have
completed our second circle.
Next we’ll create a third circle out of the next four conditions by showing
9 → 10 → 11 → 12 → 9. If the columns of A span Rn , then every n-vector ~b
can be written as a linear combination of the columns of A. Rewriting that
vector equation as a matrix equation, the coefficients on the columns of A
combine to form a solution ~x to A~x = ~b. This gives us 9 → 10. Since the
corresponding map f has f (~x) = A~x, saying A~x = ~b has a solution for every ~b
in Rn means that every ~b in Rn is an output of f . Thus f is onto, and we get
10 → 11. A function is onto if its range equals its codomain. Since the column
space of a matrix is the range of the corresponding map and the codomain
is Rn , this means we have Col(A) = Rn and 11 → 12. Finally, the column
space of a matrix can also be thought of as the span of its columns. Therefore
Col(A) = Rn means the columns of A span Rn , so we get 12 → 9 and have
completed our third circle.
Now that we’ve completed all three circles, we can start linking them
together. We’ll link our first and second circles by showing 3 7 and our first
and third circles by showing 3 11. We saw in 2.5 that A’s corresponding
function is 1-1 exactly when there is a leading 1 in every column of its reduced
echelon form. Since A has n columns, this is equivalent to saying A’s reduced
echelon form has n leading 1s, so we get 3 7. We also saw in 2.5 that A’s
function is onto exactly when there is a leading 1 in every row of its reduced
echelon form. Since A has n rows, this is equivalent to saying A’s reduced
echelon form has n leading 1s, so we get 3 11.
At this point, we’ve established the equivalence of the conditions 1 − 12.
To finish up our explanation of this theorem, we’ll need to link in the last
two statements. I’ll provide two of those arrows here, and leave the other
two as exercises. If our matrix is invertible, it has an inverse matrix A−1
with AA−1 = In and A−1 A = In . This means we can let A−1 = C to get
1 → 13 and A−1 = D to get 1 → 14. Going back from 13 or 14 to 1 is hard.
We did something similar sounding in 2.10, but it was under the assumption
that A was invertible, which we can’t assume if we’re starting with 13 or 14.
Instead I’d suggest showing 13 → 10, and 14 → 6. These are Exercises 8 and
9 respectively. With these last two conditions connected to the rest, we are
done.
Now that we’ve established our theorem, let’s see some examples of how
it can be used.
2.11 The Invertible Matrix Theorem 195
0 1
Example 1. A = is invertible.
1 0
The linear function f given by f (~x) = A~x can be thought of as reflection
about the line y = x. Geometrically, it isn’t too hard to convince yourself that
this function is onto. In fact, since doing a reflection twice brings you back
where you started, we get f (f (~x)) = ~x. In other words, for every vector ~x, we
know f (~x) maps to ~x, so f is onto.
This means A satisfies condition 11 of the Invertible Matrix Theorem, so
A is invertible.
−3 1 0
Example 2. A = 4 2 −7 is invertible.
1 0 9
This matrix has reduced echelon form I3 , so it satisfies condition 2 of the
Invertible Matrix Theorem and is therefore invertible.
1 −5 4
Example 3. A = 2 −7 3 is not invertible.
−2 1 7
In
Example
5 from 2.7, we saw that A~x = ~b didn’t have a solution for
−3
~b = −2. This means A~x = ~b doesn’t have a solution for every ~b.
−1
Thus A fails to satisfy condition 10 of the Invertible Matrix Theorem, and
therefore isn’t invertible.
Exercises 2.11.
1. Give a direct explanation of the arrow 1 → 6.
2. Give a direct explanation of the arrow 1 → 10.
3. Give a direct explanation of the arrow 11 → 2.
4. Give a direct explanation of the arrow 3 → 9.
5. Our explanation of the Invertible Matrix Theorem didn’t include a
direct explanation of why a matrix whose function is 1-1 has rank
n, i.e., 8 → 4. Use the explanations/arrows in our proof of this
theorem to explain how you can get from condition 8 to condition
4.
6. Use the Rank-Nullity Theorem from 2.8 to give a direct explanation
of the arrows 7 12.
−3 0 1 4
1 0 −5 8
7. Let A =
7 0 −2 6 . Which of the Invertible Matrix
2 0 4 −1
196 Functions of Vectors
197
198 Vector Spaces
has solutions for every 4-vector ~b. When we discussed finding the range of a
linear function in 2.5, we realized this is equivalent to the matrix having a
leading 1 in every row of its reduced echelon form. The reduced echelon form
of
1 0 1 0 1
1 0 0 1 0
0 1 1 0 0
0 1 0 1 1
is
1 0 0 1 0
0 1 0 1 0
.
0 0 1 −1 0
0 0 0 0 1
We do have a leading 1 in every row, so our matrix equation always has a
solution, and B1 , . . . , B5 span M22 .
Since the Bs span M22 , our matrix A must be a linear combination of
these five matrices. Notice that A = B1 + B2 + B5 , so our linear combination
has coefficients 1 on B1 , B2 , and B5 and coefficients 0 on B3 and B4 . This
means we identify A with the 5-vector ~x which has 1s in the 1st, 2nd, and 5th
spots and 0s in the 3rd and 4th spots, so
1
1
~x =
0 .
0
1
3.1 Basis and Coordinates 199
There are two possible issues with this process of identifying a matrix with
its vector of coefficients. The first is that our vector of coefficients depends
heavily on the order in which we listed our spanning vectors. For instance, in
the example above we could swap the order of B2 and B3 . This would mean
1
0
A = B1 + B3 + B5 , so we’d identify A with the 5-vector ~x =
1. We can fix
0
1
this problem by being careful to list our spanning set in the particular order
we want and preserving this order throughout the process. I’ll always specify
an order in this book, and I recommend that you always specify an order for
your spanning sets when doing problems on your own.
The second possible problem occurs when we can write our matrix as a
linear combination of the spanning matrices in more than one way, i.e., using
two different sets of coefficients. This can certainly happen, and means we
don’t have a unique vector to match up with our matrix. For instance, in
the example above we can also write A = B3 + B4 + B5 which would mean
0
0
identifying A with the 5-vector ~y =
1. Clearly this isn’t good, because we
1
1
want to create a function which maps from M22 to Rn and this example’s
function appears to be multi-valued. Let’s explore this situation further to see
if we can figure out how to avoid it.
Suppose A is in the span of B1 , . . . , Bk with two different sets of
coefficients. This means we have two sets of scalars x1 , . . . , xk and y1 , . . . , yk
with A = x1 B1 + · · · xk Bk and A = y1 B1 + · · · yk Bk . If we subtract one of
these linear combinations from the other we get
Since our two sets of coefficients are different, we must have at least one place
where xi − yi 6= 0. I’ll assume for ease of notation that this happens at i = 1,
i.e., x1 − y1 6= 0. In that case, we can subtract (x1 − y1 )B1 from both sides of
our equation to get
What we’ve just seen is that the coefficients used to write a matrix as
a linear combination of our spanning set aren’t unique exactly when the
spanning matrices are linearly dependent. This means that we can avoid this
situation by requiring that our spanning set be linearly independent, which
leads to the following definition. Note that there is nothing in our exploration
above that doesn’t generalize to a spanning set in any vector space, so I’ll
state the definition in those general terms.
If you look at the 3rd and 5th entries of these vectors corresponding to the
free variables x3 and x5 , you’ll see that all vectors have a 0 in that spot except
the vector containing the coefficients of that free variable which has a 1. This
means we cannot form a nontrivial linear combination of these vectors with all
coefficients equal to zero. Thus our spanning vectors are linearly independent
and so form a basis for N ul(A).
Note that there was nothing special about the matrix A and its null space
in the example above. Although we didn’t know it at the time, our method
for finding a spanning set for N ul(A) actually finds a basis.
1 0 0 1 0 0 0 0
Example 3. The matrices , , , and are a basis
0 0 0 0 1 0 0 1
for M22 .
To see that these four matrices form a basis for M22 , we need to check that
they span M22 and are linearly independent. We can check that they span in
the same way we did in Example 1, by setting a linear combination of them
3.1 Basis and Coordinates 201
We can form a similar basis for Mmn for any m and n by using mn matrices
each of which has a 1 in one entry and zeros everywhere else. This is what
is usually called the standard basis for Mmn . The only occasional point of
confusion is how to order these matrices. Most mathematicians order them
by the position of the 1 as in the example above, i.e., from left to right along
each row from the top to the bottom.
Since spanning and linear independence don’t depend on the order of the
set of matrices, this basis also gives us several more bases for M22 by simply
reordering these four matrices. However, there are also a vast array of other
bases for M22 .
1 0 0 2 0 0 0 3
Example 4. The matrices , , , are also a
0 −1 −2 0 0 2 0 0
basis for M22 .
x1 = a
2x2 + 3x4 = b
−2x2 = c
−x1 + 2x3 = d
The first equation clearly tells us that x1 = a, and we can solve the third
equation for x2 to get x2 = − 12 c. Plugging x1 = a into the fourth equation
gives us −a + 2x3 = d. Solving for x3 now gives us x3 = 12 d − 12 a. Plugging
x2 = − 12 c into the second equation gives us 2(− 12 c) + 3x4 = b which simplifies
to −c + 3x4 = b. Solving for x4 , we get x4 = 13 b + 13 c. Since these solutions
make sense no matter what values of a, b, c, d we used, our four matrices span
M22 and are therefore a basis.
As with Mmn , our standard basis is by no means the only basis for Rn .
1
2 1
Example 6. Another basis for R is and .
1 −1
While we could use the Invertible Matrix Theorem to check that these two
vectors are a basis for R2 , let’s practice a more geometric approach.
If we sketch a picture of these two 2 vectors and the lines they span in the
-2 -1 1 2
-1
From this picture, it is clear that neither of these vectors lies along the
line which is the span of the other. This
-2
means they are linearly independent.
To see that they span all of R2 , let’s think about the dimension of
1 1
Span , .
1 −1
The two spanning vectors are linearly independent, so their span is 2D, i.e.,
2 2 1
a plane. Since R is a plane, these two vectors must span R . Therefore
1
1
and are a basis for R2 .
−1
The picture above also suggests a reason we might choose to work with
this basis instead of the standard basis for R2 : working with reflection about
204 Vector Spaces
1
line y = x. Since this line is the span of , our first basis vector is fixed
1
1
by that reflection. The other basis vector is sent to its negative by that
−1
reflection. This means that if we put a general vector in terms of this basis it
becomes very easy to see what
happens to it under reflection about y = x. In
1 1 1 1
particular, if ~v = a +b , then ~v ’s reflection is a −b .
1 −1 1 −1
Note that the size of a B-coordinate vector is the number of vectors in the
basis B.
1 −1 5
Example 7. What is the coordinate vector of the matrix A =
2 0 8
1 0 0 0 1 0 0 0 1
with respect to the standard basis , , ,
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
, , for M23 ?
1 0 0 0 1 0 0 0 1
Since our coordinate vector is made up of the coefficients used to write A
as a linear combination of the basis vectors, we’ll start by solving for those
coefficients. This means solving the equation
1 0 0 0 1 0 0 0 1 0 0 0
x1 + x2 + x3 + x4
0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 −1 5
+ x5 + x6 = .
0 1 0 0 0 1 2 0 8
The left side simplifies to give us
x1 x2 x3 1 −1 5
=
x4 x5 x6 2 0 8
3.1 Basis and Coordinates 205
If you noticed that the entries of this coordinate vector were the same as
the entries of the matrix A read off from left to right across each row starting
with the first row and working downward, you have hit on the easiest way
to compute coordinate vectors with respect to the standard basis for M23 . In
fact, this method works for any matrix in any Mmn as long as you are using
the standard basis. The ease of finding coordinate vectors with respect to the
standard basis for Mmn is why this is the basis most commonly used, and
hence called standard.
2 6
Example 8. Find the coordinate vector of matrix A = with
−3 14
1 0 0 2 0 0 0 3
respect to the basis B = , , , for M22 .
0 −1 −2 0 0 2 0 0
Here we are not using the standard basis for M22 , so we can’t just read
off matrix entries and will have to use the general method of finding a linear
combination of the basis vectors which equals our matrix A. We could go back
to Example 4 and plug in our values for a, b, c, d, but I’ll do this problem from
scratch to more accurately reflect the solution process we’d use if we hadn’t
done that previous work.
Let’s start by writing our matrix A as a linear combination of our basis
via the equation
1 0 0 2 0 0 0 3 2 6
x1 + x2 + x3 + x4 =
0 −1 −2 0 0 2 0 0 −3 14
x1 = 2
2x2 + 3x4 = 6
−2x2 = −3
−x1 + 2x3 = 14
206 Vector Spaces
We can also find coordinate vectors in Rn . You can convince yourself that
if we use the standard basis for Rn discussed above then the coordinate vector
of any vector ~v is just that vector itself. However, we will sometimes want to
work in terms of another basis for Rn , especially in Chapter 4.
1 1
Example 9. If we use the basis B = , for R2 , what is the
1 −1
−3
B-coordinate vector of ~v = ?
7
To find this coordinate vector, we need to solve
1 1 −3
x1 + x2 = .
1 −1 7
The left side simplifies to give us
x1 + x2 −3
=
x1 − x2 7
so we need to solve x1 + x2 = −3, x1 − x2 = 7 which has augmented coefficient
matrix
1 1 −3
.
1 −1 7
This row reduces to
1 0 2
0 1 −5
3.1 Basis and Coordinates 207
2
so our solution is x1 = 2 and x2 = −5. Thus [~v ]B = .
−5
To check our answer is correct, we can compute
1 1 2−5 −3
2 −5 = = .
1 −1 2+5 7
We came up with the idea of coordinate vectors in order to link any vector
space V to Rn by matching a vector from ~v with its coordinate vector with
respect to some basis B for V . We can formalize this by creating a map from
V to Rn which sends each vector from V to its B-coordinate vector. Since a
coordinate vector has as many entries as there are vectors in the basis, this
map goes to Rn where n is the number of vectors in V ’s basis B.
Definition. Let B = {~b1 , . . . , ~bn } be a basis for a vector space V , then the
B-coordinate map is the function fB : V → Rn by f (~v ) = [~v ]B .
1 0 0 2 0 0 0 3
Example 10. Take the basis B = , , , for
0 −1 −2 0 0 2 0 0
2 6
M22 . Find n so that fB : M22 → Rn , and compute fB .
−3 14
The first part of this question is a fairly straightforward counting problem.
The B-coordinate map connects V , which in our case is M22 with Rn where n
is the number of elements in our basis B. Since B contains four matrices, we
have n = 4. Thus fB : M22 → R4 . This also provides a sanity check for our
computation of fB (A), because we now know our answer must be a 4-vector.
The B-coordinate map sends any matrix to its B-coordinate vector, i.e.,
fB (A) = [A]B . This is the same basis B and matrix A as in Example 8, so
2
32
fB (A) = [A]B = 8 .
No matter which basis B we chose to create fB , this map always has several
nice properties. The first is that every fB is 1-1. To see this, remember that
1-1 means we cannot have ~v 6= w~ with fB (~v ) = fB (w).
~ In this case, that would
mean ~v 6= w ~ B . If two vectors have the same B-coordinate
~ with [~v ]B = [w]
vector, that means they are both equal to the same linear combination of
the basis vectors and are therefore equal. Thus having fB (~v ) = fB (w)
~ means
~v = w,
~ so fB is 1-1.
208 Vector Spaces
The coordinate map fB is also onto for every B. For any vector ~x in Rn ,
we can find ~v in V which has ~x = [~v ]B by letting ~v = x1~b1 + · · · + xn~bn . Thus
every element of the codomain is also in the range, i.e., fB is onto.
Since fB is both 1-1 and onto, it is invertible. This means our coordinate
map has created an exact correspondence between the set of vectors in V and
the set of vectors in Rn , so it can be used to go either direction between the two
vector spaces. The only thing we might worry about is that fB doesn’t create
a similar correspondence between the operations of V and the operations of
Rn . However, life is as good as possible, i.e., fB is a linear function.
To see that fB : V → Rn is linear, we need to check that for every ~v and
~ in V and every scalar r we have
w
fB (~v + w) ~ and fB (r · ~v ) = r · fB (~v ).
~ = fB (~v ) + fB (w)
This is equivalent to checking that
[~v + w] ~ B and [r · ~v ]B = r · [~v ]B .
~ B = [~v ]B + [w]
(Notice that the “+” and “·” on the left-hand sides of these equations are the
operations from V , while the + and · on the right-hand sides are our usual
vector operations from Rn .)
Suppose
~v = x1~b1 + · · · + xn~bn and w
~ = y1~b1 + · · · + yn~bn
so
x1 y1
.. ..
[~v ]B = . and [w]
~ B = . .
xn yn
Then
~ = x1~b1 + · · · + xn~bn + y1~b1 + · · · + yn~bn
~v + w
= (x1 + y1 )~b1 + · · · + (xn + yn )~bn
so
x1 + y1
..
[~v + w]
~ B= . .
xn + yn
Now it is easy to see that
x1 + y1 x1 y1
.
.. . .
[~v + w]
~ B= = .. + .. = [~v ]B + [w]
~ B.
xn + yn xn yn
Similarly,
r · ~v = r(x1~b1 + · · · + xn~bn ) = rx1~b1 + · · · + rxn~bn
3.1 Basis and Coordinates 209
so
rx1
[r · ~v ]B = ... .
rxn
Therefore
rx1 x1
.. ..
[r · ~v ]B = . = r . = r · [~v ]B
rxn xn
so fB is a linear function.
We can summarize this discussion as follows.
has only the trivial solution. However, now that we have the idea of coordinate
maps, we can also answer it by translating the problem into Rn and solving
it there. I’ll show this newer method here, but feel free to go back to 2.4 to
see the older method being used.
To use a coordinate map, I first need to choose a basis for M22 . To make
things as easy as possible, I’ll use the standard basis
1 0 0 1 0 0 0 0
B= , , , .
0 0 0 0 1 0 0 1
210 Vector Spaces
Since we’re using the standard basis, we can find the coordinate vectors of our
three original matrices by reading left to right across each row starting at the
top. This gives us
2 1 1
4 −3 1
[A1 ]B =
−1 , [A2 ]B = −3 , and [A3 ]B = 1 .
0 1 2
Since each column of the reduced echelon form contains a leading 1, the three
coordinate vectors and hence the three matrices A1 , A2 , and A3 are linearly
independent.
Now that we have fB : V → Rn for any basis B = {~b1 , . . . , ~bn } for V , there
is still one important question left to be answered: Is it possible to have two
different bases for V which have different numbers of vectors in them? If so,
this would link V to Rn and Rm for n 6= m. This sounds fairly strange since
the coordinate map is supposed to preserve everything about V , while we
intuitively think of Rn and Rm as different sized spaces for n 6= m. Therefore
our next goal is to show that this cannot happen.
Suppose we have a vector space V with two different bases B = {~b1 , . . . , ~bn }
and C = {~c1 , . . . , ~cm }. We know that ~b1 , . . . , ~bn are linearly independent
vectors, so their C-coordinate vectors in Rm must h i be linearly
h i independent.
This means the m × n matrix with columns b1 , . . . , ~bn must have a
~
C C
reduced echelon form with leading 1s in every column which is only possible
if n ≤ m. Similarly, ~c1 , . . . , ~cm are linearly independent vectors, so their B-
coordinate vectors in Rn must also be linearly independent. This means the
n × m matrix whose columns are [~c1 ]B , . . . , [~cm ]B must have a reduced echelon
form with leading 1s in every column. For this to be possible, we must have
3.1 Basis and Coordinates 211
m ≤ n. Therefore we must have m = n, i.e., our two bases must contain the
same number of vectors.
One way to think about this number n of basis vectors in any basis for
V is that to write down any vector in V , we can choose n different numbers
to use as coefficients to use in the linear combination of our basis vectors.
This means that V allows us n independent choices. This reminds me of our
discussion of the geometric idea of dimension at the start of 1.3 where we said
a n-dimensional object allowed us n different directions of motion. In either
case, we’re specifying a particular vector/point in a space which allows us n
independent choices/directions. We’ll use this connection to create our linear
algebra definition of dimension.
Since every basis contains the same number of vectors, it doesn’t matter
which basis we use to compute the dimension. The standard basis
1 0 0
0 1 0
0 0 .
, , . . . , ..
.. ..
. . 0
0 0 1
contains n vectors – one for each entry in an n-vector. Thus the dimension of
Rn is n.
3 2 0 4 −1
Example 14. Find the dimension of the null space of A = 0 2 6 0 −8.
1 1 1 4 1
This is the matrix from Example 2, where we found that N ul(A) had
212 Vector Spaces
2 −1
−3 4
basis
1 , 0 . Since there are two basis vectors, the null space of A is
0 −1
0 1
two-dimensional.
which doesn’t have a leading 1 in every column. This means our spanning set
isn’t a basis.
In the case of a general spanning set, we’d find one of our vectors to
remove and start the process over. However, here in Rn with row reduction
at our disposal we can be more efficient. The first two columns of the reduced
echelon form are the only ones which have leading 1s in them. This means
the first two columns of A are linearly independent and the other two are not.
This allows us to skip ahead and simply pick the columns of A which produced
leading 1s in the reduced echelon form as our basis for Col(A). Thus a basis
for Col(A) is
1 −4
−4 , −2 .
3 3
This procedure works in general, so we can find a basis for the column
space of a matrix by simply finding the reduced echelon form and choosing as
our basis for Col(A) the columns of A which correspond to the columns of the
reduced echelon form containing leading 1s. Note that it is very important to
use the columns of A as your basis rather than the columns of the reduced
echelon form!
Here’s another fact about dimension that should seem reasonable.
214 Vector Spaces
You can see this illustrated in the example above for W = Col(A) and
V = R3 , since dim(W ) = 2 and dim(R3 ) = 3. This theorem is true because
a basis for W is linearly independent not only as a set of vectors in W , but
also as a set of vectors in V . By an argument very similar to the one where
we showed that every basis of a vector space has the same size, you can now
see that our basis for W can contain at most dim(V ) vectors.
With respect to the standard basis, our matrices from B have coordinate
vectors
1 1 1 1
0 1 1 1
, , , .
0 0 1 1
0 0 0 1
These coordinate vectors span R4 exactly when the matrices from B span M22 .
Four 4-vectors span R4 if and only if the matrix which has them as columns
is invertible. In our case, this would be the matrix
1 1 1 1
0 1 1 1
0 0 1 1 .
0 0 0 1
This matrix has reduced echelon form I4 , so by the invertible matrix theorem
the coordinate vectors span Rn . This means our four matrices from B span
M22 and hence are a basis.
Exercises 3.1.
2 −1 3
1. Do 0, 1 , and −1 form a basis for R3 ?
1 4 −3
3 4 1
2. Do 0, −1, and 2 form a basis for R3 ?
4 5 −3
1 3 0 −2
2 −1 4 0
3. Do 4
3, 5 , −6, 1 form a basis for R ?
4 0 1 2
−6 1 −2 0
4 0 3 8
4. Do 4
0 , −5, 1 , and 4 form a basis for R ?
2 3 1 0
5. Use the Invertible Matrix Theorem to redo Example 6.
6. Could there be a basis for R3 consisting of 5 vectors? Briefly explain
why or why not.
1 1 0 0 1 0 0 1
7. Show that , , , and form a basis for
0 0 1 1 0 1 0 0
M22 .
2 −1 −1 0 0 0 0 2
8. Show that , , , and form a basis
0 0 0 2 2 −1 −1 0
for M22 .
216 Vector Spaces
1 0 0 1 1 0 0 0
9. Do , , and form a basis for M22 ?
0 1 1 0 0 0 1 0
1 0 0 0 1 0 0 0 1 1 1 1 0 0 0
10. Do , , , , , and
1 0 0 0 1 0 0 0 1 0 0 0 1 1 1
1 0 0
form a basis for M23 ?
0 0 1
1 −2 0 4 5
0 0 2 1 3
11. Find a basis for the column space of A = 0 0 0 −1 −1.
0 0 1 2 3
1 3 0
12. Find a basis for the column space of A = −2 2 −8.
0 −1 1
2 0 4 −2
13. Find a basis for the column space of A = .
0 1 −3 1
2 −4 0 2
14. Find a basis for the column space of A = 0 1 −1 2 .
0 −1 2 −3
1 −4 0 1 0 0
15. Find a basis for the null space of A = 0 0 1 −2 0 −1.
0 0 0 0 1 5
0 2 1 0
16. Find a basis for the null space of A = 1 0 1 −1.
0 0 0 9
1 0 2 −1 0
17. Find a basis for the null space of A = 0 −1 3 2 −1.
0 2 −6 −3 4
1 −1 0 2
18. Find a basis for the null space of A = −2 2 1 8 .
−1 1 1 10
2 0 0 1 0 0 −1 0 0
19. Do 0 0 0, 0 0 0, and 0 1 0 form a basis for the
0 0 1
0 0 0 0 0 −1
a 0 0
set W = 0 b 0 ?
0 0 c
3.1 Basis and Coordinates 217
1 0 0 1 0 0 0 0 0
20. Do 0 1 0, 0 −1 0, and 0 1 0 form a basis for the
0 01 0 0 0 0 0 −1
a 0 0
set W = 0 b 0 ?
0 0 c
21. What is the dimension of M49 ?
22. What is the dimension of M25 ?
−6 −5 14
1 −4 −59
23. What is the dimension of the null space of
4
?
1 −32
−1 1 −3
2
2 6 −1 −7
24. What is the dimension of the column space of 1 3 −2 −2?
−4 −12 0 16
25. What is the codomain of the coordinate map fB for any basis B of
M34 ?
26. Why was it so important to us that every basis of a vector space
contain the
same number of vectors?
5 2 −1
27. Let B = 10 , 1 , 0 be a basis for R3 .
0 4 1
3
(a) Find the vector ~v where [~v ]B = −2.
6
−4
(b) Find the coordinate vector of w ~ = 0 with respect to B.
−7
0 1 −2
28. Let B = 3 , −1 , 0 be a basis for R3 .
−4 2 2
4
(a) Find the vector ~v where [~v ]B = −1.
2
6
(b) Find the coordinate vector of w ~ = −7 with respect to B.
10
3
−1 0 4
2 2 1 −6
29. Let B = , , , be a basis for R4 .
0 5 7 0
1 0 −2 3
218 Vector Spaces
3
0
(a) Find the vector ~v where [~v ]B = −2.
−1
−12
13
(b) Find the coordinate vector of w ~ =
12 with respect to B.
−9
1 −1 1 −1
−1 , , , 0 be a basis for R4 .
0 1
30. Let B = 0 1 0 −1
−1 −1 −1 −1
−2
3
(a) Find the vector ~v where [~v ]B = 1 .
−4
−3
2
(b) Find the coordinate vector of w ~ = with respect to B.
7
13
2 0 0 0 0 −1 0 1
31. Let B = , , , be a basis for M22 .
0 2 0 1 0 0 1 0
3
2
(a) Find the vector ~v where [~v ]B =
.
−1
4
1 2
(b) Find the coordinate vector of w ~= with respect to B.
3 4
1 0 1 0 0 1 0 1
32. Let B = , , , be a basis for M22 .
0 0 0 1 0 0 1 0
1
−2
(a) The vector ~v whose B-coordinate vector is 3 .
−1
2 4
(b) Find the B-coordinate vector of ~v = .
6 8
2 −1 −1 0 0 0 0 2
33. Let B = , , , be a basis for
0 0 0 2 2 −1 −1 0
M22 .
3.1 Basis and Coordinates 219
2
−1
(a) Find the vector ~v where [~v ]B =
3 .
1
5 −2
(b) Find the coordinate vector of w ~ = with respect to
−3 7
B.
1 1 1 1 1 1 1 1 1 1 1 0 1 0 1
34. Let B = , , , , ,
1 1 0 1 0 1 0 1 1 1 1 1 1 1 1
0 1 1
be a basis for M23 .
1 1 1
−1
2
0
(a) Find the vector ~v where [~v ]B =
1 .
−3
4
−3 3 −4
(b) Find the coordinate vector of w ~= with respect
0 −6 5
to B.
1 2 4 3
35. Use a coordinate map to check whether ~v1 = , ~v2 =
3 4 2 −1
−2 1
and ~v3 = are linearly independent or linearly dependent.
4 9
2 0 1 1
36. Use a coordinate map to check whether ~v1 = , ~v2 = ,
0 2 0 1
0 1
and ~v3 = in M22 are linearly independent or linearly
1 0
dependent.
1 2 4 3
37. Use a coordinate map to check whether ~v1 = , ~v2 = ,
3 4 2 −1
−1 0 −2 1
~v3 = , and ~v4 = span M22 .
4 1 4 9
2 0 1 1
38. Use a coordinate map to check whether ~v1 = , ~v2 = ,
0 2 0 1
0 2 0 1
~v3 = , and ~v4 = span M22 .
2 0 1 0
220 Vector Spaces
From earlier algebra classes, we already know how to add two polynomials
together by adding the coefficients on corresponding powers of x and how
to multiply a polynomial by a scalar by multiplying each coefficient by that
scalar. Although polynomials don’t look like matrices or vectors, we can use
these operations to view P as a vector space.
10. If we have two polynomials p(x) and q(x) and a scalar r, we know
Note that Pn also includes polynomials whose degrees are smaller than n.
If that bothers you, think of those polynomials as having coefficient 0 on the
missing powers of the x.
222 Vector Spaces
We could check this using our original vector space definition, but a much
easier way to approach this is to show instead that Pn is a subspace of P .
After all, the operations are the same and the polynomials in a given Pn are
certainly a subset of P . This means we need to show that ~0P is in Pn and
that Pn is closed under polynomial addition and scalar multiplication.
The zero vector of P is ~0P = 0. Since 0 is a constant, its degree is 0 which
is less than any positive n. Therefore ~0P is in Pn for any n.
To show closure of addition, suppose we have two polynomials p(x) and
q(x) in Pn . We need to show p(x) + q(x) is also in Pn . The sum of two
polynomials in x is another polynomial in x. Since p(x) and q(x) are both in
Pn , their degrees are both less than or equal to n. Adding polynomials never
raises their degree, so p(x) + q(x) will have a degree at most the larger of
deg(p) and deg(q). Therefore p(x) + q(x)’s degree is at most n which means
p(x) + q(x) is in Pn .
To show closure of scalar multiplication, suppose we have a polynomial
p(x) in Pn and a scalar r. We need to show rp(x) is also in Pn . A scalar
multiple of a polynomial in x is another polynomial in x. As above, we have
deg(p) ≤ n. Multiplying a polynomial by a constant never raises its degree,
so deg(rp) ≤ deg(p) ≤ n which means rp(x) is in Pn .
Thus Pn is a subspace of P , and therefore a vector space in its own right.
Now that we know P and Pn are vector spaces, we can develop the same
ideas and tools for working with them that we have for Rn and Mmn . We’ll
start with the idea of linear independence.
Our general definition of linear independence involved the vector equation
x1~v1 + · · · + xk~vk = ~0. Here we’ve already got an x as part of our polynomials,
so I’ll call our coefficients a1 , . . . , ak to avoid confusion. This will mean that
we’ll be solving the equation a1~v1 + · · · + ak~vk = ~0 for the ai ’s not for the x
that’s part of our polynomials!
For a polynomial to equal the zero polynomial, we need each of its coefficients
to be zero. This gives us four equations: a2 = 0 from the x3 term, a1 = 0 from
the x2 term, −2a1 − 2a2 + 6a3 = 0 from the x term, and a1 + a2 − 3a3 = 0
from the constant term. Since the first two equations tell us a1 = a2 = 0, we
can plug that into either the third or fourth equation to see a3 = 0.
Since all coefficients in our equation must be 0, our polynomials are linearly
independent.
We already saw that Pn is a subspace of P , but we can also find many other
subspaces of P and Pn . In the following example we see one such subspace
and also explore the span of a set of polynomials.
We could check this using the subspace test along the lines of our check
that Pn is a subspace of P , but I’ll instead write W as the span of a set of
polynomials. As we saw in 2.3, spans are always automatically subspaces.
To write W as a span, we need to find a set of polynomials whose linear
combinations give us all of W . In other words, we need to rewrite our generic
element of W as a linear combination of our spanning polynomials. Notice that
there are two important pieces to any polynomial in W , the part controlled
by a and the part controlled by b. Separating our generic element of W into
the sum of those two parts gives us
We can pull an a out of the first part and a b out of the second part to get
W = Span{x2 + 1, x + 1}
so W is automatically a subspace of P4 .
Now that we understand how to work within Pn , let’s explore the option
of mapping problems to Rn using a basis by finding standard bases for P and
Pn along with their coordinate maps.
The standard basis vectors for Pn are xn , xn−1 , . . . , x2 , x, 1. However, there
is no widespread consensus about whether this list should start with 1 and go
224 Vector Spaces
Theorem 3. dim(Pn ) = n + 1
Now that we have our standard basis B for Pn we can take advantage of
the corresponding coordinate map fB : Pn → Rn+1 . For example, fB gives us
a new way to answer the following question.
Since we have a leading 1 in each column of the reduced echelon form, our
three B-coordinate vectors and hence x2 − 2x + 1, x3 − 2x + 1, and 6x − 3 are
linearly independent.
As with Rn and Mmn , the standard basis is by no means the only basis
for Pn .
−3 −2 −1 1
which has reduced echelon form I4 . Since each column of the reduced echelon
form has a leading 1, our four coordinate vectors span R4 and hence our four
polynomials span P3 . Therefore x3 − 3, x2 − 2, x − 1, and x3 + x2 + x + 1 form
a basis for P3 .
226 Vector Spaces
The situation for P is a little stranger. Since there is no upper limit on the
degree of polynomials in P we need to include xk for every positive integer k.
You can convince yourself that this is a basis using a very similar argument
to that for Pn .
Since our basis for P has infinitely many basis vectors, we are forced to
conclude the following.
Theorem 4. dim(P ) = ∞
While we could conceivably still use a coordinate map for P , we’d have
to map to R∞ (which we haven’t defined) so we’ll just stick with working in P .
for all polynomials p(x) and q(x) and all scalars r. Let’s fix the notation
p(x) = ax2 + bx + c and q(x) = αx2 + βx + γ.
Our first check is that f splits up over addition.
For our function, this means solving for a, b, and c in the equation
a + c b − 2c 0 0
= .
0 a+c 0 0
If we aren’t sure this is correct, remember we can always check our work by
computing f (−cx2 + 2cx + c) and making sure we get the 2 × 2 zero matrix.
The range of f is the set of all 2 × 2 matrices which
are
outputs of our
v w
function. In other words, it is the set of all matrices where
y z
v w
f (ax2 + bx + c) =
y z
for some polynomial ax2 + bx + c. We can find these matrices by solving for
a, b, and c in the equation
a + c b − 2c v w
= .
0 a+c y z
It is now clear that our first two rows are not an issue, because they contain
leading 1s which aren’t in the last column. Our third row is a potential problem
though, because its equation is 0 = z −v which
only
has a solution when z = v.
v w
This means that to have a 2 × 2 matrix in the range of f means we
y z
need y = 0 and z = v. Thus
v w
range(f ) = .
0 v
Note: Since the kernel is a subspace of the domain, our kernel must be a
set of polynomials, not a set of matrices. On the other hand, the range is a
subspace of the codomain, and so must be a set of 2 × 2 matrices not a set of
polynomials. It is often helpful to remind yourself of the types of objects in
the kernel and range to check your work.
Exercises 3.2.
1. Let W = {ax2 + bx} where a is even and b is positive. (Both a
and b are real numbers.) Show that W is closed under the usual
polynomial addition but not under scalar multiplication.
2. Redo Example 2 using the subspace test.
3. Show that W = ax2 with the usual polynomial addition and
scalar multiplication is a vector space.
4. Let V = {2ax2 + 2bx + 2c} be the subset of P2 where all coefficients
are even. Show that with the usual polynomial addition and scalar
multiplication, V is a vector space.
5. Show that 1 + x2 , x, and 1 are a basis for P2 .
6. Show that x2 + 6x + 9, x2 − 9, and −2x + 1 are a basis for P2 .
7. Is B = {1 + x2 , 1 − x2 , 4} a basis for P2 ?
8. Is B = {−x3 + x2 − x + 1, x3 + x, x2 − x + 1, 2x2 + 2} a basis for P3 ?
3.2 Polynomial Vector Spaces 229
9. The set B = 2x2 , x2 − x + 1, 3x is a basis for P2 .
−1
(a) Find the vector ~v in V whose B-coordinate vector is 3 .
2
2
(b) Find the coordinate vector of 3x + 10x − 1 with respect to
basis B.
10. The set B = {1, 1 + x, 1 + x + x2 } is a basis for P2 .
(a) Find the polynomial ~vwhose coordinate vector with respect to
1
this basis is [~v ]B = 2.
3
(b) Find the coordinate vector of ~v = 4 + 2x + 3x2 with respect to
this basis.
11. The set B = {x2 + x + 1, 2x2 + 2, x2 } is a basis for P2 .
1
(a) Find the polynomial whose B-coordinate vector is 2.
3
(b) Find the B-coordinate vector of x2 − 4x + 6.
12. The set B = {x + 2, 3x + 1} is a basis for P1 .
−1
(a) Find the vector ~v which has [~v ]B = .
2
(b) Find the B-coordinate vector of x − 1.
(c) Find the codomain, W , of the B-coordinate map fB : P2 → W .
13. Decide whether or not 6x + 4 is in the span of x2 + x + 1 and
x2 − 2x − 1 without using a coordinate map.
14. Decide whether or not 2x2 − x + 10 is in the span of x − 2 and
x2 + x + 1 without using a coordinate map.
15. Use a coordinate map to decide whether or not 6x + 4 is in the span
of x2 + x + 1 and x2 − 2x − 1.
16. Use a coordinate map to decide whether or not 2x2 − x + 10 is in
the span of x − 2 and x2 + x + 1.
17. Redo Example 4 in P3 , i.e., without using a coordinate map.
18. Decide whether x2 − x + 1, x2 + 2x + 3, and 5x2 − 2x are linearly
independent or linearly dependent without using a coordinate map.
19. Use a coordinate map to see if x3 − x2 − 2, −3x3 + 2x2 + x − 1, and
−x2 + x − 7 are linearly independent or linearly dependent.
20. Use a coordinate map to see if x2 − x + 1, x2 + 2x + 3, and 5x2 − 2x
are linearly independent or linearly dependent.
230 Vector Spaces
a b
21. Let f = ax2 + (b + c)x + a.
c d
(a) What is the domain of f ?
(b) What is the codomain of f ?
2 a+b 0
22. Let f (ax + bx + c) = .
a+c a+b
(a) What is the domain of f ?
(b) What is the codomain of f ?
a b c
23. Let f : M23 → P4 by f = ax4 + (b + c)x2 + (d − f ).
d e f
(a) Find the kernel of f .
(b) Find the range of f .
2 2a 0
24. Let f : P2 → M22 by f = (ax + bx + c) = .
0 b+c
(a) Find the kernel of f .
(b) Find the range of f .
2 a−b b+c
25. Let f : P2 → M22 be given by f (ax +bx+c) = .
a + b + 2c 0
(a) Find the kernel of f .
(b) Find the range of f .
a−b+c 0 0
26. Let f : P2 → M33 by f (ax2 + bx + c) = 0 a − b 0 .
0 0 2c
(a) Find the kernel of f .
(b) Find the range of f .
3.3 Other Vector Spaces 231
The first term a is typically called the real part of a complex number, while
the second term is called the imaginary part. (For a more detailed discussion
of C, see Appendix A.1.)
Geometrically, we can think of C as a plane by plotting the complex
number a + bi as the point (a, b). For this reason, many people call the x-
axis the real axis and the y-axis the imaginary axis. To illustrate this, look at
the plot of 4 + 2i in
4 Figure 3.1.
�+�ⅈ
2
-1 1 2 3 4 5
-2
232 Vector Spaces
and
r(a + bi) = (ra) + (rb)i.
Keep in mind that our scalar r is from R not C. (To learn about the rule
for multiplying a complex number by another complex number, see Appendix
A.1.)
Following our rule above, to add these two complex numbers, we add their
real and imaginary parts separately. This gives us
To multiply this complex number by −10, we multiply both the real and
imaginary parts by −10. This gives us
(a + bi) + (x + yi) = x + yi
(r + s)(a + bi) = ((r + s)a) + ((r + s)b)i = (ra + sa) + (rb + sb)i
= ((ra) + (rb)i) + ((sa) + (sb)i) = r(a + bi) + s(a + bi).
Now that we know C is a vector space, we can explore all the properties
we looked at in Rn , Mmn , and Pn . Let’s start with subspaces.
Before we run our subspace test, notice that W can be thought of as all
complex numbers a + bi where a = 0. Since these complex numbers have no
real part, they are often called purely imaginary numbers. Geometrically, this
means W is the set pictured below.
2
� = {�ⅈ}
-2 -1 1 2
-1
r(bi) = (rb)i
In fact, 1 and i are the standard basis for C, although their ordering isn’t
standard so be careful to specify whether you’re using the basis 1, i or the
basis i, 1.
Since C has a basis with two elements, we know:
Theorem 2. dim(C) = 2
Now that we have access to a coordinate map via our standard basis, let’s
explore linear independence in C.
Now that we’re only talking about two complex numbers, it is possible for
them to be linearly independent. Let’s check using the coordinate map with
respect to the standard basis B = {1, i}. Coordinate vectors with respect to
this basis have first entry equal to the real part of the complex number (the
part without the i) and second entry equal to the coefficient on i. This means
our two coordinate vectors are
−2 10
[−2 + 14i]B = and [10 − 6i]B = .
14 −6
As with the other vector spaces we’ve seen, we can create linear maps with
C as their domain or codomain.
1
�
�= �
�
-2 -1 1 2
-1
Now that we’ve explored our new vector space C, let’s turn our attention
to another, more familiar, collection of mathematical objects: the set of
continuous functions from R to R, which -2 we will call C . Notice that this set
includes many functions we haven’t talked about in linear algebra so far, like
ex and 2x + sin(x). These may not seem like things that belong in our linear
world, but as we did with polynomials, we can use them as the mathematical
objects, i.e., vectors in a new vector space. Note that as with polynomials, we
mean the “objects” part of this idea seriously. We don’t get to treat ex as a
3.3 Other Vector Spaces 237
function if we want to use it here, so as with P , resist the urge to plug things
into your continuous functions, solve for x, etc.
The other important building block of any vector space is its two operations
+ and ·. For our operations, we’ll use the usual notion of adding and scaling
functions you worked with in calculus.
As with the complex numbers, we can now explore some of our vector
space ideas. Again, let’s start with subspaces.
However, this exposes a problem with finding a basis for C and using its
coordinate map to tackle spans and linear independence. From 3.1 we know
that dim(P ) ≤ dim(C ) and dim(P ) = ∞, so we must have the following.
Theorem 4. dim(C ) = ∞
Exercises 3.3.
1. Show R is a subspace of C.
2. Is W = {1 + bi} a subspace of C?
3. Show that 1 + i and 1 − i form a basis for C.
4. Show that 4 − 2i and 1 − 4i form a basis for C.
5. Is {2 − i, 4 − 2i} a basis for C?
6. Is {2 + 2i, −3 + 3i} a basis for C?
7. Let B = {1 + i, i} be a basis for C.
−3
(a) Find the complex number ~v with [~v ]B = .
5
(b) Find the B-coordinate vector of 2 − 6i.
8. Let B = {1 + i, 1 − i} be a basis for C.
1
(a) Find the complex number ~v with [~v ]B = .
7
(b) Find the B-coordinate vector of 8 + 3i.
3.3 Other Vector Spaces 239
One way to make this less complicated is to set enough entries of A equal to
zero so that each entry of their product contains only one term rather than
being a sum of two terms. We can eliminate one term from the sum in each
a 0
matrix entry by making b and c equal zero. This would make A =
0 d
241
242 Diagonalization
ax ay
and AZ = . We could have set other entries of A equal to zero, but
dz dw
this gives the nice additional result that each row of Z is being scaled by the
corresponding entry of A.
We could also go back and focus on the entries of Z, but instead let’s
generalize the type of matrix we created with A.
(Remember that our notation is that aij is the entry of A in the ith row
and jth column.)
2 0 0
Example 1. A = 0 0 0 is a diagonal matrix.
0 0 −3
All entries off the diagonal are zero, so this matrix is diagonal. (Notice
that one of our diagonal entries is also zero, which is fine because diagonal
entries are allowed to be any real number.)
Notice that the product above does indeed work out to be a diagonal
matrix whose entries are the products of the diagonal entries of our two
diagonal matrices.
Another way to view multiplication by a diagonal matrix is to decompose
our diagonal matrix into a product of elementary matrices. Each elementary
matrix will correspond to a row operation where we scale some row by a
constant. This allows us to compute the matrix product AB where A is
a diagonal matrix by scaling each row of B by A’s diagonal entry in that
row. (This is the bonus nice property we noticed in our discussion before the
definition of a diagonal matrix.)
4.1 Eigenvalues and Eigenvectors 243
Example 3. Use
the multiplication
technique
described
above to compute
1 0 0 2 7 4
AB where A = 0 3 0 and B = −1 5 2.
0 0 −2 6 −3 0
We could do this using our usual method for matrix multiplication, but
it’s easier to simply multiply each row of B by the corresponding diagonal
entry of A. For our A, this means multiplying B’s first row by 1, its second
row by 3, and its third row by −2. This gives us
2(1) 7(1) 4(1) 2 7 4
AB = −1(3) 5(3) 2(3) = −3 15 6 .
6(−2) −3(−2) 0(−2) −12 6 0
Notice that our cubed matrix’s diagonal entries are the cubes of the original
matrix’s diagonal entries.
We can generalize what happened here to say that we can take the kth
power of a diagonal matrix by simply taking the kth power of each of the
matrix’s diagonal entries.
5
−1 0 0 0
0 3 0 0
Example 5. Compute
0
.
0 2 0
0 0 0 −2
This would be annoying to do without our shortcut, but it’s easy to do by
taking the 5th power of each diagonal entry. Then our computation is just
5
−1 0 0 0 (−1)5 0 0 0
0 3 0 0 (3)5 0
= 0 0
0 0 2 0 0 0 (2) 5
0
0 0 0 −2 0 0 0 (−2)5
−1 0 0 0
0 243 0 0
= .
0 0 32 0
0 0 0 −32
4
2 0 0 5
Example 7. Compute 0 0 0 11 .
0 0 −3 −2
4
2 0 0 5
We can either compute 0 0 0 and then multiply it by 11 , or
0 0 −3 −2
think of repeating the multiplication in Example 6 four times in a row. In
either case, we’ll get
4
2 0 0 5 24 (5) 16(5) 80
0 0 0 11 = 04 (11) = 0(11) = 0 .
0 0 −3 −2 (−3)4 (−2) 81(−2) −162
The only way this could be easier is if all diagonal entries of our matrix were
equal. Then we could simply multiply all entries of our vector by the same
number. This is possible, and sometimes happens even when our matrix A
1 2 1
isn’t diagonal. For example, consider A = and ~x = . Their product
4 3 2
1 2 1 5
is A~x = = . If we compare this with ~x, we see that A~x = 5~x so
4 3 2 10
multiplication by A scales each entry of ~x by 5 even though A is not diagonal!
This is a very special situation that we’ll spend some time exploring. We start
by giving the vectors that have this special sort of relationship with A a name.
To see this, we need to check that A~y = (−1)~y . Doing this computation
gives us
1 2 −1 −1 + 2 1 −1(−1) −1
= = = = (−1) .
4 3 1 −4 + 3 −1 (−1)1 1
To figure out what happens to our population in the long run, we want
to multiply our population vector ~x by increasingly large powers of our
demographic matrix A. In other words, we want to look at the limit of Ak ~x
as k → ∞.
Since A~x = 0.75~x, we know A2 ~x = (0.75)2 ~x. Multiplying by A again gives
us
A3 ~x = A(0.75)2 ~x = (0.75)2 A~x = (0.75)2 (0.75)~x = (0.75)3 ~x.
Continuing this pattern gives us Ak ~x = (0.75)k ~x.
From calculus we know that lim (0.75)k = 0. This means that as k → ∞,
k→∞
the limit of Ak ~x = (0.75)k ~x is 0~x = ~0.
Since our population vector had a limit of ~0, our population dies out in
the long run.
In ~x = ~x, we must have λIn ~x = λ~x. This makes our equation A~x − λIn ~x = ~0.
Now we can factor ~x out of the left-hand side, leaving A − λIn . Since both A
and λIn are n × n matrices, this now makes sense. Our factored equation is
(A − λIn )~x = ~0 which has the familiar format of a matrix equation.
Unfortunately, we are still at a bit of a loss. The matrix in our matrix
equation above contains the unknown variable λ. This means we are trying
to solve for two unknown quantities, ~x and λ, simultaneously within the same
equation. Additionally, while we usually invoke row reduction to help solve
matrix equations, it is less effective with a variable in the matrix’s entries. We
need a new tool which we’ll develop in the next section. In 4.3, we’ll come
back to the problem of finding eigenvectors and eigenvalues from the equation
(A − λIn )~x = ~0 armed with the idea of the determinant.
Exercises 4.1.
2
−9 0 0
1. Compute 0 4 0 .
0 0 7
3
2 0 0
2. Compute 0 −3 0 .
0 0 5
6
−1 0 0
3. Compute 0 0 0 .
0 0 2
3
1 0 0
4. Compute 0 −5 0 .
0 0 6
2 0 0
5. Let B = 0 −3 0.
0 0 5
3 −1 6
(a) Compute AB for A = 2 1 −4.
0 7 −2
(b) Come up with a shortcut for multiplication on the right by a
diagonal matrix which is similar to the one used in Example 3
for multiplication on the left.
10 0 0
6. (a) Find the inverse of 0 −5 0 or show it is not invertible.
0 0 6
−1 0 0
(b) Find the inverse of 0 0 0 or show it is not invertible.
0 0 2
248 Diagonalization
(c) Give a rule which lets you easily see when a diagonal matrix is
invertible.
(d) Give a formula for the inverse of a diagonal matrix. (Your
formula should not involve row reduction.)
7. Let W be the set of n × n diagonal matrices. Show W is a subspace
of Mnn .
−3 2 −2 0
8. Let ~v = 0 and A = −2 4 1. Decide whether or not ~v is
−6 0 1 2
an eigenvector of A. If it is, give its eigenvalue λ.
2 2 −2 0
9. Let ~v = 1 and A = −2 4 1. Decide whether or not ~v is
−1 0 1 2
an eigenvector of A. If it is, give its eigenvalue λ.
5 1 2 1
10. Let ~v = 0 and A = 0 −4 0 . Decide whether or not
−10 −2 1 −2
~v is an eigenvector of A. If it is, give its eigenvalue λ.
11 3 0 0
11. Let ~v = −2 and A = 0 1 −1. Decide whether or not ~v is
4 2 −1 −3
an eigenvector of A. If it is, give its eigenvalue λ.
1 3 −2 4
12. Let ~v = 2 be an eigenvector of A = 0 4 −2 with
1 4 0 0
4
eigenvalue λ = 3. Compute A ~v .
1 1 0 −1
13. Let ~v = 17 be an eigenvector of A = 7 −2 −6 with
4 4 0 −4
eigenvalue λ = −3. Compute A2~v .
2 6 2 8
14. Let ~v = 1 be an eigenvector of A = 4 −1 −9 with
0 0 0 5
eigenvalue λ = 7. Compute A2~v .
−4 3 −2 4
15. Let ~v = 2 be an eigenvector of A = 0 4 −2 with
2 −1 0 0
eigenvalue λ = 2. Compute A5~v .
x1 6x1 + 2x2 + 8x3 −30
16. Let f x2 = 4x1 − x2 − 9x3 . The vector ~v = −41 is an
x3 5x3 14
eigenvector of f ’s matrix with eigenvalue λ = 5. Use this to compute
f (~v ).
4.1 Eigenvalues and Eigenvectors 249
x1 −6x1 + 5x2 − 4x3 −13
17. Let f x2 = 10x2 + 7x3 . The vector ~v = −35 is
x3 −x2 + 2x3 5
an eigenvector of f ’s matrix with eigenvalue λ = 9. Use this to
compute f (~v ).
x1 x1 − x3 0
18. Let f x2 = 7x1 − 2x2 − 6x3 . The vector ~v = −5 is
x3 4x1 − 4x3 0
an eigenvector of f ’s matrix with eigenvalue λ = −2. Use this to
compute f (f (~v )).
x1 3x1 − 2x2 + 4x3 −4
19. Let f x2 = 4x2 − 2x3 . The vector ~v = 2 is an
x3 −x1 1
eigenvector of f ’s matrix with eigenvalue λ = 3. Use this to compute
f (f (f (~v ))).
20. In Example 9, we saw that starting with a population eigenvector
of a demographic matrix with eigenvalue λ = 0.75 meant the
population would die out in the long run. What if the population
eigenvector had eigenvalue λ = 1.6?
21. Put together Example 9 and Exercise 20 to come up with a condition
on λ so that a population whose population eigenvector with
eigenvalue λ dies out in the long run. Find another condition on
λ so that the population grows in the long run.
250 Diagonalization
4.2 Determinants
In this section we’ll develop a computational tool called the determinant which
assigns a number to each n × n matrix. Then we’ll discuss what that number
tells us about the matrix. We’ll start in the smallest interesting case where
n = 2. Here our definition of the determinant is a byproduct of computing of
the inverse of a general 2 × 2 matrix. This will get a bit messy in the middle,
but hang in until the computational dust settles and you’ll see something
interesting happen.
a b
Let A = . To find the inverse, we need to row reduce [A | I2 ]. This
c d
looks like
" #
b 1
a b 1 0 1 1 ab a1 0 1 a a 0
→ ( · r1 ) → (r2 − c · r1 ) .
c d 0 1 a c d 0 1 0 d − bc
a
c
a 1
bc ad − bc
Since d − = , the last matrix can be rewritten as
a a
" #
b 1
1 a a 0
.
0 ad−bc
a
c
a 1
where
1 b c ad − bc bc ad − bc + bc
F= − − = + =
a a ad − bc a(ad − bc) a(ad − bc) a(ad − bc)
ad d
= = .
a(ad − bc) ad − bc
Substituting this back into our matrix gives us
" #
d b
1 0 ad−bc − ad−bc
c a .
0 1 − ad−bc ad−bc
Notice that ad is the product of the two diagonal entries, and bc is the
product of the two non-diagonal entries. Therefore one way to remember this
formula is to think of it as “the product of the diagonal entries minus the
product of the non-diagonal entries”.
3 2
Example 1. Compute the determinant of A = .
6 5
Here a = 3, b = 2, c = 6 and d = 5. Plugging those values into the formula
above, we get
det(A) = 3(5) − 2(6) = 3.
Example 2. Find the signs of the determinants for the maps f and g
discussed above.
Before we can compute these two determinants, we need to find the matrix
of each map. Let’s call f ’s matrix A and g’s matrix B to avoid any confusion.
We saw in 2.2 that we can find the matrix of a map from a geometric
2 1
description of its action on R by putting the image of in the first column
0
0
and the image of in the second column.
1
1
Our map f is reflection about the line y = x, so geometrically it sends
0
2
0 0 1
to and to .
1 1 0
�
�
�
-2 -1 1 2
�
�
�
-1 0 1
This means f ’s matrix has first column and second column , so
1 0
0 1
A= .
1 0
◦ 1
The other map g is counterclockwise
-2 rotation by 90 , so it sends to
0
0 0 −1
and to .
1 1 0
2
�
�
�
-2 -1 1 2
�
�
�
0 −1
This means f ’s matrix has first column
-1 and second column , so
1 0
0 −1
B= .
1 0
Computing our two determinants, we get det(A) = 0(0) − 1(1) = −1 and
det(B) = 0(0) − (−1)(1) = 1. Thus, as expected, the map that flips the plane
has a negative determinant and the-2map that doesn’t flip the plan has a
positive determinant.
2.0
Next let’s see what determinants can tell us about the effect that a matrix’s
map has on area. We’ll explore this by looking at the effect of the map on the
unit square, i.e., the area enclosed by the unit vectors along the x and y axes
1.5
in Figure 4.1.
1.0
0.5
Figure
-0.5 4.1: The unit square
This square has area 1, so we can compare the area of the unit square’s
image under various matrix maps to 1 to see what effect those maps have on
areas. -1.0
254 Diagonalization
To see this, we need to compute the image of the unit square under each
map. These are shown below. 1.5
1.0
�
�
�
0.5
2.0
-0.5
1.0
-1.0 �
�
�
0.5
From the pictures above, we can see that both of these images still have
area 1.
-1.0
2 1
Example 4. What effect does the matrix A = have on areas?
1 3
To figure this out, we need to find the image of the unit square after
multiplication by A. This is shown in the following picture.
4.2 Determinants 255
�
�
�
�
�
�
-1 1 2 3 4
�
2
-1 1 2 3 4
We can compute the-1 lengths of both b and h using the formula for the
distance between two points in the plane:
p
d = (x1 − x2 )2 + (y1 − y2 )2 .
and p √
h= (1 − 2)2 + (3 − 1)2 = 5.
Thus the area of the image of the unit square is
√ √
b · h = 5 · 5 = 5.
Since we started with a square of area 1 and ended up with an image that
has area 5, we can say A multiplies areas by 5.
To connect area with the determinant, let’s compare each of our three
example maps’ effects on area with the value of their determinants. The map
f from Examples 2 and 3 has determinant −1 and has no effect on area. (We
can also think of this as saying f multiplies area by 1.) Similarly, the map
g from Examples 2 and 3 has determinant 1 and also multiplies area by 1.
Finally, the map A from Example 4 multiplies area by 5 and has determinant
det(A) = 2(3) − 1(1) = 5. These three examples illustrate the general pattern:
the map of a 2 × 2 matrix A multiplies area by | det(A)|.
We can summarize our geometric exploration of the determinant of a 2 × 2
matrix as follows.
(Note that this is different from aij which is the entry of A in the ith row
and jth column.)
2 0 −1 5
3 1 0 −4
Example 5. Find A24 and A31 for A =
−1
.
2 1 1
7 −3 0 1
We can find the first submatrix A24 by removing A’s 2nd row and 4th
column. Crossing out this row and column looks like
4.2 Determinants 2571
2 3
2 0 1 5
63 1 0 47
6 7
4 1 2 1 15
7 3 0 1
which means
2 0 −1
A24 = −1 2 1 .
7 −3 0
Similarly, we can find the second submatrix A31 by removing A’s 3rd row
and 1st column. Crossing out this row and column looks like
2 0 −1 5
3 1 0 −4
−1 2 1 1
7 −3 0 1
which means
0 −1 5
A31 = 1 0 −4 .
−3 0 1
Now that we’ve established our notation, we can state the following
formulas which serve as the n × n to (n − 1) × (n − 1) reduction step at
the heart of our algorithm for computing an n × n determinant. The first
formula is usually called expansion along a row, because it contains one term
for each entry along some particular row of A.
det(A) = (−1)i+1 ai1 det(Ai1 )+(−1)i+2 ai2 det(Ai2 )+· · ·+(−1)i+n ain det(Ain ).
Note that there are n terms, each of which contains three parts: a sign
(plus or minus), an entry of A, and the determinant of an (n − 1) × (n − 1)
submatrix of A.
The a2j are the entries (in order) along the second row of A, so we have
a21 = 3, a22 = 1, a23 = 0, and a24 = −4. The A2j are the 3 × 3 submatrices
of A where we’ve deleted the second row and jth column, so we have
0 −1 5 2 −1 5
A21 = 2 1 1 , A22 = −1 1 1 ,
−3 0 1 7 0 1
2 0 5 2 0 −1
A23 = −1 2 1 , A24 = −1 2 1 .
7 −3 1 7 −3 0
Plugging these back into our formula gives us
0 −1 5 2 −1 5
det(A) = (−1)3 3 det 2 1 1 + (−1)4 1 det −1 1 1
−3 0 1 7 0 1
2 0 5 2 0 −1
+ (−1)5 0 det −1 2 1 + (−1)6 (−4) det −1 2 1 .
7 −3 1 7 −3 0
Alternately, we can use the formula below which is often called expansion
down a column, because it contains one term for each entry down a particular
column of A.
det(A) = (−1)1+j a1j det(A1j )+(−1)2+j a2j det(A2j )+· · ·+(−1)n+j anj det(Anj ).
As with expansion along a row, we have n terms each with the same three
parts: a sign, a matrix entry, and the determinant of a submatrix.
4.2 Determinants 259
Example 7. Implement
this formula
for the determinant down the first
2 0 −1 5
3 1 0 −4
column of A =
−1 2
.
1 1
7 −3 0 1
Since we’re expanding down the first column, we have j = 1. Plugging this
into our formula above gives us
The ai1 are the entries (in order) down the first column, so we have a11 = 2,
a21 = 3, a31 = −1, and a41 = 7. The Ai1 are the 3 × 3 submatrices of A we
get by deleting the ith row and first column, so we have
1 0 −4 0 −1 5
A11 = 2 1 1 , A21 = 2 1 1 ,
−3 0 1 −3 0 1
0 −1 5 0 −1 5
A31 = 1 0 −4 , A41 = 1 0 −4 .
−3 0 1 2 1 1
Plugging these into our formula gives us
1 0 −4 0 −1 5
det(A) = (−1)2 2 det 2 1 1 + (−1)3 3 det 2 1 1
−3 0 1 −3 0 1
0 −1 5 0 −1 5
+ (−1)4 (−1) det 1 0 −4 + (−1)5 7 det 1 0 −4 .
−3 0 1 2 1 1
possible. Usually this means picking the row or column with the most zero
entries, because if aij = 0 its whole term is multiplied by zero and can be
ignored.
We can also link aij ’s position to the sign associated with its term in our
sum. The term containing aij has sign (−1)i+j which is positive if i + j is
even and negative if i + j is odd. The top left corner of any matrix is a11 .
Since 1 + 1 = 2 is even, the sign piece of a11 ’s term is positive. If we travel
along a single row, we see that the signs of the terms alternate, because i + j
is followed by i + (j + 1) = (i + j) + 1, which will have the opposite even/odd
parity. Similarly, as we travel down a single column, the signs alternate, since
i + j is followed by (i + 1) + j = (i + j) + 1 which again has the opposite
even/odd parity. We can use this to fill in a matrix with plus and minus signs
4.2 Determinants 261
We can read off the sign of aij ’s term of our determinant sum by checking
which sign is in the ijth spot in the matrix above.
If you like these ideas for finding (−1)i+j and Aij from the position of aij ,
feel free to use them. If not, feel free to use the original formulas for expansion
along a row or down a column.
Now that we understand how to rewrite the determinant of an n × n
matrix A in terms of determinants of (n − 1) × (n − 1) matrices, we can give
the procedure for computing an n × n determinant. Pick a row or column of
A, and use one of the formulas above to rewrite det(A) in terms of smaller
determinants. Pick a row or column of each smaller matrix and repeat this
process. Eventually the smaller matrices will be 2 × 2, where we can compute
their determinants using our ad − bc formula. This process is fairly tedious
for large matrices, but is easily implemented on a computer. We’ll usually
practice with n ≤ 4.
3 0 −5
Example 10. Compute the determinant of A = 1 1 2 using
4 −2 −1
expansion along a row.
262 Diagonalization
Since we are allowed to pick whichever row we like, we should pick the
1st row since it is the only one which has a zero entry. This means we’ll have
i = 1, so
det(A) = (−1)1+1 a11 det(A11 ) + (−1)1+2 a12 det(A12 ) + (−1)1+3 a13 det(A13 ).
Plugging in our matrix entries a1j along the first row and our submatrices A1j
with the first row and jth column deleted gives us
1 2 1 2
det(A) = (−1)2 3 det + (−1)3 0 det
−2 −1 4 −1
1 1
+ (−1)4 (−5) det .
4 −2
Our middle term is 0, so we can remove it and simplify our signs to get
1 2 1 1
det(A) = 3 det − 5 det .
−2 −1 4 −2
and
1 1
det = 1(−2) − 1(4) = −2 − 4 = −6.
4 −2
Plugging these 2 × 2 determinants back into our formula for det(A) gives us
Again, we can pick whichever column we like, so let’s pick the second
column since it has a zero entry. Plugging j = 2 into our formula for the
determinant gives us
det(A) = (−1)1+2 a12 det(A12 ) + (−1)2+2 a22 det(A22 ) + (−1)3+2 a32 det(A32 ).
4.2 Determinants 263
Next we can plug in our matrix entries ai2 down the second column and our
submatrices Ai2 with the ith row and second column removed to get
1 2 3 −5
det(A) = (−1)3 0 det + (−1)4 1 det
4 −1 4 −1
3 −5
+ (−1)5 (−2) det .
1 2
(Notice that this is the same answer we got by expanding along the 1st row!)
There are two classes of matrices where computing the determinant is easy
even when they are very large: lower triangular matrices and upper triangular
matrices. (These were discussed in 2.9 as tools for solving A~x = ~b with an
extremely large A.) If A is lower triangular, then all its entries above the
diagonal are zeros, i.e., aij = 0 if i < j. In particular, this means that the
first row of A looks like a11 0 . . . 0. If we start computing det(A) by expanding
along this top row, we’ll get
The matrix A11 is also lower triangular as shown by the picture below.
a11 0 0 ··· 0
a21 a22 0 · · · 0
.. . .. ..
. .
. ..
.. . 0
an1 an2 · · · ann
264 Diagonalization
The first row of A11 is a22 0 . . . 0, so expanding along this row gives
where B is A11 with its first row and first column removed. Plugging this back
into our formula for det(A) gives us
where B is A with its top 2 rows and leftmost 2 columns removed. We can
keep repeating this process until we get down to the 2 × 2 case where our
submatrix is
a(n−1)(n−1) 0
an(n−1) ann
which has determinant a(n−1)(n−1) ann . This gives us the following fact.
−3 0 0 0
7 2 0 0
Example 12. Find the determinant of A =
0
.
1 8 0
4 −3 5 −1
Since A is lower triangular, its determinant is the product of its diagonal
entries. Thus
det(A) = −3(2)(8)(−1) = 48.
On the other hand, if A is upper triangular, then all its entries below the
diagonal are zeros, i.e., aij = 0 if i > j. This means that the nth row of A has
the form 0 . . . 0 ann . If we start computing det(A) by expanding along this
bottom row, we’ll get
The matrix Ann is also upper triangular as shown by the picture below.
a11 · · · · · · a1(n−1) a1n
.. ..
0 . .
. . ..
.. .. .
0 ··· 0 a(n−1)(n−1) a(n−1)n
0 ··· 0 0 ann
The last row of Ann is 0 . . . 0 a(n−1)(n−1) , so expanding along this row gives
where B is Ann with its last row and last column removed. Plugging this back
into our formula for det(A) gives us
det(A) = ann a(n−1)(n−1) det(B)
where B is A with its bottom 2 rows and rightmost 2 columns removed. We
can keep repeating this process until we get down to the 2 × 2 case where our
submatrix is
a11 a12
0 a22
which has determinant a11 a22 . This gives us the following fact.
7 0 −1 9
0 3 4 0
Example 13. Compute the determinant of A =
0
.
0 1 −1
0 0 0 4
Since A is upper triangular, its determinant is the product of its diagonal
entries. This means we have
Next let’s consider the effect of swapping two rows of A. (This is how we
put a nonzero entry into the top left corner during row reduction.) Suppose
we swap the ith and i + 1st rows of A to create B. If we expand along the ith
row of B we get
However, the ith row of B is the i + 1st row of A, so bij = a(i+1)j and
Bij = A(i+1)j . Plugging this back into our computation of B’s determinant,
we get
This is almost the expansion of det(A) along the i + 1st row, but the signs
have changed. Instead of (−1)(i+1)+j we have (−1)i+j . Changing the power on
−1 by 1 means we’ve switched the sign of each term, so det(B) = − det(A).
Since we don’t always swap two adjacent rows of A, we’ll also need to
consider the more general case of swapping the ith and jth rows. For the ease
of explanation, I’ll suppose i < j and rewrite j as i + k for some positive k.
I’ll label the rows of A as r1 through rn . After swapping the ith and i + kth
rows of A, the rows are in the following order from top to bottom (with the
swapped rows in bold):
Since swapping two consecutive rows multiplies the determinant by −1, we’ll
perform consecutive swaps on the rows of A until we’ve gotten them into the
order listed above. If the total number of consecutive swaps needed is odd,
the net effect will be to multiply the determinant by −1. If it is even, the
determinant will be multiplied by 1 and so remain unchanged.
We start with the rows in their original order r1 , . . . , rn . First we’ll perform
consecutive swaps of ri with the rows below it until it is directly below ri+k .
Again I’ll put the two rows being swapped in bold. The first consecutive swap
switches ri and ri+1 to give us
Continuing in the same fashion, the kth consecutive swap switches ri and ri+k
to give us
r1 , . . . , ri−1 , ri+1 , . . . , ri+k , ri , ri+(k+1) , . . . , rn .
Now we’ll perform consecutive swaps of ri+k with the rows above it until it
is between ri−1 and ri+1 where ri was in our original matrix. Again, the two
4.2 Determinants 267
rows being switched are in bold. The first of these consecutive swaps switches
ri+k with ri+(k−1) to give us
The second of these consecutive swaps switches ri+k with ri+(k−2) to give us
This is the same as simply swapping the ith and i + kth rows directly, and
we got there via k + (k − 1) consecutive swaps. Since k + k − 1 = 2k − 1 is
always odd, this means that in swapping any two rows of a matrix multiplies
the determinant by −1.
Our final row operation is adding a multiple of one row to another row. (We
use this to create the needed zeros below each leading 1 during row reduction.)
To make this precise, suppose that we add s times the kth row of A to the
ith row of A to get a new matrix B. If we compute B’s determinant along the
ith row, we get
In our new matrix B we have bij = aij + sakj and Bij = Aij , so our
determinant is really
The sum inside the first set of brackets is just det(A), and we can factor an s
out of the second set of brackets to get
det(B) = det(A) + s (−1)i+1 ak1 det(Ai1 ) + · · · + (−1)i+n akn det(Ain ) .
Now the sum inside the remaining set of brackets is the determinant of the
matrix C we’d get by replacing the ith row of A by its kth row. This means
C has identical ith and kth rows, so swapping these two rows would leave
the determinant unchanged. However, we just saw that swapping two rows
changes the sign of the determinant. This means det(C) = − det(C), so we
must have det(C) = 0. Plugging this back into our equation for det(B) gives
us
det(B) = det(A) + s det(C) = det(A).
268 Diagonalization
Therefore adding a multiple of one row to another row doesn’t change the
determinant.
We summarize these results in the following theorem.
Now that we understand how row operations change the determinant and
how to compute the determinant of an upper triangular matrix, we have an
alternative to our expansion method for finding det(A). The plan here is to
use row operations to transform A into an upper triangular matrix U . As we
go we’ll keep track of each time we swapped rows or multiplied a row by a
constant. (We don’t need to keep track of adding a multiple of one row to
another since that doesn’t have any effect on the determinant.) Then we can
compute det(U ) and then undo each of the changes our row operations made
to the determinant to get det(A). This is illustrated in the example below.
3 0 −5
Example 14. Compute the determinant of A = 1 1 2 by using row
4 −2 −1
operations to link A to an upper triangular matrix.
Our first job here is to use row operations to transform A into an upper
triangular matrix. I’ll use our usual notation for row operations since we need
to keep track of when we’ve swapped two rows or scaled a row by a constant.
3 0 −5 1 1 2 1 1 2
1 1 2 →r1 ↔r2 3 0 −5 →r2 −3r1 0 −3 −11
4 −2 −1 4 −2 −1 4 −2 −1
1 1 2 1 1 2
→r3 −4r1 0 −3 −11 →r2 ↔r3 0 −6 −9
0 −6 −9 0 −3 −11
1 1 2 1 1 2
3 3
→− 16 r2 0 1 2
→r3 +3r2 0 1 2
13
0 −3 −11 0 0 −2
determinant by −1 twice (once for each row swap) and by − 16 once (when we
scaled a row by − 16 ). Thus
1
det(U ) = (−1)(−1) − det(A)
6
or
1
det(U ) = − det(A).
6
Plugging in det(U ) gives us
13 1
− = − det(A).
2 6
(Notice that this matrix is the same one we used for Examples 6 and 7 where
we also got det(A) = 39.)
mean that the determinant of the reduced echelon form of A, and hence also
that of A itself, would be 0. Therefore we get the following addition to the
Invertible Matrix Theorem.
1
Theorem 9. Let A be an invertible matrix. Then det(A−1 ) = .
det(A)
Exercises 4.2.
−53
1. Compute det .
16
2 7
2. Compute det .
−1 −8
1 4
3. Compute det .
3 −2
9 0
4. Compute det .
−6 7
−1 2
5. Use the determinant to describe the geometric effect ’s map
−2 1
has on the plane?
4 0
6. Use the determinant to describe the geometric effect ’s map
2 3
has on the plane?
1 3
7. Use the determinant to describe the geometric effect ’s map
0 −1
has on the plane?
3 5
8. Use the determinant to describe the geometric effect ’s map
6 2
has on the plane?
5 0 0
9. Compute the determinant of 6 −2 0.
1 7 4
−1 0 0
10. Compute the determinant of 4 3 0.
9 −2 6
2 0 7
11. Compute the determinant of 0 8 −3.
0 0 1
−4 1 −6
12. Compute the determinant of 0 −3 −5.
0 0 −2
2 −1 0
13. Find A23 where A = 4 −5 2 .
1 0 −2
1 0 8
14. Find A12 where A = 6 −3 1.
3 −5 0
4.2 Determinants 273
0 −5 2
15. Find A31 where A = 8 3 1 .
7 −4 −1
−3 1 4
16. Find A22 where A = 2 0 −1.
9 3 6
17. What effect does swapping rows 1 and 2 have on the determinant?
18. What effect does adding 3 times row 2 to row 1 have on the
determinant?
19. What effect does multiplying row 1 by −4 have on the determinant?
20. What effect does adding −2 times row 3 to row 1 have on the
determinant?
1 3 4
21. Compute the determinant of A = 2 0 −2 three times: once
−1 1 −1
by expanding along a row, once by expanding down a column, and
once by using row operations to reduce it to an upper triangular
matrix.
1 0 6
22. Compute the determinant of A = 2 2 −9 three times: once
1 1 −3
by expanding along a row, once by expanding down a column, and
once by using row operations to reduce it to an upper triangular
matrix.
4 8 10
23. Compute the determinant of A = 1 −1 3 three times: once
−4 0 2
by expanding along a row, once by expanding down a column, and
once by using row operations to reduce it to an upper triangular
matrix.
2 3 −1
24. Compute the determinant of A = 0 5 3 three times: once
−4 −6 2
by expanding along a row, once by expanding down a column, and
once by using row operations to reduce it to an upper triangular
matrix.
−5 2 7
25. Use the determinant of A = 3 0 −2 to decide whether or
4 0 1
not A is invertible.
274 Diagonalization
−7 3 0 1
4 0 0 0
26. Use the determinant of A = 5
to decide whether
2 1 6
−8 −2 1 3
or not A is invertible.
2 0 0 4
2 0 3 0
27. Use the determinant of A =
−7
to decide whether
3 10 −5
−1 0 6 0
or not A is invertible.
−1 3 0 2
0 1 0 −4
28. Use the determinant of A =
7
to decide whether
−4 2 9
−2 0 0 1
or not A is invertible.
29. Let A and B be n × n matrices with det(A) = 3 and det(B) = −6.
(a) Compute det(AB).
(b) Compute det(A−1 ).
30. Let A and B be 2 × 2 matrices with det(A) = 10 and det(B) = −1.
(a) Compute det(AB).
(b) Compute det(B −1 ).
31. Let A and B be 4 × 4 matrices with det(A) = 12 and det(B) = −2.
(a) Compute det(AB).
(b) Compute det(A−1 ).
32. Let A and B be 3 × 3 matrices with det(A) = −2 and det(B) = 4.
(a) Find det(A−1 ).
(b) Find det(AB).
33. Let A be a 3 × 3 matrix with det(A) = 6. What is det(2 · A)?
34. Let A be an n × n matrix with det(A) = 6. What is det(2 · A)?
35. If A is a 1 × 1 matrix, we can define its determinant to be its
only
entry
a11 . Use this definition to show that the determinant of
a b
is ad − bc if we compute it using our formulas for expansion
c d
along a row and down a column.
36. Let V = {A ∈ Mnn | A is diagonal and det(A) 6= 0} with operations
A“ + ”B = AB and r“ · ”A = Ar . Show V is a vector space.
4.3 Eigenspaces 275
4.3 Eigenspaces
After our interlude developing determinants in 4.2, let’s get back to finding
the eigenvalues and eigenvectors of an n × n matrix A. Recall from the end
of 4.1 that we’d reduced the problem of finding the eigenvalues, λ, and the
eigenvectors, ~x, of a matrix A to solving the equation (A − λIn )~x = ~0. Our
issue back then was that our equation had two unknowns and we didn’t know
how to solve for both of them at once. In this section we’ll use the determinant
to solve for λ first, and then use our value of λ to solve for ~x later.
To isolate solving for λ and solving for ~x, remember that the Invertible
Matrix Theorem from 2.11 tells us that we have a nonzero solution to A~x = ~0
exactly when A isn’t invertible. Since eigenvectors are defined to be nonzero,
we can apply this to (A − λIn )~x = ~0 and get that we’ll have eigenvectors for
A precisely when A − λIn isn’t invertible. Usually we check invertibility of
matrices by row reducing to see if their reduced echelon form is In , however
the presence of the variable λ makes this unappealing. Instead we’ll rely on
our newest addition to the Invertible Matrix Theorem: the fact that a matrix
is invertible exactly when its determinant is nonzero. In other words, whenever
det(A − λIn ) = 0 we’ll know A − λIn isn’t invertible so we’ll get eigenvectors
for A and λ will be an eigenvector of A.
1 2
Example 1. Find all the eigenvalues of A = .
4 3
(Note that this is the matrix from 4.1’s Example 8.)
We want to find all values of λ for which det(A − λIn ) = 0. Our matrix is
2 × 2, so n = 2. We’ll start by computing
1 2 1 0 1 2 λ 0 1−λ 2
A − λI2 = −λ = − = .
4 3 0 1 4 3 0 λ 4 3−λ
λ2 − 4λ − 5 = 0.
(λ − 5)(λ + 1) = 0
The best choice for computing the determinant is to expand down the 2nd
column since it has two zero entries. This gives us
1−λ 0 7
1−λ 7
det −2 2−λ 14 = (−1)2+2 (2 − λ) det
1 −5 − λ
1 0 −5 − λ
= (2 − λ)((1 − λ)(−5 − λ) − 1(7)).
Since we want to set this equal to zero and solve for λ, I’ll leave the factor of
(2 − λ) in the front and try to expand and factor the rest of this polynomial.
This gives us
(2 − λ)(λ − 2)(λ + 6) = 0
This simplifies to
Example 4. Find the matrix A and solve det(A) = 0 for x for the molecule
1,3-butadiene.
Since our matrix A depends on the number and bonds between the carbon
atoms in a molecule of 1,3-butadiene, let’s start by looking at a picture of its
molecular structure. (Here Hs are hydrogen atoms and Cs are carbon.)
H H
C1 C3 H
H C2 C4
H H
0 0 1 x
We want to solve det(A) = 0 for x, so let’s take the determinant of A by
expanding along the first row. This gives us
x 1 0 1 1 0
det(A) = x det 1 x 1 − 1 det 0 x 1 .
0 1 x 0 1 x
and
1 1 0
x 1
det 0 x 1 = 1 det = 1(x2 − 1).
1 x
0 1 x
Plugging these subdeterminants back into our formula for det(A) gives us
This doesn’t factor nicely over the integers, but we can solve
√ det(A) = 0 using
√
technological assistance to get the solutions x = 12 (1 + 5), x = 21 (1 − 5),
√ √
x = 12 (−1 + 5), and x = 12 (−1 − 5).
just as we did in 2.5 and 2.7. The reduced echelon form of this matrix is
1 − 12
0 0
1 0 7
Example 6. Find the eigenspace of A = −2 2 14 for λ = 2.
1 0 −5
Mirroring the previous example, we need to find the null space of
−1 0 7
A − 2I3 = −2 0 14 .
1 0 −7
Biologists call the eigenvector of length 1 with the largest eigenvector (aka
the population growth rate) the “stable stage distribution”. We’ll learn how
4.3 Eigenspaces 281
to compute the length of a vector and find vectors of given lengths within a
span in Chapter 5, but for now we can at least find a basis for the eigenspace
containing the stable state distribution.
Example 7. Find the eigenspace containing the stable stage distribution for
the population discussed in Example 3.
As we’ve seen in the examples above, there is always at least one free
variable when we solve for ~x. If there were no free variables, we’d know the only
solution was ~x = ~0, which would mean our matrix was invertible and λ wasn’t
an eigenvalue. For me, this usually means I’ve messed up my determinant
somehow. If we have more than one free variable, as in Example 6, that just
means our eigenspace has dimension greater than 1.
We can get an upper bound on the dimension of an eigenspace Eλ by
looking at how its eigenvalue solves the polynomial det(A − λIn ). Some roots
of a polynomial occur only once, while some occur multiple times. We saw
this in Example 2, where det(A − λIn ) was (λ − 2)2 (λ + 6). In this case, −6
is a single root, because only one copy of (λ + 6) factors out of det(A − λIn ).
However, two copies of (λ−2) factor out of det(A−λIn ), so 2 is a double root.
In general, the dimension of an eigenspace is bounded above by the number
282 Diagonalization
Exercises 4.3.
1. Suppose det(A − λI4 ) = (4 − λ)(−1 − λ)2 ( 12 − λ). Find all the
eigenvalues of A, and say how many times each eigenvalue is a root
of det(A − λI4 ).
2. Suppose det(A−λI4 ) = (7−λ)3 (−10−λ). Find all the eigenvalues of
A, and say how many times each eigenvalue is a root of det(A−λI4 ).
3. Suppose det(A − λI3 ) = λ3 − λ2 − 6λ. Find all the eigenvalues of A,
and say how many times each eigenvalue is a root of det(A − λI3 ).
4. Suppose det(A−λI3 ) = λ3 +8λ2 +7λ. Find all the eigenvalues of A,
and say how many times each eigenvalue is a root of det(A − λI3 ).
1 2 1
5. Find all the eigenvalues of A = 0 −4 0 .
−2 1 −2
−1 0 8
6. Find all the eigenvalues of A = 6 2 −5.
0 0 7
0 9 −1
7. Find all the eigenvalues of A = 0 3 6 .
0 8 −5
1 0 −2
8. Find all the eigenvalues of A = −4 0 8 .
0 2 1
1 0 −1
9. Find the eigenspace of A = 7 −2 −6 for eigenvalue λ = −2.
4 0 −4
6 2 8
10. Find the eigenspace of A = 4 −1 −9 for eigenvalue λ = 5.
0 0 5
3 −2 4
11. Find the eigenspace of A = 0 4 −2 for eigenvalue λ = 3.
−1 0 0
0 −2 0
12. Find the eigenspace of A = 1 6 2 for eigenvalue λ = 0.
2 3 4
1 2 −11
13. Compute all the eigenvalues and eigenspaces of A = 2 −2 6 .
0 0 −9
4.3 Eigenspaces 283
0 −7 0
14. Compute all the eigenvalues and eigenspaces of A = 1 −6 2.
2 −9 4
−6 5 −4
15. Compute all the eigenvalues and eigenspaces of A = 0 10 7 .
0 −1 2
−2 0 0
16. Compute all the eigenvalues and eigenspaces of A = 7 4 2.
1 5 1
17. As we did in Example 4, set up the matrix A and solve for x for
ethylene (pictured below).
H H
C1 C2
H H
18. As we did in Example 4, set up the matrix A and solve for x for
benzene (pictured below).
H
H C1 H
C6 C2
C5 C3
H C4 H
H
19. Find the growth rate
of a population whose demographic matrix is
0 1 6
A = 0.7 0.6 0. (As with many realistic examples, the numbers
0.3 0.4 0
here do not come out clean. You will probably want to use some
technology to help, see Appendix 2 for help using Mathematica.)
20. For the demographic matrix in the previous problem, find a basis
for the eigenspace which contains the stable stage distribution.
284 Diagonalization
4.4 Diagonalization
Remember that we started investigating eigenvectors because we were looking
for an easier way to repeatedly multiply a vector by the same matrix. Obvi-
ously this is very easy if our vector is an eigenvector, but in our applications we
won’t always have that luxury. For example, our population vector is unlikely
to magically turn out to be an eigenvector of our demographic matrix for
every population. In this section we’ll explore a way to use eigenvectors to
make things easier for general vectors.
To start, let’s suppose ~x1 and ~x2 are eigenvectors of an n × n matrix A
with eigenvectors λ1 and λ2 respectively. If ~v = ~x1 + ~x2 , then
Similarly, if w
~ = a~x1 , then
which simplifies to
~0 = (λi − λ1 )a1~v1 + · · · (λi − λi−1 )ai−1~vi−1 .
Since ~v1 , . . . , ~vi−1 are linearly independent, we must have (λi − λj )aj = 0 for
j = 1, . . . , i − 1. Since the eigenvalues were different, we must have λi − λj 6= 0
for all j, which means aj = 0 for all j. Thus it wasn’t possible to have any ~vi
in the span of the other ~v s, so these eigenvectors are linearly independent.
Since eigenvectors with different eigenvalues are linearly independent, we
can combine our bases from all the eigenspaces of A to create a set of linearly
independent eigenvectors. In fact, this is the largest linearly independent
collection of eigenvectors of A. We know from 3.1’s Theorem 3 that if a linearly
independent set contains dim(V ) vectors, it is a basis for V . This means that
if we’ve collected n vectors from the bases of A’s eigenspaces, we’ve found a
basis for Rn composed of eigenvectors of A. However, if there are fewer than
n vectors in our collection, then we cannot create a basis of eigenvectors for
Rn . Because the number of vectors in a basis equals the dimension, another
way to state this is as follows.
Example
1. Find a basis for R3 which is made up of eigenvectors of
1 0 7
A = −2 2 14 .
1 0 −5
This is the matrix from Examples 2 and 6 in 4.3, so we already
know that
0 7
A has eigenvalues 2 and −6 and that E2 has basis 1 , 0. This means
0 1
we’ve already got two out of three basis vectors for R3 .
To complete our basis for R3 , we need to find another basis vector from
E−6 . We can do this by solving (A − (−6)I3 )~x = ~0. Plugging in A and
simplifying gives us
7 0 7
−2 8 14 ~x = ~0.
1 0 1
286 Diagonalization
3
Example
2. Show that
we can’t find a basis for R made up of eigenvectors
−6 −3 5
of A = 3 0 −2.
0 0 4
This isn’t a matrix we’ve discussed before, so we’ll have to take it from
the top: find the eigenvalues, find a basis for each eigenspace, and notice that
we don’t get three basis vectors from that process.
To find the eigenvalues, we need to compute
−6 − λ −3 5
det 3 0−λ −2 .
0 0 4−λ
Let’s do that by expanding along the 3rd row since it has two zero entries.
This gives us
−6 − λ −3 5
3+3 −6 − λ −3
det 3 −λ −2 = (−1) (4 − λ) det
3 −λ
0 0 4−λ
= (4 − λ)((−6 − λ)(−λ) − (−3)(3)).
4.4 Diagonalization 287
From the discussion at the end of 4.3, we know that the dimension of
any eigenspace satisfies 1 ≤ dim(Eλ ) ≤ k where k is the number of times
λ is a solution to the polynomial det(A − λIn ) = 0. (This is often called
the multiplicity of λ.) If λ is a so-called single root, i.e., k = 1, then clearly
dim(Eλ ) = 1. For larger values of k, we’ve seen that it’s possible to have
dim(Eλ ) < k as in Example 2 for λ = −3. If we list each repeated root the
number of times it solves a polynomial, the degree of any polynomial equals
the number of its roots. Thus our degree n polynomial, det(A − λIn ), has n
roots in total, so the sum of the valueP`of k over all eigenvalues λ1 , . . . , λ` is
n. As a formula, this looks like n = i=1 ki . Since ki is the upper bound on
the number of basis vectors for Eλi , to get the sum of the dimensions of our
eigenspaces equal to n we must have dim(Eλi ) = ki for every eigenvalue
of A. We can see that this is true in Example 1 but not in Example 2.
Practically speaking, this means if you come across any eigenspace whose
dimension is smaller than the multiplicity of its eigenvalue, you already know
it is impossible to find a basis of eigenvectors for Rn , which may save you
some work. As a consequence of this discussion we also get the following fact.
For example, the matrix in Example 1 of 4.3 was 2 × 2 and had eigenvalues
5 and −1, so we know there is a basis for R2 made up of its eigenvectors.
Suppose A has enough linearly independent eigenvectors to form a basis B
for Rn . We found B to simplify multiplication by A for vectors which aren’t
eigenvectors, which we can do by rewriting those vectors in terms of our
eigenvector basis B. In other words, we want to replace vectors in Rn by
their B-coordinate vectors. We’re already working in Rn , so our coordinate
vector function fB is a map from Rn to itself. We’ll explore this special case of
coordinate functions in the next section. For now, let’s see what we’ve gained
by computing A[~v ]B instead of A~v .
Let B = {~b1 , . . . , ~bn } be a basis for Rn where ~bi is an eigenvector of A with
eigenvalue λi . Let ~v be any vector in Rn . We can write ~v = a1~b1 + · · · + an~bn
so
a1
[~v ]B = ... .
an
Now
This may not look immediately better, but if we think in terms of B-coordinate
4.4 Diagonalization 289
−6 −3 5
Example 4. The matrix A = 3 0 −2 is diagonalizable.
0 0 4
This
isthe
matrix
from Example 1, which had the basis of eigenvectors
−1 0 7
−2, 1, 0 for R3 .
1 0 1
4.4 Diagonalization 291
1 0 7
Example 5. The matrix A = −2 2 14 is not diagonalizable.
1 0 −5
This is the matrix from Example 2, which we showed did not have a basis
of eigenvectors for R3 .
Exercises 4.4.
7 −1 0
1. How many linearly independent eigenvectors does 2 4 3
−4 5 8
need to be diagonalizable?
−5 0 2 1
3 −4 1 9
2. How many linearly independent eigenvectors does
2
0 0 −4
1 6 −2 5
need to be diagonalizable?
9 0
3. How many linearly independent eigenvectors does A =
4 13
need to be diagonalizable?
4. Is a 3×3 matrix A with eigenvalues λ = 4, λ = −2, and λ = 1 where
dim(E4 ) = 1, dim(E−2 ) = 1, and dim(E1 ) = 1 diagonalizable?
5. Is a 3 × 3 matrix A with eigenvalues λ = 0 and λ = −5 where
dim(E0 ) = 1 and dim(E−5 ) = 1 diagonalizable?
6. Is a 4×4 matrix A with eigenvalues λ = 2, λ = −8, and λ = 3 where
dim(E2 ) = 1, dim(E−8 ) = 1, and dim(E3 ) = 1 diagonalizable?
7. Is a 4 × 4 matrix A with eigenvalues λ = 7, λ = 0, and λ = 10 where
dim(E7 ) = 1, dim(E0 ) = 1, and dim(E10 ) = 2 A diagonalizable?
−2
8. Is a 3 × 3 matrix A with eigenspaces E0 = Span −1 ,
1
0 1
E3 = Span 4 , and E−1 = Span 0 diagonalizable?
1 1
−2
9. Is a 3 × 3 matrix A with eigenspaces E1 = Span 3 and
1
8 −3
E−11 = Span 1 , 0 diagonalizable?
0 1
292 Diagonalization
1
1
10. Is a 4 × 4 matrix A with eigenspaces E2 = Span 0,
1
12 9
0 , and E5 = Span 0 diagonalizable?
E−6 = Span 1 −2
0 1
−3
2
11. Is a 4 × 4 matrix A with eigenspaces E−2 = Span , and
0
1
8 −1 5
1 , , 0 diagonalizable?
0
E4 = Span
0 1 0
0 0 1
8 −3
12. Is A = diagonalizable?
9 −4
3 0 5
13. Is A = 2 1 4 diagonalizable?
5 0 3
2 3 0
14. Is A = 5 4 0 diagonalizable?
6 6 −1
2 5 1
15. Is A = 3 4 8 diagonalizable?
0 0 7
16. Explain why a matrix is diagonalizable if and only if the dimension
of each eigenspace, Eλ , equals the multiplicity of λ.
4.5 Change of Basis Matrices 293
V
fB fC
Rn Rn
V
fB fC
Rn PC←B
Rn
1 0 −1 0
Plugging our PC←B and [~v ]B into [~v ]C = PC←B [~v ]B , gives us
1 −1 1 1 −1 −2
0 0 0 −2 3 4
[~v ]C =
−1 1
= .
1 1 4 6
1 0 −1 0 −2 −5
To check our work, we can make sure that [~v ]B and [~v ]C give us the same
−1
3
matrix ~v . Since [~v ]B =
4 , we get
−2
1 0 0 1 1 0 0 −1 3 5
~v = (−1) +3 +4 −2 = .
0 1 1 0 0 −1 1 0 1 −5
Similarly, [~v ]C tells us
1 0 1 1 1 1 1 1 3 5
~v = (−2) +4 +6 −5 = .
0 0 0 0 1 0 1 1 1 −5
Since our matrices agree, our change of coordinates computation checks out.
4.5 Change of Basis Matrices 295
x1 fC (~b1 ) + · · · + xn fC (~bn )
This equation identifies two different versions of a linear system: on the right
as a matrix equation and on the left as a vector equation. This gives us
.. ..
h . i h .i
~bn
Theorem 1. If B and C are bases for V , then PC←B = ~b1 ... .
C C
.. ..
. .
In other words, PC←B is the matrix whose columns are the C-coordinate
vectors of the basis vectors from B. Since this formula isn’t symmetric, I
remember that I’m transforming vectors from B into vectors with respect to
C to get PC←B .
296 Diagonalization
Example 2. Using M22 with basis B and C from Example 1, compute PC←B .
Therefore
1
1 0 0
= .
0 1 C −1
1
4.5 Change of Basis Matrices 297
b b2 0 1
Similarly, when 1 = , our augmented coefficient matrix is
b3 b4 1 0
1 1 1 1 0
0 1 1 1 1
0 0 1 1 1
0 0 0 1 0
which has reduced echelon form
1 0 0 0 −1
0 1 0 0 0
.
0 0 1 0 1
0 0 0 1 0
Therefore
−1
0 1 0
= .
1 0 C 1
0
b b2 1 0
When 1 = , our augmented coefficient matrix is
b3 b4 0 −1
1 1 1 1 1
0 1 1 1 0
0 0 1 1 0
0 0 0 1 −1
which has reduced echelon form
1 0 0 0 1
0 1 0 0 0
.
0 0 1 0 1
0 0 0 1 −1
Therefore
1
1 0 0
=
1 .
0 −1 C
−1
b1 b2 0 −1
Finally, when = , our augmented coefficient matrix is
b3 b4 1 0
1 1 1 1 0
0 1 1 1 −1
0 0 1 1 1
0 0 0 1 0
298 Diagonalization
Therefore
1
0 −1 −2
= .
1 0 C 1
0
Using these four C-coordinate vectors (in order) as the columns of our change
of coordinates matrix we get
1 −1 1 1
0 0 0 −2
PC←B = −1
.
1 1 1
1 0 −1 0
Notice in our last example, that we solved four matrix equations whose
augmented coefficient matrices were the same except for the augmentation
column. This means we could have borrowed the shortcut we use to compute
matrix inverses and solved all four equations simultaneously by row reducing
the 4 × 8 matrix formed by putting the common piece of the augmented
coefficient matrices on the left and the four augmentation columns on the
right. As with matrix inverses, the left-hand side would reduce to I4 while
the right-hand side would become PC←B . If B and C are bases for Rn , these
augmentation columns are just the basis vectors from B and the common part
of the augmented coefficient matrix has columns that are the basis vectors
from C. That means we can find PC←B as the right half of the reduced echelon
form of
.. .. .. ..
. . . .
~c1 . . . ~cn ~b1 . . . ~bn .
.. .. .. ..
. . . .
2 −3 5
Example 3. In R3 , find PC←B where B = 1 , 0 , 4 and
−1 2 1
1 0 0
C = −1 , 1 , 0 .
1 −1 1
Since B and C are bases for R3 , we can use the shortcut described above
4.5 Change of Basis Matrices 299
to get
1 0 0 2 −3 5
0 1 0 3 −3 9 .
0 0 1 0 2 5
Thus
2 −3 5
PC←B = 3 −3 9 .
0 2 5
We can extend this trick to a more general vector space V if we are willing
to use a coordinate map to translate our problem from V to Rn . You can
redo Example 2 by using the standard basis for M22 to translate to R4 . (This
will actually give you the same augmented coefficient matrix pieces as we
computed there.)
Since B is linearly independent, the only vector with [~v ]B = ~0 is ~0V . This
means PC←B ~x = ~0 has only one solution, ~x = ~0, and therefore by the Invertible
Matrix Theorem PC←B is invertible. Its inverse, (PC←B )−1 , satisfies
so
(PC←B )−1 [~v ]C = (PC←B )−1 (PC←B [~v ]B ) = [~v ]B .
This gives us the following:
V
fB fC
PC←B
Rn Rn
PB←C
Example 4. Find PB←C for M22 with the same B and C as in Examples 1
and 2.
to get
1 1 1
1 0 0 0 2 2 2 1
0 1 0 0 0 1
1 1
2
1 1 1 .
0 0 1 0 2 2 2 0
0 0 0 1 0 − 12 0 0
This means
1 1 1
2 2 2 1
0 1
1 1
PB←C = (PC←B )−1 = 1 2
1 1 .
2 2 2 0
0 − 12 0 0
There is one special case where the process of finding PC←B is much easier:
when V = Rn , and C is the standard basis for Rn . This may sound too
specific to be very useful, but we’ll want it when we’re working with a basis
of eigenvectors.
Suppose V = Rn and C is its standard basis. Our process for finding PC←B
is to find the C-coordinate vector of each basis vector from our basis B and
use them as the columns of PC←B . However, the way we write vectors in Rn
is as their coordinate vectors in terms of the standard basis. This means that
each basis vector from B is its own C-coordinate vector, which gives us the
following.
Theorem 3. Let C be the standard basis for Rn and B be any other basis.
.. ..
. .
Then PC←B =
~b1 . . . ~bn .
.. ..
. .
4.5 Change of Basis Matrices 301
In other words, PC←B is just the matrix whose columns are the basis vectors
from B.
−1 0 7
Example 5. Find PC←B where B = −2 , 1 , 0 is the basis of
1 0 1
1 0 7
eigenvectors of A = −2 2 14 from Example 1 in 4.4 and C is the
1 0 −5
standard basis for R3 .
Since C is the standard basis for R3 , the basis vectors from B are already
written as C-coordinate vectors. This means the columns of PC←B are simply
the basis vectors from B (in order), which means we have
−1 0 7
PC←B = −2 1 0 .
1 0 1
As we saw in the previous example, this links back to our work in the
previous section, by letting B be a basis of eigenvectors for an n × n matrix
A. From 4.4, we know that A acts like an n × n diagonal matrix D when
multiplied by B-coordinate vectors. (Recall that D’s diagonal entries are the
eigenvalues of our basis vectors from B (in order).) With our new change of
coordinates matrices, we get an alternate way to compute Ak~v in three stages:
change ~v from standard C-coordinates to B-coordinates using PB←C , multiply
by Dk which is the diagonal version of Ak , and change the result back to
standard C-coordinates using PB←C . We can visualize this as follows, where
we start at the top left corner and end up at the top right corner. (Simply
multiplying ~v by Ak can be visualized as starting in the same place but simply
going straight across the top to the same endpoint.)
Ak
Rn −−−−→ Rn
x
PB←C y
P
C←B
Dk
Rn −−−−→ Rn
Figure 4.5: Visualizing diagonalization
Since our change of coordinates matrices are in the special case discussed
above, we know PC←B is just the matrix P whose columns are the eigenvectors
in our basis B and PB←C = P −1 . This means we can update Figure 4.5 to get
Figure 4.6.
302 Diagonalization
Ak
Rn −−−−→ Rn
x
P −1 y P
Dk
Rn −−−−→ Rn
Figure 4.6: Simplified notation for diagonalization
or simply
Theorem 5. Ak = P Dk P −1
1 0 7
Example 6. Find P , P −1 , and D for A = −2 2 14 .
1 0 −5
This is the matrix we worked with in Example 5 of this section and
Example 1 of 4.4. From 4.4’s Example 1, we have the basis
−1 0 7
B = −2 , 1 , 0
1 0 1
to get
1 0 0 − 18 0 7
8
7
0 1 0 − 14 1 4
1 1
0 0 1 8 0 8
so
− 18 0 7
8
1 7
P −1 = − 4 1 4 .
1 1
8 0 8
Exercises 4.5.
3 −4 2
1. Suppose PC←B = and [~v ]B = . Find [~v ]C .
−1 5 6
2 7 −1
2. Suppose PC←B = and [~v ]B = . Find [~v ]C .
−3 1 4
5 −2 1 2
3. Suppose PC←B = 0 8 −2 and [~v ]B = 0 . Find [~v ]C .
1 5 3 −1
4 0 −3 3
4. Suppose PC←B = 1 6 2 and [~v ]B = −1. Find [~v ]C .
−2 −1 0 1
8 3 3
5. Suppose PC←B = and [~v ]C = . Find [~v ]B .
2 5 1
4 6 10
6. Suppose PC←B = and [~v ]C = . Find [~v ]B .
3 5 2
304 Diagonalization
−3 2 0 3
7. Suppose PC←B = 8 −12 4 and [~v ]C = −4. Find [~v ]B .
1 −1 0 2
1 0 1 −3
8. Suppose PC←B = −1 2 0
and [~v ]C = 1 . Find [~v ]B .
0 −1 1 4
−11 13 1 2 −3 0
9. Let B = −5 , 19 , 12 and C = 4 , 1 , −1
2 10 1 2 0 1
be bases for R3 . Compute PC←B .
−12 −2 −28 −4 2 −1
10. Let B = 24 , 8 , 34 and C = 4 , 0 , 5
−13 11 −1 1 3 −2
be bases for R3 . Compute PC←B .
−3 3 6 32 4 18 8 8
11. Use the bases B = , , , and
−3 −7 8 44 5 26 0 12
2 4 1 1 0 0 −1 3
C = , , , for M22 . Compute
2 8 −1 −1 1 0 0 1
PC←B .
−10 3 −12 3 0 5 0 11
12. Use the bases B = , , ,
−6 12 0 1 18 2 12 15
and
−1 2 2 1 0 −1 5 0
C = , , , for M22 . Compute
0 4 6 1 0 3 0 5
PC←B .
13. Use thebases B = 5x2 − 3x + 5, −12x2 − 44x, 7x2 + 31x − 1 and
C = 3x2 + 7x + 1, −2x2 + 2x − 2, −8x + 2 for P2 . Compute
PC←B .
2 2 2
14. 2 B = 14x 2 − 9x + 1,219x − 3x − 4, 9x + x − 10
Use the bases
and C = x − x + 4, 5x − 3, −2x + x + 3 for P2 . Compute
PC←B .
3 1 5
15. Let B = 7 , 0 , 9 be a basis for R3 and C be the
−1 −3 1
standard basis. Compute P = PC←B .
10 −3 9
16. Let B = −6 , 4 , 2 be a basis for R3 and C be the
0 7 −5
standard basis. Compute P = PC←B .
4.5 Change of Basis Matrices 305
8 −11 4 −2
−5 3
0 6
17. Let B = , , , be a basis for R4 and C be
9 3 −1 4
1 −3 2 11
the standard basis. Compute P = PC←B .
4
−2 0 9
−1 7 3 0
18. Let B = , , , be a basis for R4 and C be
−1 −6 2
5
0 3 5 1
the standard basis. Compute P = PC←B .
2 −3 0
19. Let B = −4 , 1 , −1 be a basis for R3 and C be the
2 0 1
−1
standard basis. Compute P = PB←C .
−2 6 2
20. Let B = 0 , 4 , 0 be a basis for R3 and C be the
1 2 1
standard basis. Compute P −1 = PB←C .
3
2 −1 0
0 1 1 0
21. Let B = , , , be a basis for R4 and C be
−2
0 1 1
1 0 0 1
the standard basis. Compute P −1 = PB←C .
1 −2 0 1
0 −1
, , , 2 be a basis for R4 and C be
1
22. Let B = 1 0 2 0
0 1 0 1
the standard basis. Compute P −1 = PB←C .
4 −1 3
23. Let B = 1 , 0 , −4 be a basis for R3 made up of
0 6 2
eigenvectors of A and C be the standard basis. If the eigenvalues of
the basis vectors of B are (in order) λ1 = −7, λ2 = 2, and λ3 = −1,
compute the matrices P and D used in our formula A = P DP −1 .
5 −2 0
24. Let B = 1 , 6 , −4 be a basis for R3 made up of
0 1 1
eigenvectors of A and C be the standard basis. If the eigenvalues of
the basis vectors of B are (in order) λ1 = −4, λ2 = 5, and λ3 = 0,
compute the matrices P and D used in our formula A = P DP −1 .
−13
4 0 −2
10 −1 3 1
25. Let B =
, , , be a basis for R4 made up
0 2 5
0
1 2 0 −3
306 Diagonalization
5.1 Length
In this chapter, we’ll return to Rn and start developing a set of tools that will
allow us to easily compute geometric quantities without needing to visualize
them first. This is most obviously useful in dealing with Rn for n > 3, but
can also be easier than drawing a picture even in complicated situations in
R2 or R3 . (It is also appreciated by less than stellar artists like me!) The two
basic geometric quantities we’ll work with are length and angle. These are
both scalar quantities, so we’ll need a way to create scalars from vectors. Our
basic tool is the following.
x1 y1
x2 y2
Definition. Let ~x = . and ~y = . be n-vectors. Their dot product is
.. ..
xn yn
~v · w
~ = x1 y1 + x2 y2 + · · · + xn yn .
Note that we cannot take the dot product of two vectors unless they are
both the same size.
−2 3
Example 1. Compute 1 · 7 .
6 −1
Since both of our vectors are from R3 , this dot product makes sense. To
compute it, we multiply corresponding entries of the two vectors and add up
those products to get
−2 3
1 · 7 = −2(3) + 1(7) + 6(−1) = −5.
6 −1
As with any other new vector operation, we want to explore its properties,
including how it interacts with addition and scalar multiplication.
307
308 Computational Vector Geometry
The first nice property to notice about the dot product is that unlike
matrix multiplication it is commutative, i.e., ~x · ~y = ~y · ~x. This is because
~x · ~y = x1 y1 + x2 y2 + · · · + xn yn
which equals
y1 x1 + y2 x2 + · · · + yn xn = ~y · ~x
since multiplication of real numbers commutes.
Next, let’s see how the dot product interacts with vector addition. Suppose
~x, ~v , and w ~ are in Rn . If we take the dot product of one vector with the sum
of the other two vectors we get
x1 v1 w1 x1 v1 + w1
x2 v2 w2 x2 v2 + w2
~x · (~v + w)
~ = . · . + . = . · ..
.. .. .. .. .
xn vn wn xn vn + wn
= x1 (v1 + w1 ) + x2 (v2 + w2 ) + · · · + xn (vn + wn )
= x 1 v1 + x 1 w 1 + x 2 v2 + x 2 w 2 + · · · + x n vn + x n w n
= (x1 v1 + x2 v2 + · · · + xn vn ) + (x1 w1 + x2 w2 + · · · + xn wn )
= ~x · ~v + ~x · w
~
so ~x · (~v + w)
~ = ~x · ~v + ~x · w.
~ Thus the dot product distributes over vector
addition.
3 8
Example 2. Check that ~x · (~v + w)
~ = ~x · ~v + ~x · w
~ where ~x = , ~v = ,
−2 5
−6
and w
~= .
1
The left-hand side of this equation is ~x · (~v + w)
~ which is
3 8 −6 3 2
· + = · = 3(2) + (−2)(6) = −6.
−2 5 1 −2 6
Finally, let’s explore how dot products interact with scalar multiplication.
Suppose ~x and ~y are in Rn and r is a scalar. Multiplying the dot product of
5.1 Length 309
~x and ~y by r gives us
x1 y1
x2 y2
r(~x · ~y ) = r . · . = r(x1 y1 + x2 y2 + · · · + xn yn )
.. ..
xn yn
= rx1 y1 + rx2 y2 + · · · + rxn yn .
This is interesting, because we can split this up one of two ways: as
rx1 y1 + rx2 y2 + · · · + rxn yn = (rx1 )y1 + (rx2 )y2 + · · · + (rxn )yn
rx1 y1
rx2 y2
= . · . = (r~x) · ~y
.. ..
rxn yn
or as
x1 ry1 + x2 ry2 + · · · + xn ryn = x1 (ry1 ) + x2 (ry2 ) + · · · + xn (ryn )
x1 ry1
x2 ry2
= . · . = ~x · (r~y ).
.. ..
xn ryn
This means that scalar multiplication can be thought of as halfway distributing
over the dot product, in that multiplying a dot product by a scalar is the same
as multiplying one of the vectors in the dot product by that scalar, i.e.,
r(~x · ~y ) = (r~x) · ~y = ~x · (r~y ).
2
Example 3. Check that r(~x · ~y ) = (r~x) · ~y = ~x · (r~y ) holds for ~x = −3,
4
5
~y = 1, and r = 10.
2
As in the previous example, we’ll compute each part of the equation to
check that they match. The left piece is r(~x · ~y ), which in our case is
2 5
10 −3 · 1 = 10(2(5) + (−3)(1) + 4(2)) = 150.
4 2
The middle piece is (r~x) · ~y , which is
2 5 20 5
10 −3 · 1 = −30 · 1 = 20(5) + (−30)(1) + 40(2) = 150.
4 2 40 2
310 Computational Vector Geometry
Now that we understand dot products, we can start using them to explore
a computational version of geometry in Rn .
In R2 , we have a nice formula for the length of a vector because we can
view our vector as the hypotenuse of a right triangle.
2
��
�=
��
��
-1 1 2 3
��
This could also be called the length of ~x, but we’ll follow the usual
conventions and use norm. Notice that this definition does not require us
to have any picture of ~x, but is purely computational. As mentioned at the
start of this section, this is extremely exciting if we want to talk about the
length of vectors in Rn for n > 3.
−2
1
Example 4. Compute
0 .
4
From the definition above, we know
v
u
−2 u −2 −2
1 u
u
= u · 1 .
1
0 t 0 0
4 4 4
Therefore
−2
1 √
= 21.
0
4
as claimed.
Geometrically, we can look at this in Figure 5.2. Notice that the distance
between ~x and ~y , labeled d, is exactly the same length as the vector ~x − ~y
geometrically constructed via our parallelogram rule as the sum of ~x and −~y .
4
3
312 Computational Vector Geometry
d
x
1
x-y
y
-2 -1 1 2 3 4
-y
-1
There is nothing special here about R2 . In fact we get the following general
way to compute distances between vectors in any Rn .
−2 −3
0 1
Example 5. Find the distance between ~x =
6 and ~y = 4 .
1 0
Theorem 1’s formula tells us the distance between ~x and ~y is
−2 −3 1
0 1 −1
k~x − ~y k =
6 − 4 = 2
1 0 1
p √
= 12 + (−1)2 + 22 + 12 = 7 ≈ 2.65.
As we did with the dot product, we next want to explore the basic
properties of the norm.
One of the first major things to notice about the norm is that it is always
the square root of a sum of squares. This means that no matter what the
entries of ~x are, ||~x|| can never be negative. Additionally, since the sum of
squares doesn’t allow for any cancelation within the square root, the only way
we can have ||~x|| = 0 is to have all entries of ~x be zero. In other words:
5.1 Length 313
Theorem 2. ||~x|| ≥ 0 for any ~x, and ||~x|| = 0 if and only if ~x = ~0.
This result also makes sense geometrically, since the length of a vector
can’t be negative and the only way to get a vector ~x to have length 0 is to
make ~x the point at the origin, i.e., ~0.
To see how the norm interacts with scalar multiplication, let’s consider ~x
in Rn and a scalar r. Then
x1 rx1
x2 rx2 p
||r~x|| = r . = . = (rx1 )2 + (rx2 )2 + · · · (rxn )2
.. ..
xn rxn
p p
= r2 (x 1 )2 + r2 (x 2)
2 + · · · r2 (xn )2 =
r2 ((x1 )2 + (x2 )2 + · · · (xn )2 )
√ p p
= r2 (x1 )2 + (x2 )2 + · · · (xn )2 = |r| (x1 )2 + (x2 )2 + · · · (xn )2
= |r|(||~x||).
Thus multiplication by a positive scalar can be done before or after the norm
without changing the answer. Multiplication by a negative scalar multiplies
the norm by the absolute value of the scalar. Geometrically this makes sense,
because scalar multiplication multiplies the length of the vector by r and
reverses its direction if r is negative. Since reversing the direction of a vector
does nothing to its length, only the magnitude of r matters when figuring out
the effect on the length of the vector.
3
Example 6. Verify ||r~x|| = |r|(||~x||) when ~x = and r = −2.
4
The left-hand side is ||r~x||, which in our case is
3 −6 p √
−2 = = (−6)2 + (−8)2 = 100 = 10.
4 −8
The interaction between vector addition and the norm is more complicated.
This shouldn’t surprise us, since we know from 2D geometry that the length
of the sum of two vectors is unlikely to be the sum of their lengths. However,
we are all familiar with the idea that for two vectors ~x and ~y in R2 at right
angles we can use the Pythagorean theorem to say that
||~x||2 + ||~y ||2 = ||~x + ~y ||2
314 3 Computational Vector Geometry
������� ���� ��
�
1
�
�+�
-1 1 2 3
-1
In the next section we’ll generalize our idea of right angle as we have our
idea of length, after which we can tackle a generalization of the Pythagorean
Theorem. For now, let’s finish up this section by introducing and exploring
the idea of normalizing a vector.
The main idea here is that many computations are easier if the length of
your vector is 1. This is why if you ask a mathematician to embed a square
in R2 , they’ll usually choose to make each of its sides have length 1. Vectors
of length 1 are usually called unit vectors. If we aren’t lucky enough to start
out with ||~x|| = 1, we can fix that by multiplying ~x by the scalar ||~x1|| . This is
called normalizing ~x. Geometrically, normalizing a vector changes its length
but not its direction, which we can see because the normalized vector lies
along the same line as ~x since it is in ~x’s span. This is illustrated in Figure
5.4.
4
x
1
x
x
-1 1 2 3 4
-1
−4
−4 5
Therefore the unit vector in the same direction as 0 is 0 .
3 3
5
Example 8. Find the stable stage distribution of the population from 4.3’s
Example 7.
If we are dealing with more than one vector, we can’t normalize each of
them without changing the relationship between their lengths. However, we
can preserve the basics of the situation while normalizing one of the vectors,
say ~x, by multiplying all our vectors by the scalar needed to normalize ~x.
9. Suppose we
Example are looking at the triangle formed by the two vectors
4 2
~v1 = and ~v2 = , but we would really prefer to look at the similar
5 1
triangle whose longest side has length 1. Find the two vectors which define
that similar triangle.
To give ourselves a better idea what’s going on here, let’s look at our
original triangle.
5.1 Length 6 317
��
2
��
-1 1 2 3 4 5 6
-1
The vector ~v1 is clearly the longest side, so we can solve this problem by
finding the scalar needed to normalize ~v1 and then multiplying both vectors
by it. Since
p √
||~v1 || = 42 + 52 = 41
" √4 #
1 4 41 .63
√ = ≈
41 5 √5 .78
41
and
" √2 #
1 2 41 .31
√ = ≈ .
41 1 √1 .16
41
-1 1 2 3 4 5 6
Exercises 5.1. -1
−3 2
1. Compute 1 · 7 or say it is impossible.
5 −2
−1
2. Compute 1 · 4 or say it is impossible.
6
13
2 1
3. Compute −1 · 2 or say it is impossible.
8 0
5 −2
4. Compute · or say it is impossible.
7 1
5. Suppose ~v · ~u = 8. Find ~v · (4~u).
6. Suppose ~v · ~u = 4, ~v · w
~ = −9, and ~u · w
~ = 12. Find ~v · (~u + w).
~
4
7. Compute −1 .
2
−1
1
8. Compute 1 .
−1
0
9. Compute 5 .
7
5.1 Length 319
−2
10. Compute .
6
7 3
11. Find the distance between and .
−2 −1
2 3
12. Find the distance between 8 and 9.
−3 1
4 7
13. Find the distance between −5 and −2.
0 4
5 4
−2 0
14. Find the distance between
0 and −1.
6 8
15. Suppose k~v k = 18. Find − 12 ~v .
5
16. Normalize .
12
6
17. Normalize 0 .
−6
1
0
18. Normalize 1 .
−1
−2
19. Normalize 1 .
0
20. Give an example of a situation when it is convenient to be able to
compute length without a picture?
320 Computational Vector Geometry
5.2 Orthogonality
In the last section we explored a way to compute the length of a vector in a
non-geometric way. Toward the end of the section we wanted to generalize the
Pythagorean theorem, but didn’t have a good notion of how to tell vectors
are at right angles without using a picture. In this section we’ll develop a
computational test for that and use it to explore several related ideas.
We’ve already developed a quick algebraic way to compute the dot product
of two vectors, but it turns out there is also a geometric formula for ~x ·~y which
computes the dot product in terms of the lengths of the vectors and the angle
between them.
In other words, the dot product of two vectors can also be found by
multiplying together their lengths and the cosine of the angle between them.
While this formula holds for any n, I’ll provide an explanation here in R2
so we can draw pictures. The first step in our explanation is to rewrite our
vectors ~x and ~y in polar
4 coordinates. If we let αx be the
angle between ~x and
||~x|| cos(αx )
the x-axis as shown in Figure 5.5, then we have ~x = .
||~x|| sin(αx )
x sin[αx ]
2
1 x
αx
-1 1 2 3 4
x cos[αx ]
-1
Using the trigonometric identity cos(β) cos(γ) + sin(β) sin(γ) = cos(γ − β),
this gives us
~x · ~y = ||~x||||~y || cos(αy − αx ).
Now we need to3 relate the difference between the angles associated with
our two vectors to the angle between them.
2 y
x
1
αx
αy
-1 1 2 3 4
With this analog for right angles, we can state the following theorem.
2
v +w
v
1
-2 -1 1 2 3
Since the two sides of the equation are equal, the theorem is satisfied.
1 1
Example 4. Check that Theorem 2 does not hold for 2 and 1 .
3 −5
These are our vectors from Example 2, so we know they aren’t orthogonal.
As in the previous example, we can tackle this by computing each side of the
theorem’s equation. For our vectors,
2 2
1 1
~ 2 = 2 + 1
||~v ||2 + ||w||
3 −5
p 2 p 2
= 12 + 2 2 + 3 2 + 12 + 12 + 52
= 14 + 27 = 41.
324 Computational Vector Geometry
Since the two sides of the equation are different, the theorem doesn’t hold.
Span {w }
1
Span {v }
w
v
-2 -1 1 2
1 5x1 + x2
Example 6. Show ~v = 2 is not in W ⊥ for W = −x1 + 2x2 .
3 −x1 − x2
A vector is in W ⊥ only if it is orthogonal to everything in W . Therefore
to show ~v is not in W ⊥ , we simply need to find some w ~ in W which is not
orthogonal to ~v .
1
If we let x1 = 0 and x2 = 1, we get w ~ = 2 . Taking the dot product
−1
with ~v gives us
1 1
2 · 2 = 1(1) + 2(2) + 3(−1) = 2.
3 −1
and
1 7
2 · −2 = 1(7) + 2(−2) + 3(−1) = 0.
3 −1
Since both of these dot products are zero, ~v is orthogonal to all spanning
vectors of W and hence by Theorem 2 we know ~v is in W ⊥ .
4 −1
−2 , 5 .
Example 8. Find W ⊥ where W = Span 1 2
0 9
x1
x2
To be in W , a vector ~x =
⊥ needs to satisfy the equations
x3
x4
~ 1 · ~x = 4x1 − 2x2 + x3 = 0
w
~ 2 · ~x = −x1 + 5x2 + 2x3 + 9x4 = 0
w
This means we are really solving the linear system whose augmented coefficient
matrix is
4 −2 1 0 0
.
−1 5 2 9 0
5.2 Orthogonality 327
To see that this is true, we can use the subspace test. Since ~0 is orthogonal
to everything (~v · ~0 = 0 for every ~v in Rn ), we clearly have ~0 in W ⊥ . If ~x and
~y are in W ⊥ , then we have ~x · w ~ = 0 and ~y · w
~ = 0 for all w
~ in W . Therefore
(~x + ~y ) · w
~ = (~x · w)
~ + (~y · w)
~ =0+0=0
so ~x + ~y is in W . Finally, suppose ~x is in W ⊥ , so ~x · w
⊥
~ = 0 for every w
~ in
W , and r is any scalar. Then
(r~x) · w
~ = r(~x · w)
~ = r(0) = 0
so r~x is in W ⊥ . Since W ⊥ satisfies all three conditions of our subspace check,
it is a subspace of Rn .
Exercises 5.2.
2 1
1. Are −4 and 1 orthogonal?
1 2
−3 1
2. Are 0 and 5 orthogonal?
6 −1
8 1
−1 1
3. Are
3
and orthogonal?
−2
2 1
4 1
−4 1
4. Are
1
and orthogonal?
1
0 1
328 Computational Vector Geometry
1 7
5. Does Theorem 2 hold for ~v = 0 and w ~ = 12?
−1 7
3 −6
6. Does Theorem 2 hold for ~v = and w
~= ?
4 3
−5
7. Find a nonzero vector which is orthogonal to 5 .
1
1
8. Find a nonzero vector which is orthogonal to 2 .
−9
−2 2x1 − 4x2
9. Is in W ⊥ for W = ?
1 4x1 + x2
3 ⊥ 2x1 − 4x2
10. Is in W for W = ?
−1 −6x1 + 12x2
1 x1 + x2
11. Is 1 in W ⊥ for W = x1 − 6x2 ?
−1 2x1 − x2
−3 5x1 + x2
12. Is 4 in W ⊥ for W = 2x1 − x2 ?
1 7x1 + 7x2
−1 3 2
13. Is −1 in W ⊥ for W = Span 5 , 0 ?
2 4 1
0 −3 4
14. Is 1 in W ⊥ for W = Span 3 , −3 ?
1 4 3
4 0 2
15. Is −2 in W ⊥ for W = Span 1 , 3 ?
−1 2 5
1 −1 2
16. Is 3 in W ⊥ for W = Span 2 , −1 ?
1 −5 1
−1 3
17. Compute W ⊥ for W = Span 2 , −1 ?
1 0
4 −3
18. Compute W ⊥ for W = Span 2 , 1 ?
−1 2
5.2 Orthogonality 329
6 −3
−2 , 2 ?
19. Compute W ⊥ for W = Span
−1 0
1 1
1 0
0 , 1 ?
20. Compute W ⊥ for W = Span
−2 5
5 −6
2 −1 4
−2 2 −4
21. Compute W ⊥ for W = Span , , ?
4 −5
0
6 1 2
22. (a) Let W = Span{w ~ 1, w~ 2 } be a subspace of Rn , and ~v be any
n-vector. Show that if ~v · w ~ 1 = 0 and ~v · w~ 2 = 0 then ~v is in
W ⊥.
(b) Let W = Span{w ~ k } be a subspace of Rn , and let ~v be
~ 1, . . . , w
an n-vector. Show that if ~v · w ~ i = 0 for w ~ k then ~v is in
~ 1, . . . , w
W ⊥.
330 Computational Vector Geometry
4
x =w+ v
Span {y }
v 1 w
-1 1 2 3 4 5
-1
We know the values of ~x and ~y , and want to solve for r (and hence w)
~ and
~v . We can use the first two equations to solve for ~v by plugging w~ = r~y into
~x = w
~ + ~v to get
~x = r~y + ~v .
Solving this for ~v gives us
~v = ~x − r~y .
It may feel as if we are stuck, but remember we have one more equation we
haven’t used yet: ~v · ~y = 0. Plugging ~v = ~x − r~y into this equation gives us
(~x − r~y ) · ~y = 0.
Using the properties of the dot product explored in 5.1 to expand the left-hand
side of this equation gives us
2 3
Example 1. Find the orthogonal projection of onto and the
9 1
2 3
component of orthogonal to .
9 1
and
3 3
~y · ~y = · = 32 + 12 = 10.
1 1
Plugging these dot products into our formula for the orthogonal projection
gives us
15 3 3 4.5
= (1.5) = .
10 1 1 1.5
2 3
To find the component of orthogonal to , we just need to subtract
9 1
2
the orthogonal projection from . This gives us
9
2 4.5 −2.5
− = .
9 1.5 7.5
2 3 4.5
Therefore, the orthogonal projection of onto is , and the
9
1 1.5
2 3 −2.5
component of orthogonal to is .
9 1 7.5
Geometrically, you can think of finding the orthogonal projection of ~x
onto ~y as shining a light perpendicularly down onto the line spanned by ~y and
looking at the shadow cast by ~x. This is shown in the picture below.
5.3 Orthogonal Projection 333
10
�
6
�
4
�
����
�
����������
����������
2 4 6 8 10
�
4 �
�
����
�
����������
��������� 2
����������
����������
-2 2 4 6 8 10
-2
334 Computational Vector Geometry
We’d like this sum to be zero, but unfortunately we don’t know much
about the dot products of our spanning vectors with each other. However,
this does suggest that we consider the special case where our spanning vectors
are orthogonal.
before we got stuck. Every term of this sum has as its rightmost factor a dot
product of the form w ~j · w
~ i with j 6= i, and now all such dot products are
zero because the w~ i s are an orthogonal set. Therefore we get ~v · w
~ i = 0, so
~ is in W ⊥ . This allows us to make the following definition.
~v = ~x − w
In order to use the formulas in the previous definition, we first need to check
that the spanning vectors of W are an orthogonal set. Taking their dot product
5.3 Orthogonal Projection 337
gives us
−3 5
2 · 6 = −3(5) + 2(6) + 1(3) = 0
1 3
so w
~ 1 and w
~ 2 are orthogonal and we can proceed as planned.
The formulas for the orthogonal projection of ~x onto W and component of
~x orthogonal to W involve the dot products w ~ 1 · ~x, w
~ 2 · ~x, w
~1 · w ~2 · w
~ 1 , and w ~ 2.
For our vectors, we get
−3 1
w~ 1 · ~x = 2 · 1 = (−3)(1) + 2(1) + 1(8) = 7,
1 8
5 1
~ 2 · ~x = 6 · 1 = 5(1) + 6(1) + 3(8) = 35,
w
3 8
−3 −3
w~1 · w ~ 1 = 2 · 2 = (−3)2 + 22 + 12 = 14,
1 1
and
5 5
~ 2 = 6 · 6 = 52 + 62 + 32 = 70.
~2 · w
w
3 3
Plugging these dot products into the formulas from our definition, we get
that the orthogonal projection of ~x onto W is
−3 5 1
~x · w
~1 ~x · w
~2 7 35
~1 +
w ~2 =
w 2 + 6 = 4
~1 · w
w ~1 ~2 · w
w ~2 14 70
1 3 2
and the component of ~x orthogonal to W is
1 1 0
~x · w
~1 ~x · w
~2
~x − ~1 +
w w
~2 = 1 − 4 = −3 .
~1 · w
w ~1 ~2 · w
w ~2
8 2 6
If we want to check that these are correct, we can check that the component
orthogonal to W is in W ⊥ . To do this we need to check its dot products with
~ 1 and w
w ~ 2 . These are
0 −3
−3 · 2 = 0(−3) + (−3)(2) + 6(1) = 0
6 1
and
0 5
−3 · 6 = 0(5) + (−3)(6) + 3(6) = 0.
6 3
338 Computational Vector Geometry
0
Since they are both zero, −3 is in W ⊥ as required.
6
Let’s wrap up this section by doing a physics problem of the type discussed
at the beginning of the section.
Example 5. A 1.5 kg metal sled is sliding down a hill with a flat slope of 45◦
downward. The only force acting on the sled is gravity, which pulls it straight
downward with a force of −14.7 N. Find the component of the gravitational
force which is pushing the sled along the slope of the hill.
Often in examples like this it can be helpful to sketch the situation. Ours
looks like this:
����
��������
�������
and
1 1
~y · ~y = · = 12 + (−1)2 = 2.
−1 −1
Plugging these into our orthogonal projection formula we get
~x · ~y 14.7 1 7.35
~y = = .
~y · ~y 2 −1 −7.35
to conclude that a little more than 2/3 of the magnitude of our gravitational
force is moving the sled along the hill.
Exercises 5.3.
−3
1. If we want to compute the orthogonal projection of 4 onto
−4
0
2 , which vector is ~x in our formula and which is ~y ?
−2
15 4
2. If we want to compute the orthogonal projection of onto ,
−2 9
which vector is ~x in our formula and which is ~y ?
8 2
3. Compute the orthogonal projection of onto and the
−1 1
8 2
component of orthogonal to .
−1 1
−1 1
4. Compute the orthogonal projection of onto and the
7 −1
−1 1
component of orthogonal to .
7 −1
3 1
5. Compute the orthogonal projection of −1 onto −1 and the
2 1
3 1
component of −1 orthogonal to −1.
2 1
340 Computational Vector Geometry
6 1
6. Compute the orthogonal projection of −5 onto 0 and the
8 −1
6 1
component of −5 orthogonal to 0 .
8 −1
8 0 1
7. Is −4 , 0 , 2 an orthogonal set?
0 1 0
−4 −1 5
8. Is 6 , 3 , 2 an orthogonal set?
2 −11 4
3 2 4
0 4 13
9. Is , , an orthogonal set?
−5 3 1
−1 −9 7
6 −2 4
1 , , 3 an orthogonal set?
6
10. Is 2 −3 −7
3 4 −5
1 0 3
11. Let W = Span 0 , −2 and ~x = 8. Compute the
−1 0 5
orthogonal projection of ~x onto W and the component of ~x
orthogonal to W .
1 1 3
12. Let W = Span −2 , 1 and ~x = −4. Compute the
1 1 −5
orthogonal projection of ~x onto W and the component of ~x
orthogonal to W .
1 −2 10
3
−1 0
13. Let W = Span
2 , 1 and ~x = 4 . Compute the
0 −1 −6
orthogonal projection of ~x onto W and the component of ~x
orthogonal to W .
−1 0 2
−2
−1 1
14. Let W = Span
1 , −1 and ~x = −3. Compute the
0 2 1
orthogonal projection of ~x onto W and the component of ~x
orthogonal to W .
5.3 Orthogonal Projection 341
2
15. What is the shortest distance between and the line spanned by
9
1
?
2
−2
16. What is the shortest distance between 3 and the plane spanned
−1
−1 2
by 1 and 0?
2 1
17. A man is dragging a sled up a hill by pulling on a rope attached to
the sled. The line formed by the rope has a slope of 3, and the hill
has a slope of 1. The man exerts 225 N of force in the direction of
the rope.
(a) What is the vector of the man’s force?
(b) What set of vectors represents the hill?
(c) How much of that force is exerted along the hill in the plane of
motion of the sled?
342 Computational Vector Geometry
To see why this is true, suppose we have nonzero ~v1 , . . . , ~vk in Rn which are
an orthogonal set. We want to show that they are also linearly independent,
i.e., none of these vectors is in the span of the others. We’ll do this by assuming
one of the ~v s is in the span of the others, and showing that assumption
produces an impossible consequence. For notational ease I’ll assume ~v1 is in
the span of ~v2 , . . . , ~vk . This means
The fact that the ~v s are an orthogonal set means ~vi · ~v2 = 0 for i 6= 2, so our
equation reduces to
0 = a2 (~v2 · ~v2 ).
Since ~v2 6= ~0, we know ~v2 · ~v2 6= 0. Therefore we must have a2 = 0. A similar
argument shows that each of our scalars a2 , . . . , ak must be zero. But this
means ~v1 = ~0, which is impossible since we required our orthogonal set to be
made up of nonzero vectors! Therefore we cannot have any of the ~v s in the
span of the others, and our orthogonal set must also be linearly independent.
While this theorem is great, we don’t want to get carried away and try
to reverse it. There are plenty of linearly independent sets which aren’t
orthogonal. Geometrically, we can see this by thinking of any two vectors
in R2 which aren’t on the same line but don’t lie at right angles to each other.
Since a basis is defined as a linearly independent spanning set, Theorem 1
tells us that having an orthogonal set gets us halfway to having a basis. This
leads to the following special case of a basis.
5.4 Orthogonal Basis 343
Definition. ~b1 , . . . , ~bn are an orthogonal basis for Rn if they are a basis for
Rn and an orthogonal set.
We could check that ~b1 , . . . , ~bn form an orthogonal basis by checking that
they are linearly independent and span Rn (and hence are a basis) and
then checking that they are an orthogonal set. However, Theorem 1 provides
a shortcut since orthogonality automatically implies linear independence.
Therefore we can check that ~b1 , . . . , ~bn form an orthogonal basis by checking
that they span Rn and are an orthogonal set. If we like, we can do even
less work by using Theorem 3 from 3.1. That theorem said that any linearly
independent set of n vectors was a basis for Rn . Combining this with Theorem
1 from this section we get the following:
0 −3 5
Example 2. Show that −2, 2 , and 6 form an orthogonal basis for
4 1 3
R3 .
These are the three vectors from 5.3’s Example 2, so we’ve seen that they
form an orthogonal set. Since there are three of them and we are in R3 ,
Theorem 2 tells us that our three vectors are an orthogonal basis for R3 .
1 1
Example 3. Show that and do not form an orthogonal basis for R2 .
1 −2
A set of vectors can fail to be an orthogonal basis either by failing to be
orthogonal or failing to be a basis. It’s easier to check whether or not these
two vectors are orthogonal, so let’s start there. Since
1 1
· = 1(1) + 1(−2) = −1 6= 0
1 −2
344 Computational Vector Geometry
these vectors are not orthogonal which means they do not form an orthogonal
basis for R2 .
(You can check if you like that they do form a basis for R2 , perhaps the
easiest way to see this is to show that the matrix with those two columns has
nonzero determinant.)
and from Example 4 we know ~b3 · ~b3 = 70. Therefore the third entry of [~v ]B is
~v · ~b3 105
= = 1.5.
~b3 · ~b3 70
the denominators in our coordinate vectors are the squares of the norms of
our basis vectors. This suggests an easy way to simplify this computation:
normalize each basis vector so that its length is 1. Then we’d have
~bi · ~bi = ||~bi ||2 = 1
An orthogonal basis which has all basis vectors of length 1 is often called an
orthonormal basis.
It turns out that we can take any basis ~v1 , . . . , ~vn for Rn and transform
it into an orthonormal basis using an algorithm called the Gram-Schmidt
process. Like row reduction, it has two stages. The first stage transforms our
original basis into an orthogonal basis ~b1 , . . . , ~bn , and the second normalizes
5.4 Orthogonal Basis 347
each of the orthogonal basis vectors produced in the first stage to produce the
orthonormal basis ~u1 , . . . , ~un . Formally, this can be stated as follows:
Gram-Schmidt Process:
Part 1:
• Let ~b1 = ~v1 .
• Starting with i = 2 and repeating with each successive i until you reach
i = n, let
! ! !
~
v i · ~b1 ~
v i · ~bi−1
~bi = ~vi − ~b1 + · · · + ~bi−1 .
~b1 · ~b1 ~bi−1 · ~bi−1
(This means ~bi is the component of ~vi orthogonal to the span of ~b1 , . . . , ~bi−1 .)
Part 2:
1 ~
• Let ~ui = bi .
||~bi ||
To do this, we need the dot products ~v2 · ~b1 and ~b1 · ~b1 . Plugging in our ~v2 and
~b1 gives us
−3 1
~v2 · ~b1 = 4 · 0 = (−3)(1) + 4(0) + 1(−1) = −4
1 −1
and
1 1
~b1 · ~b1 = 0 · 0 = 12 + 02 + (−1)2 = 2.
−1 −1
348 Computational Vector Geometry
Thus !
−3 1 −1
~b2 = ~v2 − ~v2 · ~b1 ~b1 = 4 − −4 0 = 4 .
~b1 · ~b1 2
1 −1 −1
(We can check our work by computing ~b1 · ~b2 = 0 to see that ~b1 and ~b2 are
orthogonal.)
To complete part 1, we need to compute
! ! !
~v3 · ~b1 ~v3 · ~b2
~b3 = ~v3 − ~b1 + ~b2 .
~b1 · ~b1 ~b2 · ~b2
Here we need the dot products ~v3 · ~b1 , ~b1 · ~b1 , ~v3 · ~b2 , and ~b2 · ~b2 . From the
previous step we know ~b1 · ~b1 = 2, and we can compute
−1 1
~v3 · ~b1 = 7 · 0 = (−1)(1) + 7(0) + (−7)(−1) = 6,
−7 −1
−1 −1
~v3 · ~b2 = 7 · 4 = (−1)(−1) + 7(4) + (−7)(−1) = 36,
−7 −1
and
−1 −1
~b2 · ~b2 = 4 · 4 = (−1)2 + 42 + (−1)2 = 18.
−1 −1
Plugging these into our formula for ~b3 gives us
! ! !
~v3 · ~b1 ~v3 · ~b2
~b3 = ~v3 − ~b1 + ~b2
~b1 · ~b1 ~b2 · ~b2
−1 1 −1
6 36
= 7 − 0 + 4
2 18
−7 −1 −1
−1 1 −2
= 7 − 8 = −1 .
−7 −5 −2
(Again, we can check our work by computing ~b1 · ~b3 = 0 and ~b2 · ~b3 = 0 to see
that ~b1 , ~b2 , ~b3 are an orthogonal set.)
We are now done with Part 1 of the Gram-Schmidt process and have
created the orthogonal basis
1 −1 −2
~b1 = 0 , ~b2 = 4 , ~b3 = −1
−1 −1 −2
5.4 Orthogonal Basis 349
for R3 .
Moving on to part 2 of the Gram-Schmidt process, we want to normalize
1 ~
each of our orthogonal basis vectors using the formula ~ui = bi to create
~
||bi ||
our orthonormal basis ~u1 , ~u2 , ~u3 for R3 . To do this, we need to compute the
norm of each ~bi . These are
1 p √
||~b1 || = 0 = 12 + 02 + (−1)2 = 2,
−1
−1 p √ √
||~b2 || = 4 = (−1)2 + 42 + (−1)2 = 18 = 3 2,
−1
and
−2 p √
||~b3 || = −1 = (−2)2 + (−1)2 + (−2)2 = 9 = 3.
−2
Plugging these norms into our formula for ~ui gives us
√1
1 2
1 ~ 1
~u1 = b1 = √ 0 = 0 ,
||~b1 || 2 −1
− √12
1
− 3√2
−1
1 ~ 1
~u2 = b2 = √ 4 =
3
√4
2 ,
||~b2 || 3 2 −1
1
−3 2√
and − 2
−2 3
1 ~ 1
~u3 = b3 = −1 = − 13 .
||~b3 || 3
−2 −2 3
(You can check on your own that each of these vectors has norm 1.)
This means our orthonormal basis for R3 is
1 1 2
√ − 3√2 −3
2
0 1
~u1 = 4
, ~u2 = 3 2 , ~u3 = − 3 .
√
− √12 1
− 3√ − 23
2
Our newfound ability to create orthogonal bases also allows us to show the
following interesting fact.
350 Computational Vector Geometry
Since our reduced echelon form has two leading 1s, we know dim(W ) = 2.
To compute the dimension of W ⊥ , notice that it is written as the solution
set of a matrix equation of the form A~x = ~0. This means we can find a basis
for W ⊥ which has one basis vector corresponding to each of the free variables
x3 and x4 which appear in the entries of the vectors in W ⊥ . (See Example 2
of 3.1 for more discussion of this idea.) Since we have two free variables, we
have dim(W ⊥ ) = 2.
Thus
dim(W ) + dim(W ⊥ ) = 2 + 2 = 4
as claimed.
Exercises 5.4.
3 2
1. Is −1 , 10 an orthogonal basis for R3 ?
4 1
2 41 3
2. Is 10 , −5 , −1 an orthogonal basis for R3 ?
1 32 4
0
−1 9 41
6 1 0 14
3. Is , , ,
−2 3 3 9
an orthogonal basis for R4 ?
1 0 6 −66
−4 1 3 0
0 −2
, , , 5 an orthogonal basis for R4 ?
4
4. Is
1 −1 6 −1
1 −1 6 1
−3 1
5. Let B = , be an orthogonal basis for R2 . Compute the
1 3
4
first component of [~v ]B for ~v = .
6
1 1
6. Let B = , be an orthogonal basis for R2 . Compute the
1 −1
8
second component of [~v ]B for ~v = .
−2
−3 1 −6
7. Let B = 4 , 0 , −5 be an orthogonal basis for R3 .
1 3 2
4
Compute the second component of [~v ]B for ~v = 7.
6
352 Computational Vector Geometry
1 4 1
8. Let B = 1 , −2 , 3 be an orthogonal basis for R3 .
−2 1 2
7
Compute the first component of [~v ]B for ~v = −2 .
2
0 −1 2
9. Let B = 1 , 1 , 1 be an orthogonal basis for R3 .
−1 1 1
4
Compute [~v ]B for ~v = −2.
3
1 −2 −4
10. Let B = 1 , −1 , 5 be an orthogonal basis for R3 .
1 3 −1
2
Compute [~v ]B for ~v = 1.
4
1 4 1
11. Let B = 1 , −2 , 3 be an orthogonal basis for R3 .
−2 1 2
2
Compute [~v ]B for ~v = 1.
1
−1 1 −1
12. Let B = 0 , 2 , 1 be an orthogonal basis for R3 .
1 1 −1
3
Compute [~v ]B for ~v = 3.
1
1 9
13. Use Part 1 of the Gram-Schmidt process on B = , to
3 7
create an orthogonal basis for R2 .
4 10 −8
14. Use Part 1 of the Gram-Schmidt process on B = 2 , 10 , 6
0 −1 −2
to create an orthogonal basis for R3 .
−1 −4
15. Use the Gram-Schmidt process on B = , to create an
2 3
orthonormal basis for R2 .
5.4 Orthogonal Basis 353
−1 1 4
16. Use the Gram-Schmidt process on B = 0 , −1 , 3 to
1 3 2
create an orthonormal basis for R3 .
3 2
17. Check that Theorem 2 holds for W = Span −1 , 0 .
5 4
2 −8
−1 , 3 .
18. Check that Theorem 2 holds for W = Span
0 2
1 5
19. Suppose W is a subset of R4 with dim(W ) = 1. What is dim(W ⊥ )?
20. Suppose W is a subset of R3 with dim(W ) = 2. What is dim(W ⊥ )?
21. Suppose W is a subset of R8 with dim(W ) = 2. What is dim(W ⊥ )?
22. Suppose W is a subset of R5 with dim(W ) = 3. What is dim(W ⊥ )?
23. Show that the standard basis for Rn is an orthogonal basis.
24. Can you find an orthogonal set of 3-vectors which is not a basis for
R3 ?
A
Appendices
The real part is the piece without a factor of i, which is 14. The imaginary
part is the piece with a factor of i, which is −7i.
Since we saw in 3.3 that the complex numbers are a 2-dimensional vector
space over R, it shouldn’t be surprising to learn that C is often identified with
the plane R2 . This is usually
done by identifying a complex number a + bi
a
with its coordinate vector with respect to the standard basis {1, i} for C.
b
In other words, we use the real part a as the x-coordinate and coefficient b on
i in the imaginary part bi as the y-coordinate. This sometimes leads people to
refer to the x-axis as the real axis and the y-axis as the imaginary axis. This
is illustrated by the following figure.
355
356 3
Appendices
2
-3 + 2 i = (-3, 2)
-4 -3 -2 -1 1
-1
This makes sense geometrically as well, since two complex numbers with
equal real and imaginary parts occupy the same point in the complex plane.
In 3.3, we mostly treated i as a placeholder variable. However, it is really
the key to C’s good algebraic properties, but to explore C more deeply we
need to connect it to R as follows.
√
Definition. The complex number i is defined as i = −1.
We can state this in words by saying that i is the positive square root of
−1. (The negative square root of −1 is therefore −i.)
We’ll use the same definition for addition of complex numbers as we did
in 3.3.
Note that this says we add component-wise, i.e., the real part of the sum
is the sum of the real parts and similarly for the imaginary parts.
and additive inverses, which are explained below. (We also showed these
properties in 3.3 as part of our check that C is a vector space.)
The order in which we add complex numbers doesn’t matter since
Now we can write this in our standard complex number format by grouping
the terms without a factor of i to form the real part and the terms with a
factor of i to form the imaginary part. This gives us
We can either find the values of a, b, x, y and plug them into the formula
above, or we can expand out as we would for (1 − 3x)(4 + 2x) and remember
that i2 = −1. I prefer the latter option so I don’t have to memorize a formula,
but feel free to do this from the formula if you prefer.
Expanding (1 − 3i)(4 + 2i) gives us
This means that unlike polynomials with real coefficients, our complex
polynomials factor completely into linear factors. In fact, since R is a subset
of C, if we view a real polynomial like x2 + 1 which doesn’t factor completely
over R as a complex polynomial, it will factor completely over C. (In this case
some of the roots will be complex numbers.)
x2 + 1 = (x + i)(x − i).
A.2 Mathematica
The vast majority of calculations from this book have been implemented in
Mathematica. Of course, this is also true of many graphing calculators and
other mathematical software packages. I’ve chosen to discuss Mathematica
over other software packages because it is the software package available at
the school where I teach, and I’ve chosen not to discuss graphing calculators
because there are so many different makes and models that having a unified
discussion is too difficult. If you prefer to use another software package or a
graphing calculator, my advice is to google the linear algebra computation
and the name of your technology of choice. In general, the resulting resources
will be fairly good and accurate.
I’ll start our tour of Mathematica by saying that its help section is
usually excellent. In my version, this is accessed through the Help menu
under Wolfram Documentation where you can then search by topic. Usually
examples are provided to walk you through a sample problem. However, a little
bit of knowledge starting out can help you avoid some of my past frustrations
with this program.
Before we get into linear algebra topics, here are a few basic tips for using
Mathematica.
• Mathematica documents are organized into cells which are each evaluated
as one unit. You can tell where each cell starts and ends by looking at the
right edge of the document where a bracket will indicate what is in that cell.
To leave a cell and start a new cell, simply click below that cell. Just start
typing to default to a Wolfram Language Input which is the basic mode for
executing mathematical commands, or click on the tab with the + at the
left of the page to choose which type of new cell you want. I often use Plain
Text cells for times when I’m trying to write up my work nicely and want to
access the formatting options we commonly associate with a program like
Microsoft Word.
• All commands start with a capital letter. (Not knowing this made me cry
as a student, but hopefully I’ve spared you!)
• All inputs to commands are put in square brackets rather than parentheses,
i.e., Factor[x2 − 1] instead of Factor(x2 − 1).
• To execute the contents of a cell, press Shift and Enter at the same time.
The output(s) will appear in another cell below the one you executed. Just
pressing Enter will simply take you down to the next line without executing
anything. If you don’t want your commands to execute all at once, put each
of them in a separate cell. If you want to execute a command without seeing
its output, put a semicolon (;) after that command.
A.2 Mathematica 363
• You can copy and paste commands either within the same cell or into a new
cell. This is very useful when you’re using the same type of command several
times. You can also copy and paste an entire cell, which is especially useful
if you’ve customized that cell’s format. To select a cell, click on its bracket
on the right-hand side of the document.
• While you can type in most things manually, many of the most common
commands can be entered using buttons on the Classroom Assistant which
is accessed through the Palettes menu. Of particular use are the Calculator
and Basic Commands sections.
Now that you’ve had a whirlwind tour of the basics of Mathematica, let’s
discuss some topics of particular relevance to our study of linear algebra.
• One of the first things you’ll want to do if you’re using Mathematica for
linear algebra computations is to enter vectors and matrices.
Mathematica doesn’t care whether you write your vector as a column of
numbers (which we’ve been doing) or as a row of numbers. Either way,
Mathematica treats a vector as an ordered list of numbers. This means the
easiest way to enter a vector is as a list. Mathematica denotes lists using
curly braces to contain the list and commas to separate the entries as in
{1,2,3}.
The default format used by Mathematica is to write a matrix as a “list
of lists.” Mathematica treats each row of the matrix as a list and then lists
those row lists in order as in {{1,2,3},{3,2,1}}. Each row is listed from left to
right and the rows are listed from top to bottom, basically mirroring the way
English is read. This means that the 3 × 3 identity matrix would be written
{{1,0,0},{0,1,0},{0,0,1}}. Personally I find this presentation of a matrix
less digestible than our “grid of numbers” approach, but it is important
to understand what it means if Mathematica gives you something in this
format.
If you prefer to enter your matrices as grids of numbers, you can use the
piece of the Classroom Assistant devoted to matrices. This is under the
Basic Commands section and is accessed by clicking on the button which is
highlighted blue in the picture below.
364 Appendices
example when 1 of
finding the range
������� Eigensystem
2 a linear function. If A is m×n with m ≤ n,
then RowReduce handles this just 4 3 fine. However, if m > n, then RowReduce
will give������� 1}, 2},
numeric results that aren’t
{{5, - {{1, {- 1,useful.
1}}} In this case, you can use the
more complicated command Reduce. To solve A~x = ~b where the entries of
5 0 ~
. . . , xEigensystem
~x are x1 , ������� n and the entries of b are b1 , . . . , bm , your Mathematica input
0 5
would be
������� {{5, 5}, {{0, 1}, {1, 0}}}
Reduce[A.{x1 , . . . , xn }=={b 1 12
, . . . , bm },{x1 , . . . , xn }]
������� Eigensystem
4 3 5 1
The output Eigensystem
�������will list any conditions
0 5
on the entries of ~b needed to ensure that
there�������
is a{{5, - 1},and
solution, 2},the
then
{{1, {- 1, 1}}}of x1 , . . . , xn in terms of b1 , . . . , bm .
values
������� {{5, 5}, {{1, 0}, {0, 0}}}
������� Eigensystem 1 5 0 2
0 5
������� Reduce - 1 5 .{x1 , x2 } ⩵ {b1 , b2 , b3 }, {x1 , x2 }
������� {{5, 5}, {{0, 01},- {1,
1 0}}}
b1 ⩵ - b2 - 7 b3 && x1 ⩵ - b2 - 5 b3 && x2 ⩵ - b3
Eigensystem 5 1
�������
�������
0 5
1 2 3
������� ������� 5}, {{1,00},
{{5,Reduce 1 0 0}}}
{0,.{x 1 , x2 , x3 } ⩵ {b1 , b2 , b3 }, {x1 , x2 , x3 }
In the example above, we 1 b1 = −b2 − 7b3 in order to have the solution
1 0need
x1 = −b2 − 5b3 and b1x12 =2 −b33. b3 b1 b3
Reduce x ⩵ -- 1 + 5 b2 +.{x1 ,&&x~2x}2⩵⩵{b , bx23, ⩵
b21&& b3 }, -{x
b12,- x2 }
If no condition1 on 0the
������� �������
2 -entries
1 2 of b is needed to ensure
2 a solution, then
2
Mathematica will simply return the solution itself.
������� b1 ⩵ - b2 - 7 b3 && x1 ⩵ - b2 - 5 b3 && x2 ⩵ - b3
1 2 3
������� Reduce 0 1 0 .{x1 , x2 , x3 } ⩵ {b1 , b2 , b3 }, {x1 , x2 , x3 }
1 0 1
b1 3 b3 b1 b3
������� x1 ⩵ - + b2 + && x2 ⩵ b2 && x3 ⩵ - b2 -
2 2 2 2
������� Eigensystem 5 0
0 5
In the example above, the matrix has
eigenvalues
5 and −1 and eigenspaces
1 ������� {{5, 5}, {{0, 1},
−1{1, 0}}}
E5 = Span and E−1 = Span .
2 1
������� Eigensystem 5 1
An eigenvalue which has multiplicity 0greater 5 than one, i.e., is a multiple
root of det(A − λI�������
n ), will {{5, 5}, {{1, 0}, {0, 0}}} In fact, it will be listed
be listed more than once.
as many times as it is a root of det(A − λIn ). If the eigenspace for the
Eigensystem 1 2
multiple eigenvalue has �������dimension1 large
2
4enough
3 x }to match the multiplicity
of the eigenvalue, then Mathematica will list1 ,the
������� Reduce - 1 5 .{x 2 ⩵ {b1 , b2 , b3 }, {x1 , x2 }
basis eigenvectors of that
������� {{5, - 1},
0 - 1 2}, {- 1, 1}}}
{{1,
eigenspace.
�������b1 ⩵ - b2 - 7 b3 && x1 ⩵ - b2 - 5 b3 && x2 ⩵ - b3
Eigensystem 5 0
�������
0 5
1 2 3
������� Reduce
������� {{5, 5},
0 1 0
{{0, .{x
1}, , x20}}}
1{1, , x3 } ⩵ {b1 , b2 , b3 }, {x1 , x2 , x3 }
1 0 1
1 2
Eigensystem
b1
������� Eigensystem 3 b35 1 b b
�������
������� x1 ⩵ - + b2 + 04 3x2 ⩵ b2 && x3 ⩵ 1 - b2 - 3
&&5eigenvalue
In the previous example, the matrix 2 has2 the 5,
which
2 is a double
2
������� {{5,
������� - 1},
{{5,5}, {{1,
{{1, 0},2}, 1,01}}}1
{0,{-0}}}
root of det(A − λIn ), and eigenspace E5 = Span , .
1 0
������� Eigensystem
1 2 5 0
theReduce
If the eigenspace for������� 1 5 0 .{x
multiple -eigenvalue , x2 } ⩵ {b
5 1doesn’t 1 , b2enough
have , b3 }, {x 1 , x2 }
basis
vectors, Mathematica�������
will {{5,
put in5}, 0 vectors
zero -1 as placeholders.
{{0, 1}, {1, 0}}}
������� b1 ⩵ - b2 - 7 b3 && x1 ⩵ - b2 - 5 b3 && x2 ⩵ - b3
������� Eigensystem
5 1
1 2 30 5
������� Reduce 0 1 0 .{x1 , x2 , x3 } ⩵ {b1 , b2 , b3 }, {x1 , x2 , x3 }
������� {{5, 5}, 1 0 10}, {0, 0}}}
{{1,
b1 3 b3 b1 b3
x1 ⩵ -
������� + b12 + 2 && x2 ⩵ b2 && x3 ⩵ - b2 -
������� Reduce 2 - 1 52 .{x1 , x2 } ⩵ {b1 , b22, b3 }, {x 2 1 , x2 }
In the previous example, the matrix has the eigenvalue
5, which is a double
0 -1
1
root of det(A − λIn ),�������
and beigenspace E5 = Span .
1 ⩵ - b2 - 7 b3 && x1 ⩵ - b20- 5 b3 && x2 ⩵ - b3
1 2 3
������� Reduce 0 1 0 .{x1 , x2 , x3 } ⩵ {b1 , b2 , b3 }, {x1 , x2 , x3 }
1 0 1
b1 3 b3 b1 b3
������� x1 ⩵ - + b2 + && x2 ⩵ b2 && x3 ⩵ - b2 -
2 2 2 2
A.2 Mathematica 367
� 2
-4 -2 2 4 6
-2
-� �
-4
-6
19. -8
height 68.5
C 6
South 2 weight 158
21. (a) H = 12, (b) = , (c)
temp. = 98.7
W est 7
O 6
age 38
A.3 Solutions to Odd Exercises 369
0.75
1
23.
1.625 where the entries are in gallons and the vector has the
0.125
red
blue
format white
yellow
25. Rotates 180◦ and doubles distance from the origin.
1.2:
−7
1.
−11
2x1 + 3x3 + 5x4
3.
x1 − 4x2 − x3
5. 2
1 0 0 1
7. x1 0 + x2 2 + x3 2 = x4 3
0 0 1 3
9. No
11. No
13. The line y = − 12 x.
15. The xy plane in R3 .
17. No
19. No
21. No n o
23. Smallest is ~0 , largest is Rn .
1.3:
1. linearly dependent
3. linearly independent
3
5. linearly dependent
4
2
1 -2 -1 1 2 3
-2 -1 1 2 3 4 -1
7. (a) (b)
-1
-2
-2
370 Appendices
6
5. Domain is R2 , codomain is R3
7. Domain is R5 , codomain is R4
x1 x
9. f = 2
x2 x1
x 4x1 − 2x2 4x1 − 2x2 x1
11. f 2 1 = 6= = 2f
x2 2x1 + 2 2x1 + 4 x2
x1 y1 x2 + y2 x1 y1
13. f + = =f +f
x2 y 2 x 1 + x2 + y1 + y2 x 2 y2
x1 rx2 x1
and f r = = rf
x2 rx1 + rx2 x2
15. f (~0n ) = f (~v − ~v ) = f (~v ) − f (~v ) = ~0m
17. Show f (2~v ) 6= 2f (~v ).
2.2:
1. (a) 6-vector, (b) 4-vector
3. (a) 4-vector, (b) 5-vector
5. (a) R4 , (b) R2
7. (a) R2 , (b) R3
9. (a) R3 , (b) R2 , (c) 2 × 3
11. (a) R4 , (b) R3 , (c) 3 × 4
2 −1 1
13.
−1 0 10
1 1 0
15. 2 1 0
0 2 −1
A.3 Solutions to Odd Exercises 371
3 0 −7 1
17. 1 0 1 1
−1 0 8 0
0 1
19.
−1 0
0 −1
21.
−2 0
16
23.
5
11
25. (a) , (b) impossible
−9
11
27. A~v1 =
5
−7
29. A~v1 =
20
2 4 −2 4 −1
31. ~x =
1 2 −1 0 9
−10 3 8
33. (a) x1 1 + x2 0 = −2
7 −2 3
√
1 cos(θ) 0 − sin(θ) 10 5 3
35. fθ = and fθ = , (b) , ,
0 sin(θ) 1 cos(θ) 0 5
√ √
5
√ 0 −5 −5 3 −10 −5 3 −5 0
, , √ , , , , √ , ,
5 3 10 5 3 5 0 −5 −5 3 −10
√
5√ 5 3
,
−5 3 −5
2.3:
3 −3
1. 6 0
9 −6
1 2
3. −3 0
1
2 4
−6 −14
5.
−2 2
8 8
7.
4 −16
1 2
9.
2 4
372 Appendices
2 4
11. A + C = −2 0
4 8
4 3 7
13. A + C = −4 −2 8
0 2 8
2 −2 7
15. A + C =
−1 0 5
17. (a) 9 × n, (b) m × 5
1 0 a b a b a b 1 0 a b
19. · = , · =
0 1 c d c d c d 0 1 c d
4 −6 −8
21.
1 9 19
35 −13
23. AB = −10 14
−5 −2
3 11
25. AB = 18 6
−22 −4
8 −4 14
27. BC =
−11 8 7
14 −1 18 −7
29. AB = 6= = BA
23 8 9 4
1 1
31. B=
1 1
2.4:
−3 −17 0
1.
5 −2 8
3. Use the subspace test.
5. Yes
7. Not closed under multiplication by r < 0.
9. 0 doesn’t have an “additive” inverse
11. Show W is a subspace of M33 .
−a1 4a2 a1
13.
0 a1 − a2 a2
15. Yes
15. linearly independent
17. linearly independent
A.3 Solutions to Odd Exercises 373
25. No
x
27. 0
z
29. A has more rows than columns.
31. Yes
1 0 0 1 0 0
33. (a) , (b) , (c) A has more columns than rows
0 0 1 2 0 0
2.6:
3 −4 0 −5
1.
4 1 17 8
−1 3 5 0
3. 2 4 6 −6
0 9 −2 10
2 1 −11 5
5. −5 0 3 −1/2
10 −9 2 0
−4 0 −1 0 7
7. 0 1 1 2 9
1 0 0 5 13
2 4 −2 4 −1
9.
1 2 −1 0 9
3 −2 0
0 1 −5
11.
−6 8 1
7 4 2
0 −1 4 2
13. (a) ~x = , (b) −x2 + 4x3 = 2, 6x1 − 5x3 = 11, (c)
6 0 −5 11
0 −1 4 2
x1 + x2 + x3 =
6 0 −5 11
1 −3 0 0
15. Yes, 0 0 1 0
0 0 0 1
" #
1 0 1 4
17. Yes,
0 1 0 −3
1 −2 0 0 1
0 0 1 0 1
19.
0 0 0 1 1
0 0 0 0 0
A.3 Solutions to Odd Exercises 375
1 0 0 −3
5
21. 0 1 0 − 2
0 0 1 − 12
1 0 0
23. 0 1 0
0 0 1
−7
25. 2
6
2
27. −1
5
5
29. 3
−1
0
31. 1
−1
2.7:
5 + 3x3
−3 − 2x3
1.
x3
0
11
3. −4
3
x3
5. x3
x3
3x3
7. −2x3
x3
0
9. No
11. Yes
13. linearly independent
15. linearly independent
17. (a) linearly dependent, (b) 2
376 Appendices
1 0 −4
0 1 2
19. (a) Their reduced echelon form is
0 0
. (b) ~v3 = −4~v1 +2~v2 ,
0
0 0 0
(c) dim = 2
x1
21. x2
−8x1 + 4x2
23. R2
25. No
27. Yes
1 0 0
0 1 0
29.
0
0 1
0 0 0
31. No
33. Yes
35. Yes
37. See if their reduced echelon form had a leading 1 in every row.
39. $284,639 of agriculture, $492,470 of manufacturing, $893,072 of
service
2.8:
1. One unique solution
3. No solutions
5
5. 3
−1
7. x1 = 4k, x2 = 3k, x3 = 6k, x4 = 4k for some positive integer k
9. (a) No leading1 in rightmost
column and leading 1’s in all variable’s
1 0 4
columns, (b) 0 1 −1
0 0 0
3 −4
1, 0
11. Span 0 5
0 1
A.3 Solutions to Odd Exercises 377
2 −3
0
1
13. Span 0, 4
0 −1
0 1
−2 3
1 , 0
15. Span 0 1
0 1
−2
1
0 −3
17. (a) Span , , (b) It is the span from (a) plus any
1 0
0 1
solution to A~x = ~b.
19. The solution set of A~x = ~b is a shifted version of the solution set to
A~x = ~0.
21. No, because the matrix will have more columns than rows.
23. 3
25. 2
27. n = 3, rk(A) = 2, dim(Nul(A)) = 1, and 2 + 1 = 3
29. 2
2.9:
1. Horizontal cut between rows 1 and 2, vertical cut between columns
1 and 2
3. Horizontal cuts between rows 1 and 2 and rows 3 and 4, vertical
cuts between columns 1 and 2 and columns 3 and 4
5. 15 −10
−4 16
7.
8 −12
9. −3 13
7 −1
11.
−4 2
13. A: Horizontal cut between rows 2 and 3, B: vertical cut between
columns 3 and 4
15. A: Horizontal cuts between rows 1 and 2 and rows 2 and 3, B:
vertical cuts between columns 1 and 2 and columns 3 and 4
6
17.
4
378 Appendices
20 17
19.
25 9
4
21. −1
2
1
23. −2
−1
1 0 a b a b
25. =
0 −2 c d −2c −2d
1 0
27.
6 1
0 0 1
29. 0 1 0
1 0 0
1
0
31. 4
0 1
1 0 0
33. 0 0 1
0 1 0
−1 0 0 1 −1 −3
35. −5 3 0 0 1 8
1 2 −10 0 0 1
1 0 0 1 −2 4
37. 0 2 0 0 1 −1
1 3 4 0 0 1
a 0 0 x 0 0 ax 0 0
39. (a) b c 0 y z 0 = bx + cw cz 0 , (b)
d e f w u v dx + ey + f w ez + f u fv
Use the associative property of matrix multiplication.
2.10:
1. AA−1 = A−1 A = I2
−18 −3 5
3. A−1 = −12 −2 3
−5 −1 1
5. Not invertible
3
− 2 −12 −2
7. A−1 = 0 −1 0
1 6 1
A.3 Solutions to Odd Exercises 379
9. Not invertible
5 −2 −1
11. A−1 = 3 −1 −1
−10 4 3
x1 −x1 + x2
13. f −1 x2 = x1 − x2 + x3
x3 −2x1 + 3x2 − 2x3
15. Not invertible
17. (A−1 )−1 = A
19. ~x = (I3 − A)−1~b
21. y = 22.124 − 0.274x1 + 0.034x2 where x1 is bikers and x2 is smokers
2.11:
13. Yes
15. No
4.5:
−18
1.
28
9
3. 2
−1
3
5.
−7
−7
7. −9
−14
−1 5 2
9. 3 −1 1
4 0 −3
−1 5 3 2
1 0 0 3
11.
0 −2 −1 −1
2 4 2 −1
1 −4 3
13. −1 0 1
1 2 −1
3 1 5
15. 7 0 9
−1 −3 1
8 −11 4 −2
0 6 −5 3
17.
9 3 −1 4
1 −3 2 11
1
− 4 − 34 −4 3
1
19. − 2 − 12 − 12
1 3 3
2 2 2
1 1 1
−6 3 − 12 2
1 0 1
− 12
2 2
21. 1 1
− 2 1 − 12
1
6 − 13 1
2
1
2
A.3 Solutions to Odd Exercises 385
4 −1 3 −7 0 0
23. P = 1 0 −4, D = 0 2 0
0 6 2 0 0 −1
−13 4 0 −1 −1 0 0 0
10 −1 3 1 0 −1 0 0
25. P = 0
, D = 1
0 2 5 0 0 2 0
1 2 0 −3 0 0 0 6
27. Repeated multiplication by the same matrix of many vectors.
5.1:
1. −9
3. 0
5. 32
√
7. 21
√
9. 74
√
11. 17
√
13. 34
15. 9
√1
2
17. 0
− √12
2
− √5
1
19. √
5
0
5.2:
1. Yes
3. No
5. Yes
1
7. 1
0
9. No
11. No
13. Yes
15. No
1
− 5 x3
17. − 3 x
5 3
x3
386 Appendices
1
x − 2x
3 3 3 4
1 x − 3 x
19. 2 3 2 4
x3
x4
x4
4x4
21. −3x4
x4
5.3:
−3 0
1. ~x = 4 , ~y = 2
−4 −2
6 2
3. orthogonal projection: , component orthogonal:
1 −2
2 1
5. orthogonal projection: −2, component orthogonal: 1
2 0
7. Yes
9. Yes
−1 4
11. orthogonal projection: 8 , component orthogonal: 0
1 4
7 3
−3 6
13. orthogonal projection: , component orthogonal:
4 0
2 −8
√
15. 5
" 225 #
√ √
10 1
17. (a) 675 , (b) Span , (c) 90 5
√ 1
10
5.4:
1. No
3. Yes
5. − 35
11
7. 5
5
−2
9. −1
3
2
A.3 Solutions to Odd Exercises 387
1
5
11. 13
1
2
1 6
13. ,
3 −2
(" 1 # " 2 #)
− √5 − √5
15. ,
√2 − √15
5
389
Index
391
392 Index
of Mmn , 201
of Rn , 202
of P , 226
of Pn , 223
subspace, 38, 98, 100
subspace test, 98
transpose, 187
vector, 9
in Mathematica, 363
vector equation, 35
vector space, 97
W ⊥ , 324
zero vector
of C, 233
of C , 237
of Mmn , 95
of Rn , 17
of P , 221