100% found this document useful (4 votes)
835 views406 pages

Functional Linear Algebra (Hannah Robbins)

Uploaded by

fillform101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (4 votes)
835 views406 pages

Functional Linear Algebra (Hannah Robbins)

Uploaded by

fillform101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 406

Functional Linear

Algebra
Textbooks in Mathematics
Series editors:
Al Boggess, Kenneth H. Rosen

Nonlinear Optimization
Models and Applications
William P. Fox
Linear Algebra
James R. Kirkwood, Bessie H. Kirkwood
Real Analysis
With Proof Strategies
Daniel W. Cunningham
Train Your Brain
Challenging Yet Elementary Mathematics
Bogumil Kaminski, Pawel Pralat
Contemporary Abstract Algebra, Tenth Edition
Joseph A. Gallian
Geometry and Its Applications
Walter J. Meyer
Linear Algebra
What you Need to Know
Hugo J. Woerdeman
Introduction to Real Analysis, 3rd Edition
Manfred Stoll
Discovering Dynamical Systems Through Experiment and Inquiry
Thomas LoFaro, Jeff Ford
Functional Linear Algebra
Hannah Robbins

https://www.routledge.com/Textbooks-in-Mathematics/book-series/
CANDHTEXBOOMTH
Functional Linear
Algebra

Hannah Robbins
Roanoke College
First edition published 2021
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
© 2021 Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the author and
publisher cannot assume responsibility for the validity of all materials or the consequences of their
use. The authors and publishers have attempted to trace the copyright holders of all material repro-
duced in this publication and apologize to copyright holders if permission to publish in this form
has not been obtained. If any copyright material has not been acknowledged please write and let us
know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermis-
sions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and
are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data

Names: Robbins, Hannah, 1980- author.


Title: Functional linear algebra / Hannah Robbins.
Description: First editon. | Boca Raton, FL : CRC Press, 2021. | Includes
bibliographical references and index.
Identifiers: LCCN 2020046750 (print) | LCCN 2020046751 (ebook) | ISBN
9780367486877 (hardback) | ISBN 9781003042280 (ebook)
Subjects: LCSH: Algebras, Linear. | Functional analysis.
Classification: LCC QA184.2 .R6215 2021 (print) | LCC QA184.2 (ebook) |
DDC 512/.5--dc23
LC record available at https://lccn.loc.gov/2020046750
LC ebook record available at https://lccn.loc.gov/2020046751

ISBN: 978-0-367-48687-7 (hbk)


ISBN: 978-1-003-04228-0 (ebk)
To all my linear algebra students: past, present, and future

and

To my linear algebra teacher: David Perkinson


Contents

Introduction for Students ix

Introduction for Instructors xi

0 Introduction and Motivation 1

1 Vectors 9
1.1 Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . 41

2 Functions of Vectors 53
2.1 Linear Functions . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.3 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . 80
2.4 Matrix Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 95
2.5 Kernel and Range . . . . . . . . . . . . . . . . . . . . . . . . 107
2.6 Row Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.7 Applications of Row Reduction . . . . . . . . . . . . . . . . . 134
2.8 Solution Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
2.9 Large Matrix Computations . . . . . . . . . . . . . . . . . . 166
2.10 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
2.11 The Invertible Matrix Theorem . . . . . . . . . . . . . . . . 192

3 Vector Spaces 197


3.1 Basis and Coordinates . . . . . . . . . . . . . . . . . . . . . . 197
3.2 Polynomial Vector Spaces . . . . . . . . . . . . . . . . . . . . 220
3.3 Other Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . 231

4 Diagonalization 241
4.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . 241
4.2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
4.3 Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
4.4 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . 284
4.5 Change of Basis Matrices . . . . . . . . . . . . . . . . . . . . 293

vii
viii Contents

5 Computational Vector Geometry 307


5.1 Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
5.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . 320
5.3 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . 330
5.4 Orthogonal Basis . . . . . . . . . . . . . . . . . . . . . . . . 342

A Appendices 355
A.1 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . 355
A.2 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
A.3 Solutions to Odd Exercises . . . . . . . . . . . . . . . . . . . 368

Bibliography 389

Index 391
Introduction for Students

Linear algebra occupies an important place in the world of math and science
because it is an extremely versatile and useful subject. It rewards those
of us who study it with powerful computational tools, lessons about how
mathematical theory is built, examples for later study in other classes, and
much more. Even if you think you know why you are studying linear algebra,
I encourage you to learn about and appreciate those aspects of the course
which don’t seem immediately related to your original motivation. You may
find that they enhance your general mathematical understanding, and you
may need them for unexpected future applications. I initially loved linear
algebra’s lessons about how to generalize outward from familiar mathematical
environments, but I have recently used tools from linear algebra to help a
biology colleague with an application to her research.
As you work your way through this book, it is important to make sure you
understand the basic ideas, definitions, and computational skills introduced.
The best way to do this is to work through enough examples and problems
to make sure you have thoroughly grasped the material. How many problems
constitute “enough” will vary from person to person, but you’ll know you’re
there when a type of problem or idea elicits boredom instead of confusion or
stress. If you work your way through all of the exercises on a particular idea
and still need more, I encourage you to look up that topic in other linear
algebra books or online to find more problems.
The answers to the odd problems are in Appendix A.3. If your answers
don’t match up, I encourage you to seek help quickly so you can get
straightened out before you move on. Math is not a subject which rewards an
“I’ll come back to that later” mentality! Realize that help is available in many
places including your teacher, classmates, any tutoring resources available at
your school, online tutorials, etc. Take the time to figure out which type of
help works best for you – not everyone’s brain responds the same way and it
is much easier to work with your learning style rather than fight it.
Most of the computational techniques used in this book can be done either
by hand or using technology. I encourage you to do enough work with each one
by hand to understand its properties, and also to learn how to do it quickly
using a calculator or computer software. This book specifically addresses how
to use Mathematica, but feel free to use whichever technological tool best suits
your needs.
Finally, welcome to linear algebra! I hope you find this book helpful and
interesting.

ix
Introduction for Instructors

A first course in linear algebra is fairly unique in that it may appear


in different places within the mathematics curriculum depending on the
department’s planned progression of classes. This results in a wide array of
first linear algebra courses, each of which caters to a different set of student
circumstances. This book is (entirely selfishly) written for needs of the linear
algebra class as it is taught at Roanoke College, where we teach a one-
semester linear algebra course whose only prerequisite is first-year calculus. In
particular, our students may not have taken an introduction to proofs class.
With that student background in mind, I’ve made the following choices:

• For the vast majority of the book, I stick to vector spaces over R. Complex
vector spaces are briefly discussed in Appendix A.2.
• I freely use the label “Theorem,” but proofs are called explanations or
justifications to avoid causing students anxiety.
• More emphasis is placed on the idea of a linear function, which is used
to motivate the study of matrices and their operations. This should seem
natural to students after the central role of functions in calculus.
• Row reduction is moved further back in the semester and vector spaces
are moved earlier to avoid an artificial feeling of separation between the
computational and theoretical aspects of the course.
• Applications from a wide range of other subjects are introduced in Chapter
0 as motivation for students not intending to be math majors.

The chapters and sections are designed to be done in the order they appear,
however, there are a few places where some judicious editing is possible.

• Section 2.9 on large matrix calculations can be skipped, except for the
introduction of elementary matrices, which could be introduced where they
next appear in Section 4.2.
• Sections 3.2 and 3.3 could be skipped, since students already have m × n
matrices as an example of a vector space besides Rn . However, this will
require you to skip many later problems which include polynomial vector
spaces and C.
• Sections 5.1–5.3 depend only on the material from Chapter 1.

xi
xii Introduction for Instructors

• Section 5.4 depends only on the material from Chapter 1 and the idea of
a basis in Rn , which could be taught there rather than in Section 3.1 if
you wanted to move Chapter 5 earlier in the course.

I hope you find, as I do, that this book fills a gap in the literature and so
is helpful to you and your students.

Acknowledgments
While I have wanted to write a linear algebra book for quite some time, I
couldn’t have completed this project without help from many people who
definitely deserve to have their contributions recognized.
First of all, David Taylor deserves many, many thanks for not only
supporting my sabbatical proposal, which gave me the time off to write, but
also providing writing advice, formatting help, and reassurance when LaTeX
was being uncooperative. Thanks also to Karin Saoub for all the lessons I
learned watching her write her first book as well as her support when I was
feeling stressed by the writing process. Roland Minton provided good feedback
on my first draft. Steve Kennedy helped deepen the book’s motivation of
definitions and encouraged me to provide a more geometric approach. Maggie
Rahmoeller taught out of a draft version of the book and provided invaluable
feedback as well as reassurance that another instructor could appreciate my
book’s approach.
As I moved outside my area of expertise, many people helped inspire and
check my motivating examples: Rachel Collins introduced me to the use of
matrices in population modeling in biology, Skip Brenzovich and Gary Hollis
explained how matrices are used to model molecular structures in chemistry,
Adam Childers helped me find applications of matrices in statistics, Chris
Santacroce talked with me about coordinate systems in machining, and Alice
Kassens and Edward Nik-khah discussed how matrices are used in economics
modeling.
Thanks to Bob Ross for helping me think about how to position the book
in the marketplace and shepherding me through the publication process.
Finally, a special thank you to my Fall 2018 linear algebra students, in
particular Gabe Umland, who provided the first field test of this book, and
found many typos and arithmetic errors.
0
Introduction and Motivation

Linear algebra is widely used across many areas of math and science. It
simultaneously provides us with a great set of tools for solving many different
types of problems and gives us a chance to practice the important art of
mathematical generalization and theory building. In this initial section, we’ll
do both of these things. First we’ll explore a variety of different problems which
will all turn out to yield to linear algebra’s methods. Next we’ll pick out some
of the common characteristics of these problems and use them to focus our
mathematical inquiries in the next chapters. As we develop our linear algebra
toolbox, we’ll return to each of these applications to tackle these problems.
A common problem type in chemistry is balancing chemical reactions.
In any given chemical reaction, an initial set of chemicals interact to
produce another set of chemicals. During the reaction, the molecules of the
input chemicals break down and their constituent atoms recombine to form
the output chemicals. Since the atoms themselves are neither created nor
destroyed during a reaction, we must have the same number of atoms present
in the input and output chemicals. To balance a chemical reaction, we need to
figure out how many molecules of each chemical (both input and output) are
present so that the number of atoms balances, i.e., is the same in the inputs
as in the outputs. This must be done for each type of atom present in the
chemical reaction, and the complication is that the quantities of molecules we
use must balance each type of atom simultaneously. To make this less abstract,
let’s look at an example reaction.

Example 1. When propane is burned, it combines with oxygen to form


carbon dioxide and water. Propane molecules are composed of three carbon
atoms and eight hydrogen atoms, so they are written as C3 H8 . Oxygen
molecules each contain two atoms of oxygen, so they are written as O2 . Carbon
dioxide molecules each contain one carbon atom and two oxygen atoms, so
are written CO2 . Water molecules are made up of two hydrogen atoms and
one oxygen atom (which is why some aspiring chemistry comedians call water
dihydrogen monoxide), so are written H2 O. Using this notation, our chemical
reaction can be written C3 H8 + O2 → CO2 + H2 O.

There are three different types of atoms involved in this reaction: carbon,
hydrogen, and oxygen. For each molecule involved, we need to keep track of
how many carbon, hydrogen, and oxygen atoms it has. It is important that

1
2 Introduction and Motivation

we keep these quantities separate from each other since they record totally
different properties of a molecule. For example, each carbon dioxide molecule
contains 1 carbon atom, 0 hydrogen atoms, and 2 oxygen atoms.
If we have multiple copies of a molecule, we can calculate the number of
atoms of each type by multiplying each number of atoms by the number
of molecules present. For example, if our reaction produces 15 molecules
of carbon dioxide, we know this can also be viewed as producing with 15
carbon atoms, 0 hydrogen atoms, and 30 oxygen atoms. However, we don’t
just produce carbon dioxide, we also produce water, each molecule of which
contains 0 carbon atoms, 2 hydrogen atoms, and 1 oxygen atom. If our reaction
produces 20 molecules of water, we’ve produced 0 carbon atoms, 40 hydrogen
atoms, and 20 oxygen atoms.
However, our reaction will actually produce molecules of both carbon
dioxide and water. If we produce 15 molecules of carbon dioxide and 20
molecules of water, we’ve produced 15 carbon atoms (15 from the carbon
dioxide and 0 from the water), 40 hydrogen atoms (0 from the carbon dioxide
and 40 from the water), and 50 oxygen atoms (30 from the carbon dioxide
and 20 from the water).
To balance our reaction, we’d need to figure out quantities of each of our
four different molecule types so that the number of each type of atom is the
same before and after the reaction. In our example computation above, that
would mean we’d need to find quantities of propane and oxygen molecules
which together have 15 carbon atoms, 40 hydrogen atoms, and 50 oxygen
atoms to use as our inputs since those are the quantities of atoms from our
output molecules.

In physics, we often need to figure out the net effect of a group of forces
acting on the same object. If we’re thinking of our object as living in three
dimensions (which is how most of us think about the world we live in), then
each force can push on the object to varying degrees along each of our three
axes. Let’s agree to call these axes up/down, North/South, and East/West.
Different forces can either reinforce each other if they are acting in the same
direction along one or more of these axes, or they can cancel each other out if
they are acting in opposite directions. The overall force acting on the object
can be found by figuring out the combined effect of the separate forces along
each of these axes. Again, let’s look at an example.

Example 2. A person jumps upward and Northward against a wind which is


pushing them Southward and Eastward while gravity pulls them downward.
The force of their jump is 1000 Newtons up and 650 Newtons North, the
wind’s force is 200 Newtons South and 375 Newtons East, and the force of
gravity is 735 Newtons down.

To figure out the overall force acting on this person, we need to combine
the various forces’ actions in each of our three directions. The first step is to
Introduction and Motivation 3

realize that although we are describing the forces above as if North and South
are separate, they are actually opposites. This means we need to treat each
pair of directions as one directional axis and assign one of the two directions
along that axis as positive and the other as negative. I’ll stick with typical
map conventions and assign North to be positive and South to be negative
along the North/South axis, West as positive and East as negative along the
East/West axis, and up as positive and down as negative along the up/down
axis. With these conventions, we can restate the components of our three
forces acting on this person. The person’s jump has 1000 Newtons along the
up/down axis, 650 Newtons along the North/South axis, and 0 Newtons along
the East/West axis. The force of the wind has 0 Newtons along the up/down
axis, −200 Newtons along the North/South axis, and −375 Newtons along
the East/West axis. The force of gravity has 0 Newtons along the up/down
axis, 0 Newtons along the North/South axis, and −735 Newtons along the
East/West axis.
Now we can combine the various components of each force, but we must be
careful to add them up separately for each directional axis. Thus the total force
on the person is 265 Newtons along the up/down axis (1000 from the jump, 0
from the wind, and −735 from gravity), 450 Newtons along the North/South
axis (650 from the jump, −200 from the wind, and 0 from gravity), and −375
Newtons along the East/West axis (0 from the jump, −375 from the wind,
and 0 from gravity). Eventually we’ll develop tools to do this not just along
the standard axes, but along any set of axes at right angles to each other.

In economics, it is important to model the relationships between different


types of resources in the production process. To manufacture or produce a
given product requires different amounts of various resources, possibly even
including some of the same product being produced. (Think of the seed corn
required to grow the next corn crop.) In order to produce certain quantities
of a variety of different products, we will need to calculate how much of each
of the input resources are required for our production process. An example of
this is given below.

Example 3. A tortilla factory manufactures both corn and flour tortillas. It


takes 2 cups of cornmeal, 1.5 cups of water, and 0.5 teaspoons of salt to make
24 corn tortillas, and 2 cups of flour, 0.75 cups of water, 0.25 cups of oil, and
1 teaspoon of salt to make 20 flour tortillas.
If the factory wants to make 100 flour tortillas and 200 corn tortillas,
they will need to calculate the amount of each ingredient to order. One
way to do this is to divide the ingredient quantities in each recipe by the
number of tortillas that recipe produces (24 for corn tortillas and 20 for flour
tortillas) and then multiply by the number of tortillas to be produced (200
for corn tortillas and 100 for flour tortillas). This can also be done in one step
using a fraction whose numerator is the desired number of tortillas and whose
4 Introduction and Motivation

denominator is the number of tortillas produced by that recipe ( 20024 for corn
tortillas and 100
20 for flour tortillas).
Multiplying the quantities in the corn tortilla recipe by 200
24 shows us that
the factory will need 16 23 cups of cornmeal, 12.5 cups of water, and 4 16
teaspoons of salt to make 200 corn tortillas. Multiplying the quantities in the
flour tortilla recipe by 100
20 shows that the factory will need 10 cups of flour,
3.75 cups of water, 1.25 cups of oil, and 5 teaspoons of salt to make 100 flour
tortillas.
Combining the quantities of the ingredients needed for each type of tortilla,
this means that to make 100 flour tortillas and 200 corn tortillas the factory
will need 16 23 cups of cornmeal, 10 cups of flour, 16.25 cups of water, 1.25
cups of oil, and 9 16 teaspoons of salt.

In statistics, a tool called regression is often used to model the relationship


between variables so that the value of one variable, y, (often called the response
variable) can be predicted based on the values of a set of other variables
x1 , . . . , xp . In multiple linear regression, we want to use a data set to find a
set of constant coefficients β0 , β1 , . . . , βp so that y = β0 + β1 x1 + · · · βp xp .
The number βi describes the effect of xi on y if the other x variables are held
constant. The constant term β0 is the value of y when all the other variables
equal zero. The goal of this model is to choose the values of the β’s which give
us the most accurate predictions for the value of y.

Example 4. A group of scientists wanted to predict people’s VO2 max (a


measure of peak oxygen consumption during exercise) based on their age,
weight, heart rate (during low-intensity exercise), and gender. Since they
wanted to predict VO2 max, that is their response variable y. The other
variables are the xi ’s, so we can assign x1 to be age, x2 to be weight, x3
to be heart rate, and x4 to be gender.
The values of all these variables were recorded for 100 people. The scientists
found the following regression equation based on their data.

y = 87.83 − 0.165x1 − 0.385x2 − 0.118x3 + 13.208x4

Examining the first coefficient β1 = −0.165, we can say that if a person’s


weight and heart rate remain the same but they are a year older, then their
VO2 max decreases by 0.165.

In computer graphics, the world is currently restricted to our two-


dimensional screen. The geometry of this 2D space is usually mapped out
in terms of a point’s horizontal and vertical position. In many computer
games the screen gives our point of view as we move through the world. If
we move straight ahead, this means the screen needs to give us the feeling of
zooming into the center point of the screen to produce the impression that
we’re traveling toward that point. This can be done by moving the position of
Introduction and Motivation 5

everything on the screen outward at the same rate. One way to do this is to
multiply both the horizontal and vertical positions by the same number. One
such example is explored below.

Example 5. Suppose we have an object on the computer screen whose


horizontal position is −5 and whose vertical position is 12.

If we multiply both the horizontal and vertical positions by 2, we get a


horizontal position of −10 and a vertical position of 24. To see how this has
moved our point, let’s look at a picture.
25

(-������)

20

15

(-�����)
10

-25 -20 -15 -10 -5 5

-5

Both of these points are on the same line through the origin, but the point
where we multiplied the coordinates by 2 is much farther out. You can imagine
that if you did this to every point in the plane at once it would produce the
feeling of moving toward the center.

In biology there is a class of problems where we want to model a population


of organisms over time to see whether that population is growing or declining.
This is usually done by dividing the life cycle of an organism up into different
stages and observing both the number of organisms present in each stage and
the probability that an organism moves from a given stage into each other
stage. This data is then used to extrapolate the population over long stretches
of time to see if it is sustainable and survives or is not sustainable and shrinks
away to extinction.
6 Introduction and Motivation

Example 6. The smooth coneflower is a native Virginian plant. It’s life cycle
can be divided into five stages: seedling, small plant, medium plant, large
plant, and flowering plant. Data was collected for two different geographically
isolated populations of coneflowers.

For each of these two populations, there were different numbers of plants
in each of the five stages of growth. For example, in one population there were
88 large plants and in the other population there were only 16. If we want to
combine data about these two populations, we need to combine the numbers
for each life cycle stage separately. In the example above we have 104 large
plants in our two populations.

There is a whole class of applications where we want to record connections


between a collection of objects. These could be atomic bonds between the
atoms in a molecule, routes between subway stations, or Facebook friendships
within a group of people. In these applications, we pick one of the objects and
record whether or not it is connected to each of the other objects. Typically we
mirror computer code and use 1 to represent a connection and 0 to represent
no connection.

Example 7. Think about the set of all students at a particular college who
have a Facebook page. It is unlikely that all of them are Facebook friends with
everyone else in this group.

To describe the connections between these students, we could start by


putting their names in alphabetical order. Next we could pick each of them
one at a time and run down our list putting a 1 next to each person that
student is Facebook friends with, and a 0 next to each person they are not.
Looking at the collection of these individual lists of 1s and 0s, would give us
a sense of the social network of Facebook on campus.

One of the least typical examples of linear algebra I’ve come across is a
scene in Neal Stephenson’s book Cryptonomicon. In it, several siblings gather
after their mother’s death to divide up their parents’ belongings. One brother
is a mathematician and he explains that each person will be allowed to rank
each item based on two quantities: their perception of its monetary value
and their emotional attachment to it. These are represented visually as a 2D
grid. After each sibling has assigned their values to every object, the objects
will be divided. Using their valuation, each sibling can now compute their
perception of the total monetary and emotional value of their share versus
the share of each other sibling. The goal is to divide up the objects so that
each person thinks that their share is worth at least as much (both monetarily
and emotionally) as everyone else’s shares.
Introduction and Motivation 7

Example 8. Suppose one person receives a table, a car, and a set of dishes
as their share. In monetary terms, they think the table is worth $450, the car
is worth $2000, and the dishes are worth $25. In emotional terms, they’ve as-
signed the table a value of 60, the car a value of 5, and the dishes a value of 100.
This person has assigned each object a pair of numbers to represent its
monetary and emotional value. The table is worth $450 monetarily and 60
emotionally, the car is worth $2000 monetarily and 5 emotionally, and the
dishes are worth $25 monetarily and 100 emotionally. This person feels their
share has a total monetary value of $450 + $2000 + $25 = $2475 and a total
emotional value of 60 + 5 + 100 = 165.
To decide whether or not this person is satisfied with this division, we’d
need to use their valuations of the objects that make up each other person’s
share to make sure that they don’t believe anyone else’s share was worth more
than $2475 or 165 in emotional value.
Now that we’ve seen a variety of different examples, let’s shift our focus to
looking for common characteristics between these problems to help us decide
where to start our exploration of linear algebra in Chapter 1.
Although the examples in this section may not seem closely related, they
have certain underlying similarities. In each problem we had several different
numbers associated to a given object. These numbers meant separate things or
described separate qualities about that object, so needed to be kept separate
from each other. For example, when we looked at forces acting on a person
we kept the components of our forces in the up/down, North/South, and
East/West directions separate from each other.
In several of these problems we needed to be able to multiply all numbers
associated with a particular object by a constant. For example, if we used 15
water molecules (which are each made of hydrogen and oxygen) in a chemical
reaction then we’d multiply both the number of hydrogens and the number of
oxygens by 15.
We also needed to add the collections of numbers associated to different
objects together, but in such a way that the result was a collection of numbers
where we only combined numbers which meant the same thing. For example,
when adding a sibling’s valuation of several objects together we’d need to add
up their monetary values and emotional values separately to get a monetary
total and an emotional total.
Noticing this pattern is our first step in the process of mathematical
generalization which was mentioned at the beginning of this section. It allows
us to strip away the differences between these various example applications to
focus on the underlying similarities we need to study and understand. In the
next chapter we’ll develop a way to write down these collections of associated
numbers, figure out how to add two such collections, and how to multiply such
a collection by a constant. Once we’ve solidified our understanding of these
basic processes, we can tackle each of these example problems using the same
basic mathematical tools.
1
Vectors

1.1 Vector Operations


As we saw in Chapter 0, we often have several separate numeric values
associated with the same data point. Vectors provide us with a way to
simultaneously record all of those quantities in a compact way. Whether we are
looking at an object’s emotional and monetary value or its position in different
directions, a vector quickly sums things up by providing multiple descriptive
numbers and using the position of those numbers within the vector to tell us
what each one means. We will start with the most basic definition of a vector
below.

v1
 v2 
 
Definition. A vector is an ordered column of real numbers written  . .
 .. 
vn

To help distinguish which variables represent vectors, I’ll write them with
arrows over the top of the variable as in ~v .
Notice that this definition doesn’t tell us what the various positions inside
the vector represent. This is deliberate, because it allows us the flexibility to
assign the appropriate meanings for our current use. However, it does mean
we’ll need to get into the habit of clearly communicating the meaning of our
vector’s entries in each application we do.

Example 1. Write down the vector which records the number of carbon,
hydrogen and oxygen atoms in a molecule of carbon dioxide (CO2 ).

We saw in Example 1 from Chapter 0 that a carbon dioxide molecule


contains 1 carbon atom, 0 hydrogen atoms, and 2 oxygen atoms. There are
several possible ways to put these three numbers into a vector, depending on
the order of the entries. However, if we standardize our notation so that the
first entry counts carbon atoms, the second hydrogen, and the third oxygen,

9
10 Vectors

we get that carbon dioxide’s atomic vector is


 
1
~v = 0 .
2

 
1
Example 2. The vector ~v = 2 can be thought of as describing a position
3
in 3-space.

Points in 3-space are usually described using three axes (often denoted
by x, y, and z). Our vector ~v describes the point we get to by starting at
the origin where the three axes meet and moving 1 unit along the first axis
(usually x), 2 units along the second axis (usually y), and 3 units along the
third axis (usually z). Notice here that as in Example 1, both the entries of the
vector and their positions within the vector are important. Here each entry
tells us how many5 units to move, and its position in the vector tells us which
axis to move along.

Geometrically, we will draw our vectors as arrows that start at the origin
and end at the point
4 in the plane or 3-space whose coordinates match the
entries of the vector as in Figure 1.1. (In some other situations, vectors may
instead be drawn between any two points in the plane or in 3-space.)



2

-1 1 2 3 4 5

Figure 1.1: A geometric representation of a vector


-1
This often gives rise to the geometric description of a vector as having both
a length (or magnitude) and a direction. This interpretation of vectors also
shows up in physics where they are used to model forces acting on objects. We
can expand this geometric perspective from a single vector to a set of vectors.
1.1 Vector Operations 11

Example 3. The set of all 2-vectors is the plane, R2 , where R stands for the
set of real numbers.
 
2 x1
We can write this more explicitly as R = . Thinking of these
x2
vectors geometrically as in Example 2, we can see that this set of vectors
corresponds
 to the set of points in the plane. If we rewrite our 2-vectors as
x
R2 = , it becomes even clearer that this is the familiar 2D space we
y
use to graph one variable functions.

Similarly, using the interpretation of the entries from Example 2, the set
of all 3-vectors is R3 . We can expand this to the set of all vectors of any fixed
size.

Definition. The set of all vectors with n entries is Rn .

I will often refer to a vector with n entries as an “n-vector”. The dash


makes it clear that we’re talking about the number of entries in the vector
rather than the number of vectors involved. For example, saying “suppose we
have two 3-vectors” means that we have two vectors which each have three
entries, i.e., two vectors from R3 .
Because both the numbers in a vector and the positions of those numbers
matter greatly for the vector’s meaning, two vectors are considered equal only
when each corresponding pair of entries is equal. Since vectors from Rn and
Rm with n 6= m don’t have the same number of entries, they are automatically
not equal.
   
−1 −1
Example 4. ~v =  0  equals w
~ =  0 .
3 3
Both ~v and w~ are in R3 , so it is reasonable to ask whether they are equal.
Starting at the top and moving down ~v and w, ~ we can see that each pair of
their corresponding entries matches up. Therefore ~v = w. ~

   
−1 −1
Example 5. ~v =  0  is not equal to w
~ =  3 .
3 0
Again, these are both 3-vectors, so it’s reasonable to ask if they are equal.
Their first entries do match up, but their second and third entries do not.
(Remember that it is important not only that the vectors contain the same
numeric entries, but that they have those numbers in the same positions.)

Our motivating examples in Chapter 0 involved more than just vectors,


12 Vectors

since in Examples 1 and 5 we wanted to multiply a vector by some real number.


To distinguish between vectors and real numbers, we make the following
definition.

Definition. A number r in R is also called a scalar.


 
−5
Thus −5 is a scalar while is a vector.
0
Scalars will be represented by variables without arrows over them, so a is
a scalar while ~a is a vector.
In Chapter 0 we saw several problems where we were interested in adding
vectors or multiplying vectors by scalars, so let’s work out how this type
of addition and multiplication should work. As we define these operations,
we need to keep in mind the properties we want them to have. The biggest
advantage of using vectors to express information is that different entries
can describe different numeric properties. This means we want our vector
operations to respect the position of a vector’s entries. We also only want to
combine like entries together, so that the resulting vector’s information can
be interpreted in the same way as the original vectors’ entries.
Let’s start with vector addition. Suppose we want to use vector addition
to find the number of atoms of carbon, hydrogen, and oxygen contained in
a molecule of carbon dioxide and a molecule of water. Let’s use the same
notation as in Example 1 so that our vector’s entries will record the number
of carbon, hydrogen, and oxygen atoms in that  order. We saw in Example 1
1
that carbon dioxide’s atomic vector was 0. Water is H2 O, so each water
2
molecule contains 0 carbon  atoms,
 2 hydrogen atoms, and 1 oxygen atom
0
giving us the atomic vector 2. Our notion of vector addition should add
1
together the pair of numbers representing carbon and place that sum into the
entry which represents carbon, and similarly for hydrogen and oxygen.
Applying this pattern to the general case of two n-vectors, we get the
following definition.
   
v1 w1
 v2   w2 
   
~ =  .  be vectors in Rn . We define
Definition. Let ~v =  .  and w
 ..   .. 
vn wn
 
v1 + w1
 v2 + w2 
 
~v + w
~ = .. .
 . 
vn + wn
1.1 Vector Operations 13

We need to add the condition here that ~v and w ~ are both in the same
Rn to make sure that they have the same number of entries. Otherwise, it is
impossible to add their entries pairwise since some of the entries in the longer
vector won’t have corresponding entries in the shorter vector.

Example 6. Use vector addition to find the atomic vector of a molecule of


carbon dioxide and a molecule of water.

From the discussion before our  definition of vector addition, we


 know
1 0
carbon dioxide’s atomic vector is 0 and water’s atomic vector is 2, where
2 1
our vector entries represent carbon, hydrogen, and oxygen (in that order).
The combined atomic vector of these two molecules can be found by adding
their individual atomic vectors. This gives us
       
1 0 1+0 1
0 + 2 = 0 + 2 = 2
2 1 2+1 3

which means that together these two molecules contain 1 carbon atom, 2
hydrogen atoms, and 3 oxygen atoms.

Notice that our definition of addition did exactly what we asked: it


combined only pairs of numbers recording the same type of atom and placed
each sum into the appropriate spot in the vector for that atom.


2
~ where ~v = −10
Example 7. Explain why it is impossible to compute ~v + w
  13
−5
and w
~= .
8
Since ~v is a 3-vector and w
~ is a 2-vector we can’t do this addition, because
we don’t have ~v and w ~ both in Rn for a single n. Another way to think about
this is that the third entry of ~v has no corresponding entry to be added to in
w.
~

In R2 , vector addition
  has a striking geometric pattern. If we write our
x
2-vectors in the form , we know that the x-coordinate of our sum is the
y
sum of the x-coordinates of the vectors being added and similarly for the y-
coordinates. However, if we draw a parallelogram from the two vectors we’re
adding, then their sum forms that parallelogram’s diagonal as shown in Figure
1.2.
We can do something similar in R3 , but it is too complicated to draw in a
2D book.
5
14 Vectors

� � �
4 + =
� � �

2




1

-1 1 2 3 4 5

Figure 1.2: Adding vectors geometrically


-1

Now that we’ve explored vector addition, let’s turn our attention to
multiplying a vector by a scalar. Suppose we want to count the numbers
of the various types of atoms in 15 molecules of carbon dioxide. We can do
this by multiplying the number of carbon, hydrogen, and oxygen atoms in one
molecule by 15 and placing the results into the appropriate entries of our new
vector.
This motivates the following general definition for multiplying an n-vector
by a scalar.
   
v1 rv1
 v2   rv2 
   
Definition. If ~v =  .  is a vector in Rn and r is a scalar, then r·~v =  . .
 ..   .. 
vn rvn

 
1
Example 8. Compute 15 0.
2
To do this, we multiply each entry of our vector by 15 which gives us
     
1 15(1) 15
15 0 = 15(0) =  0  .
2 15(2) 30
Note that our vector is the atomic vector for carbon dioxide from Example
1.1 Vector Operations 15

1. This means our computation above gives us the atomic vector for 15
molecules of carbon dioxide.

As with vector addition, there is a nice geometric interpretation of scalar


multiplication in R2 . Since the pattern of scalar multiplication is slightly more
complicated to describe than the pattern of addition, we’ll start by exploring
its pattern in the next two examples.
12
Example 9. Compute 4~v and explain
 geometrically what this scalar
3
multiplication did to ~v , where ~v = .
10 1
Computing 4~v means multiplying each entry of ~v by 4. This gives us
8    
4(3) 12
4~v = = .
4(1) 4
6
To see what this looks like geometrically, let’s plot ~v and 4~v together.

��
2 � �

2 4 6 8 10 12

Here we can see that 4~v and ~v lie along the same line in R2 , but 4~v is 4
times as long as ~v .

As we saw in the previous example, multiplying a vector by a positive


scalar doesn’t change the direction it points, but does change its length. In
fact I’ve often wondered if scalars got their name because multiplying a vector
by a positive real number scales the length of the vector by that number. On
the other hand if our scalar is negative, we get a slightly different picture.

Example 10. Compute − 12 ~v and  explain geometrically what this scalar


3
multiplication did to ~v , where ~v = .
1
As in Example 9, computing − 12 ~v means multiplying each entry of ~v by
− 12 .This means " # " #
1 − 12 (3) − 32
− ~v = = .
2 − 12 (1) − 12

Again, let’s plot ~v and − 12 ~v together to get a visual idea of what this looks
like.
4

16 2
Vectors

3
1
1

-2 -1 1 2 3 4

-�

-� -1

These two vectors still lie on the same line in R2 , but they are pointing
opposite directions along that -2 line. In addition to pointing the opposite way,
our new vector − 21 ~v is also half the length of ~v .

From the picture in the example above, we can see that the only difference
when our scalar is negative is that our vector’s direction is switched. Putting
this together with our observations from Example 9, we can say that in R2
multiplying a vector by a scalar r multiplies the length of the vector by |r| and
switches the direction of the vector if r is negative. As with vector addition,
there is a similar pattern in R3 which you can explore on your own.
Note that at this point we don’t have a good idea of what length means
except in R2 and R3 where we can use our formulas for the distance between
two points to compute the length of our vectors. Later on in Chapter 5
we’ll create a definition of the length of a vector which doesn’t depend on
geometry and hence can be extended to higher dimensions. However, we can
still compute the product of a vector and a scalar for any size of vector even
if we can’t visualize its geometric effect.
Now that we’ve defined these two vector operations, let’s explore some of
their nice properties. You’ll notice that many of these properties are similar
to what we have in R, which means we can use our intuition about how
addition and multiplication work in the real numbers. This is an important
consideration, because whenever a newly defined operation shares its name
with a familiar operation, it is tempting to assume they behave exactly the
same. This is not always true, so it pays to check carefully! We’ll start with
properties of vector addition.
From the definition of vector addition, we know that if ~v and w ~ are both in
Rn , their sum ~v + ~u is also in the same Rn . We will call this property closure
of addition.
~ in Rn , then ~v + w
If we have ~v and w ~ =w ~ +~v . To see this, we can compute
 
v1 + w1
 v2 + w2 
 
~v + w
~ = .. .
 . 
vn + wn
1.1 Vector Operations 17

Since addition of real numbers is commutative (i.e., order doesn’t matter), we


know vi + wi = wi + vi for i = 1, . . . , n. This means
   
v1 + w1 w1 + v1
 v2 + w2   w2 + v2 
   
 .. = .. .
 .   . 
vn + wn wn + vn

Since the right-hand side is w ~ + ~v , we have ~v + w


~ =w ~ + ~v . This means vector
addition is commutative.
~ in Rn , then (~v + ~u) + w
If we have ~v , ~u, and w ~ = ~v + (~u + w).
~ To see this,
we compute
     
v1 + u 1 w1 (v1 + u1 ) + w1
 v2 + u2   w2   (v2 + u2 ) + w2 
     
(~v + ~u) + w~ =  ..  +  ..  =  .. .
 .    .   . 
vn + un wn (vn + un ) + wn

Since addition of real numbers is associative (i.e., where we put our parentheses
in addition doesn’t matter), we know (vi + ui ) + wi = vi + (ui + wi ) for
i = 1, . . . , n. This means
   
(v1 + u1 ) + w1 v1 + (u1 + w1 )
 (v2 + u2 ) + w2   v2 + (u2 + w2 ) 
   
 .. = .. 
 .   . 
(vn + un ) + wn vn + (un + wn )
and since
     
v1 + (u1 + w1 ) v1 u1 + w1
 v2 + (u2 + w2 )   v2   u2 + w2 
     
 ..  =  ..  +  ..  = ~v + (~u + w)
~
 .   .   . 
vn + (un + wn ) vn un + wn

we get (~v + ~u) + w ~ = ~v + (~u + w)


~ as claimed. Therefore vector addition is
associative.
In the real numbers, we have 0 as our additive identity, i.e., 0 + r = r for
any real number r. In Rn , our additive identity is the n-vector whose entries
are all 0, which we’ll write as ~0. For any ~v in Rn , we have
       
v1 0 v1 + 0 v1
 v2  0  v2 + 0   v2 
       
~v + ~0 =  .  +  .  =  .  =  .  = ~v .
 ..   ..   ..   .. 
vn 0 vn + 0 vn

This shows ~0 is the additive identity of vector addition.


18 Vectors

In the real numbers, every number r has an additive inverse, −r, so that
−r + r = 0. For every ~v in Rn we have the additive inverse −~v = (−1) · ~v so
that −~v + ~v = ~0. We can check this by computing
       
−v1 v1 −v1 + v1 0
 −v2   v2   −v2 + v2  0
        ~
−~v + ~v =  .  +  .  =  ..  =  ..  = 0.
 ..   ..   .  .
−vn vn −vn + vn 0

Thus vector addition has additive inverses. (Notice that we set −~v + ~v equal
to ~0, because ~0 is the additive identity of Rn . This mirrors how −r + r = 0 in
R, because 0 is the additive identity of R.)
Now that we’ve explored vector addition, let’s turn our attention to scalar
multiplication of vectors.
From our definition of scalar multiplication, we know that if ~v is in Rn
then r · ~v is also in Rn . We call this property closure of scalar multiplication.
If we have ~v in Rn and two scalars r and s, then r · (s · ~v ) = (rs) · ~v . To
see this, we compute
   
sv1 rsv1
 sv2   rsv2 
   
r · (s · ~v ) = r  .  =  .  .
 ..   .. 
svn rsvn

Pulling a factor of rs out of each entry on the right-hand side gives us


   
rsv1 v1
 rsv2   v2 
   
 ..  = (rs)  ..  = (rs) · ~v .
 .  .
rsvn vn

This means scalar multiplication of vectors is associative.


In the real numbers, our multiplicative identity is 1, because for any real
number r we have 1 · r = r. This is also true for multiplication of vectors by
scalars, since for any ~v in Rn and scalar r we have
   
1 · v1 v1
 1 · v 2   v2 
   
1 · ~v =  .  =  .  = ~v .
 ..   .. 
1 · vn vn

Thus 1 is the identity for scalar multiplication of vectors.


To finish up our exploration of the properties of addition and scalar
multiplication of vectors, let’s see how these two operations interact with each
other.
1.1 Vector Operations 19

If ~v is in Rn and r and s are scalars, then (r + s) · ~v = r · ~v + s · ~v . To see


this we compute
       
(r + s)v1 rv1 + sv1 rv1 sv1
 (r + s)v2   rv2 + sv2   rv2   sv2 
       
(r + s) · ~v =  .. = ..  =  ..  +  ..  .
 .   .   .   . 
(r + s)vn rvn + svn rvn svn

Factoring r out of each entry of the first vector and s out of each entry of the
second vector gives us
       
rv1 sv1 v1 v1
 rv2   sv2   v2   v2 
       
 ..  +  ..  = r  ..  + s  ..  = r · ~v + s · ~v .
 .   .  . .
rvn svn vn vn

Therefore scalar multiplication distributes over addition of scalars.


If ~v and ~u are in Rn and r is a scalar, then r · (~v + ~u) = r · ~v + r · ~u. To see
this we compute
   
v1 + u1 r(v1 + u1 )
 v2 + u2   r(v2 + u2 ) 
   
r · (~v + ~u) = r  ..  =  .. 
 .   . 
vn + un r(vn + un )
     
rv1 + ru1 rv1 ru1
 rv2 + ru2   rv2   ru2 
     
= ..  =  ..  +  ..  .
 .   .   . 
rvn + run rvn run

Factoring an r out of both vectors gives us


       
rv1 ru1 v1 u1
 rv2   ru2   v2   u2 
       
 ..  +  ..  = r  ..  + r  ..  = r · ~v + r · ~u.
 .   .  .  . 
rvn run vn un

This means scalar multiplication distributes over vector addition.


To summarize these properties, we can say that vector addition is closed,
commutative, associative, has an identity element and inverses, that scalar
multiplication of vectors is closed, associative, and has an identity element,
and that vector addition and scalar multiplication of vectors distribute nicely.
These properties are expressed below as equations.
20 Vectors

Theorem 1. For every ~v , w,~ ~u in Rn and any scalars r and s, we have


n
~v + ~u is in R r · ~v is in Rn
~v + w~ =w ~ + ~v r · (s · ~v ) = (rs) · ~v
(~v + ~u) + w
~ = ~v + (~u + w)
~ 1 · ~v = ~v
~v + ~0 = ~v (r + s) · ~v = r · ~v + s · ~v
~
−~v + ~v = 0 for some −~v in R n
r · (~v + ~u) = r · ~v + r · ~u

In short, with these two operations, each Rn behaves a lot like R.

Exercises 1.1.
 
−1
0
1. For which n is the vector   n
 10  in R ?
−3
 
7
2. For which n is the vector in Rn ?
−8
 
1
3. Compute −2 2.
3
 
12
4. Compute 13 −9
1
 
3
5. Compute −4
1
 
−1
6. Compute −5  2 
1
 
3
7. Compute 4 .
−2
 
−1
8. Compute − 12  6 
4
 
−1
0
9. Compute 3  
 10 .
−3
10. Is it possible to have a scalar r and a vector ~v so that r~v doesn’t
make sense?
11. Compute each sum or explain why it is impossible.
1.1 Vector Operations 21
 
  1
2
(a) + 2
−4
3
   
2 7
(b) +
−4 5
 
1  
  7
(c) 2 + .
5
3
12. Compute each sum or explain why it is impossible.
 
1  
−3
(a)  + 0
0 0
4
   
−1 2
(b) +
5 −3
13. Compute each sum or explain why it is impossible.
 
2  
1
(a) −1 +
−3
17
   
0 −2
(b) −5 +  7 
10 −1
14. Compute each sum or explain why it is impossible.
 
−1  
3
(a)  2  +
−2
1
   
−1 11
(b)  2  +  5 
1 0
 
  11
3
(c) +5
−2
0
15. Compute each sum or explain why it is impossible.
 
−1  
3
(a)  2  +
−2
1
 
−1  
11
(b)  2  +
5
1
   
3 11
(c) +
−2 5
22 Vectors
     
1 −1 5
16. Compute 2 +  6  + −8.
3 4 −1
 
−1    
0 2 4
17. Let ~v1 =   , ~v2 = −3, ~v3 =  0 .
 10 
1 −5
−3
(a) Which sums of these vectors make sense?
(b) Compute those sums.
18. Use the picture of ~v and w
~ below to draw a picture of ~v + w.
~

2

-6 -4 -2 2

19. Use the picture of ~v below to draw a picture of −2~v .

4
-2

� -4
2

-2 -1 1 2 3 4

-1

-2
2
1.1 Vector Operations 23

~ below to draw a picture of ~v − w.


20. Use the picture of ~v and w ~

-1 1 2 3 4

-1

21. For each of the following scenarios, write a vector which records
the given information. Be sure to describe what each entry of your
vector represents.
-2
(a) The number of each type of atom in a molecule of glucose,
C6 H12 O6 . As in Example 1 from Chapter 0, C stands for
carbon, H for hydrogen, and O for oxygen.
(b) The pirate treasure can be found two paces south and seven
paces west of the only palm tree on the island.
(c) A hospital patient is 68.5 inches tall, weighs 158 pounds, has a
temperature of 98.7◦ F, and is 38 years old.
22. In the previous problem, imagine constructing the vector of a second
molecule or treasure map or hospital patient. For each scenario,
suppose you add together your two vectors. What is the practical
meaning of your vector sum?
23. Suppose we work at the paint counter of a hardware store. A
customer orders three gallons of lavender paint and half a gallon
of sage green paint. To make a gallon of lavender paint we mix 1/4
gallon of red paint, 1/4 gallon of blue paint and 1/2 gallon of white
paint. To make a gallon of sage green paint we mix 1/4 gallon of
white paint, 1/4 gallon of yellow paint, and 1/2 gallon of blue paint.
Use the method from Chapter 0’s Example 3 to find a vector which
gives the amounts of each color of paint needed to fill this order.
24. Adam and Steve are getting divorced, and they are trying to divide
up their assets: a car, a boat, a house, and a ski cabin. Each of
them has assigned each object a monetary and an emotional value.
If the first value vector entry is an object’s monetary value (in
24 Vectors

thousands of dollars) and the second is its emotional value (on


 a
12
0 to 10 scale), Steve’s value vectors are as follows: ~vcar = ,
  3
   
2 100 75
~vboat = , ~vhouse = , ~vcabin = .
6 5 10
(a) What is the vector which represents Steve’s total value of all
four assets?
(b) What does Steve consider to be half the monetary value of these
four assets? What does he consider to be half the emotional
value?
(c) If Steve is only thinking about monetary value, is it possible
for him to approve a division where he doesn’t get the house?
(d) If Steve is only thinking about emotional value, is it possible for
him to approve a division where he doesn’t get the ski cabin?
25. In Example 5 from Chapter 0, we discussed the visual effect of
multiplying both coordinates of a point in R2 by 2. What is the
visual effect of multiplying all points in R2 by −2? (You may want
to start by computing −2~v for several specific vectors ~v . Remember
to try ~v s from each quadrant of the plane.)
26. Can you think of a property of our familiar addition and multi-
plication in R which we didn’t discuss in Rn ? Does Rn have that
property?
27. Draw some examples of 3D vector addition and come up with your
own description of how the sum relates to the shape formed by the
two vectors. (You may want to use Mathematica.)
1.2 Span 25

1.2 Span
So far, we’ve seen examples of problems where we want to add vectors or
multiply a vector by a scalar. However, there are many situations where
we’ll want to see what vectors we can get by adding and scaling a whole
set of vectors – perhaps to combine existing solutions to a problem into
new solutions. For example, rather than simply adding together one molecule
of carbon dioxide and one molecule of water, we may want to combine 15
molecules of carbon dioxide with 25 molecules of water. Since this idea comes
up so often, we’ll spend this section studying the set of new vectors we can
get by adding and scaling a given set of vectors. Notice that as with the
combination of carbon dioxide and water above, we will often want to use
different scalar multiples of each vector. Let’s start by considering a single
combination of a set of vectors.

Definition. Let ~v1 , . . . , ~vk be vectors in Rn . A linear combination of these


vectors is a vector of the form a1~v1 + · · · + ak~vk where a1 , . . . , ak are scalars.

Note that if ~v1 , . . . , ~vk are n-vectors, then any linear combination of them
will also be an n-vector. Also notice that to create a linear combination of a
given set of vectors we can choose whichever scalars we like, but must use the
specified vectors.
   
−2 1
Example 1. Create a linear combination of ~v1 =  8 , ~v2 =  0 , and
  6 −5
3
~v3 = −1.
2
To create a linear combination of these three vectors, we need to first select
three scalars. I’ll use a1 = 12 , a2 = 2, and a3 = −1. Our linear combination is
then
     
−2 1 3
1 
a1~v1 + a2~v2 + a3~v3 = 8 + 2  0  + (−1) −1
2
6 −5 2
       
−1 2 −3 −2
=  4  +  0  +  1  =  5 .
3 −10 −2 −9

Notice that since our original vectors were 3-vectors, our linear combination
of them is also a 3-vector.

Geometrically, we can think of a linear combination as starting at the origin


26 Vectors

and taking a sequence of moves in Rn along each of the lines defined by ~vi .
Through this point of view, our linear combination is the vector corresponding
to the point where we end up at the end of our trip. Remember from Figure 1.2
in 1.1 that we can add vectors geometrically using the parallelogram pattern,
and from Examples 9 and 10 of 1.1 that which direction we travel along ~vi ’s
line is determined by whether its scalar coefficient is positive or negative.
   
1 4
Example 2. Find the linear combination of ~v1 = and ~v2 = with
−2 4
a1 = 3 and a2 = − 12 .
Plugging these vectors and scalars into our definition gives us
         
1 1 4 3 2 1
a1~v1 + a2~v2 = 3 − = − = .
−2 2 4 −6 2 −8
   
1 1 4
Alternately, we can construct 3 and − 2 geometrically.
−2 4

��

-4 -2 2 4 6

��
-2

- ��

� ��

-4

� ��
-6

Now we can find the sum by drawing in the parallelogram formed by these
vectors, which we can see exactly matches the result we computed algebraically
above.
1.2 Span 27

-4 -2 2 4


- ��

-2

-4 � ��


-6  
-�


� �� - ��
-8 �

In a more practical sense, we can use linear combinations to create an


ordered total amount of a variety of different quantities contributed by several
vectors.

Example 3. Suppose there are three lunch specials at a restaurant. Special


#1 is 2 eggs, 2 strips of bacon, and 2 pieces of toast, special #2 is 2 eggs, 2
strips of bacon, 2 sausage links, and an order of hash browns, and special #3
is 2 sausage links, an order of hash browns, and 2 pieces of toast. If a table of
6 people orders 2 special #1s, 3 special #2s, and 1 special #3, what should
the cook make?

As with any practical example, we first have to decide how to assign vector
entries to numeric pieces of information. There are 5 different food types in
the three specials, so we can encode the information about each breakfast
special’s contents as a 5-vector whose entries are (in order): eggs, strips of
bacon, sausage links, pieces of toast, and orders of hash browns. If we call
special #1’s
 vector ~
s1 ,special #2’svector
 ~s2 , and special #3’s vector ~s3 , we
2 2 0
2 2 0
     
get ~s1 = 0, ~s2 = 2, and ~s3 = 2.
     
2 0 2
0 1 1
The table’s order can now be thought of as a linear combination of these
three vectors whose coefficients a1 , a2 , and a3 , are the numbers of each special
28 Vectors

ordered. Since the table ordered 2 special #1s, we set a1 = 2. Similarly they
ordered 3 special #2s so a2 = 3, and 1 special #3 so a3 = 1. This makes the
tables order the linear combination
             
2 2 0 4 6 0 10
2 2 0 4 6 0 10
             
a1~s1 + a2~s2 + a3~s3 = 2              
0 + 3 2 + 2 = 0 + 6 + 2 =  8  .
2 0 2 4 0 2  6 
0 1 1 0 3 1 4

This means the cook needs to make 10 eggs, 10 strips of bacon, 8 sausage
links, 6 pieces of toast, and 4 orders of hash browns.

While we could figure out the totals in the previous example simply by
counting, in many practical situations we have far more than 5 quantities to
track and far more than 3 combinations of those quantities contributing to
our final outcome.
In many situations we won’t just want to look at one linear combination
of our starting vectors, but at the set of all possible linear combinations. For
example, we might want to consider all possible combinations of some number
of carbon dioxide molecules and some number of water molecules or figure  out
1
which points in the plane we can reach using linear combinations of and
  −2
4
. This motivates the following definition.
4

Definition. Let ~v1 , . . . , ~vk be vectors in Rn . The span of these vectors is the
set of all possible linear combinations, which is written Span{~v1 , . . . , ~vk }. The
vectors ~v1 , . . . , ~vk are called the spanning set.

In other words, the span is the set of vectors we get by making all possible
choices of scalars in our linear combination of ~v1 , . . . , ~vk . Note that if ~v1 , . . . , ~vk
are n-vectors then all their linear combinations are also n-vectors, so their
span is a subset of Rn . Geometrically, we can visualize the span of a set
of vectors by thinking of our spanning vectors ~v1 , . . . , ~vk as giving possible
directions of movement in Rn . This makes the span all possible points in
Rn that we can reach by traveling in some combination of those possible
directions. Practically, you could imagine your spanning vectors as different
types of coins and bills which makes the span the possible amounts of money
you can give as change using those coins and bills. (Of course in this example
the only coefficients that make sense are positive whole numbers, but we will
learn to explore these situations in the most general case since that allows us
to solve the greatest array of problems.)
Let’s start by exploring the easiest possible case of a span: when we have
only one vector in our spanning set.
1.2 Span 29
 
1
Example 4. Find the span of ~v = .
−2
Since we only have one spanning vector, our span is just the set of all
multiples of that vector. This means we have
      
1 1 a
Span = a = .
−2 −2 −2a
 
1
In other words, the span of is the set of all vectors in R2 whose
−2
second entries are −2 times their first entries.
To visualize this span geometrically, recall from Examples 9 and 10 in 1.1
that positive multiples of a vector keep the direction of that vector but scale
its length, while negative multiples of a vector scale its length and reverse its
direction. This gives us the picture below.

��

-4 -3 -2 -1 1 2 3

-�

-2

This shows that the span of this vector


 is the entire line through the origin
1
which points in the same direction as .
−2

Next, let’s step up one level of complexity and add a second spanning
vector.
30 Vectors
 
 
1 4
Example 5. Find the span of and .
−2 4
By definition, our span is all vectors ~b which can be written as
     
~b = a1 1 + a2 4 = a1 + 4a2
−2 4 −2a1 + 4a2

for some a1 and a2 .


This algebraic description of the span is harder to interpret than the
previous example, so let’s turn our attention to what the span looks like
geometrically.
  From our discussion in Example 4, we know that vectors  of the
1 1
form a1 all lie along the line through the origin defined by and
−2   −2
4
that vectors of the form a2 all lie along the line through the origin defined
4
   
4 1
by . Therefore any vector in the span is the sum of a vector along ’s
4 −2
 
4
line and a vector along ’s line.
4
Drawing those two lines gives us the following picture.


����  
-�

2 �
����  

-2 2 4

-2

We saw in 1.1 that we can visualize the sum of two vectors as the diagonal
of the parallelogram formed by the two vectors we’re adding. Thus our span
1.2 Span 31

consists of all points in the plane which can be thought of as the fourth corner
of a parallelogram with one corner at the origin and two sides along the two
lines defined by our two spanning vectors. This is shown for a generic point
in the plane in the picture below.

-2 2 4

-2

Hopefully you can convince yourself that no matter which point in the
plane we pick, we can always 
draw a parallelogram
  to show it is an element
1 4 2
of the span. This means Span , =R .
−2 4

   
2 −1
Example 6. Find the span of and .
8 −4
We know
          
2 −1 2 −1 2a1 − a2
Span , = a1 + a2 = .
8 −4 8 −4 8a1 − 4a2

As in the previous example, this algebraic description is not very illuminating,


so we turn
 to a geometric interpretation. We can visualize
 all vectors of the
2 2
form a1 as the line through the origin along and all vectors of the
8 8
   
−1 −1
form a2 as the line through the origin along .
−4 −4
32 Vectors

10


4 �

-6 -4 -2 2 4 6 8

-�
-� -2

-4

This picture shows us that our two vectors actually lie on the same line
through the origin! Therefore their span is simply that line, i.e.,
       
2 −1 2 −1
Span , = Span = Span .
8 −4 8 −4

Notice that the previous example shows that it is possible for multiple
different spanning sets to have the same span.
While geometric descriptions are fairly straightforward in R2 , as we work
with larger vectors we may start to rely more on the algebraic description
from the definition. (This is certainly true for Rn with n > 3!)
   
1 −1
Example 7. Describe the span of ~v1 = 1 and ~v2 =  0 .
1 −1
By definition, Span{~v1 , ~v2 } = {a1~v1 + a2~v2 } for all scalars a1 and a2 .
Plugging in our two vectors gives us
          
 1 −1   a1 −a2   a1 − a2 
Span{~v1 , ~v2 } = a1 1 + a2  0  = a1  +  0  =  a1  .
     
1 −1 a1 −a2 a1 − a2

Since we can let a1 be whatever number we like, it is clear that our middle
1.2 Span 33

entry can be anything. We also get a free choice of a2 , so we can make a1 − a2


be anything we want as well. This means our span is any 3-vector whose first
and last entries are the same, i.e.,
     
 1 −1   x 
Span 1 ,  0  = y  ,
   
1 −1 x

which can be visualized geometrically as the plane z = x. Notice that since


our spanning vectors were from R3 , their span is a subset of R3 .

We can also use spans to model practical problems, even those which may
not initially look like they have anything to do with linear combinations.

Example 8. Suppose we want to swap produce at a farmer’s market.


There are four goods available: bacon, beans, cornmeal, and turnips. We can
exchange 4 lbs of turnips for 1 lb of bacon, 4 lbs of beans for 3 lbs of cornmeal,
and 1 lb of turnips for 1 lb of beans. What possible trades can we make?

Before we can even start thinking about how spans might be relevant here,
we need to reinterpret this situation using vectors. We have four different
goods, so we will use 4-vectors whose entries give the quantity, in pounds,
of the various goods. I’ll use the first entry for bacon, the second for beans,
the third for cornmeal, and the fourth for turnips. Normally it would be very
strange to have a negative quantity of something like bacon, but here we will
view negative quantities as how much of a good we are giving to someone else
and positive quantities as how much we are getting.
Using this interpretation of positive vector entries as getting and negative
entries as giving, we can model an accepted exchange as a 4-vector. We know
that we can give someone 4 lbs of turnips
 in exchange for getting 1 lb of bacon,
1
0
which we can now view as the vector  
 0 . (If you’re worried that I chose to
−4
view exchanging turnips and bacon as giving turnips and getting bacon rather
than the other way around, we’ll get to that in the next paragraph where we
discuss the scalars in our linear combinations.) Similarly, giving someone 4
0
−4
lbs of beans and getting 3 lbs of cornmeal has the vector  
 3 , and giving
0
 
0
1
someone 1 lb of turnips and getting 1 lb of beans has the vector  
 0 .
−1
34 Vectors

Multiplying one of these exchange vectors by −1 simply changes the


signs of all the entries, i.e., which goods are being given or gotten. In other
words, multiplying
 by  −1changes the direction of the exchange. For example,
1 −1
0 0
the vector −1    
 0  =  0  means you’re giving someone 1 lb of bacon
−4 4
and getting 4 lbs of turnips, the opposite direction of our initial model of
exchanging turnips for bacon. Multiplying an exchange vector by a positive
scalar changes the amounts of the goods involved, but keeps the  ratio
 of  their

1 3
0  0 
quantities, their “exchange rate”, the same. For example, 3   
0= 0 

−4 −12
means we’re giving 12 lbs of turnips and getting 3 lbs of bacon, but we’re
still giving 4 times as much turnips as we get bacon. Putting these two ideas
together means that our scalar multiples allow us to trade different amounts
of goods in both directions, but keeps the relative values of the goods the
same.
Using this model, we can view each linear combination of our exchange
vectors as an overall trade at the market which is made up of some combination
of our three given exchange ratios. For example, suppose we arrive at the
market with 8 lbs of beans and swap 4 lbs of beans for 3 lbs of cornmeal,
swap another 4 lbs of beans for 4 lbs of turnips, and thenswap  2 lbs of our
0 0
−4 −4
turnips for 1/2 lb of bacon. Our first exchange is    
 3  = 1  3 . Our second
    0  0  
0 0 1/2 1
−4 1  0  0
exchange is      
 0  = −4  0 . Our final exchange is  0  = (1/2)  0 
 

4 −1 −2 −4
This means our overall exchange vector is the linear combination
       
1 0 0 1/2
0 −4  1   −8 
(1/2)        
 0  + 1 3  − 4 0  =  3 
−4 0 −1 2
which correctly tells us overall we exchanged our 8 lbs of beans for 1/2 lb of
bacon, 3 lbs of cornmeal, and 2 lbs of turnips.
If we view our possible overall trades as exchange
  vectors,
 the set of all
1 0 0
 0  −4 1
possible trades is therefore the span of      
 0 ,  3 , and  0 .
−4 0 −1
1.2 Span 35

One basic question we can ask about spans is whether or not a given
vector ~b is in the span of a set of vectors ~v1 , . . . , ~vk . Geometrically, this can be
reinterpreted as asking whether we can get to the point in Rn corresponding to
~b via a combination of movements along the vectors ~v1 , . . . , ~vk . Practically, this
can be reinterpreted as asking whether we can create the output represented
by ~b using only the inputs represented by the ~vi ’s. If ~b is not the same size as
the ~v ’s then the answer is clearly no, since a linear combination of vectors has
the same size as the vectors in the spanning set. If ~b is the same size as the ~v ’s,
then we can answer this question by trying to find scalars a1 , . . . , ak so that
a1~v1 + · · · + ak~vk = ~b. Since we’re used to solving for x, this type of equation
is often written x1~v1 + · · · + xk~vk = ~b to help remind us which variables we’re
solving for. These equations come up a lot, so they have their own name.

Definition. A vector equation is an equation of the form x1~v1 +· · ·+xk~vk = ~b


where ~v1 , . . . , ~vk , ~b are n-vectors and we are solving for scalars x1 , . . . , xk .

 
     
2 1 0 6
Example 9. Solve the vector equation x1 −1 +x2 −1 +x3 2 = −9.
3 0 1 1
We can start by combining the left-hand side of our equation into a single
vector to give us    
2x1 + x2 6
−x1 − x2 + 2x3  = −9 .
3x1 + x3 1
Setting corresponding entries equal gives us three equations: 2x1 + x2 = 6,
−x1 − x2 + 2x3 = −9, and 3x1 + x3 = 1. We can use the first and third
equations to solve for x2 and x3 in terms of x1 , which gives us x2 = 6 − 2x1
and x3 = 1 − 3x1 . Plugging these back into our middle equation above gives
−x1 − (6 − 2x1 ) + 2(1 − 3x1 ) = −9, which simplifies to −5x1 − 4 = −9 or
x1 = 1. This means x2 = 6 − 2 = 4 and x3 = 1 − 3 = −2. Therefore the
solution to our vector equation is x1 = 1, x2 = 4, and x3 = −2.

Since ~b is in Span{~v1 , . . . , ~vk } exactly when we can find scalars a1 , . . . , ak


so that a1~v1 + · · · + ak~vk = ~b, figuring out whether or not ~b is in the span is
equivalent to deciding whether or not the vector equation x1~v1 +· · ·+xk~vk = ~b
has a solution.
     
1 3 0
Example 10. Is ~b = 2 in the span of ~v1 = 0 and ~v2 = −1?
3 1 0
36 Vectors

We can answer this question by trying to solve the vector equation


     
3 0 1
x1 0 + x2 −1 = 2 .
1 0 3

As in the previous example, we can simplify the left-hand side of this equation
to get    
3x1 1
−x2  = 2 .
x1 3
Setting corresponding entries equal gives us the three equations 3x1 = 1,
−x2 = 2, and x1 = 3. Unfortunately, the first equation says x1 = 13 , while
the third equation says x1 = 3. Since we can’t satisfy both of these conditions
at once, there is no solution to our vector equation. Therefore ~b is not in the
span of ~v1 and ~v2 .

Notice that this vector equation version of checking whether or not a vector
is in a span doesn’t require any geometric understanding of what that span
looks like, so it is particularly useful in Rn for n > 3.
Asking whether or not a vector is in a span also comes up in practical
situations, as in the example below.

Example 11. With the same basic setup of bartering goods from Example
8, can we trade 3 lbs of cornmeal and 4 lbs of turnips for 1/2 lb of bacon and
6 lbs of beans?

Sincewe saw
 in Example
 8that the set of all possible exchanges is the
1 0 0
 0  −4 1
span of      
 0 ,  3 , and  0 , this is really just asking if our trade is in
−4 0 −1
this span. Giving 3 lbs of cornmeal and 4 lbs of turnips and getting 1/2 lb
1/2
 6 
of bacon and 6 lbs of beans corresponds to the vector  
 −3 . Therefore this
−4
     
1/2 1 0
 6   0  −4
question can be rephrased as asking if      
 −3  is in the span of  0 ,  3 ,
  −4 −4 0
0
1
and  
 0 .
−1
1.2 Span 37

We can figure this out using the vector equation


       
1 0 0 1/2
0 −4 1  6 
x1        
 0  + x2  3  + x3  0  =  −3  .
−4 0 −1 −4

Simplifying the left-hand side gives us


   
x1 1/2
−4x2 + x3   6 
 = 
 3x2   −3  .
−4x1 − x3 −4

Setting corresponding entries equal gives us the four equations x1 = 1/2,


−4x2 +x3 = 6, 3x2 = −3, and −4x1 −x3 = −4. The first equation gives us the
value of x1 , and we can use the third equation to see that x2 = −1. Plugging
x2 = −1 into the second equation and solving for x3 gives us 4 + x3 = 6, so
x3 = 2. Finally, we can check that these values satisfy the fourth equation.
Since we have a solution to our vector equation, our vector is in the span
of possible exchanges. This means that it is possible to trade 3 lbs of cornmeal
and 4 lbs of turnips for 1/2 lb bacon and 6 lbs of beans.

Now that we have an initial understanding of spans, let’s look at a few


of their nice properties. These will often be quite helpful, so we’ll happily
generalize them in later chapters. The first of these properties is that the zero
vector is always in any span. To see this, just choose zero as the coefficient
on every spanning vector. The second property is that spans are closed under
addition. This means that if we take any two vectors from a span and add
them together, that sum will also be in our span. To see this, suppose our
spanning vectors are ~v1 , . . . , ~vk . Our two vectors in the span of the ~v s can
therefore be written w ~ = a1~v1 + · · · + ak~vk and ~u = b1~v1 + · · · + bk~vk for some
scalars a1 , . . . , ak and b1 , . . . , bk . Their sum is
~ + ~u = a1~v1 + · · · + ak~vk + b1~v1 + · · · + bk~vk
w
= (a1~v1 + b1~v1 ) + · · · + (ak~vk + bk~vk )
= (a1 + b1 )~v1 + · · · + (ak + bk )~vk
which is clearly in the span of ~v1 , . . . , ~vk . Finally, the third property is that
spans are closed under scalar multiplication. This means that if we take any
vector from our span and multiply it by any scalar we’ll get a vector which is
also in our span. To see this, again suppose our spanning vectors are ~v1 , . . . , ~vk .
Our vector in the span must have the form w ~ = a1~v1 + · · · + ak~vk for some
scalars a1 , . . . , ak . If we multiply this vector by a scalar r, we get
~ = r(a1~v1 + · · · + ak~vk ) = ra1~v1 + · · · + rak~vk
rw
which is clearly still in the span of ~v1 , . . . , ~vk .
38 Vectors

These three properties are important characteristics of Rn , i.e., n-


dimensional space, so we’ll use them to make the following definition.

Definition. A subset of Rn is a subspace if it contains ~0, is closed under


addition, and is closed under scalar multiplication.

Example 12. The span of any set of vectors in Rn is a subspace of Rn .

We showed in the previous paragraph that any span in Rn satisfies the


three properties to be a subspace.

Exercises 1.2.
    
1 −1 2
1. Compute the linear combination 2 +7 − .
−3 0 5
     
1 4 1
2. Compute the linear combination 4 2 − 2 −1 + 5  0 .
3 6 −1
3. Compute the linear combination
       
2 0 3 5
x1 + x2 + x3 + x4 .
1 −4 −1 0
     
−2 3 1
4. Compute the linear combination x1  4  + x2 −6  + x3 −1 .
1 0 −2
     
−2 4 0
5. Find a value of h so that x1 + x2 = .
h −3 1
     
1 0 4
6. Find a value of k so that x1 −1 + x2 k  = −5.
2 5 3
7. Rust is formed by an initial reaction of iron, F e, oxygen gas, O2 ,
and water, H2 O to form iron hydroxide, F e(OH)3 . As in Chapter
0’s Example 1, we can write each molecule in this reaction as
a vector which counts the number of each type of atom in that
molecule. Let’s agree that the first entry in our vectors will count
iron atoms (F e), the second will count oxygen (O), and the third
will count hydrogen (H). If our reaction combines x1 molecules of
iron, x2 molecules of oxygen, and x3 molecules of water to create x4
molecules of rust, give the vector equation of this chemical reaction.
8. Sodium chlorate, N aClO3 , is produced by electrolysis of sodium
chloride, N aCl, and water, H2 O. The reaction also produces
hydrogen gas, H2 . As in Chapter 0’s Example 1, we can write
each molecule in this reaction as a vector which counts the number
1.2 Span 39

of each type of atom in that molecule. Let’s agree that the first
entry in our vectors will count sodium atoms (N a), the second will
count chlorine (Cl), the third will count oxygen (O), and the fourth
will count hydrogen (H). If our reaction combines x1 molecules of
sodium chloride with x2 molecules of water to produce x3 molecules
of sodium chlorate and x4 molecules of hydrogen gas, give the vector
equation of this chemical reaction.
     
1 3 1
9. Does the span of −1 and 1 include the vector 3?
2 4 5
     
−2 6 0
10. Does the span of  0  and 3 include the vector 1?
5 9 8
       
1 
 1 2 −3 
1       
−3
    0
0
11. Is ~b =  
1 in Span −1 ,  0  ,  6 ?

 

1 2 1 −2
       
3  2 1 −1 
12. Is ~b = 0 in Span −2 , 1 ,  0  ?
 
3 0 1 1
 
4
13. Sketch a picture of the span of .
−2
   
2 1
14. Sketch a picture of the span of and .
0 1
   
2 1
15. Sketch a picture of the span of 0 and 1.
0 0
     
1 0 −4
16. Do  0 , 1, and  0  span all of R3 ? Briefly say why or why
−1 2 6
not.
     
1 2 0
17. Do 0, 0 and 1 span all of R3 ? Briefly say why or why not.
1 2 1
18. Develop a strategy to figure out whether or not a set of 4-vectors
span all of R4 .
19. Using the setup of Example 3, is it possible to order a combination
of specials so that the cook needs to make 12 eggs, 10 strips of bacon,
6 sausage links, 10 pieces of toast, and 5 orders of hash browns? If
it is possible, explain how. (Remember that the coefficients in your
solution must be positive whole numbers!)
40 Vectors

20. Using the setup of Example 8, can you trade 3 lbs of bacon and 8
lbs of beans for 3 lbs of cornmeal and 16 lbs of turnips?
21. Is the graph of the line y = x + 1 a subspace of R2 ?
 
 x 
22. Is W =  0  a subspace of R3 ?
 
z
23. What is the smallest subspace of Rn ? What is the largest subspace
of Rn ?
24. Let S be the subset of R3 consisting of the x-axis, y-axis and z-axis.
Show that S is not a subspace of R3 .
1.3 Linear Independence 41

1.3 Linear Independence


In this section we’ll develop a way to figure out how big the span of a given
set of vectors is. Since we’ll answer that question by giving the dimension of
the span, we first need to discuss the idea of dimension. Most of us have a
geometric idea of dimension as long as we stay in R2 (the plane) or R3 (3D
space). This tells us that a line has dimension 1, a plane has dimension 2, and
a three-dimensional object like a solid ball has dimension 3.
One way of articulating this geometric notion of dimension is to say that a
k-dimensional object is one that contains k possible independent directions of
travel from any given point where by “independent” we mean the different
directions are all at right angles to each other. (Think of up/down and
right/left in a plane or up/down, right/left, forward/back in 3-space.) We’ll
develop a more linear algebra flavored definition of dimension which can
be applied much more broadly (including in dimensions greater than 3) in
Chapter 3.
If we are working in R2 or R3 , this visual idea of dimension allows us to
figure out the dimension of a span by computing the span and drawing a
picture. However, in Rn for n > 3 this approach won’t work, so we’d like to
develop a way to figure out the dimension of a span without using pictures.
This alternate strategy will also help with problems in R2 or R3 where drawing
a picture becomes overly complicated.
We saw in Examples 4 and 5 of 1.2 that the dimension of a span sometimes
equals the number of vectors in our spanning set. However, Example 6 shows
us that this isn’t always true. It shouldn’t be too hard to convince yourself
that the dimension of a span can’t be larger than the number of spanning
vectors, i.e., the dimension of Span{~v1 , . . . , ~vk } is at most k. If we think of
our spanning vectors as giving possible directions of travel in Rn , we can see
that there are at most k possible independent directions included in the span.
Thus dim(Span{~v1 , . . . , ~vk }) ≤ k.
This reduces our problem of finding the dimension of a span to figuring out
how to tell when the dimension is less than the number of spanning vectors.
Since we saw that in Example 6 of 1.2, let’s think about what happened there
to see if we can identify and generalize what happened in that situation. In
that example, our two vectors lay along the same line through the origin. In
other words, the second vector lay along the line spanned by the first vector.
This kept their span as a one-dimensional line instead of expanding it to a
two-dimensional plane. We explore a similar situation with more vectors in
the next example.
     
1 1 2
Example 1. Give a geometric description of the span of 0, 1, and 3.
0 0 0
42 Vectors

From the definition of the span we get


            
 1 1 2   1 1 2 
Span 0 , 1 , 3 = x1 0 + x2 1 + x3 3
   
0 0 0 0 0 0

which simplifies to  
 x1 + x2 + 2x3 
 x2 + 3x3  .
 
0
We can choose any values we like for x1 , x2 , and x3 , so we can make the first
two entries of our vector
  equal whatever we want. To see this, suppose we
a
want to get the vector  b . We can do this by choosing x1 = a − b, x2 = b,
0
and x3 = 0 to get
     
x1 + x2 + 2x3 (a − b) + b + 2(0) a
 x2 + 3x3  =  b + 3(0)  = b
0 0 0

as desired. However, no matter what scalars we choose, we’ll always get 0
 a 
as our third entry. Therefore geometrically, this span is the plane  b 
 
0
inside R3 . Notice that even though we have three spanning vectors, our span
is two-dimensional not three-dimensional.

In the previous example, none of the vectors lay along the lines spanned
by the other vectors. You can check this geometrically using a 3D picture or
computationally by noticing that none of them is a multiple of one of the
other vectors. However, the third vector is in the plane spanned by the first
two vectors which we can check computationally by noticing that
     
1 1 2
−1 0 + 3 1 = 3 .
0 0 0

This made the span two-dimensional instead of three-dimensional. We can


generalize this idea by saying that the span of ~v1 , . . . , ~vk has dimension less
than k only if some of the spanning vectors don’t add anything to the overall
span because they are already in the span of the other vectors. This motivates
the following definition.
1.3 Linear Independence 43

Definition. The vectors ~v1 , . . . , ~vk in Rn are linearly dependent if one of


the vectors is in the span of the others. Otherwise ~v1 , . . . , ~vk are linearly
independent.

    
2 1 3
Example 2. Show that ~v1 = −1, ~v2 = 1, and ~v3 = 0 are linearly
3 1 4
dependent.

Here we can observe that ~v1 + ~v2 = ~v3 . This means ~v3 is an element of
Span {~v1 , ~v2 }. Since one vector is in the span of the other two, these three
vectors are linearly dependent.

This definition is great for showing a set of vectors is linearly dependent,


but can be harder to use if we want to show they are linearly independent.
To give ourselves a more computational option to check linear independence
or dependence, we use the following theorem.

Theorem 1. The vectors ~v1 , . . . , ~vk in Rn are linearly dependent if the vector
equation x1~v1 + · · · + xk~vk = ~0 has a solution where at least one of the xi s is
nonzero. Otherwise ~v1 , . . . , ~vk are linearly independent.

To see that this is equivalent to our definition, suppose we have ~v1 , . . . , ~vk
in Rn and that one of these vectors is in the span of the others. Let’s renumber
our vectors if necessary so that ~v1 is in Span{~v2 , . . . , ~vk }. This means that ~v1
is a linear combination of ~v2 , . . . , ~vk , so we can find scalars a2 , . . . , ak for which

~v1 = a2~v2 + · · · + ak~vk .

Subtracting ~v1 from both sides of this vector equation gives us


~0 = −~v1 + a2~v2 + · · · + ak~vk .

The coefficient on ~v1 is −1, so we have a solution to the vector equation


x1~v1 + · · · + xk~vk = ~0 where at least one coefficient is nonzero. Therefore if
~v1 , . . . , ~vk are linearly dependent according to our first definition, they are
linearly dependent according to our second definition.
Now suppose that we know the vector equation x1~v1 + · · · + xk~vk = ~0 has
a solution with at least one nonzero coefficient. This means we have scalars
a1 , . . . , ak with
a1~v1 + · · · + ak~vk = ~0
and at least one ai nonzero. Again, let’s renumber our vectors if necessary so
that a1 6= 0. Subtracting a1~v1 from both sides of our equation gives us

a2~v2 + · · · + ak~vk = −a1~v1 .


44 Vectors

Multiplying both sides of this equation by − a11 gives us


a2 ak
− ~v2 − · · · − ~vk = ~v1 .
a1 a1
This shows us that ~v1 is in the span of the other vectors. Therefore if ~v1 , . . . , ~vk
are linearly dependent according to our second definition, they are linearly
dependent according to our first definition.
We’ve now shown that our two definitions of linear dependence are the
same. Since linear independence was defined to be “not linearly dependent”,
this means that our two definitions of linear independence are also the same.
     
1 2 1
Example 3. Show that 0, 3, and  1  are linearly independent.
1 0 −1
Let’s show this using our second definition, which says our vectors are
linearly independent if the vector equation
       
1 2 1 0
x1 0 + x2 3 + x3  1  = 0
1 0 −1 0

can only be solved by letting x1 = x2 = x3 = 0.


We can simplify the left-hand side of this equation to get
   
x1 + 2x2 + x3 0
 3x2 + x3  = 0 .
x1 − x3 0

Setting corresponding entries equal to each other gives us the three equations
x1 + 2x2 + x3 = 0, 3x2 + x3 = 0, and x1 − x3 = 0. The third equation says
x1 = x3 . Solving the second equation for x2 gives us x2 = − 13 x3 . Plugging
both of those into the first equation gives us x3 + 2(− 13 x3 ) + x3 = 0 which
simplifies to 43 x3 = 0. This means x3 = 0, so x1 = x3 = 0 and x2 = − 13 x3 = 0.
Since our only solution is to have all coefficients equal to 0, our three vectors
are linearly independent.

    
2 1 5
Example 4. Are −1, 1, and −4 linearly independent or linearly
3 1 8
dependent?

If you have a particularly clever moment, you might observe that


     
2 1 5
3 −1 + (−1) 1 = −4 .
3 1 8
1.3 Linear Independence 45

This means our third vector is in the span of the other two, so our three
vectors are linearly dependent.
However, I am not confident that I’d notice that relationship between these
vectors immediately. In that case, we can always solve the vector equation
       
2 1 5 0
x1 −1 + x2 1 + x3 −4 = 0
3 1 8 0

to see if there is a solution with at least one nonzero coefficient.


This equation can be simplified to
   
2x1 + x2 + 5x3 0
−x1 + x2 − 4x3  = 0
3x1 + x2 + 8x3 0

which gives us three equations: 2x1 + x2 + 5x3 = 0, −x1 + x2 − 4x3 = 0, and


3x1 + x2 + 8x3 = 0. Subtracting our first equation from our third equation
gives us x1 + 3x3 = 0, so x1 = −3x3 . Plugging that into our second equation
gives us 3x3 + x2 − 4x3 = 0, which simplifies to x2 − x3 = 0, so x2 = x3 .
However, no matter which of our three original equations we plug x1 = −3x3
and x2 = x3 into, the terms all cancel out. This means any value of x3 will
work. (In particular we can have x3 = −1 so x2 = −1 and x1 = 3 to get the
linear combination at the start of this example.) Since it is possible to have
x3 6= 0, these three vectors are linearly dependent.

This is a good example of why having two different but equivalent ways
to check something can be very helpful!
Going back to our original motivation for defining linear independence and
linear dependence, we can now state the following.

Theorem 2. Let ~v1 , . . . , ~vk be in Rn . Span{~v1 , . . . , ~vk } is k-dimensional if


and only if ~v1 , . . . , ~vk are linearly independent.

If our spanning set is linearly independent, this allows us to find the


dimension of the span by simply counting the number of spanning vectors.
This doesn’t require pictures, so it can be used in Rn for any n. This provides
an alternate way to compute the dimension of the span from Example 5 in
1.2.
   
1 4
Example 5. Find the dimension of the span of and .
−2 4
From the theorem above, we know that if our two spanning vectors are
linearly independent, the span has dimension 2. If they are linearly dependent,
the span has dimension less than 2.
46 Vectors

Checking linear independence or dependence means solving the vector


equation      
1 4 0
x1 + x2 =
−2 4 0
which can be simplified to
   
x1 + 4x2 0
= .
−2x1 + 4x2 0

This gives us the equations x1 + 4x2 = 0 and −2x1 + 4x2 = 0. The first
equation tells us x1 = −4x2 . Plugging this into the second equation, we get
8x2 + 4x2 = 0 or 12x2 = 0. This means x2 = 0, so x1 = 0 as well.
Since our only solution was both coefficients equal to zero, our two span-
ning vectors are linearly independent. This means their span has dimension
2, which matches up with the dimension of the picture we drew in 1.2.

Looking at Example 6 from 1.2 where our two spanning vectors are linearly
dependent shows us Theorem 1’s other conclusion.
   
2 −1
Example 6. Find the dimension of the span of and .
8 −4
From the theorem above, we know that if our two spanning vectors are
linearly independent, the span has dimension 2. If they are linearly dependent,
the span has dimension less than 2.
Checking linear independence or dependence means solving the vector
equation      
2 −1 0
x1 + x2 =
8 −4 0
which can be simplified to
   
2x1 − x2 0
= .
8x1 − 4x2 0

This gives us the equations 2x1 − x2 = 0 and 8x1 − 4x2 = 0. The first equation
tells us x2 = 2x1 . Plugging this into the second equation, we get 8x1 −8x1 = 0
or 0 = 0. This means we can have any value of x1 as long as we set x2 = 2x1 .
Since we have solutions where the coefficients don’t equal zero, our two
spanning vectors are linearly dependent. This means their span has dimension
less than 2, which again matches up with the dimension of the picture we drew
in 1.2.

If our set of vectors is linearly dependent, we can still use Theorem 1


to find the dimension of their span. During the check that our vectors are
linearly dependent, we will have identified at least one which is in the span of
the others (or has a nonzero coefficient in our other definition). That vector
1.3 Linear Independence 47

doesn’t contribute anything new to the overall span, so we can remove it


from our spanning set without changing either the span or its dimension. If
our new smaller spanning set is now linearly independent, we can find the
dimension by counting our new set of spanning vectors. If not, we repeat our
identification and removal of a redundant spanning vector. Since we started
with finitely many vectors, this process will eventually stop, and the dimension
of the original span will be the number of remaining spanning vectors.
     
1 1 2
0 0 0
Example 7. Find the dimension of the span of  
1,
 , and  .
1 0
0 1 2
We have three spanning vectors, so the dimension of their span is at most
3. If our vectors are linearly independent, their span is three-dimensional. If
they are linearly dependent, their span has dimension less than 3, and we can
use the algorithm outlined above to find the dimension. Let’s start by checking
whether our vectors are linearly independent or linearly dependent.
Since our spanning vectors are in R4 , we can’t tackle this problem
geometrically, so let’s look at the vector equation
       
1 1 2 0
0 0 0 0
x1        
1 + x2 1 + x3 0 = 0 .
0 1 2 0

This simplifies to    
x1 + x2 + 2x3 0
 0  0
   
 x1 + x2  = 0
x2 + 2x3 0
which gives us x1 + x2 + 2x3 = 0, 0 = 0, x1 + x2 = 0, and x2 + 2x3 = 0. The
third equation tells us x1 = −x2 , and the fourth equation tells us x3 = − 12 x2 .
Plugging these into the first equation gives us −x2 + x2 − x2 = 0 so x2 = 0.
This also means x1 = 0 and x3 = 0.
Since all three coefficients in our vector equation must be 0, our vectors
are linearly independent which means their span has dimension 3.

     
4 6 −3
2 6 0
Example 8. Find the dimension of the span of  
 0 ,
 , and  .
0 0
−1 1 2
As in the previous example we have three spanning vectors, so the
maximum dimension of this span is 3. Again, we’ll start by determining if
these vectors are linearly independent or linearly dependent using the vector
48 Vectors

equation        
4 6 −3 0
2 6  0  0
x1        
 0  + x2 0 + x3  0  = 0 .
−1 1 2 0
This simplifies to    
4x1 + 6x2 − 3x3 0
 2x1 + 6x2  0
 = 
 0  0
−x1 + x2 + 2x3 0
which gives us the equations 4x1 + 6x2 − 3x3 = 0, 2x1 + 6x2 = 0, 0 = 0,
and −x1 + x2 + 2x3 = 0. The second equation gives us x1 = −3x2 . Plugging
this into the fourth equation gives us 3x2 + x2 + 2x3 = 0 so 4x2 + 2x3 = 0
or x3 = −2x2 . Plugging x1 = −3x2 and x3 = −2x2 into any of our original
equations simplifies down to 0 = 0, so x2 can be whatever we choose. In
particular, we can choose to have x2 6= 0, so our three vectors are linearly
dependent. This means their span has dimension less than 3.
Following our algorithm, we now need to remove one of our spanning
vectors which is in the span of the other two. The clue here is that this vector
will have a nonzero coefficient in our vector equation. Since we said we could
choose any value for x2 , let’s remove our second vector. (Actually if x2 6= 0
we also have x1 and x3 nonzero, so here we could also have chosen to remove
either of the
 other two vectors as  well.)
  
6 4 −3
6 2 0
Since      
0 is in the span of  0  and  0 , we know
1 −1 2
         
 4
 6 −3    4
 −3  
  2  6  0     2   0 
Span   ,   ,   = Span   ,   .
 0  0  0   0   0 

  
 
   
−1 1 2 −1 2
In particular, we know these two spans have the same dimension. To find
this dimension, let’s check whether our remaining two spanning vectors are
linearly independent or linearly dependent by solving the vector equation
     
4 −3 0
2  0  0
x1      
 0  + x2  0  = 0 .
−1 2 0
This simplifies to    
4x1 − 3x2 0
 2x  0
 1 = 
 0  0
−x1 + 2x2 0
1.3 Linear Independence 49

which gives us the equations 4x1 −3x2 = 0, 2x1 = 0, 0 = 0, and −x1 +2x2 = 0.
The second equation tells us that x1 = 0, and plugging that back into the
fourth equation tells us x2 = 0. (This also satisfies the first equation.) Since
our only solution is to have both coefficients equal to 0, these two vectors are
linearly independent and
 that
  their
 span  has dimension 2.
4 6 −3
 2  6 0
Thus the span of      
 0 , 0, and  0  is two-dimensional. (Geometri-
−1 1 2
cally, this means the span is a plane inside R4 , which is fun to think about
even if we can’t draw a good picture of it.)

Exercises 1.3.
     
1 3 1
1. Are −1, 0, and 2 linearly independent or linearly depen-
1 6 4
dent?
     
2 1 3
2. Are 2,  0 , and 2 linearly independent or linearly depen-
2 −1 0
dent?
     
2 3 1
1 0 0
3. Are      
0, 2, and 2 linearly independent or linearly dependent?
1 1 0
     
−1 3 2
 1  0 4
4. Are      
 0 , 1, and 2 linearly independent or linearly depen-
0 3 6
dent?
5. Are the two vectors pictured below linearly independent or linearly
dependent?
50 Vectors
6

-2 2 4 6

-2

6. Are the two vectors pictured below linearly independent or linearly


dependent?

-2 2 4

-2

7. (a) Sketch two vectors in R2 which are linearly independent.


(b) Sketch two vectors in R2 which are linearly dependent.
8. (a) Sketch three vectors in R3 which are linearly independent.
(b) Sketch three vectors in R3 which are linearly dependent.
     
3 5 4
1 −2 5
9. Let ~v1 =  , ~v2 =  , and ~v3 =  
     .
0 1 −1
−1 2 2
(a) Are ~v1 , ~v2 , ~v3 linearly dependent or linearly independent? Show
work to support your answer.
(b) What does your answer to (a) tell you about the dimension of
Span{~v1 , ~v2 , ~v3 }?
     
1 2 0
−1 −1 −1
10. Let ~v1 =      
 0 , ~v2 =  1 , and ~v3 =  1 .
0 0 0
1.3 Linear Independence 51

(a) Are ~v1 , ~v2 , ~v3 linearly dependent or linearly independent? Show
work to support your answer.
(b) What does your answer to (a) tell you about the dimension of
Span{~v1 , ~v2 , ~v3 }?
     
 −3 1 −2 
11. Find the dimension of Span  12  , −4 ,  8  .
 
9 −3 6
     

 −1 3 2 
      
1  ,   , 4 .
0
12. Find the dimension of Span   0  1 2

 
 
0 3 6
13. Briefly explain why two 3-vectors cannot span all of R3 .
14. Is it possible to span all of R6 with five 6-vectors, i.e., five vectors
from R6 ? Briefly say why or why not.
   
1 0
15. Use linear independence to decide whether or not  0 , 1, and
  −1 2
−4
 0  span all of R3 .
6
   
1 2
16. Use linear independence to decide whether or not 0, 0 and
  1 2
0
1 span all of R3 .
1
17. Briefly explain how you would use the idea of linear independence
to figure out whether or not a set of 4-vectors span all of R4 .
18. Using the setup of 1.2’s Example 8, are there trades that are
impossible? Explain why or why not using the idea of linear
independence. Why does your answer make sense from a practical
perspective?
2
Functions of Vectors

2.1 Linear Functions


Now that we’ve explored vectors in Rn in Chapter 1, let’s explore functions of
vectors. Think of this as a parallel to a first calculus course’s study of functions
from R to R after earlier mathematical explorations of the real numbers. By
functions of vectors, we simply mean functions whose inputs and outputs
are vectors instead of real numbers. Since both their inputs and outputs are
vectors, these are often called vector-valued functions. For example, we might
consider the following map from R2 to itself.
   1 
x1 x1
Example 1. f = 2 .
x2 2x2
This function acts on R2 by multiplying the first component of each vector
by 1/2 and the second component by 2.

Unlike a first calculus course, we don’t have to use the same kind of input
and output.
 
  x1
x1
Example 2. f = x2 .
x2
3
In this example our inputs are 2-vectors and our outputs are 3-vectors.
We can imagine a practical interpretation of this function as taking a position
vector on a 2D table top and changing it into a position vector in a 3D dining
room using the fact that the table top has a height of 3 feet.

In the previous example, as in many applications, we needed to use input


and output vectors of different sizes. This doesn’t really make our lives too
much more complicated, but it does mean we’ll need to be careful to think
about whether a vector makes sense as an input or an output (or neither) of a
given function f . To help keep this straight, we have special names for these
two spaces of vectors.

53
54 Functions of Vectors

Definition. Suppose f : Rn → Rm . The domain of f is Rn where n is the


size of the input vectors, and the codomain of f is Rm where m is the size of
the output vectors.

In other words, the set from which the function maps is its domain, and
the set it maps to is its codomain, as shown in Figure 2.1.

domain codomain

Figure 2.1: Visualizing a function

 
x1  
x2 
Example 3. Find the domain and codomain of f   = x1 + x4 .
x3  x2 − x3
x4
This function’s inputs are 4-vectors and its outputs are 2-vectors, so we
could write f : R4 → R2 . Therefore f ’s domain is R4 and its codomain is R2 .

Since these vector-valued functions are still functions, we can ask all the
same sorts of questions about them that we asked about the functions in
calculus. These include things like computing f (~x) for a given input vector,
solving f (~x) = ~b for ~x, and doing basic function operations like adding two
functions, multiplying a function by a scalar, doing function composition, and
finding inverse functions. We will eventually tackle all of these questions, both
for general mathematical interest and because we will need their answers to
solve practical problems involving vectors. However, we will not discuss all
vector-valued functions, but instead restrict our attention to a special class of
functions.
Recall that calculus quickly narrows its focus to study only continuous
functions. This is because calculus relies so heavily on limits of real numbers
that it wants functions to respect and play well with limits. In our case, we
don’t care about limits, but do care about addition and scalar multiplication
of vectors. Therefore, we will restrict our attention to functions that respect
and play well with vector addition and scalar multiplication. Putting this more
precisely, we define a linear function as follows.

Definition. A function f : Rn → Rm is linear if f (~v + ~u) = f (~v ) + f (~u) and


f (r · ~v ) = r · f (~v ) for all vectors ~v , ~u in Rn and all scalars r.
2.1 Linear Functions 55

Intuitively, this is saying that our function f respects addition and scalar
multiplication because we’ll get the same answer whether we add or scale our
vectors before or after applying the function f . This often gives us two choices
on how to tackle a computation, which can be very helpful if one is easier than
the other.
Additionally, a linear function preserves lines in the sense that if f is linear
then the image of a line in Rn is a line in Rm . To see this, recall that one way
to write the equation of a line in Rn is t~v + ~b where ~b is any vector on that
line, ~v is any vector parallel to that line, and t is any scalar. If we apply a
linear function f , then the image is the set of all points f (t~v + ~b), which by
linearity can be broken up to give f (t~v + ~b) = tf (~v ) + f (~b). This is also a line,
in particular the line in Rm containing the vector f (~b) and parallel to f (~v ).
Let’s move from the realm of theory to the realm of computations and look
at a few examples.
 
  x
x
Example 4. Show f : R2 → R3 by f = y  is linear.
y
0
This map is one of the three standard ways to map R2 into R3 as what is
often called the xy-plane. To check that it is a linear map, we need to check
the two conditions from the definition. To do that, we’ll need to write down  
2 x
two generic vectors ~v and ~u in R and a generic scalar r in R. I’ll use ~v = ,
  y
z
~u = .
w
First we need to check that f (~v + ~u) = f (~v ) + f (~u).
Computing the left-hand side gives us
 
      x+z
x z x+z
f (~v + ~u) = f + =f = y + w .
y w y+w
0

Computing the right-hand side gives us


     
    x z x+z
x z
f (~v ) + f (~u) = f +f = y  + w = y + w .
y w
0 0 0

Since our two answers are equal, it’s clear that f respects addition.
Next we need to check that f (r · ~v ) = r · f (~v ).
Computing the left-hand side gives us
 
     rx
x rx
f (r · ~v ) = f r · =f = ry  .
y ry
0
56 Functions of Vectors

Computing the right-hand side gives us


   
  x rx
x
r · f (~v ) = r · f = r · y  = ry  .
y
0 0

Again, since our answers are equal, it’s clear that f respects scalar multipli-
cation.
Since both conditions hold, f is a linear function.

   
x 2x − y
Example 5. Show g : R2 → R2 by g = is linear.
y x + 3y
   
x z
As in Example 4, I’ll use the generic vectors ~v = and ~u = and
y w
the scalar r to show that g respects addition and scalar multiplication.
Plugging ~v and ~u into the left-hand side of g(~v + ~u) = g(~v ) + g(~u) gives us
     
x z x+z
g(~v + ~u) = g + =g
y w y+w
   
2(x + z) − (y + w) 2x + 2z − y − w
= = .
(x + z) + 3(y + w) x + z + 3y + 3w

Computing the right-hand side gives us


       
x z 2x − y 2z − w
g(~v ) + g(~u) = g +f = +
y w x + 3y z + 3w
   
2x − y + 2z − w 2x + 2z − y − w
= = .
x + 3y + z + 3w x + z + 3y + 3w

Since our two answers are equal, it’s clear that g respects addition.
Computing the left-hand side of g(r · ~v ) = r · g(~v ) gives us
        
x rx 2(rx) − (ry) 2rx − ry
g(r · ~v ) = g r · =g = = .
y ry (rx) + 3(ry) rx + 3ry

Computing the right-hand side gives us


       
x 2x − y r(2x − y) 2rx − ry
r · g(~v ) = r · g =r· = = .
y x + 3y r(x + 3y) rx + 3ry

Again, since our answers are equal, g respects scalar multiplication.


Both our conditions hold, so g is a linear function.
2.1 Linear Functions 57
   2 
2 x
2 x
Example 6. Show h : R → R by h = is not linear.
y y−1
As in the previous two examples,   we start by checking whether or not
x z
h(~v + ~u) = h(~v ) + h(~u) using ~v = and ~u = .
y w
     
x z x+z
h(~v + ~u) = h + =h
y w y+w
 2
  2 
(x + z) x + 2xz + z 2
= = .
(y + w) − 1 y+w−1

However,
     2   2 
x z x z
h(~v ) + h(~u) = h +h = +
y w y−1 w−1
   2 
x2 + z 2 x + z2
= = .
y−1+w−1 y+w−2

These are clearly not equal (in both components!), so this function is not
linear.
If I were just concerned with checking whether or not h was linear, I’d
stop here. However, in the interest of practice, let’s check the other condition
as well using the same ~v and the scalar r.
        2 2 
x rx (rx)2 r x
h(r · ~v ) = h r · =h = = .
y ry (ry) − 1 ry − 1

Computing the right-hand side gives us


   2     
x x rx2 rx2
r · h(~v ) = r · h =r· = = .
y y−1 r(y − 1) ry − r

Again, these are not equal (in both components), so h doesn’t respect scalar
multiplication. This is also enough, even without the check on addition, to
show h isn’t linear.

Have you noticed a pattern in the functions above? One possible pattern is
that for both of our linear functions, each component of the output vector was
a linear combination of the variables from the input vector. Our function that
wasn’t linear had both an exponent and a constant term in its output vector’s
components. This is another, probably easier way to determine if a function
from Rn to Rm is linear. Why didn’t we adopt that as our definition of a
linear map? If we were only going to talk about Rn , we could have. However,
in 2.4 and Chapter 3, we’ll want to expand our focus to other types of spaces
and our formal definition will be easier to generalize to those situations than
58 Functions of Vectors

this second idea. In many areas of math and science you’ll see this happening;
collecting many ways of describing an idea and sorting through them later to
figure out which one ends up working best. That decision will depend heavily
on what you want to do later, so you may find yourself changing your mind.
One nice thing about studying an older mathematical subject like basic linear
algebra is that it has had time to settle down to a set of definitions that work
best for what we want to do.

Exercises 2.1.
     
−3 x1 x2
1. Compute f where f = .
5 x2 x1 + x2
   
1 x1  
−x3
2. Compute f 2 where f x2  = .
x1 − x2 + x3
3 x3
 
    0
8 x1 2x1 + 4x2 
3. Compute f where f =  1 x1 .

−2 x2 4
x1 + x2
   
6 x1  
 2  x2  x1 + x3 + x4
4. Compute f      
 0  where f x3  = −5x2 .
4x3 − x4
−1 x4
 
  2x1 − x2
x1
5. Give the domain and codomain of f = 0 .
x2
−x1 + 4x2
 
x1  
    x1 + x2 + x3
6. Give the domain and codomain of f x2 = .
x1 − x3
x3
 
x1  
x2  x2 − x4
  5x1 + x5 
7. Give the domain and codomain of f    
x3  =  −x3 .

x4 
x1 + x3
x5
 
x1  
x2  3x2 + x4
8.  
Give the domain and codomain of f   =  .
x3 −x1
x4
9. Find the formula of the function f : R2 → R2 which switches the
order of a 2-vector’s entries.
10. Find the formula of the function f : R2 → R3 which multiplies the
2.1 Linear Functions 59

first entry by −1, keeps the second entry the same, and has a zero
in the third entry.
   
x1 2x1 − x2
11. Show that f = is not a linear map.
x2 x1 + 2
 
  2y
x
12. Show that f =  x  is not a linear map.
y
1
   
x1 x2
13. Show that f = is a linear map.
x2 x1 + x2
 
  x2
x1
14. Show that f = x1 − x2  is a linear map.
x2
−x1
15. If f : Rn → Rm is linear, explain why f (~0n ) = ~0m where ~0n is
the zero vector in Rn and ~0m is the zero vector in Rm . (This can
be restated as saying that a linear function always maps the zero
vector of the domain to the zero vector of the codomain.)
16. If f : Rn → Rm is linear, explain why f (−~v ) = −f (~v ) for every ~v
in Rn . (This can be restated as saying that a linear function always
maps the additive inverse of ~v to the additive inverse of f (~v ).)
 
0
17. Use r = 2 and ~v = to show that the function f : R2 → R2
1/2
which reflects the plane across the line y = x + 1 is not a linear
function.
60 Functions of Vectors

2.2 Matrices
We saw in the last section that a linear map f from Rn to Rm had a certain
pattern to its vector of outputs: each entry was a linear combination of the
entries of the input vector. This means if we want to describe a particular linear
map to someone, we really only need to tell them three pieces of information:
the size of the input vectors, the size of the output vectors, and the coefficients
on each input entry that appear in each of our output entries. Instead of
writing down the whole function with all the variables, we can keep track of
of all three pieces of information by writing down our coefficients in a grid of
numbers called a matrix.

Definition.
 A matrix
 is an ordered grid of real numbers written
a11 a12 · · · a1n
 a21 a22 · · · a2n 
 
 .. .. . A matrix with m rows and n columns is called m×n.
 . . 
am1 am2 · · · amn

Note that when talking about matrices, our notation always puts row
information before column information. Thus an m × n matrix has m rows
and n columns, and aij is the entry in the ith row and jth column. As with
vectors, two matrices are equal exactly when they are the same size and all
corresponding pairs of entries are equal.
 
−3 0 1 8
Example 1. Find the size of A and give a32 where A =  2 −4 3 1.
5 −1 0 7
This matrix has three rows and four columns, so it is 3 × 4. The entry a32
is in the third row and second column, so a32 = −1.

Now that we have some understanding of what a matrix is and how to write
it down, let’s explore the idea of using matrices to record the coefficients from
our linear maps. In the first part of this section we’ll use algebraic techniques
and in the second part we’ll use geometry.

Example 2. What matrix should we use to encode the sizes of input and
output vectors and the coefficients used to build the function
 
  6x1 + 4x2
x1
f =  5x1 + 2x2 ?
x2
−3x1 + 7x2
2.2 Matrices 61

Let’s start by focusing on the coefficients on the variables in each entry


of our output vector. Those numbers already look like they form a grid if
we ignore the variables and plus signs. The rows of this grid pattern are the
entries of our output vector, and the columns are for our input variables x1
and x2 which are the entries of our input vector. Ifwe write
 down just these
6 4
coefficients in this grid pattern, we get the matrix  5 2.
−3 7
This matrix definitely tells us everything we need to know about the
coefficients (as long as we remember that the first column contains coefficients
of x1 and the second column contains coefficients of x2 ). As a bonus, it also
tells us the size of the input and output vectors. Each column came from
one of our variables, and those variables were the entries of our input vector.
Since our matrix has two columns, this means our inputs must be 2-vectors.
Similarly each row of our matrix came from the coefficients in one entry of
the output vector, so we can tell our output vectors are 3-vectors because our
matrix has three rows.

If we look at the pattern revealed in this last example, we get the following.

Definition. The matrix A corresponding to a linear map f : Rn → Rm is


the m × n matrix where aij is the coefficient of xj in the ith entry of f (~x).

Notice that the dimensions of our matrix are the sizes of the vectors from
the domain and codomain of f , but in reverse order. As we saw in the previous
example, this is because each column of f ’s matrix contains the coefficients
on one variable from ~x, so the number of columns is the same as the size of
a vector from f ’s domain. Conversely, each row of f ’s matrix contains the
coefficients from one entry of f (~x), so the number of rows is the same as the
size of a vector from f ’s codomain. In fact, this is a very important idea to
keep in mind: rows of f ’s matrix correspond to output entries while columns
correspond to input entries.
Finding the matrix of a linear function may seem complicated at first, but
after a few repetitions it will quickly become routine.
 
x1  
  −x1 + x3
Example 3. Find the matrix of the function f x2 = .
4x1 − x2 + 3x3
x3
The input vector has three entries, so our matrix A must have three
columns. The output vector has two entries, so our matrix A must have two
rows. Another way of thinking about this is that f ’s domain is R3 and its
codomain is R2 . Therefore f : R3 → R2 , so A is 2 × 3.
The first column of A contains the coefficients of x1 . In the first component
of f (~x) we have −x1 , so the first entry in this column is −1. In the second
62 Functions of Vectors

output component we have 4x1 , so the second entry in A’s first column is 4.
Similarly, the second column of A contains the coefficients on x2 . These are 0
in the first component of f (~x) and −1 in the second component. Finally, A’s
third column contains the coefficients on x3 . The first component of f (~x) has
coefficient 1 and the second has coefficient 3.  
−1 0 1
Putting this all together we get that A is the 2 × 3 matrix .
4 −1 3

Example 4. Find the matrix of the identity function f : Rn → Rn with


f (~x) = ~x.

Since the domain and codomain of f are both Rn , this matrix will be n×n.
Each variable xi appears only in the ith entry of f (~x) with coefficient 1, so
the jth column of f ’s matrix has all entries equal to zero except for a 1 in the
jth spot. Thus f has matrix
 
1 0 0 ··· 0
0 1 0 · · · 0
 
 ..  .
In =  ... ... . . . . 
 
0 0 1 0
0 0 ··· 0 1

Since f is the identity map, its matrix In is called the identity matrix.

In biology, linear functions are often used to model population dynamics.


The matrices of these functions are called demographic matrices.

Example 5. In Example 6 from Chapter 0, we discussed setting up a vector


to model the five stages (seedling, small plant, medium plant, large plant, and
flowering) of the life cycle of the smooth coneflower. A recent study measured
how plants moved between these stages after a year of growth. It found that
seedlings have a 0.35 probability of becoming a small plant. Small plants have a
0.78 probability of staying small plants, a 0.18 probability of becoming medium
plants, a 0.03 probability of becoming large plants, and a 0.01 probability of
flowering. Medium plants have a 0.24 probability of becoming small plants, a
0.49 probability of staying medium plants, a 0.17 probability of becoming large
plants, and a 0.10 probability of flowering. Large plants have a 0.07 probability
of becoming small plants, a 0.21 probability of becoming medium plants, a
0.38 probability of staying large plants, and a 0.33 probability of flowering.
Flowering plants have a 0.43 probability of becoming small plants, a 0.28
probability of becoming medium plants, a 0.18 probability of becoming large
plants, and a 0.11 probability of flowering. Flowering plants also have a 17.76
probability of producing seedlings. (Normally this wouldn’t be an acceptable
number for a probability because it is greater than 1, but one flowering plant is
2.2 Matrices 63

capable of producing multiple seedlings.) Find the matrix of the function that
takes a given year’s population vector and gives the next year’s population
vector.

Let’s start by thinking about the size of this matrix. Our input and output
vectors both have 5 entries, one for each stage in the coneflower’s life cycle.
This means we have f : R5 → R5 and our matrix is 5 × 5.
As in Chapter
 0’s Example 6, let’s order the entries in our population
x1
x2 
 
vectors  
x3  so that x1 counts seedlings, x2 counts small plants, x3 counts
x4 
x5
medium plants, x4 counts large plants, and x5 counts flowering plants.
The first row of our matrix is the coefficients on these xi s that make up
the first entry of f (~x), in other words, the coefficients on the current year’s
life cycle counts which tell us how many seedlings there will be next year.
The only life cycle stage that produces seedlings is flowering plants, and each
flowering plant produces 17.76 seedlings. This means that the first entry of
f (~x) is 17.76x5 .
The second row of our matrix is the coefficients that tell us how to compute
the number of small plants there will be next year. Each seedling has a 0.35
chance to become a small plant, each small plant has a 0.78 chance to stay a
small plant, each medium plant has a 0.24 chance to become a small plant,
each large plant has a 0.07 chance to become a small plant, and each flowering
plant has a 0.43 chance to become a small plant. This means the second entry
of f (~x) is 0.35x1 + 0.78x2 + 0.24x3 + 0.07x4 + 0.43x5 .
The third row of our matrix is the coefficients that tell us how to compute
the number of medium plants there will be next year. No seedlings become
medium plants, each small plant has a 0.18 chance to become a medium plant,
each medium plant has a 0.49 chance to stay a medium plant, each large plant
has a 0.21 chance to become a medium plant, and each flowering plant has a
0.28 chance to become a medium plant. This means the third entry of f (~x) is
0.18x2 + 0.49x3 + 0.21x4 + 0.28x5 .
The fourth row of our matrix is the coefficients that tell us how to compute
the number of large plants there will be next year. No seedlings become large
plants, each small plant has a 0.03 chance to become a large plant, each
medium plant has a 0.17 chance to become a large plant, each large plant has
a 0.38 chance to stay a large plant, and each flowering plant has a 0.18 chance
to become a large plant. This means the fourth entry of f (~x) is 0.03x2 +
0.17x3 + 0.38x4 + 0.18x5 .
The fifth row of our matrix is the coefficients that tell us how to compute
the number of flowering plants there will be next year. No seedlings become
flowering plants, each small plant has a 0.01 chance to become a flowering
64 Functions of Vectors

plant, each medium plant has a 0.10 chance to become a flowering plant, each
large plant has a 0.33 chance to become a flowering plant, and each flowering
plant has a 0.11 chance to stay a flowering plant. This means the fifth entry
of f (~x) is 0.01x2 + 0.10x3 + 0.33x4 + 0.11x5 .
Putting the coefficients from each entry of f (~x) into the corresponding
rows of f ’s matrix gives us
 
0 0 0 0 17.76
0.35 0.78 0.24 0.07 0.43 
 
 0 0.18 0.49 0.21 0.28 
 .
 0 0.03 0.17 0.38 0.18 
0 0.01 0.10 0.33 0.11

The idea of building a matrix whose entries are probabilities of going


from one stage in a process to another is part of a type of modeling called a
discrete Markov chain. These are used in many other applications where we’re
transitioning from one state to another. For example, in game theory the states
may be squares on a game board and the probabilities in the associated matrix
will encode how likely a player is to move between a given pair of squares.
Now that we understand how to find the matrix of a given function, we
can turn this process around to find the function of a given matrix.

Example
 6. Find the equation of the function corresponding to the matrix
4 0
A = 1 −3.
2 7
Our matrix A has three rows and two columns, so is 3 × 2. This means it
corresponds to a function f : R2 → R3 . This means we’re trying to fill in the
gaps in the equation  
 
x1
f =  .
x2

Each row of our matrix contains the coefficients (on x1 in the first column
and x2 in the second column) of an entry of f (~x). The first row gives us
4x1 + 0x2 , the second row gives us x1 − 3x2 , and the third row gives us
2x1 + 7x2 . Plugging this into our formula above means
 
  4x1
x1
f = x1 − 3x2  .
x2
x1 + 7x2

Suppose we are given the matrix A of a linear function f : Rn → Rm , and


we want to find f (~x) for some vector ~x from Rn . We could find the function’s
equation as in Example 6 and then plug in ~x, but that seems a bit complicated
2.2 Matrices 65

to do every time – especially if we want to compute f (~x) for multiple vectors!


Instead, let’s figure out a way to go directly from A and ~x to f (~x). We’ll write
this as the product of a matrix and a vector, so A~x = f (~x).
Before we figure out how to compute the entries of the vector A~x, we need
to talk about how to make sure the sizes of A and ~x are compatible. For f (~x)
to make sense, we need ~x to be in the domain of f . We can connect this to A
by remembering that the size of f ’s input vectors is the same as the number
of columns of A. Therefore A~x only makes sense if A has n columns and ~x is
an n-vector. The product A~x is f (~x), which means it is in the codomain of f .
Since the size of f ’s codomain is the number of rows of A, this means A~x is a
vector with as many entries as A has rows.
Personally, I’ve always remembered this by restating it as “the adjacent
numbers match”, i.e., (m × n matrix)(n-vector), “and cancelling the matching
numbers leaves the size of the product”, i.e., removing the n’s leaves m.

Example 7. What are the sizes of ~x and A~x if A is a 5 × 8 matrix?

Because A is 5 × 8, we know f : R8 → R5 . Since ~x is an input of f , this


means it must be an 8-vector. Similarly, since A~x is an output of f , it must
be a 5-vector.
Alternately we need (5 × 8 matrix)(8-vector) to get the adjacent numbers
to match. The leftover number 5 is the size of the vector A~x.
 
−1 0 1
Example 8. Explain why we can’t multiply the matrix A =
  2 4 −1
0
by the vector ~x = .
1
Here our matrix A is 2 × 3 but ~x is a 2-vector. Since the size of ~x doesn’t
match the number of columns of A, we know ~x isn’t in the domain of A’s
function. Therefore this product isn’t defined.

Now that we understand how to ensure A~x is defined and what size vector
it is, let’s figure out how to compute its entries from those of A and ~x.
   
2 3 5
Example 9. Let A = and ~x = . Use the fact that A~x = f (~x)
4 −2 −8
to compute A~x.

Using the
 same
 process
 as Example
 6, we find that
 A is the matrix of the
x1 2x1 + 3x2 5
function f = . Plugging in ~x = , we get
x2 4x1 − 2x2 −8
   
5 2(5) + 3(−8)
f =
−8 4(5) − 2(−8)
66 Functions of Vectors

so since A~x = f (~x) means we want


    
2 3 5 2(5) + 3(−8)
= .
4 −2 −8 4(5) − 2(−8)

Looking at this last equation, we can start to see a pattern emerge. To


get the top entry of A~x, we added together the first entry in the top row of
A times the first entry of ~x and the second entry in the top row of A times
the second entry of ~x. To get the bottom entry of A~x, we added together the
first entry in the bottom row of A times the first entry of ~x and the second
entry in the bottom row of A times the second entry of ~x. This should seem
reasonable, since it means we’re using the entries of A as coefficients on the
entries of ~x.

We can generalize the pattern we observed in the previous example as


follows: to find the ith entry of A~x we combine the entries from the ith row
of the matrix A with the entries of ~x. Each matrix row is combined with ~x by
adding together the pairwise products of their corresponding entries. Pairing
up the entries along a row with the entries of ~x in this way should make sense,
because the jth entry of each row of A is a coefficient on the jth entry, xj , of
~x.
This motivates the following definition.
 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
 
Definition. The m × n matrix A =  . ..  and n-vector
 .. . 
am1 am2 · · · amn
   
x1 a11 x1 + a12 x2 + · · · + a1n xn
 x2   a21 x1 + a22 x2 + · · · + a2n xn 
   
~x =  .  have m-vector product A~x =  .. .
 ..   . 
xn am1 x1 + am2 x2 + · · · + amn xn

Notice that since aij is the coefficient on xj in the ith component of A~x,
if A is the matrix of a linear map f we have A~x = f (~x) as planned.
This definition’s notation is quite complicated, but once you have the
overall pattern down, it becomes more routine. Let’s walk through a few
examples.
 
  2
−1 0 1
Example 10. Compute A~x where A = and ~x = −1.
2 4 −1
3
First of all, notice that A is 2×3 and ~x is a 3-vector, so this product makes
sense. It also tells us A~x is a 2-vector.
2.2 Matrices 67

To find the first entry of A~x, we add up the pairwise products across the
first row of A and down ~x.
 
  2
−1 0 1  
−1 .
2 4 −1
3

This gives us (−1)(2) + (0)(−1) + (1)(3) = 1. The second entry of A~x is


the same format, but uses the second row of A.
 
  2
−1 0 1  
−1 .
2 4 −1
3

This gives us (2)(2)


 +  (4)(−1) + (−1)(3) = −3.
1
Therefore A~x = .
−3

Now that we understand how to multiply a vector by a matrix, we can use


our new skill to compute f (~x).
 
2  
−1 0 1
Example 11. Compute f 1 where f has matrix A = .
4 −5 3
1
By our construction of matrix-vector multiplication, we know
   
2   2
−1 0 1  
f 1 = 1 .
4 −5 3
1 1

Adding up the pairwise products along the rows of our matrix and down our
vector gives us
 
  2    
−1 0 1   −1(2) + 0(1) + 1(1) −1
1 = = .
4 −5 3 4(2) + (−5)(1) + 3(1) 6
1
 
2  
  −1
Thus f 1 = .
6
1

The fact that multiplying ~x by a matrix A is the same as plugging ~x into


the corresponding linear function f means that this type of multiplication is
linear in the sense that A(~v + w)
~ = A~v + Aw~ and A(r · ~v ) = r · A~v since

A(~v + w)
~ = f (~v + w)
~ = f (~v ) + f (w)
~ = A~v + Aw
~
68 Functions of Vectors

and
A(r · ~v ) = f (r · ~v ) = r · f (~v ) = r · A~v .
We can relate this new multiplication to the linear combinations of vectors
discussed in 1.2. There we were discussing the span of a set of vectors
~v1 , ~v2 , . . . , ~vn , which had the form x1~v1 + x2~v2 + · · · + xn~vn . If we have an
m × n matrix A and we think of its n columns as the m-vectors ~a1 , ~a2 , . . . , ~an ,
we can rewrite A~x as
 
a11 x1 + a12 x2 + · · · + a1n xn
 a21 x1 + a22 x2 + · · · + a2n xn 
 
A~x =  .. 
 . 
am1 x1 + am2 x2 + · · · + amn xn
     
a11 x1 a12 x2 a1n xn
 a21 x1   a22 x2   a2n xn 
     
=  .  +  .  + ··· +  . 
 ..   ..   .. 
am1 x1 am2 x2 amn xn
     
a11 a12 a1n
 a21   a22   a2n 
     
= x1  .  + x2  .  + · · · + xn  . 
 ..   ..   .. 
am1 am2 amn
= x1~a1 + x2~a2 + · · · + xn~an .

This means the vector equation x1~v1 + x2~v2 + · · · + xn~vn = ~b can also be
thought of as the matrix equation A~x = ~b or the equation f (~x) = ~b where f
is the linear function whose matrix is A. This connection shouldn’t be totally
surprising, since we constructed A so that the entries in its jth column were
the coefficients of xj in f (~x).

Example 12. Rewrite  the matrix-vector


 product A~x as a linear combination
−1 0 1
of vectors where A = .
2 4 −1
To do this, we need to split A up into column vectors and multiply each
column by the appropriate variable. Since A has three columns, ~x is a 3-vector
so our variables are x1 , x2 , and x3 . Thus
 
  x1      
−1 0 1   −1 0 1
A~x = x2 = x1 + x2 + x3 .
2 4 −1 2 4 −1
x3
2.2 Matrices 69
     
3 −1 1
−2 4 5
Example 13. Write x1      
 1  + x2  2  =  0  as a matrix equation.
0 6 −3
To write this as a matrix equation, we simply need to rewrite the left-
hand side as A~x for some matrix A. This is the opposite of what we did in
the previous example, so A is the matrix whose columns are the vectors on
the left of our equation. This means we can rewrite our vector equation as
   
3 −1 1
−2 4  5
  ~x =   .
1 2 0
0 6 −3

   
x1 7x1 − x2 + 6x3
Example 14. Let f : R3 → R3 by f x2  = 2x1 + 2x2 − 4x3 . Find ~x
x3 5x1 − 3x2 + 8x3
 
5
so that f (~x) = ~b where ~b = −2.
5
We now have three options on how to tackle this problem, because we have
three ways of writing this equation: as f (~x) = ~b, as A~x = ~b where A is f ’s
matrix, or as the vector equation x1~a1 + x2~a2 + x3~a3 = ~b where the vectors ~a1 ,
~a2 , and ~a3 are the columns of A. All of these methods will give us the same
answer for ~x, so we can choose whichever one seems easiest to us.
Solving for ~x directly from f (~x) = ~b doesn’t seem easy, so let’s explore
our two other options by rewriting this equation as both a matrix and vector
equation. In both cases, we need to find the matrix A of our function f . Since
f : R3 → R3 , we know A is a 3 × 3 matrix. Picking off the coefficients from f ,
gives us  
7 −1 6
A = 2 2 −4 .
5 −3 8

This means our matrix equation A~x = ~b is


    
7 −1 6 x1 5
2 2 −4 x2  = −2 .
5 −3 8 x3 5

Unfortunately, this doesn’t seem easier,


 so let’s try
 thevector equation
 version.

7 −1 6
The three columns of A are ~a1 = 2, ~a2 =  2 , and ~a3 = −4. This
5 −3 8
70 Functions of Vectors

means our vector equation x1a1 + x2a2 + x3a3 = b is


       
7 −1 6 5
x1 2 + x2  2  + x3 −4 = −2 .
5 −3 8 5

In this format it is easier to realize that 


 b= a2 + a3 . This means our solution
0
is x1 = 0, x2 = 1, and x3 = 1 or x = 1. (You can go back and check that
1
this vector satisfies the other two formats as well.)

If you’re worried you wouldn’t have noticed this relationship on your own,
don’t worry. We’ll be discussing how to solve f (x) = b in more detail soon.
Let’s switch gears for a moment and explore how to use the geometric
action of a function on Rn to find its matrix. Since we don’t yet know how to
do that, we’ll start in the opposite direction by exploring the geometric effects
of a matrix in hopes that once we understand this process we can reverse it.
 
0 1
Example 15. Suppose f has matrix A = . What does f do
−1 0
geometrically?

The first thing to notice here is that since A is 2 × 2 we know f : R2 → R2 .


This means we’re looking at the geometric effect f has on the plane.
One way to explore the impact of f geometrically is to start with a picture
#
of a shape in the plane and compare that starting shape with its image after
we apply our map f . Let’s start with the unit square pictured below.

" !
! ! !

!# !! " ! ! #
" "

If we plug each of the corners of our unit square into f , we get


0 !!
      
0 1 0 0
f = = ,
0 −1 0 0 0
      
1 0 1 1 0
f = = ,
0 −1 0 0 −1
!#
2.2 Matrices 71
      
1 0 1 1 1
f = = ,
1 −1 0 1 −1
and       
0 0 1 0 1
f = = .
1 # −1 0 1 0
These image points outline a new square as shown below. (The dotted outline
is the original unit square.)

"

!# !" ! " ! ! #
! "
!

!
" !" "
!
! "

Looking at this image of the unit square, we can guess that f is the function
!#
which rotates
 the plane clockwise by 90◦ . However, we also  saw an interesting
1 0
thing: f was the first column of f ’s matrix and f was the second
0 1
column of f ’s matrix.
   
1 0
Why should the images of the and have given us the columns of
0 1
our function’s matrix in the example above? If we think geometrically about
a vector x in Rn , the entries of x tell us that vector’s position along the n
axes of Rn . We can think of each axis as being the span of a vector of length
1 which lies along that axis in the positive direction. These are the n special
n-vectors      
1 0 0
 0  1  0
     
     
e1 = 0 , e2 = 0 , . . . , en =  ...  ,
 ..   ..   
. .  0
0 0 1
which are sometimes called the standard unit vectors.
72 Functions of Vectors

This makes
 
x1
 x2 
 
~x =  .  = x1~e1 + x2~e2 + · · · + xn~en .
 .. 
xn

Multiplying a vector by a matrix splits up over linear combinations, so

A~x = A (x1~e1 + x2~e2 + · · · + xn~en )


= x1 A~e1 + x2 A~e2 + · · · + xn A~en .

Recall from the discussion before Example 12 that if ~ai is the ith column
of A, then A~x = x1~a1 + x2~a2 + · · · + xn~an . Comparing this with the equation
above tells us that A~ei is the ith column of A! Thus we can interpret a matrix
geometrically by using the rule that A maps the positive unit vector along the
ith axis to the vector which is its ith column. Thus, to understand the effect
on Rn of multiplication by A, it is enough to understand the effect of A on
~e1 , ~e2 , . . . , ~en . This matches what we saw in Example 15. Next, let’s use this
idea to find the matrices of some 2D functions.

Example 16. Find the function f : R2 → R2 which rotates the plane


counterclockwise by 45◦ .

We can do this by figuring out where f sends each of the positive unit
vectors along the axes and using them to create f ’s matrix.
 Put
more

1 0
concretely, we need to use some geometry to find f and f .
0 1
2
Let’s start by visualizing f ’s effect on the plane.



1

� �
� �

-2 -1 � 1 2

 
1
The image of is the vector of length 1 along the line y = x. We can
0 -1

use the Pythagorean Theorem to figure out its coordinates.

-2
2.2 Matrices 1.0 73

0.8

0.6



0.4 �

0.2

-0.4 -0.2 0.2 0.4 0.6 0.8 1.0


  -0.2
1
Rotating doesn’t change its length, so from the picture above, we can
0
-0.4
see that x2 + y 2 = 1 and x = y. This means x = y = √12 , so

  " √1 #
1 2
f = 1 .
0 √
2
 
0
Similarly, we can use the image of in the picture below to figure out
1 1.0
its exact coordinates.
0.8

� 0.6

� 0.4

0.2

-1.0 -0.8 -0.6 -0.4 -0.2 0.2 0.4


-0.2

Here x = − √12 and y = √1 ,


2
so
-0.4
  "− √1 #
0 2
f = .
1 √1
2

Using these two images (in order) as the columns of f ’s matrix, we see
74 Functions of Vectors

that f (~x) = A~x where " #


√1 − √12
2
A= .
√1 √1
2 2

Using the entries of A as the coefficients in f ’s formula gives


  " √1 x − √1 x #
x1 2 1 2 2
f = 1 .
x2 1
√ x1 + √ x2
2 2

Example 17. Give a geometric description of map f : R2 → R2 by


  "− √1 x + √1 x #
x1 2 1 2 2
f = .
x2 1
− x1 −
√ √1
x2
2 2
To understand what f does to R2 geometrically, let’s start by computing
the images of the two positive unit vectors along the axes. This gives us
  "− √1 #
1 2
f =
0 − √12

and
  " √1 #
0 2
f = .
1 − 1

2

(We could also havedone thisby


finding f ’s matrix and remembering that its
2
1 0
columns are f and f respectively.)
0 1
Visually, this means we get the following picture of f ’s effect on the plane.

� 1


-2 -1 � 1 2




-1

Looking at the picture, we can see that f rotates R2 clockwise 135◦ .

Our geometric and equation oriented approaches to finding the matrix of


-2
2.2 Matrices 75

a linear function give us the option to find a function’s matrix whichever way
seems easier for our particular function. The option to view a problem as
f (~x) = ~b, A~x = ~b, or x1~a1 + · · · + xn~an = ~b gives us several different ways to
solve for ~x.

Exercises 2.2.

1. Consider A~x = ~b, where A is a 4 × 6 matrix.


(a) What size vector is ~x?
(b) What size vector is ~b?
2. Consider A~x = ~b, where A is a 3 × 8 matrix.
(a) What size vector is ~x?
(b) What size vector is ~b?
3. Consider A~x = ~b, where A is a 5 × 4 matrix.
(a) What size vector is ~x?
(b) What size vector is ~b?
4. Consider A~x = ~b, where A is a 7 × 2 matrix.
(a) What size vector is ~x?
(b) What size vector is ~b?
 
1 0 0 −2
5. Let f (~x) = A~x where A = .
0 0 1 3
(a) What is the domain of f ?
(b) What is the codomain of f ?
 
1 −3 0
0 0 1

6. Let f (~x) = A~x where A =  .
0 0 0
0 0 0
(a) What is the domain of f ?
(b) What is the codomain of f ?
 
1 −1

7. Let f be the linear map whose matrix is 4 −3.
0 3
(a) What is the domain of f ?
(b) What is the codomain of f ?
 
1 1 0
8. Let f be the linear map whose matrix is A = 0 0 −7.
2 5 3
76 Functions of Vectors

(a) What is the domain of f ?


(b) What is the codomain of f ?
 
x1  
3x1 − x3
9. Let f x2  =
−x1 + x2 + 2x3
x3
(a) What is the domain of this map?
(b) What is the codomain of this map?
(c) What is the size of f ’s matrix?
 
  −x1 + 4x2
x1
10. Let f = x2 
x2
−2x1 + 3x2
(a) What is the domain of this map?
(b) What is the codomain of this map?
(c) What is the size of f ’s matrix?
 
x1  
x2  −x1 + 4x2 − 3x4
11. Let f   
x3  = x2 + x3 
−2x1 + 3x2
x4
(a) What is the domain of this map?
(b) What is the codomain of this map?
(c) What is the size of f ’s matrix?
   
x1 x1 + 2x2
12. Let f =
x2 2x1 + 5x2
(a) What is the domain of this map?
(b) What is the codomain of this map?
(c) What is the size of f ’s matrix?
 
x  
3 2    2x − y + z
13. Let f : R → R by f y = . Find the matrix A
−x + 10z
z
so that f (~x) = A~x.
   
x 2x − y
14. Let f : R3 → R3 by f y  =  3z . Find the matrix A
z y + 8z − x
so that f (~x) = A~x.
   
x x+y
15. Find the matrix of the linear map f y  = 2x + y .
z 2y − z
2.2 Matrices 77
 
  −x2 + x3
x1 2x1 + 4x2 + 2x3 
16. Find the matrix of the linear map f x2  =  
 x1 − 3x2 + x3  .
x3
x2 + x3
 
x1  
x2  3x1 − 7x3 + x4
17. Find the matrix of the linear map f    
x3  = x1 + x3 + x4 .

−x1 + 8x3
x4
 
x1  
  2x1 − 3x2 + x3
18. Find the matrix of the linear map f x2 = .
x1 − x3
x3
19. Find the matrix of the linear map f : R2 → R2 which rotates the
plane 270◦ counterclockwise.
20. Find the matrix of the linear map f : R2 → R2 which reflects the
plane about the line y = −x. 2
   
1 0
21. The picture below shows the effect of the map f on and .
0 1
Use this to find f ’s matrix.



-2 -1 1 2

-1


-2

   
1 0
22. The picture below shows the effect of the map f on and .
0 1
Use this to find f ’s matrix.
3

78 Functions of Vectors




1



-2 -1 1 2

 
  1
2 1 4  
23. Compute 2 .
0 4 −1
3 -1
 
1 −1  
2
24. Compute 4 −3
 .
5
0 3
25. Compute each of the following or explain why it isn’t possible.
 
  −2
1 2 5  
(a) · 4
3 −1 1
1
   
1 2 5 −1
(b) ·
3 −1 1 2
26. Compute each of the following or explain why it isn’t possible.
 
1 2  
  −1
(a) −1 4 ·
2
0 −5
   
1 2 −5
(b) −1 4  ·  7 
0 −5 9
 
−1      
3 11 −1 3 4
27. Let ~v1 =  2  , ~v2 = , ~v3 = , and A = .
−2 5 0 −2 9
1
Compute whichever of A~v1 , A~v2 and A~v3 are possible.
 
  1    
2   7 1 −2
28. Let ~v1 = , ~v2 = 2 , ~v3 = , and A = . Compute
−4 5 −4 2
3
whichever of A~v1 , A~v2 and A~v3 are possible.
 
−1    
2 4  
0
29. Let ~v1 =   , ~v2 = −3, ~v3 =  0  and A = 1 4 0 2 .
 10  0 1 2 0
1 −5
−3
Compute whichever of A~v1 , A~v2 and A~v3 are possible.
2.2 Matrices 79
   
4     −1 2
6 −1
30. Let ~v1 = −2 , ~v2 = , ~v3 = and A =  4 −3.
−5 9
8 7 1
Compute whichever of A~v1 , A~v2 and A~v3 are possible.
         
2 4 −2 4 −1
31. Write x1 + x2 + x3 + x4 = as a matrix
1 2 −1 0 9
equation.
       
1 −2 3 −2
32. Write x1 + x2 + x3 = as a matrix equation.
−1 3 1 4
   
−10 3 8
33. Write  1 0  ~x = −2 as a vector equation.
7 −2 3
   
2 1/2 4 10
34. Write −1 0 5  ~x =  0  as a vector equation.
7 8 −4 −9
 
2 2 cos(θ) − sin(θ)
35. (a) Show that fθ : R → R with matrix Rθ =
sin(θ) cos(θ)
rotates
  the plane counterclockwise
  by the angle θ by computing
1 0
fθ and fθ and arguing that these are the images
 0   1
1 0
of and after a counterclockwise rotation by θ.
0 1
(b) We can compute the coordinates of k holes evenly spaced
around a circle of radius r centered at the origin by starting
with a single point on the perimeter of the circle and applying
fθ repeatedly, where θ is k1 th of a complete circle. Use this
method to find the coordinates of 12 evenly spaced holes on
the perimeter of a circle of radius 10.
36. The general formula for the matrix of the map f :"R2 → R2 which
#
2
1−m 2m
1+m2 1+m2
reflects the plane across the line y = mx is Rm = 2m m2 −1
.
1+m2 1+m2

(a) Check your answer from Exercise 20 by plugging in m = −1.


(b) Use this formula to find the matrix of the map which reflects
R2 across the line y =x.Check your
 answer by using geometry
1 0
to find the images of and .
0 1
 
2
(c) Find the reflection of across the line y = 3x.
5
80 Functions of Vectors

2.3 Matrix Operations


In the last section, we learned how to rewrite a linear function from Rn to
Rm as an m × n matrix. From our studies of functions in other math courses,
we have a number of function operations. Since matrices are just another way
of viewing linear functions, we should be able to use our function operations
to define matrix operations. We’ll do this for three basic function operations:
addition, multiplication by a scalar, and composition. Let’s start with function
addition.
If we have two functions f and g, then their sum should be another function
f + g which satisfies (f + g)(~x) = f (~x) + g(~x). This immediately makes it clear
that our functions f and g must have the same domain and codomain. If f
and g don’t have the same domain, then we won’t be able to plug the same
vector ~x into both functions which means one of f (~x) and g(~x) won’t make
sense. If f and g don’t have the same codomain then their output vectors
won’t have the same size, so f (~x) + g(~x) will be undefined. Thus f + g only
makes sense if we have f : Rn → Rm and g : Rn → Rm for the same n and m.
Before we try to translate this function behavior into matrix behavior, we
need to be sure that the function f + g will have a matrix. Since only linear
functions have matrices, this means we need the following theorem.

Theorem 1. If f and g are linear functions for which f + g is defined, then


f + g is a linear function.

To check that f + g is a linear function, we need to show that it obeys


both the additive and scalar multiplication conditions from 2.1’s definition of
a linear function. Let ~v and ~u be vectors in f +g’s domain and r be any scalar.
Function addition dictates that

(f + g)(~v + ~u) = f (~v + ~u) + g(~v + ~u)

and since both f and g are linear functions we know

f (~v + ~u) = f (~v ) + f (~u) and g(~v + ~u) = g(~v ) + g(~u).

Putting these three equations together, we get

(f + g)(~v + ~u) = f (~v ) + f (~u) + g(~v ) + g(~u).

Rearranging the terms in the right-hand side of this equation gives us

(f + g)(~v + ~u) = f (~v ) + g(~v ) + f (~u) + g(~u)

which can be rewritten as

(f + g)(~v + ~u) = (f + g)(~v ) + (f + g)(~u).


2.3 Matrix Operations 81

This shows f + g satisfies the additive condition of a linear function.


Moving on to scalar multiplication, we have
(f + g)(r · ~v ) = f (r · ~v ) + g(r · ~v ).
Again, since f and g are linear we have
f (r · ~v ) = r · f (~v ) and g(r · ~v ) = r · g(~v ).
Plugging this back into our initial equation gives us
(f + g)(r · ~v ) = r · f (~v ) + r · g(~v ).
This shows that f + g satisfies the scalar multiplication property of a linear
function, and therefore f + g is a linear function whenever f and g are linear.
Thinking about function addition in the context of matrices, if f has matrix
A and g has matrix B we want to create another matrix which we’ll write
A + B so that A + B is the matrix of the linear map f + g. We can only add f
and g when they have the same domain and codomain, i.e., f, g : Rn → Rm .
In matrix terms, this means we can only add f and g when they both have
m × n matrices. Therefore matrix addition only makes sense for matrices of
the same size, which mirrors what we saw for vector addition. Additionally,
since (f + g)(~x) = f (~x) + g(~x), we need our new matrix A + B to satisfy
(A + B)~x = A~x + B~x.
   
9 −1 4 2
Example 1. Find A + B where A = and B = .
3 7 6 5
Since we don’t know yet how to combine these matrices directly, let’s
instead find the formulas for their maps, add those maps, and then find the
matrix of that new map. I’ll call A’s map f and B’s map g, so A + B is the
matrix of (f + g)(~x) = f (~x) + g(~x).
Using the techniques we practiced in 2.2, we know
       
x1 9x1 − x2 x1 4x1 + 2x2
f = and g =
x2 3x1 + 7x2 x2 6x1 + 5x2
so
     
9x1 − x2 4x1 + 2x2 (9x1 − x2 ) + (4x1 + 2x2 )
f (~x) + g(~x) = + =
3x1 + 7x2 6x1 + 5x2 (3x1 + 7x2 ) + (6x1 + 5x2 )
 
(9 + 4)x1 + (−1 + 2)x2
= .
(3 + 6)x1 + (7 + 5)x2
Since A + B is the matrix of f + g, we must have
 
9 + 4 −1 + 2
A+B = .
3+6 7+5
   
9 −1 4 2
Comparing this to A = and B = , we can see that each
3 7 6 5
entry of A + B is the sum of the corresponding entries of A and B.
82 Functions of Vectors

The pattern from our previous example motivates the following definition.
 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
 
Definition. The m × n matrices A =  . ..  and
 .. . 
am1 am2 ··· amn
 
b11 b12 ··· b1n
 b21 b22 ··· b2n 
 
B= . ..  have sum
 .. . 
bm1 bm2 · · · bmn
 
a11 + b11 a12 + b12 · · · a1n + b1n
 a21 + b21 a22 + b22 · · · a2n + b2n 
 
A+B = .. .. .
 . . 
am1 + bm1 am2 + bm2 ··· amn + bmn

Since the ijth entry of a matrix is the coefficient of xj in its function’s


output vector’s ith entry, we get (A + B)~x = A~x + B~x as desired. To see that
the matrix A + B corresponds to the function f + g, we can compute

(A + B)~x = A~x + B~x = f (~x) + g(~x) = (f + g)(~x)

Therefore our newly defined addition for matrices corresponds to our older
notion of addition of functions.
   
−1 0 4 2 6 1
Example 2. Compute + .
10 −5 2 −8 1 2
Since these two matrices are the same size (both are 2 × 3), we can add
them together by adding corresponding pairs of entries. This gives us
     
−1 0 4 2 6 1 −1 + 2 0+6 4+1
+ =
10 −5 2 −8 1 2 10 + (−8) −5 + 1 2 + 2
 
1 6 5
= .
2 4 4

   
3 −1 0 4 6 1 −2
Example 3. Why can’t we compute + ?
7 10 −5 2 9 0 −3
Our first matrix is 2 × 4 while the second is 2 × 3. Since they don’t have
the same number of columns, they aren’t the same size. This means that if we
tried to add their corresponding entries, we’d have entries of the first matrix
without partners in the second matrix. Thus it is impossible to add these two
matrices.
2.3 Matrix Operations 83

Next let’s create a matrix operation to mirror multiplying a function by a


constant. Here there are no restrictions on the size of the domain and codomain
of our function, so we won’t have any restrictions on the number of rows or
columns of our matrix.
 
9 −1
Example 4. Find 4A where A = .
3 7
Since we don’t know yet how to do this directly, let’s instead find the
formula for A’s map, multiply that map by 4, and then find the matrix of
that new function. I’ll call A’s map
 f ,
so 4A is the matrix
 of 4f (~x).
x1 9x1 − x2
From Example 1, we know f = , so
x2 3x1 + 7x2
     
9x1 − x2 4(9x1 − x2 ) 4(9)x1 + 4(−1)x2
4f (~x) = 4 = = .
3x1 + 7x2 4(3x1 + 7x2 ) 4(3)x1 + 4(7)x2

Since 4A is the matrix of 4f , we must have


 
4(9) 4(−1)
4A = .
4(3) 4(7)

Clearly this is just our matrix A with every entry multiplied by 4.

The pattern we saw in the previous example holds for any matrix and
any scalar, because when we multiply a function f by a scalar r, we’re really
creating a new function r · f where (r · f )(~x) = r · f (~x). Thus to multiply a
function by a scalar we’re really just multiplying our function’s output vector
by that scalar. Since multiplying a vector by a scalar means multiplying each
entry of the vector by the scalar, we can find r · f by multiplying each entry
of f (~x) by r. Distributing this multiplication by r through each entry of f (~x)
means multiplying the coefficient of each variable in each entry of f (~x) by r,
so the coefficients for r · f are just r times the coefficients of f . We want to
define multiplication of a matrix by a scalar so that if A is the matrix of f
then r · A is the matrix of r · f . This prompts the following definition.
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
 
Definition. Let A =  . ..  be an m × n matrix and r be
 .. . 
am1 am2 · · · amn
 
ra11 ra12 · · · ra1n
 ra21 ra22 · · · ra2n 
 
any scalar. Then r · A =  . .. .
 .. . 
ram1 ram2 ··· ramn

Our discussion above explains why the coefficient on xj in the ith entry
84 Functions of Vectors

of r · f is r times the corresponding coefficient from f . Since those coefficients


are the ijth entries of the matrices corresponding to f and r · f , we see that
r · A is the matrix of r · f .
 
5 3
Example 5. Compute −2 ·  0 −1.
−4 2
To multiply this matrix by −2, we simply multiply each of its entries by
−2. This gives us
     
5 3 −2(5) −2(3) −10 −6
−2 ·  0 −1 =  −2(0) −2(−1) =  0 2 .
−4 2 −2(−4) −2(2) 8 −4

Finally, let’s tackle figuring out how to model function composition. If we


have two functions f and g, then f ◦g is the function with (f ◦g)(~x) = f (g(~x)).
(Notice that even though we read from left to right most of the time, functions
in a composition act right to left since the rightmost function is the one applied
to ~x first.)
As with addition, before we start to translate function composition into
matrices, we need to establish that f ◦ g has a matrix.

Theorem 2. If f and g are linear functions and f ◦ g is defined, then f ◦ g


is a linear function.

As with Theorem 1, we need to show that f ◦ g satisfies both the additive


and scalar multiplication property of a linear function. Let ~v and ~u be vectors
in f ◦ g’s domain and r be any scalar.
Since we know g is linear, we have g(~v + ~u) = g(~v ) + g(~u) which means

(f ◦ g)(~v + ~u) = f (g(~v + ~u)) = f (g(~v ) + g(~u)).

Since f is also linear, the right-hand side can be split up to give us

(f ◦ g)(~v + ~u) = f (g(~v )) + f (g(~u))

which can be rewritten as

(f ◦ g)(~v + ~u) = (f ◦ g)(~v ) + (f ◦ g)(~u).

Thus f ◦ g splits up over vector addition as required.


Similarly, because g is linear we have g(r · ~v ) = r · g(~v ) so

(f ◦ g)(r · ~v ) = f (g(r · ~v )) = f (r · g(~v )).

Since f is also linear, we can pull the scalar r out of the right-hand side to get

(f ◦ g)(r · ~v ) = r · f (g(~v ))
2.3 Matrix Operations 85

which can be rewritten as

(f ◦ g)(r · ~v ) = r · (f ◦ g)(~v )).

This means f ◦ g also splits up over scalar multiplication and therefore is a


linear function.
If f has matrix A and g has matrix B, then (f ◦ g)(~x) = f (g(~x)) can be
rewritten as
(f ◦ g)(~x) = f (g(~x)) = A(B~x) = (AB)~x.
This means we need to define matrix multiplication so that AB is the matrix
of the linear map f ◦ g.
For f ◦ g to make sense, we must be able to plug g(~x) into f . This means
the codomain of g must be the same as the domain of f . In other words,
we need f : Rk → Rm and g : Rn → Rk . This gives us f ◦ g : Rn → Rm .
Translating this into conditions on our matrices A and B, we get that A is
m × k and B is k × n. In other words, the product AB only makes sense if
the number of columns of A equals the number of rows of B. Since AB is the
matrix of f ◦ g, it is an m × n matrix.
Personally, I remember (m × k)(k × n) as “the touching dimensions
match” and “to get the dimensions of the product, cancel out the matching
dimensions” i.e., removing the k’s leaves m × n. (This is similar to my method
for remembering sizes when multiplying a matrix and a vector.)

Example 6. Is it possible to compute AB if A is 4 × 6 and B is 6 × 9? If it


is possible, give the dimensions of the product matrix AB.

Remember that for AB to make sense we need the number of columns in


A to equal the number of rows in B. Here A has 6 columns and B has 6 rows,
so it is possible to compute their product AB. The product AB will have as
many rows as A and as many columns as B. Since A has 4 rows and B has 9
columns, our product AB is a 4 × 9 matrix.
If you prefer to think of this in terms of functions, this means A’s function
has domain R6 while B’s function has codomain R6 . This means the output
of B’s map is the correct size to plug into A’s map, so the composition is
possible. The domain of the composition is the domain of B’s map, i.e., R9 ,
while the codomain of the composition is the codomain of A’s map, i.e., R4 .
Therefore the matrix of the composition, AB, is 4 × 9.
Alternately, we can plug the dimensions of A and B into the product to
get (4 × 6)(6 × 9). Since the middle numbers match, the product is possible.
If we remove our matching 6s the remaining numbers are the dimensions of
the product, so 4 × 9.

Example 7. Is it possible to compute AB if A is 5 × 3 and B is 5 × 2? If it


is possible, give the dimensions of the product matrix AB.
86 Functions of Vectors

Here A has 3 columns and B has 5 rows, so this product is impossible.


In terms of maps this means B’s map has outputs which are 5-vectors,
while A’s map has inputs which are 3-vectors. Therefore the composition of
these two maps is impossible.
Alternately, when we plug in the dimensions of A and B we get (5×3)(5×2)
and the middle numbers don’t match up, so this can’t be done.

To make AB the matrix of f ◦ g, we need the ijth entry of AB to be the


coefficient on xj in the ith entry of f (g(~x)). Before we dive into the general
case, let’s look at a small concrete example.
   
9 −1 4 2
Example 8. Find AB where A = and B = .
3 7 6 5
Since we don’t know yet how to combine these matrices directly, let’s
instead find the formulas for their maps, compose those maps, and then find
the matrix of that composition. I’ll call A’s map f and B’s map g, so AB is
the matrix of (f ◦ g)(~x) = f (g(~x)).    
x1 9x1 − x2
Using the techniques we practiced in 2.2, we see f =
    x2 3x1 + 7x2
x1 4x1 + 2x2
and g = . This means
x2 6x1 + 5x2
   
4x1 + 2x2 9(4x1 + 2x2 ) − (6x1 + 5x2 )
f (g(~x)) = f = .
6x1 + 5x2 3(4x1 + 2x2 ) + 7(6x1 + 5x2 )

Simplifying (although not all the way because we’re looking for the pattern)
gives us
 
9(4)x1 + 9(2)x2 + (−1)(6)x1 + (−1)(5)x2
f (g(~x)) =
3(4)x1 + 3(2)x2 + 7(6)x1 + (7)5x2
 
(9(4) + (−1)(6))x1 + (9(2) + (−1)(5))x2
= .
(3(4) + 7(6))x1 + (3(2) + (7)5)x2
Since AB is the matrix of this composition map, we must have
 
9(4) + (−1)(6) 9(2) + (−1)(5)
AB = .
3(4) + 7(6) 3(2) + (7)5
   
9 −1 4 2
Comparing this to A = and B = , we can see that the top
3 7 6 5
left entry of AB is a sum of products of corresponding entries from the top
row of A and the left column of B. This pattern of combining a row of A with
a column of B continues throughout the rest of AB’s entries.
2.3 Matrix Operations 87

To see that the pattern we observed in the previous example holds more
generally, suppose
   
x1 a11 x1 + a12 x2 + · · · + a1k xk
x2   a21 x1 + a22 x2 + · · · + a2k xk 
   
f  .  =  .. 
 ..   . 
xk am1 x1 + am2 x2 + · · · + amk xk

and    
x1 b11 x1 + b12 x2 + · · · + b1n xn
 x2   b21 x1 + b22 x2 + · · · + b2n xn 
   
g  .  =  .. .
 ..   . 
xn bk1 x1 + bk2 x2 + · · · + bkn xn
To find the ith entry of f (g(~x)), we need to plug the entries of g(~x) into the ith
entry of f (~x) as x1 , . . . , xk . The ith entry of f (~x) is ai1 x1 + ai2 x2 + · · · + aik xk
so plugging in the entries of g(~x) gives us

ai1 (b11 x1 + b12 x2 + · · · + b1n xn ) + · · · + aik (bk1 x1 + bk2 x2 + · · · + bkn xn ).

This looks incredibly messy, but remember that to find the ijth entry of AB,
we only care about the coefficient on xj . The first term in the sum above
contains ai1 b1j xj , the second term contains ai2 b2j xj , and so on until the last
(kth) term which contains aik bkj xj . Thus the xj term in the ith entry of
f (g(~x)) is
(ai1 b1j + ai2 b2j + · · · + aik bkj )xj .
This allows us to make the following definition.
 
a11 a12 ··· a1k
 a21 a22 ··· a2k 
 
Definition. The product of the m × k matrix A =  . .. 
 .. . 
am1 am2 · · · amk
 
b11 b12 ··· b1n
b21 b22 ··· b2n 
 
and the k × n matrix B =  . ..  is the m × n matrix AB
 .. . 
bk1 bk2 · · · bkn
whose ijth entry is ai1 b1j + ai2 b2j + · · · + aik bkj .

Notice that to find the ijth entry of AB, we add up the pairwise products
along the ith row of A and down the jth column of B. This should feel similar
to our method for multiplying a vector by a matrix. In fact, the jth column
of AB is A times the vector which is the jth column of B.
88 Functions of Vectors
 
  5 3
−2 −1 1
Example 9. Compute AB where A = and B =  0 1.
0 6 −2
−4 2
The first matrix A is 2 × 3 and the second matrix B is 3 × 2, so the product
AB makes sense and is a 2 × 2 matrix. This means we need to compute each
of the four entries of AB.
Let’s start with the entry in the 1st row and 1st column of AB, so i = 1
and j = 1. (Here k = 3 since that is the number of columns of A and rows of
B.) We want to compute the sum of pairwise products along the 1st row of A
and 1st column of B as shown below.
 
  5 3
− 2 −1 1 
0 1
0 6 −2
−4 2

Multiplying the corresponding entries in this row and column we get

(−2)(5) + (−1)(0) + (1)(−4) = −14


 
−14
so AB = .
Moving to the entry in the 1st row and 2nd column of AB where i = 1
and j = 2, means we’re looking at the 1st row of A and 2nd column of B as
shown below.
 
  5 3
− 2 −1 1 
0 1
0 6 −2
−4 2

Multiplying the corresponding entries in this row and column we get

(−2)(3) + (−1)(1) + (1)(2) = −5


 
−14 −5
so AB = .
The entry in the 2nd row and 1st column of AB has i = 2 and j = 1,
which means we’re looking at the 2nd row of A and 1st column of B as shown
below.
 
  5 3
−2 −1 1 
0 1
0 6 −2
−4 2

Multiplying the corresponding entries in this row and column we get

(0)(5) + (6)(0) + (−2)(−4) = 8


2.3 Matrix Operations 89
 
−14 −5
so AB = .
8
The entry in the 2nd row and 2nd column of AB has i = 2 and j = 2,
which means we’re looking at the 2nd row of A and 2nd column of B as shown
below.
 
  5 3
−2 −1 1 
0 1
0 6 −2
−4 2

Multiplying the corresponding entries in this row and column we get

(0)(3) + (6)(1) + (−2)(2) = 2

so  
−14 −5
AB = .
8 2

The fact that this new operation is called matrix multiplication will tempt
you to assume that it behaves like multiplication in the real numbers. However,
that is a dangerous parallel to draw, because matrix multiplication isn’t
based on multiplication at all – it is based on function composition. With
that in mind, let’s explore two very important differences between matrix
multiplication and our usual notion of multiplication.

Theorem 3. Matrix multiplication isn’t commutative, i.e., AB 6= BA.

The easiest way we could have AB 6= BA is if only one of these products


can be computed.
   
1 3 7 −4 0
Example 10. If A = and B = , then the product AB
−2 4 3 −1 9
makes sense but the product BA does not.

Here A is 2 × 2 and B is 2 × 3. Since A’s number of columns equals B’s


number of rows, the we can compute the product AB. However, B’s number
of columns doesn’t equal A’s number of rows, so we can’t compute BA.

The second obvious issue is that even when AB and BA both exist, they
may have different dimensions.

Example 11. The two matrices A and B from Example 9 have both AB
and BA defined, but they are different sizes.
90 Functions of Vectors

The matrix A from Example 9 is 2×3 and the matrix B is 3×2. No matter
which order we list them, the number of rows in the first matrix equals the
number of columns in the second matrix. The product AB that we computed
in Example 9 has as many rows as A and as many columns as B, so it is 2 × 2.
However, the other product BA has as many rows as B and as many columns
as A, so it is 3 × 3. Since AB and BA are different sizes, it is impossible for
them to be equal.

Even if both of these matrix products make sense and are the same size,
remember that AB is the matrix of f ◦ g while BA is the matrix of g ◦ f .
From working with function composition in calculus, you should be familiar
with the fact that f (g(~x)) and g(f (~x)) may be very different functions. This
means it makes sense that AB can easily be quite different from BA. (We will
see some special cases where AB = BA, but they are the exceptions rather
than the rule.)
   
1 −2 3 1
Example 12. Check that A = and B = have AB 6= BA.
−5 1 0 4
Both A and B are 2 × 2 matrices, so AB and BA are both defined and
both have the same size (also 2 × 2). However
      
1 −2 3 1 1(3) − 2(0) 1(1) − 2(4) 3 −7
AB = = =
−5 1 0 4 −5(3) + 1(0) −5(1) + 1(4) −15 −1

while
      
3 1 1 −2 3(1) + 1(−5) 3(−2) + 1(1) −2 −5
BA = = =
0 4 −5 1 0(1) + 4(−5) 0(−2) + 4(1) −20 4

so AB 6= BA.

The moral of the story here is to be very careful about the order of matrix
multiplication and fight against the urge to assume you can switch that order
without changing the product.
Another oddity of matrix multiplication is given by the following theorem.

Theorem 4. It is possible to have AB = 0 with A 6= 0 and B 6= 0.

Note that the “0” in this theorem is the appropriately sized zero matrix.
   
2 6 12 0
Example 13. Check that AB = 0 where A = and B = .
1 3 −4 0
First of all, notice that neither A nor B are the zero matrix, since each of
them contains at least one nonzero entry.
2.3 Matrix Operations 91

Since both A and B are 2 × 2 matrices, their product is defined and is


also a 2 × 2 matrix. To compute the product, we follow the same process as
in Example 9 and get
      
2 6 12 0 2(12) + 6(−4) 2(0) + 6(0) 0 0
= = .
1 3 −4 0 1(12) + 3(−4) 1(0) + 3(0) 0 0

Since all entries of the product are zero, we do indeed have AB = 0.


(For further proof that AB 6= BA, you can check that BA is not the zero
matrix!)

This contradicts our usual intuition from the real numbers where we often
factor an equation of the form f (x) = 0 and then set each factor equal to zero
to solve. This won’t work to solve AB = 0 or even A~x = ~0. We will have to
develop new tools for this more complicated matrix situation.
One property matrix multiplication does share with multiplication of real
numbers is the existence of an identity element which mimics the way 1 acts
in R. There 1r = r1 = r for any real number r. The identity element for
matrix multiplication is the appropriately named identity matrix discussed in
Example 4 of Section 2.2, which is the matrix of the identity map. Since there
is an identity map from Rn to itself for every n, this is actually a collection of
n × n identity matrices
 
1 0 0 ··· 0
0 1 0 · · · 0
 
 ..  .
In =  ... ... . . . .
 
0 0 1 0
0 0 ··· 0 1
Any m × n matrix A corresponds to a map f : Rn → Rm . The composition
of f with the n×n identity map is f ◦idn which is just f , while the composition
of the m × m identity map with f is idm ◦ f which is again f . Translating this
into matrix multiplication tells us that AIn = A and Im A = A.

Example 14. Let n = 2. Show that I2 A = A for any 2 × 3 matrix A.


 
1 0
The 2 × 2 identity matrix is I2 = which corresponds to the map
    0 1
x1 x
id2 : R2 → R2 with id2 = 1 .
x2 x2
This means
    
1 0 a b c 1(a) + 0(d) 1(b) + 0(e) 1(c) + 0(f )
I2 A = =
0 1 d e f 0(a) + 1(d) 0(b) + 1(e) 0(c) + 1(f )
 
a b c
= = A.
d e f
92 Functions of Vectors
 
1 0 0
You can check for yourself that AI3 = A where I3 = 0 1 0.
0 0 1

Exercises 2.3.
 
1 −1
1. Compute 3 2 0 .
3 −2
 
10 −5
2. Compute 4 .
5 2
 
2 4
3. Compute 12 −6 0.
1 8
 
−1 3
4. Compute 3  2 4.
−5 0
 
3 7
5. Compute −2 .
1 −1
 
1 −2
6. Compute 3 .
−1 4
 
2 2
7. Find 4 .
1 −4
   
−1 5 1 −2 5
8. Compute + or explain why it isn’t possible.
2 3 2 0 7
   
−1 5 2 −3
9. Compute + or explain why it isn’t possible.
2 3 0 1
     
1 −1 1 2 3 −4 2
10. Let A = 2 0 , B = 2 1 −1, C =  1 −1.
3 −2 3 4 0 2 0
Compute whichever of A + B, B + C and A + C are possible.
   
3 1   −1 3
10 −5
11. Let A = −2 2 , B = , C =  0 −2.
5 2
0 −1 4 9
Compute whichever of A + B, A + C and B + C are possible.
   
2 4   −1 3
1 2
12. Let A = −6 0 , B = , C =  0 10 .
3 4
1 8 2 9
Compute whichever of A + B, A + C, and B + C are possible.
2.3 Matrix Operations 93
     
1 2 0 −1 3 3 1 7
13. Let A = −2 3 −2 , B =  2 4, C = −2 −5 10.
0 −1 4 −5 0 0 3 4
Compute whichever of A + B, A + C and B + C are possible.
     
1 −4 2 5 2 −3 3 7
14. Let A = ,B= ,C= .
0 −1 3 4 0 6 1 −1
Compute whichever of A + B, A + C and B + C are possible.
     
1 −2 0 2 2 1 0 7
15. Let A = ,B= ,C= .
−4 2 5 1 −4 3 −2 0
Compute whichever of A + B, A + C and B + C are possible.
16. Suppose A is a 10 × 6 matrix.
(a) What size matrix B would make computing AB possible?
(b) What size matrix B would make computing BA possible?
17. Suppose A is a 5 × 9 matrix.
(a) What size matrix B would make computing AB possible?
(b) What size matrix B would make computing BA possible?
18. Suppose A is a 7 × 3 matrix.
(a) What value of n would make computing AIn possible?
(b) What value of m would make computing Im A possible?
 
a b
19. Let A = . Use matrix multiplication to check that I2 A = A
c d
and AI2 = A.
   
1 2 5 −2 2
20. Compute · or explain why it isn’t possible.
3 −1 1 4 −1
   
−2 2 1 2 5
21. Compute · or explain why it isn’t possible.
4 −1 3 −1 1
     
1 −1 1 2 3 −4 2
22. Let A = 2 0 , B = 2 1 −1, C =  1 −1.
3 −2 3 4 0 2 0
Compute whichever of AB, BC and AC are possible.
   
3 1   −1 3
10 −5
23. Let A = −2 2 , B = , C =  0 −2.
5 2
0 −1 4 9
Compute whichever of AB, BC and CA are possible.
   
2 4   −1 3
1 2
24. Let A = −6 0 , B = , C =  0 10 .
3 4
1 8 2 9
Compute whichever of AB, AC, and BC are possible.
94 Functions of Vectors
     
1 2 0 −1 3 3 1 7
25. Let A = −2 3 −2 , B =  2 4, C = −2 −5 10.
0 −1 4 −5 0 0 3 4
Compute whichever of AB, BC, BA are possible.
     
1 −4 2 5 2 −3 3 7
26. Let A = ,B= ,C= .
0 −1 3 4 0 6 1 −1
Compute whichever of AB, BC and AC are possible.
     
1 −2 0 2 2 1 0 7
27. Let A = ,B= ,C= .
−4 2 5 1 −4 3 −2 0
Compute whichever of AB, BC and AC are possible.
 
  2 4 1
7 −3 5
28. Let A = and B = −4 −2 1. Explain why AB
−2 0 8
0 3 6
can be computed, but BA cannot.
   
3 −2 4 1
29. Let A = and B = . Show AB 6= BA even though
6 1 −1 2
both AB and BA can be computed.
 
1 −1
30. Let A = . Find a nonzero matrix B for which AB = BA.
0 2
 
1 −1
31. Let A = . Find a nonzero matrix B for which AB = 0.
−2 2
2.4 Matrix Vector Spaces 95

2.4 Matrix Vector Spaces


In the last section, we worked out the right way to add matrices and
multiply matrices by scalars so that they correctly mirror adding functions
and multiplying functions by scalars. Since we can only add matrices of the
same size, let’s pick a (generic) size for our matrices and consider only matrices
of that fixed size m × n. The set of all matrices of size m × n is called Mmn .
As usual, we want to take stock of some nice properties of these two matrix
operations. Stay alert and see if you can remember where we’ve discussed
these properties before! First let’s consider matrix addition.
If we add two m × n matrices, the result will be another m × n matrix.
To see this, think of our sum as the addition of two functions f and g which
both map from Rn to Rm . The sum should be a new function which still maps
from Rn to Rm , which will also have an m × n matrix. This means Mmn is
closed under matrix addition.
The order in which we add matrices doesn’t matter. Suppose we have two
matrices A and B, which correspond to linear functions f and g respectively.
Since we saw in Chapter 1 that order of addition of vectors doesn’t matter, it
is clear that
f (~x) + g(~x) = g(~x) + f (~x)
so we must have f + g = g + f and hence

A + B = B + A.

This also jives with our intuition from calculus where x2 + 2x and 2x + x2 are
the same function. This means matrix addition is commutative.
We can use a similar line of reasoning to see that if we’re adding three
matrices, we can start by adding any pair of them and still get the same
result. This property is also inherited from vector addition, this time from the
fact that (~v + w)
~ + ~u = ~v + (w
~ + ~u). This means for any three functions from
Rn to Rm we have

(f (~x) + g(~x)) + h(~x) = f (~x + (g(~x) + h(~x))

so we also have (f + g) + h = f + (g + h) and therefore

(A + B) + C = A + (B + C).

This means matrix addition is associative.


Matrix addition has an identity element, which is a matrix that we can
add to any other matrix and get that other matrix back. Since the zero vector
is the additive identity in Rn , the function f (~x) = ~0 is our additive identity
in Mmn because adding it to any other function from Rn to Rm has no effect.
This zero function’s matrix is the m × n matrix with all entries equal to 0,
which we’ll call the zero matrix for short.
96 Functions of Vectors

What about additive inverses? In other words, for each matrix A in Mmn
we want a partner matrix −A so that −A + A is the zero matrix. Here it is
easier to think computationally instead of in terms of functions. To get each
entry of this sum to be zero, we need the entries of −A to be the entries of A
just with the opposite sign. This means that

−A = (−1) · A.

Next let’s consider multiplication of matrices by scalars.


If we multiply an m×n matrix by a scalar, the result will be another m×n
matrix. As with addition, scaling a function f which maps from Rn to Rm
gives another function from Rn to Rm . In other words, Mmn is closed under
scalar multiplication.
Multiplication by scalars is also associative in the sense that if we want to
scale a matrix A by two scalars r and s, we can choose to either scale A by s
and then scale sA by r, or just scale A by the product rs all at once. In other
words,
r(sA) = (rs)A.
Thinking in terms of A’s function f , this is like scaling the output vector
of f by either s and then r or scaling by rs. Vectors have associative scalar
multiplication, so f and A inherit it as well.
The scalar 1 leaves matrices unchanged, i.e., 1·A = A. This is clear because
multiplying each entry of A by 1 produces no change.
Finally, let’s consider how these two operations interact with each other.
We saw that scalar multiplication of vectors distributes over scalar
addition, i.e.,
(r + s) · ~v = r · ~v + s · ~v .
This means that for any linear function f from Rn to Rm we know

(r + s)f (~x) = rf (~x) + sf (~x)

and therefore
(r + s)A = rA + sA
for the m × n matrix of f .
Similarly, we know scalar multiplication of vectors distributes over vector
addition. In function terms, this means we know

rf (~x) + rg(~x) = r(f (~x) + g(~x)).

Translating this to matrices, we see

rA + rB = r(A + B).

Did these properties look familiar? That’s because they’re the same as
the good properties of vector addition and multiplication of vectors by scalars
2.4 Matrix Vector Spaces 97

summarized in Theorem 1 of 1.1. One way to think of this is that Mmn is acting
like Rn if we think of matrices in place of the vectors and use matrix addition
and scalar multiplication instead of vector addition and scalar multiplication.
This motivates the following definition.

Definition. A collection of mathematical objects, V , with two operations


“+” and “·” is a vector space if for all ~v , ~u, w
~ in V and r, s in R the following
conditions are satisfied:
1. ~v + ~u is in V (closure of addition)
2. ~v + ~u = ~u + ~v (addition is commutative)
3. (~v + ~u) + w
~ = ~v + (~u + w)
~ (addition is associative)
4. There is a 0 in V so that ~v + ~0 = ~v (additive identity)
~
5. For each ~v , there is a −~v so that −~v + ~v = ~0 (additive inverses)
6. r · ~v is in V (closure of scalar multiplication)
7. r · (s · ~v ) = (rs) · ~v (scalar multiplication is associative)
8. 1 · ~v = ~v (scalar multiplication’s identity is 1)
9. (r + s) · ~v = r · ~v + s · ~v (scalar multiplication distributes over
addition of scalars)
10. r · (~v + ~u) = r · ~v + r · ~u (scalar multiplication distributes over
addition of vectors)

Note that we will usually call the elements ~v of a vector space V vectors
and put an arrow over their variables even though we know that they may not
be vectors in Rn in the sense of our definition from 1.1. If this is confusing,
you can feel free to mentally add air quotes around the word “vector” when
thinking about this privately.

Example 1. The set V = Rn with vector addition as + and scalar


multiplication of vectors as · is a vector space.

We saw in 1.1 that vector addition and scalar multiplication of vectors


have these properties.

Example 2. The set V = Mmn with matrix addition as + and scalar


multiplication of matrices as · is a vector space.

We saw at the beginning of this section that matrix addition and scalar
multiplication of matrices have these properties.

This gives us tons of vector spaces to work with: an Rn for every choice of
n and an Mmn for every choice of m and n.
98 Functions of Vectors

To get even more examples of vector spaces, remember that in 1.2 we


discussed subsets of Rn which were self-contained subspaces under our vector
operations. We can now reformulate our definition of a subspace using our
definition of a vector space.

Definition. Let V with + and · be a vector space. A subset W of V is a


subspace if W is a vector space using the same operations + and · as V .

Before we look at some examples, let’s also generalize our subspace test
from 1.2. Otherwise we’d be forced to recheck all ten properties from our vector
space definition every time we wanted to say something was a subspace, which
is much more work than we really need to do.

Theorem 1. Let V with + and · be a vector space. W ⊆ V is a subspace of


V if it satisfies the following conditions:
1. w
~1 + w~ 2 is in W for every w
~ 1 and w
~ 2 in W (closure of addition)
~
4. 0V is in W (zero vector of V is in W )
6. r · w
~ is in W for every scalar r and w
~ in W (closure of scalar
multiplication)

I’ve labeled each condition with the same number as its corresponding
property from our definition of a vector space.
To see that this is true, let’s convince ourselves that if we have these three
properties we get the other seven from the definition of a vector space.
Notice that since V is a vector space and W ⊆ V , we automatically
get properties 2, 3, 7, 8, 9, and 10. This means that we really just need to
understand why we get property 5.
Pick any w ~ in W . Since w~ is in V , we know that it has an additive inverse
−w~ in V . If we can show that −w ~ is also in W , we’ll be done. To see that,
consider
~0V = 0 · w
~ = (−1 + 1) · w
~ = (−1) · w
~ +1·w
~ = (−1) · w
~ +w
~

which tells us that


(−1) · w
~ = −w.
~
Now we can invoke property 6 to say that since w~ is in W , −w
~ is also in W .
Therefore W also satisfies property 5, and is a subspace of V . With this
shortcut in hand, let’s look at an example.
 
a 0
Example 3. Show W = with the usual addition and scalar
0 b
multiplication of matrices is a subspace of M22 .
2.4 Matrix Vector Spaces 99

We know M22 is a vector space under matrix addition and scalar


multiplication, and W is clearly a subset of M22 . That means we can use
the subspace test to show W is a subspace. Since all three properties used in
the subspace test ask us to show something is in W , we should first remind
ourselves that to be in W a 2 × 2 matrix simply needs to have the top right
and bottom left entries equal 0.
We’ll start 
by checking
 that W is closed under addition. If we have two
a 0 c 0
matrices and in W , their sum is
0 b 0 d
     
a 0 c 0 a+c 0
+ =
0 b 0 d 0 b+d

which is clearly also in W . Next we need to show ~0M22 is in W . Since


 
~0M22 = 0 0
0 0

we can see that ~0M22 is in W by letting a = b = 0. Finally, we need


 to show
a 0
W is closed under scalar multiplication. If we have a matrix in W and
0 b
a scalar r, then their product is
   
a 0 ra 0
r = .
0 b 0 rb

Since this is in W , we’ve satisfied all three conditions of the subspace test.
Therefore, W is a subspace of M22 . This also shows that W is a vector space.

We saw in 1.2 that the span of a set of vectors is automatically a subset of


Rn , which gave us another way to check that something was a subspace. We
can generalize that idea to Mmn , but first we’ll need to update our definition
of a span in Rn to a general vector space.

Definition. Let V with + and · be a vector space. The span of ~v1 , . . . , ~vn
in V is Span{~v1 , . . . , ~vn } = {a1~v1 + · · · + an~vn } where a1 , . . . , an are scalars.

Remember that the + and · in this definition are the operations from V .
   
1 0 1 1
Example 4. Find the span of and in M22 .
0 1 0 0
From our definition, we have
        
1 0 1 1 1 0 1 1
Span , = a1 + a2 .
0 1 0 0 0 1 0 0
100 Functions of Vectors

The right-hand side can be simplified to


     
a1 0 a a2 a1 + a2 a2
+ 2 =
0 a1 0 0 0 a1
so      
1 0 1 1 a1 + a2 a2
Span , = .
0 1 0 0 0 a1
Another way to say this is that the span of these two matrices is the set of all
matrices whose bottom left entry is 0 and whose top left entry is the sum of
the top right and bottom right entries.

Theorem 2. Let ~v1 , . . . , ~vk be in a vector space V . Then Span{~v1 , . . . , ~vk } is


a subspace of V .

To check that any span is automatically a subspace of V , we can use the


subspace test. Pick any two vectors w ~ and ~u in Span{~v1 , . . . , ~vn }. Since they
are in the span of the ~v s, we can rewrite them as

~ = a1~v1 + · · · + an~vn
w

and
~u = b1~v1 + · · · + bn~vn
for some scalars a1 , . . . , an and b1 , . . . , bn . This means their sum is

~ + ~u = (a1~v1 + · · · + an~vn ) + (b1~v1 + · · · + bn~vn )


w
= (a1~v1 + b1~v1 ) + · · · + (an~vn + bn~vn )
= (a1 + b1 )~v1 + · · · + (an + bn )~vn

which is clearly in Span{~v1 , . . . , ~vn }. If we let all our scalars equal zero, we
have
0 · ~v1 + · · · + 0 · ~vn = ~0V
in Span{~v1 , . . . , ~vn }. If w
~ is in Span{~v1 , . . . , ~vn } and r is any scalar, then

r·w
~ = r · (a1~v1 + · · · + an~vn ) = r · (a1~v1 ) + · · · + r · (an~vn )
= (ra1 )~v1 + · · · + (ran )~vn

which is also in Span{~v1 , . . . , ~vn }. Since Span{~v1 , . . . , ~vn } satisfies all three
conditions of the subspace test, it is a subspace of V .
 
a 0
Example 5. Show W = is a subspace of M22 .
0 b
We showed this in Example 3 using the subspace test, but it can also be
shown by writing W as a span.
2.4 Matrix Vector Spaces 101

To do this, we need to rewrite our general element of W as a linear


combination of a set of other matrices. We’ll start by splitting an element
of W up into the sum of two parts: one containing terms with a and one
containing terms with b. This gives us
     
a 0 a 0 0 0
W = = + .
0 b 0 0 0 b

Now we can factor an a out of the first matrix and a b out of the second to
get     
1 0 0 0
W = a +b .
0 0 0 1
We’re more used to seeing x1 and x2 as our scalar coefficients, but there’s no
harm in calling them a and b instead. This means
        
1 0 0 0 1 0 0 0
W = a +b = Span , .
0 0 0 1 0 0 0 1

Since    
1 0 0 0
W = Span ,
0 0 0 1
Theorem 2 tells us that W is a subspace of M22 .

Another important idea from Rn that we can generalize to any vector space
is the idea of linear independence and linear dependence. In 1.3 we gave two
equivalent definitions, so we have a choice here about which one to generalize.
I’ll generalize the vector equation version here, and you can work through
generalizing the span version and arguing that the two are still equivalent in
the exercises.

Definition. A set of elements ~v1 , . . . , ~vk from a vector space V are linearly
dependent if the equation x1 · ~v1 + · · · + xk · ~vk = ~0V has a solution where at
least one of the xi s is nonzero. Otherwise ~v1 , . . . , ~vk are linearly independent.

As with span, here + and · are the addition and scalar multiplication of
V and ~0V is V ’s zero vector.
     
1 −1 −2 2 −5 5
Example 6. Are , , and linearly independent or
0 2 1 1 1 −5
linearly dependent?

Here V = M22 with  matrix


 addition and scalar multiplication as its
0 0
operations and ~0M22 = . This means the equation from our generalized
0 0
102 Functions of Vectors

definition of linear independence and linear dependence is


       
1 −1 −2 2 −5 5 0 0
x1 + x2 + x3 = .
0 2 1 1 1 −5 0 0

If we have no solutions to this equation apart from setting all the variables
equal to zero, then our three matrices are linearly independent. If we can find
a solution where any of the variables is nonzero, our matrices are linearly
dependent.
Simplifying the left-hand side of this equation gives us
   
x1 − 2x2 − 5x3 −x1 + 2x2 + 5x3 0 0
= .
x2 + x3 2x1 + x2 − 5x3 0 0

Setting corresponding entries of these two matrices equal gives us the


equations x1 − 2x2 − 5x3 = 0, −x1 + 2x2 + 5x3 = 0, x2 + x3 = 0, and
2x1 + x2 − 5x3 = 0. The third equation tells us that x2 = −x3 . Plugging this
into the first equation gives x1 + 2x3 − 5x3 = 0 or x1 − 3x3 = 0, so x1 = 3x3 .
However, if we plug x1 = 3x3 and x2 = −x3 into any of our original equations
they simplify to 0 = 0. Therefore x3 can equal anything and in particular can
be nonzero. This tells us that our three matrices are linearly dependent.
     
1 −1 1 0 0 −1
Example 7. Are , , and linearly independent or
1 −1 0 1 −1 0
linearly dependent?

As in the previous example, V = M22 and we can tackle this question of


linear independence or dependence via the equation
       
1 −1 1 0 0 −1 0 0
x1 + x2 + x3 = .
1 −1 0 1 −1 0 0 0

This simplifies to    
x1 + x2 −x1 − x3 0 0
=
x1 − x3 −x1 + x2 0 0
which gives us the equations x1 + x2 = 0, −x1 − x3 = 0, x1 − x3 = 0, and
−x1 + x2 = 0. The first equation gives us x1 = −x2 , while the fourth equation
gives us x1 = x2 . This means −x2 = x2 which can only be satisfied if x2 = 0.
This also means x1 = 0. The third equation tells us x1 = x3 , so x3 = 0 as
well. Since our only solution is to have all three variables equal zero, our three
matrices are linearly independent.

Just as we did with subspaces, spans, and linear independence, we can also
generalize our idea of linear functions. We’ll use the same properties as in 2.1,
but now we’ll allow the domain and codomain of our map to be general vector
spaces instead of restricting them to be Rn and Rm .
2.4 Matrix Vector Spaces 103

Definition. Let V and W be vector spaces. A function f : V → W is linear


if f (~v + ~u) = f (~v ) + f (~u) and f (r · ~v ) = r · f (~v ) for all vectors ~v and ~u in V
and all scalars r.

Remember that the addition and scalar multiplication used on the left-
hand side of each of this definition’s examples are the operations from the
vector space V , while the addition and scalar multiplication used on the right-
hand sides are the operations from W . These may be different if V and W are
different types of vector spaces!
 
  a+d
a b c
Example 8. Show that f : M23 → R3 by f = 2b  is
d e f
c+e−f
linear.

To show that f is a linear function, we need to show that it satisfies both


conditions from the definition above. Let’s start by checking addition.
     
a b c r s t a+r b+s c+t
f + =f
d e f x y z d+x e+y f +z
 
(a + r) + (d + x)
= 2(b + s) 
(c + t) + (e + y) − (f + z)
 
(a + d) + (r + x)
= 2b + 2s 
(c + e − f ) + (t + y − z)
   
a+d r+x
= 2b  +  2s 
c+e−f t+y−z
   
a b c r s t
=f +f .
d e f x y z

This means f satisfies the first condition. Next we need to check scalar
multiplication.
 
     ra + rd
a b c ra rb rc
f r =f = 2rb 
d e f rd re rf
rc + re − rf
   
r(a + d) a+d
= r(2b)  = r 2b .
r(c + e − f ) c+e−f

Since f also satisfies the second condition, it is a linear function.


104 Functions of Vectors
   
2 x1 x1 x2
Example 9. Show that f : R → M22 by f = is
x2 x1 + x2 x1 x2
not linear.

To show this function isn’t linear, we only need to show that it fails one of
our two conditions. This function actually fails both, but here I’ll just show
that it fails the condition for scalar multiplication.
         
x1 rx1 rx1 rx2 rx1 rx2
f r =f r = =
x2 rx2 rx1 + rx2 rx1 rx2 rx1 + rx2 r2 x1 x2

but      
x1 x1 x2 rx1 rx2
rf =r = .
x2 x1 + x2 x1 x2 r(x1 + x2 ) rx1 x2
Comparing the bottom right entries shows that these two options are not
equal, so f is not linear.

Exercises 2.4.
 
3 17 0
1. What is the additive inverse of in M23 ?
−5 2 −8
 
−1 9
2. What is the additive inverse of in M22 ?
8 0
 
a −2a
3. Show W = with the usual matrix addition and scalar
0 a
multiplication is a subspace of M22 .
 
a b
4. Is W = a subspace of M22 ?
−a 2b
 
a b
5. Let W be the set of all 2 × 2 matrices of the form where
c 0
a + b + c = 1. Is W a subspace of M22 ?
6. The trace of an n × n matrix A, written tr(A), is the sum of A’s
diagonal entries. Is W = {A in M22 | tr(A) = 0} a subspace of M22 ?
  
a b
7. Show that V = a, b, c, d ≥ 0 with the usual matrix
c d
addition and scalar multiplication is NOT a vector space.
8. Explain why we can’t make the set of n × n matrices into a vector
space where the “+” is defined as matrix multiplication, in other
words, where A + B = AB.
9. Explain why we can’t make R into a vector space where “+” is
defined to be the usual multiplication of real numbers, i.e., a + b =
ab.
2.4 Matrix Vector Spaces 105
 
a 0
10. Show that D = with a, b in R is a vector space under
0 2a
the usual operations of M22 .
11. Fix a particular 3×3 matrix A. Let W be the set of all 3×3 matrices
which commute with A, i.e., W = {B in M33 | AB = BA}. Show
that W is a vector space.
   
1 0 0 1
12. Find the span of and in M22 .
0 2 1 0
   
−1 0 1 0 4 0
13. Find the span of and in M23 .
0 1 0 0 −1 1
       
1 −2 2 −1 −1 0 1 1
14. Is in the span of , , and ?
−1 0 0 2 3 −1 1 1
     
2 −2 10 3 −1 4 4 0 −2
15. Is in the span of and ?
−1 1 8 2 0 5 5 −1 2
     
1 2 4 3 −2 1
16. Are ~v1 = , ~v2 = and ~v3 = linearly
3 4 2 −1 4 9
independent or linearly dependent in M22 ?
     
2 0 1 1 0 1
17. Are ~v1 = , ~v2 = , and ~v3 = linearly
0 2 0 1 1 0
independent or linearly dependent in M22 ?
     
2 0 1 1 0 1
18. Are ~v1 = , ~v2 = , and ~v3 = linearly
0 2 0 1 1 0
independent or linearly dependent in M22 ?
     
1 −1 −3 2 0 −1
19. Are ~v1 = , ~v2 = , ~v3 = linearly
0 −2 1 −1 1 −7
independent or linearly dependent in M22 ?
20. In 1.3 we had two equivalent definitions for linear dependence, one
in terms of an equation and one involving spans. In this section,
we generalized the equation definition to get our definition of linear
dependence in a vector space.
(a) Generalize the definition of linear dependence involving spans
to a vector space.
(b) Generalize our explanation from 1.3 of why the two definitions
were equivalent to a vector space.
   
x1 x1 + 3x2 0
21. Let f = .
x2 0 2x1 − x2
(a) What is the domain of f ?
(b) What is the codomain of f ?
 
  a+b
a b
22. Let f =  b − c .
c d
2d
106 Functions of Vectors

(a) What is the domain of f ?


(b) What is the codomain of f ?
   
a b −a a + b b
23. Let f = .
c d a+c 0 c+d
(a) What is the domain of f ?
(b) What is the codomain of f ?
 
  f 0 c
a b c
24. Let f = 0 a + e 0.
d e f
a 0 b
(a) What is the domain of f ?
(b) What is the codomain of f ?
     
x1 x1 + 3x2 0 3
25. Let f = . Compute f .
x2 0 2x1 − x2 −1
 
  a+b  
a b 4 −1
26. Let f =  b − c . Compute f .
c d 2 5
2d
     
a b −a a + b b 1 −3
27. Let f = . Compute f .
c d a+c 0 c+d −2 6
 
  f 0 c  
a b c   7 −1 2
28. Let f = 0 a + e 0 . Compute f .
d e f 3 1 4
a 0 b
 
  a+b 0 0
a b
29. Show that f = 0 c+d 0  is not a linear
c d
0 0 a−d+1
function.
30. Show that the map from Example 9 doesn’t satisfy the additive
condition f (~x + ~u) = f (~x) + f (~u) from the definition of a linear
function.
   
x1 x1 + 3x2 0
31. Show that f = is a linear function.
x2 0 2x1 − x2
 
  a+b
a b
32. Show that f =  b − c  is a linear function.
c d
2d
2.5 Kernel and Range 107

2.5 Kernel and Range


In this section we’ll explore how a linear function creates two special subspaces
within its domain and codomain. We’ll start in the domain, where we’ll explore
solutions to the equation f (~x) = ~0. This is an important equation in many
applications, for example, when finding maximums and minimums using the
techniques of calculus. It also helps us figure out what is mapped to the origin
in the codomain, which can be visualized geometrically as the subset of the
domain which is collapsed to the point ~0 by the map f .
 
  0
x1
Example 1. Let f : R2 → R3 by f =  x1 − 2x2 . Which vectors
x2
−3x1 + 6x2
2 ~
from R are mapped to 0 in R ? 3

We can answer this question by solving f (~x) = ~0. In our case this is
   
0 0
 x1 − 2x2  = 0 .
−3x1 + 6x2 0
This means we need x1 −2x2 = 0 and −3x1 +6x2 = 0. Both of these equations
simplify to 2 ~ 3
 x1 = 2x2 , so a vector from R maps to 0 in R exactly when it has
x1
the form .
2x1
We can see this subset of R2 geometrically by relabeling x1 and x2 as x
and y, so our requirement to map to ~0 becomes x = 2y. Now solving for y
2
allows us to view this set in a familiar format as the line y = 12 x pictured
below.

1

�= �

-2 -1 1 2

-1

-2
108 Functions of Vectors

Since this subset of the domain is clearly important, we make the following
definition.

Definition. Suppose f : V → W is a linear function between two vector


spaces. The kernel of f is ker f = {~v in V | f (~v ) = ~0W }.

In other words, the kernel of a linear function is the set of all vectors in
its domain which map to the zero vector of the codomain. (Keep in mind that
we’re using “zero vector of the codomain” in the context of our vector space
definition, so if W = Mmn our zero vector would be the zero matrix.)

3
  2. Find the kernel of the function f : R → M22 where
Example
x1  
x x1 − x3
f x2  = 2 .
0 x2
x3
The domain of f is R3 and the codomain is M22 , so the kernel of f is the
set of 3-vectors which map to the 2 × 2 zero matrix. We can solve for those
vectors using the equation f (~x) = ~0M22 which can be rewritten as
   
x2 x1 − x3 0 0
= .
0 x2 0 0

Setting corresponding matrix entries equal, we get x2 = 0 and x1 − x3 = 0.


This means for ~x to be in the kernel we need x2 = 0 and x1 = x3 . Thus
 
 x1 
ker(f ) =  0  .
 
x1

In the special case where our linear function maps from Rn to Rm , we


can think of our linear function f as multiplicationby some matrix A. For
0 0
example, the function in Example 1 has matrix A =  1 −2. In this case,
−3 6
the kernel of f has an alternate name.

Definition. Let A be an m × n matrix. The null space of A is


N ul(A) = {~x in Rn | A~x = ~0}.

In other words, the null space of A is all vectors ~x which satisfy A~x = ~0.
(Here our zero vector is the actual vector of zeros in Rm .) If f is the linear
function with matrix A, then A~x = ~0 is the same as f (~x) = ~0, so the null
space of A is the kernel of f .
2.5 Kernel and Range 109
 
1 −2
Example 3. Find the null space of A = .
− 12 1
The null space of A is all 2-vectors ~x where A~x = ~0. We can rewrite this
as     
1 −2 x1 0
= .
− 12 1 x2 0
Multiplying out the left-hand side gives us
   
x1 − 2x2 0
= .
− 12 x1 + x2 0

Setting corresponding entries equal gives us the equations x1 − 2x2 = 0 and


− 12 x1 + x2 = 0. Solving for x1 in either equation gives us x1 = 2x2 , so
 
2x2
N ul(A) = .
x2

So far we’ve seen the kernel (or null space) as a subset of the domain of
our function. However, the situation is more special than that.

Theorem 1. If f : V → W is a linear function, then ker(f ) is a subspace of


V.

Another way to say this (while practicing our new vocabulary) is that
the kernel of a linear function is a subspace of its domain, which we can
check using our subspace test. This means we need to verify that the kernel
contains the zero vector of the domain and is closed under addition and scalar
multiplication.
To show that ~0V (the zero vector of the domain V ) is in the kernel of f ,
we need to show f (~0V ) = ~0W . To do this we’ll use the fact that f is linear
along with a trick: because (−1) · ~x is the additive inverse, −~x, of ~x, we get
~0 = ~x + (−1) · ~x for any vector ~x in any vector space V . This means

f (~0V ) = f (~v + (−1) · ~v ) = f (~v ) + (−1) · f (~v ) = ~0W

so ~0V is in ker(f ).
To show that the kernel of f is closed under addition, suppose we have two
vectors ~v1 and ~v2 in ker(f ). We need to show ~v1 + ~v2 is also in ker(f ). Since
~v1 and ~v2 are in ker(f ) we know f (~v1 ) = ~0W and f (~v2 ) = ~0W . Therefore we
have
f (~v1 + ~v2 ) = f (~v1 ) + f (~v2 ) = ~0W + ~0W = ~0W
so ~v1 + ~v2 is in ker(f ).
110 Functions of Vectors

To show that the kernel of f is closed under scalar multiplication, suppose


we have a vector ~v1 in ker(f ) and a scalar r. We need to show r · ~v1 is also in
ker(f ). As above, we have f (~v1 ) = ~0W , so

f (r · ~v1 ) = r · f (~v1 ) = r · ~0W = ~0W

which means r · ~v1 is in ker(f ).


Since the kernel satisfies all three conditions of the subspace test, we can
see that the kernel of any linear map is a subspace of its domain.

This means that we automatically know that the sets we found in Examples
1 and 3 are subspaces of R2 and the set we found in Example 2 is a subspace
of R3 .
During our subspace check above, we showed that the kernel of any linear
map contains the zero vector of its domain. One interesting question we can
ask about a function is whether its kernel contains anything besides the zero
vector. This is equivalent to asking when the only vector that maps to the
zero vector of the codomain is the zero vector of the domain. Geometrically,
this would mean that there is no collapsing as we map to ~0 since only a single
unique vector is being mapped there. We can expand this idea of a function
having no collapsing as we map to anything in the codomain, which motivates
the following definition.

Definition. A function f is 1-1 if ~x 6= ~y guarantees that f (~x) 6= f (~y ).

If we have ~x 6= ~y with f (~x) = f (~y ), then f has collapsed ~x and ~y to the


same point in its codomain. This means a function is 1-1 if it doesn’t allow
any collapsing to occur.
   
2 x 2 y
Example 4. The function f : R → R by f = is 1-1.
y x
This function switches the two components of our vector, which means f
is reflection about the line y = x. It is impossible to map two different points
in the plane to the same point with this reflection, so f is 1-1.

 
x  
x
Example 5. The function f : R3 → R2 by f y  = is not 1-1.
y
z
Geometrically, this function can be thought of as a projection of R3 into
the xy-plane. This projection definitely collapses multiple different vectors to
the same output vector as each line in the z direction is identified with a single
vector in R2 . Algebraically, f identifies all vectors in R3 which differ only in
the z component. From either perspective, f is not 1-1.
2.5 Kernel and Range 111

Since not every function is as easy to visualize as those in our previous


two examples, it can be useful to have a more computational way to check
whether or not a function is 1-1. It turns out that if a function maps two
different vectors to the same output, then it must also map multiple different
vectors to the zero vector, which is stated more precisely in the Theorem
below.

Theorem 2. A function f : V → W is 1-1 if and only if ker(f ) = {~0V }.

This means another way to view 1-1 functions is that they are the functions
with the smallest possible kernels.
To see why this theorem is true, suppose we have a linear map f : V → W
and two vectors ~x and ~y in V for which f (~x) = f (~y ). This can be rewritten as

f (~x) − f (~y ) = ~0W

and since f is linear, we can rewrite the left-hand side to get

f (~x − ~y ) = ~0W .

This means that ~x − ~y is an element of ker(f ) whenever f (~x) = f (~y ). Having


~x = ~y is equivalent to having ~x − ~y = ~0V . Therefore the only way to have
ker(f ) = {~0V } is to have ~x = ~y , and the only way to have ~x = ~y whenever
f (~x) = f (~y ) is to have ker(f ) = {~0V }.
 
  4d
a b  3c 
Example 6. Use Theorem 2 to show that f = 
a − b is 1-1.
c d
a+b
Since we have f : M22 → R4 , to see that this map is 1-1 we need to show
that ker(f ) contains only the zero matrix
 (which
 is the zero vector of M22 ).
a b
We can do this using the equation f = ~0 which in this case is
c d
   
4d 0
 3c  0
   
a − b = 0 .
a+b 0

Setting corresponding entries equal, we get 4d = 0, 3c = 0, a − b = 0, and


a+b = 0. The first equation gives us d = 0, and the second gives us c = 0. The
third equation tells us a = b, which can be plugged into the fourth equation
to give b + b = 2b = 0, so a = b = 0. This means the kernel of f is the 2 × 2
zero matrix, so f is 1-1.
112 Functions of Vectors
 
x1  
x2 x1 − x3
Example 7. Use Theorem 2 to show that f x2  = is
0 x2
x3
not 1-1.

As we saw in Example 2, the kernel of this map contains vectors other


than the zero vector. Thus f is not 1-1.

Suppose for a moment that we are in the special case f : Rn → Rm , where


the kernel of f can be thought of as the null space of its m × n matrix A.
If f is 1-1, this means N ul(A) = {~0}. This is the same as saying A~x = ~0
is only satisfied by ~x = ~0. If we mentally transform this matrix equation
into a vector equation, this is equivalent to saying A’s columns are linearly
independent. Since there are n columns in our matrix, this means the span of
the columns has dimension n. However, A’s columns are m-vectors, so their
span is a subset of Rm , which means its dimension is at most m. Therefore,
we get the following.

Theorem 3. It is only possible for f to be 1-1 if n ≤ m. Thus if n > m, we


know that f cannot be 1-1.

Note that this theorem can’t be reversed, i.e., having n ≤ m doesn’t


guarantee f is 1-1.

Example 8. Use Theorem


 3 to check whether or not the function f with
−3 0 1 2
matrix is 1-1.
4 7 −6 1
This matrix is 2 × 4, so we know f : R4 → R2 . Since 4 > 2, Theorem 3
tells us the function corresponding to this matrix is not 1-1.

Example 9. Use Theorem 3 to check whether or not the function f with


−1 7
matrix  0 2  is 1-1.
4 −5
This matrix is 3×2, so we know f : R2 → R3 . Since 2 ≯ 3, Theorem 3 can’t
tell us anything about whether the function corresponding to this matrix is
1-1 or not.

Now that we’ve discussed the kernel within our domain, let’s switch gears
to look at a special subset of our codomain. Since the codomain was defined
as the vector space where the outputs of the function live, it’s very natural to
ask what those outputs are.
2.5 Kernel and Range 113

2 2
Example
  10.
 What are the outputs of the function f : R → R by
x x−y
f = ?
y −x + y
The outputs of this function are all vectors in the codomain, R2 , which
are f (~x) for some vector ~x in the domain. Looking at the formula for our
x−y
function, this means our outputs are all 2-vectors of the form for
 −x + y
x−y
some scalars x and y. We can rewrite this output vector as , which
−(x − y)
shows that our outputs can be thought of as any 2-vector whose second entry
is the negative
  of the first entry. To summarize, f ’s outputs are all 2-vectors of
2
z
the form for some scalar z. We can visualize this set as the line y = −x
−z
in the picture below.

-2 -1 1 2

� = -�

-1

Definition. Suppose f : V → W is a linear function between two vector


spaces. The range of f is range(f ) = {w
~ in W | w
~ = f (~v ) for some ~v in V }.
-2

In other words, the range of a function is the set of all its outputs. This is
clearly a subset of its codomain.
 
  a+c
a b
Example 11. Find the range of f : M22 → R3 by f =  b .
c d
2b
The range of f is the set of all 3-vectors which are f (A)  2 × 2
 for some
a b
matrix A. We can express this as the set of ~x where ~x = f , i.e., we
c d
114 Functions of Vectors

need some a, b, c, and d so that


   
x1 a+c
x2  =  b  .
x3 2b

(Rememberthat we  are assuming we know x1 , x2 , and x3 and are looking for


a b
the matrix .) Setting corresponding entries equal gives us a + c = x1 ,
c d
b = x2 , and 2b = x3 . We can solve this for any value of x1 and x2 by letting
a = x1 , b = x2 , c = 0, and d any number we like. However, this only works if
2b = 2x2 = x3 . Therefore we have
 
 x1 
range(f ) =  x2  .
 
2x2

As with the kernel and null space, the range has another name in the
special case where our linear function f maps from Rn to Rm as in Example
10.

Definition. Let A be an m × n matrix. The column space of A is


Col(A) = {~b in Rm | ~b = A~x for some ~x in Rn }.

In other words, the column space of A is the set of all vectors ~b for which
A~x = ~b has a solution. From 1.2 we know that this set can also be expressed
as the span of A’s columns, so an alternate way of thinking about the column
space is as the span of the columns of A.
 
1 1
Example 12. Find the column space of A = 0 0 .
0 −2
Our matrix is 3 × 2, so its column space is the set of all 3-vectors ~b where
A~x = ~b for some 2-vector ~x. We can rewrite this equation as
   
1 1   b1
0 0  x1 = b2 
x2
0 −2 b3

which simplifies to    
x1 + x2 b1
 0  = b2  .
−2x2 b3
(As in the previous example, we view b1 , b2 , and b3 as known quantities and
solve for x1 and x2 .) Setting corresponding entries equal gives us x1 + x2 = b1 ,
2.5 Kernel and Range 115

0 = b2 , and −2x2 = b3 . Solving for x2 in the third equation gives x2 = − 12 b3 .


Plugging that into the first equation and solving for x1 gives x1 = b1 + 12 b3 .
This means as long as b2 = 0 we can find a ~x for which A~x = ~b, so
 
 b1 
Col(A) =  0  .
 
b3

As with the kernel and domain, the range (or column space) is more than
just a subset of the codomain.

Theorem 4. If f : V → W is a linear function, then range(f ) is a subspace


of W .

As with the kernel, we’ll use the subspace test by checking that the range
contains the zero vector of the codomain and is closed under addition and
scalar multiplication.
To show that ~0W (the zero vector of the codomain W ) is in the range, we
need to find some ~v in V with f (~v ) = ~0W . However, we’ve already shown that
f (~0V ) = ~0W , so ~0W is in the range of f .
To show that the range of f is closed under addition, suppose we have
two vectors w ~ 1 and w~ 2 in the range. We need to show w ~1 + w
~ 2 is also in the
range. Since w ~ 1 and w ~ 2 are in the range, we must have ~v1 and ~v2 in V with
f (~v1 ) = w
~ 1 and f (~v2 ) = w~ 2 . Since f is linear, we get

~1 + w
w ~ 2 = f (~v1 ) + f (~v2 ) = f (~v1 + ~v2 )

and w
~1 + w~ 2 is in the range of f .
To show that the range of f is closed under scalar multiplication, suppose
~ 1 in the range and a scalar r. We need to show r · w
we have w ~ 1 is also in the
range. As above, we have ~v1 with f (~v1 ) = w
~ 1 , so

r·w
~ 1 = r · f (~v1 ) = f (r · ~v1 )

so r · w
~ 1 is in the range of f .
Therefore the range is a subspace of the codomain.

This means we get that the set from Example 10 is a subspace of R2 and
the sets from Examples 11 and 12 are subspaces of R3 .
Unlike in our discussion of 1-1 where we asked which functions had the
smallest possible kernel, we’ll ask which functions have the largest possible
range. Since the range is a subset of the codomain, the largest the range
could be is the whole codomain. Geometrically, this means that the function’s
outputs fill up the entire codomain, which prompts the following definition.
116 Functions of Vectors

Definition. A function f : V → W is onto if its range is W .

In other words, a function is onto if its range is its entire codomain.


   
a b d −b
Example 13. The function f : M22 → M22 by f = is
c d −c a
onto.
   
x y w −y
For any 2 × 2 matrix A = , we have f = A. This
z w −z x
means the range of f is all of M22 , so f is onto.

 
  a+c
a b
Example 14. The function f : M22 → R3 by f =  b  is not
c d
2b
onto.
 
 x1 
We saw in Example 11 that range(f ) =  x2  . Since not all 3-vectors
 
2x2
have x3 = 2x2 , this is strictly smaller than f ’s codomain R3 . Therefore f is
not onto.

As we did with the kernel, let’s explore onto functions in the case where
f : Rn → Rm . Here the range of f can also be thought of as the column space
of f ’s m × n matrix A. If f is onto, then Col(A) = Rm . This means Col(A)
has dimension m. Since the column space of A is the span of A’s columns,
this means that A must have m linearly independent columns. Since A has
n columns in total, A has at most n linearly independent columns. Therefore
we get a parallel to Theorem 2.

Theorem 5. It is only possible for f to be onto if m ≤ n. Thus if m > n, we


know that f cannot be onto.

This theorem also can’t be reversed, i.e., having m ≤ n doesn’t guarantee


f is onto.

Example 15. Use Theorem 5 to check whether or not the function f with
7 −2 1
matrix is onto.
3 4 0
This matrix is 2 × 3, so we know f : R3 → R2 . Since 2 ≯ 3, Theorem 5
can’t tell us anything about whether or not the function corresponding to this
matrix is onto.
2.5 Kernel and Range 117

Example 16. Use Theorem 5 to check whether or not the function f with
4 9
matrix −1 0 is onto.
2 1
This matrix is 3 × 2, so we know f : R2 → R3 . Since 3 > 2 we know the
function corresponding to this matrix is not onto.

From a geometric perspective Example 16’s conclusion should make sense,


because it says that our linear function f cannot fill up three-dimensional
space (R3 ) with its outputs if its inputs are only two-dimensional (R2 ).

Exercises 2.5.
   
0 1 0 a b c  
a c
1. Is −1 0 1  in the kernel of f d e f  = ?
2a e+k
2 1 −2 g h k
     
3 −2 a b 4a − 6d
2. Is in the kernel of f = ?
6 2 c d 3b + d
 
x1  
x2  4x1 + 8x3
3. Find the kernel of f  
  = .
x3  x2 − x3 + 2x4
x4
   
x1 x1 − 4x2 + 2x3
4. Find the kernel of f x2  =  −x2 + 3x3 .
x3 x1 − 3x2 − x3
 
  2a − b
a b
5. Find the kernel of f = a + c − d.
c d
−3c
 
x1  
x + x2 0
6. Find the kernel of f x2  = 1 .
x2 + x3 −x1 − x2
x3
   
a b d a−b
7. Find the kernel of f = .
c d a+c 5d
   
a b c 2f + 4b c − 3a
8. Find the kernel of f = .
d e f a − 2b −e
 
1 3 0
9. Find the null space of A = −2 2 −8.
0 −1 1
 
2 0 4 −2
10. Find the null space of A = .
0 1 −3 1
118 Functions of Vectors
 
1 −1 0 2 0
11. Find the null space of A = 0 0 1 −3 0.
0 0 0 0 1
 
1 −4 0 1 0 0

12. Find the null space of A = 0 0 1 −2 0 −1.
0 0 0 0 1 5
13. Explain, without doing
 any computations,
 why the linear map
1 0 0 −2
whose matrix is A = cannot be 1-1.
0 0 1 3
14. Explain, without doing  any computations,  why the linear map
1 −5 0 0
whose matrix is A = 0 0 1 −2 cannot be 1-1.
0 0 −4 8
 
  x1 + x2
x1
15. Is f = 0  a 1-1 function?
x2
2x1 + x2
   
a b a−d c
16. Is f = a 1-1 function?
c d d b+c
 
  a b c  
1 2    a c
17. Is in the range of f d e f = ?
3 4 2a e + k
g h k
   
−1 1 0 0 −2
18. Is in the range of the function with matrix ?
7 0 0 1 3
 
1 −1
19. Find the range of the linear map f whose matrix is 4 −3.
0 3
20. Find therange of
 the linear map f whose matrix is
1 −3 0
0 0 1
A=  .
0 0 0
0 0 0
 
x1  
  2x1 x2
21. Find the range of f x2 = .
x2 + x3 x3
x3
 
  a + 2e + f
a b c
22. Find the range of f =  d + c .
d e f
−4b
   
a b c a+b c
23. Find the range of f = .
d e f d e+f
2.5 Kernel and Range 119
 
  a+c 0 0
a b
24. Find the range of f = 0 b+d 0 .
c d
0 0 a+b+c+d
   
−1 1 −4 2
25. Is ~v =  2  in the column space of A = 0 −1 3 ?
1 1 −3 −1
   
4 2 −1 5
26. Is ~v = 10 in the column space of A = 10 3 −1?
2 0 −1 4
 
1 1
27. Find the column space of A = 0 0.
2 5
 
2 5
28. Find the column space of A = .
4 10
29. Explain without
 doing any
 computations why the linear map whose
9 −2
matrix is A =  3 1  cannot be onto.
−4 6
30. Explain without
 doing any computations
 why the linear map whose
2 0 4
−3 1 1
matrix is A =  
 6 −6 2 cannot be onto.
0 −1 1
 
x1  
  2x1
31. Is f x2 = an onto function?
−x2
x3
 
x1  
  x1 x2
32. Is f x2 = an onto function?
x3 x1 + x2
x3
33. Let f : R3 → R2 be a linear map whose matrix is A.
(a) Give an example of an A for which f is an onto map.
(b) Give an example of an A for which f is not an onto map.
(c) Briefly explain why we can’t give an A for which f is 1-1.
34. Let f : R2 → R3 be a linear map whose matrix is A.
(a) Give an example of an A for which f is a 1-1 map.
(b) Give an example of an A for which f is not a 1-1 map.
(c) Briefly explain why we can’t give an A for which f is onto.
120 Functions of Vectors

2.6 Row Reduction


In many of our previous sections, we’ve worked on problems that eventually
reduced to solving f (~x) = ~b for ~x or solving x1~v1 + · · · xn~vn = ~b for x1 , . . . , xn .
In Section 2.2, we learned how to find a matrix A so that both types of
problems could be restated as solving A~x = ~b for ~x. In this section, we’ll
develop an algorithm to systematically solve these matrix equations.
Before we start building our algorithm, we need to introduce two more
ways of writing the equation A~x = ~b. The first way doesn’t contain any new
mathematical insights, but allows us to write down the minimum information
needed to specify the matrix equation we’re working with. Since our algorithm
will often be done on a computer or calculator, the efficiency of this format
will be very helpful.
If you consider a matrix equation A~x = ~b, many of the symbols used can be
inferred from the rest of the setup. For example, from the number of columns
of the matrix A we can already tell the number of entries in ~x. This means
we really just need to tell our algorithm the entries of A and ~b.
h i
Definition. The augmented coefficient matrix of A~x = ~b is the matrix A ~b
formed by joining ~b onto the right-hand side of A.

When writing augmented coefficient matrices, I will always include a


vertical line separating the part of the matrix coming from A from the last
column, which gives ~b. (This line occurs where the equals sign is in A~x = ~b.)
If you don’t see that line, I’m not describing an augmented coefficient matrix.
Some people call A the coefficient matrix, but I’ll avoid using that terminology
to avoid confusion.
Note that if A is m × n (making ~b an m-vector) then the augmented
coefficient matrix of A~x = ~b will be m × (n + 1).

1. Find ~
Example
  the augmented
  coefficient matrix of A~x = b where
−3 4 −1 5
A= and ~b = .
0 2 6 −2
The augmented coefficient matrix is the matrix A with ~b added as its new
fourth column to give us
 
−3 4 −1 5
.
0 2 6 −2

(Note the vertical line where we joined ~b onto A, which tells us this is an
augmented coefficient matrix.)
2.6 Row Reduction 121

Since writing down augmented coefficient matrices is easier than writing


down matrix equations and our algorithm to solve A~x = ~b will involve writing
down many matrix equations, I’ll write the algorithm in terms of augmented
coefficient matrices.
Our second new way to rewrite A~x = ~b is a more familiar format from
algebra class: as a list of equations to be solved simultaneously. To see this,
we can use the matrix-vector multiplication we developed in 2.2 to compute
 
a11 x1 + a12 x2 + · · · + a1n xn
 a21 x1 + a22 x2 + · · · + a2n xn 
 
A~x =  .. .
 . 
am1 x1 + am2 x2 + · · · + amn xn
 
b1
~  .. 
Setting this vector equal to b =  . , we get
bm
 
a11 x1 + a12 x2 + · · · + a1n xn  
  b1
 a21 x1 + a22 x2 + · · · + a2n xn
  .. 
 ..  =  . .
 . 
bm
am1 x1 + am2 x2 + · · · + amn xn

From 1.1 we know two vectors are equal precisely when each pair of their
corresponding entries is equal, so the equation above can be broken up into
m separate equations which must all hold at once. These equations are

a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2


..
.
am1 x1 + am2 x2 + · · · + amn xn = bm .
Notice that each h equation
i corresponds to a row of our augmented
~
coefficient matrix A b , since the coefficients on the xi ’s in one equation
all come from the entries of a single row of A and the equation equals the
entry of ~b from that row. Thus a matrix equation involving an m × n matrix
A gives us an m × (n + 1) augmented coefficient matrix which corresponds to
a list of m linear equations. Each variable, on the other hand, corresponds to
one of the first n columns, since the entries down the jth column of A appear
as the coefficients on xj in each equation. The entries of ~b form the added
n + 1st column, which is sometimes called the augmentation column.
The left-hand side of each of these equations is a linear combination of the
variables x1 , . . . , xn . For this reason, many people call each of these equations
a linear equation and the whole list of linear equations a linear system.
122 Functions of Vectors

~
Example 2. Write thematrix equation
  A~x = b as list of linear equations,
−3 4 −1 5
where A = and ~b = .
0 2 6 −2
Each row of A gives us the coefficients for one equation, which will equal
the corresponding entry in ~b. Since A has 2 rows and ~b has 2 entries, this means
we’ll have 2 linear equations. Each column of A corresponds to a variable, so
A’s 3 columns mean we have 3 variables which I’ll call x1 , x2 , and x3 . Putting
this all together, the first row of A and first entry of ~b give us the equation
−3x1 + 4x2 − x3 = 5. The second row of A and second entry of ~b give us
2x2 + 6x3 = −2. (Note that there is no x1 term in the second equation,
because the first entry in A’s second row is 0, which means x1 ’s coefficient in
the second equation is 0.) This means we can write A~x = ~b as

−3x1 + 4x2 − x3 = 5
2x2 + 6x3 = −2.

The benefit of rewriting a matrix equation as a list of linear equations


is that we can then use several tools from algebra to transform our linear
equations or augmented coefficient matrix. Our goal is to change our original
formulation of the problem into a version where the solutions are easy to
find. Of course, we want this transformation to occur without changing our
solutions.
Consider the linear equations 3x1 + x2 = 14 and x1 + x2 = 4 compared to
the linear equations x1 = 5 and x2 = −1. The second pair of equations is much
easier to use, even though both pairs of equations have the same solutions.
What properties made the second pair of equations so much easier? To
start with, each equation only contained one of our variables. Additionally,
those variables had coefficient 1. We also listed our equations so that we
started with the equation for x1 , then proceeded to the equation for x2 .
Even when we aren’t lucky enough to get such a clean final format, we
can still look for a simpler format. For example, we can consider the linear
equations 2x1 − 4x2 + x3 = 12, x1 − 2x2 + 3x3 = 11, and −x1 + 2x2 = −5
compared to the linear equations x1 − 2x2 = 5, x3 = 2, and 0 = 0. This second
pair of equations is simpler, and again has the same solutions as the first pair.
What properties made our second set of equations simpler? We don’t have
each variable in its own equation, but we still have a coefficient of 1 on the
first variable to appear in each equation. Additionally, the first variable in
each equation doesn’t appear in any other equation. This set of equations had
a redundant equation which we’ve replaced with 0 = 0. Finally, we’ve ordered
our list of equations based on their starting variables and placed 0 = 0 at the
end of our list since we can basically ignore it.
We’re planning to write down our algorithm using augmented coefficient
matrices, so let’s try to translate the nice properties of our simpler sets of linear
equations into properties of an augmented coefficient matrix. The augmented
2.6 Row Reduction 123

coefficient matrix of x1 = 5 and x2 = −1 is


 
1 0 5
0 1 −1
and the augmented coefficient matrix of x1 − 2x2 = 5, x3 = 2, and 0 = 0 is
 
1 −2 0 5
0 0 1 2 .
0 0 0 0
The fact that the first variable in each equation had a coefficient of 1 means
that the first nonzero entry in each row of our matrices is 1. Those variables
didn’t appear in any other equation, so if a column contains one of these 1s
then the rest of that column’s entries are zero. The equations were ordered
by their first variables, so these special 1s appear in lower rows as we move
from left to right across the columns of our augmented coefficient matrices.
Finally, we put 0 = 0 equations last in the list, so any rows that are all zeros
appear at the bottom of our matrices. These properties are summed up in the
following definition.

Definition. A matrix is in reduced echelon form if it satisfies the following


conditions:
• Any rows whose entries are all 0 occur at the bottom.

• The leftmost nonzero entry in each row is a 1, called a leading 1.


• If a column contains a leading 1, then all its other entries are 0s.
• If a leading 1 appears in a row above another leading 1, then it must also
appear in a column to the left of the other leading 1.
 
1 0 −3
Example 3. Circle all leading 1s of A = . Is A in reduced echelon
0 1 8
form?

To check whether A is in reduced echelon form, we need to check each of


the conditions in the definition above. Our matrix A doesn’t have any rows of
zeros, and it does have a 1 as the leftmost nonzero entry in each row. These
are our leading 1s, which are circled below.
" #
1 0 −3
0 1 8
In the first two columns, which have leading 1s, all other entries are 0. The
leading 1 in the first row is in a row above the leading 1 in the second row,
and it is also to the left of the second row’s leading 1. Since A satisfies all the
definition’s conditions, it is in reduced echelon form.
124 Functions of Vectors
 
1 0 0 −1
0 1 0 2
Example 4. Circle all leading 1s of A = 
0
. Is A in reduced
0 1 5
0 0 0 0
echelon form?

This matrix has a row of zeros, which is at the bottom. The first nonzero
entry in each of the other rows is a 1. These leading 1s are circled below.
 
1 0 0 −1
 
 0 1 0 2
 
 0 0 1 5
0 0 0 0
The first three columns, which contain the leading 1s, have all other entries
0. The leading 1 in the top row is above those in the second and third rows,
and it is also to their left. Similarly, the leading 1 in the second row is above
that in the third row, and it is also to its left. Thus A is in reduced echelon
form.

Note that this matrix isn’t in our ideal format because of the bottom row
of zeros. However, this row corresponds to the equation 0 = 0, so isn’t really
a problem.
 
1 0 3 0 −1
Example 5. Circle all leading 1s of A = 0 1 1 0 2 . Is A in reduced
0 0 0 1 9
echelon form?

Our matrix doesn’t have any rows of zeros, and the leftmost nonzero entry
of each row is a 1. See the picture below where these leading 1s are circled.
 
1 0 3 0 −1
 
 0 1 1 0 2
0 0 0 1 9
(Note that the 1 in the second row and third column is not a leading 1,
because it is not the leftmost nonzero entry.)
The first, second, and fourth columns with the leading 1s have all other
entries 0. The leading 1 in the top row is above those in the second and third
rows, and it is also to their left. Similarly, the leading 1 in the second row is
above that in the third row, and it is also to its left. Thus A is in reduced
echelon form.

This matrix also isn’t in our ideal format, because it doesn’t have an
equation of the form x3 = s. This is because there is no leading 1 in the third
2.6 Row Reduction 125

column. We also have x3 terms in the equations defining x1 and x2 . This does
happen sometimes, and we’ll discuss how to deal with this type of outcome in
2.7 and 2.8.
 
1 0 1
Example 6. Why isn’t A = in reduced echelon form?
0 −2 3
The leftmost nonzero entry in the second row is −2 instead of a leading 1,
so A isn’t in reduced echelon form.
 
1 0 0
Example 7. Why isn’t A = 0 0 1 in reduced echelon form?
0 1 0
Note that this isn’t an augmented coefficient matrix. That’s okay, we can
still think about whether or not it is in reduced echelon form.
The leading 1 in the second row of A is above the leading 1 in the third
row, but it is not to the left of the third row’s leading 1. This is why A isn’t
in reduced echelon form.
 
0 1 3
Example 8. Why isn’t A = in reduced echelon form?
0 0 1
The third column has a leading 1, but its other entry isn’t 0 (instead it
is 3). This means A isn’t in reduced echelon form. (The fact that the first
column of the matrix doesn’t have a leading 1 isn’t a problem.)

Now that we understand reduced echelon form, we can start developing the
tools needed to create an algorithm which transforms an augmented coefficient
matrix into an augmented coefficient matrix in reduced echelon form while at
every step keeping the solutions to our matrix equation the same. These tools
are based on methods from algebra used to manipulate sets of equations, and
since those equations correspond to rows of our augmented coefficient matrix
our algorithm’s tools are called row operations.
The first of our three row operations is to swap the order of the rows in
our augmented coefficient matrix, which corresponds to swapping the order
of equations in our list. The order in which the equations appear in a list
doesn’t matter, so it is clear that reordering the rows of our augmented
coefficient matrix doesn’t change the solutions. We’ll usually do this reordering
by swapping one pair of rows at a time. I’ll use the notation ri ↔ rj to indicate
that we’re swapping the ith and jth rows of our matrix.
   
1 1 2 1 −1 1
Example 9. The solutions to and are the same
1 −1 1 1 1 2
   
1 1 2 1 −1 1
where →r1 ↔r2 .
1 −1 1 1 1 2
126 Functions of Vectors

The first augmented coefficient matrix corresponds to the pair of equations


x1 + x2 = 2 and x1 − x2 = 1, while the second corresponds to the equations
x1 − x2 = 1 and x1 + x2 = 2. Adding these equations together (in either order)
gives 2x1 = 3 or x1 = 32 . Plugging this back into one of the original equations
then gives x2 = 12 . The order in which we originally listed the equations
doesn’t affect their solutions.

Our next row operation is to multiply one row of our augmented coefficient
matrix by a nonzero scalar. This corresponds to multiplying an entire equation
by a nonzero scalar, i.e., multiplying all coefficients and the value the equation
equals by that scalar. Since our scalar isn’t zero, this doesn’t change the
solutions of that equation. I’ll use the notation c ri to indicate that we’re
multiplying the ith row of the augmented coefficient matrix by c.
   
2 1 6 2 1 6
Example 10. The solutions to and are the same where
0 4 8 0 1 4
   
2 1 6 2 1 6
→ 14 r2 .
0 4 8 0 1 2
The first matrix corresponds to the equations 2x1 + x2 = 6 and 4x2 = 8
while the second corresponds to the equations 2x1 + x2 = 6 and x2 = 4.
When solving the first pair of equations, we’d automatically divide the second
equation by 4 (i.e., multiply by 14 ) to get x2 = 2. This is the second equation
from our second augmented coefficient matrix, so this row operation doesn’t
change the fact that x2 = 2. The first rows of our two matrices are the same,
so we’d be plugging x2 = 2 back into the same equation 2x1 + x2 = 6 to
get x1 = 3 in both cases. Therefore multiplying our second row by 14 didn’t
change the solution.

Our final row operation is to add a multiple of one row of our augmented
coefficient matrix to another row. This corresponds to adding a multiple of
one equation to another equation, and is commonly done to use part of one
equation to cancel out one of the variables from another equation (see the
example below). It also doesn’t change the solutions of our list of equations.
I’ll use the notation ri + c rj to indicate that we’re replacing the ith row of
our augmented coefficient matrix by the ith row plus c times the jth row.
(Note that the notation for this row operation is not symmetric, since the row
operation itself isn’t symmetric. The row listed first is the only row of the
matrix being changed.)
   
−1 2 3 −1 2 3
Example 11. The solutions to and are the same
   1 −1
 5 0 1 8
−1 2 3 −1 2 3
where →r2 +r1 .
1 −1 5 0 1 8
2.6 Row Reduction 127

Here our first matrix corresponds to the equations −x1 + 2x2 = 3 and
x1 − x2 = 5. When solving this, a common strategy is to add two equations
to cancel out one of the variables so we can solve for the other variable. In
this case, if we simply add the equations together, we’ll cancel out x1 to get
x2 = 8. Plugging this back into one of the original equations then gives us
x1 = 13. The second matrix corresponds to the equations −x1 + 2x2 = 3 and
x2 = 8. Essentially, here we’ve already done the addition of equations. In any
case, you can see that the solution is still x1 = 13 and x2 = 8.

Now that we understand the three building blocks of our algorithm, it’s
time to create the algorithm itself. Our goal is to use these three row operations
to transform our augmented coefficient matrix into its reduced echelon form.
Since row operations don’t change the solutions, we’ll be able to solve our
original matrix equation by reading off the solutions of the reduced echelon
form’s matrix equation. The row reduction algorithm has two parts. In the
first half, we start at the top left corner of our matrix and work our way
downward and to the right, creating the leading 1s and the zeros beneath
them. In the second half, we start at the bottom right corner of our matrix
and work our way up and to the left, creating the zeros above each leading
1. Since our matrices have finitely many rows and columns, this means our
algorithm will always terminate in finitely many steps. Formally, the row
reduction algorithm’s instructions are as follows:

Row Reduction Algorithm:

Part 1:
• If possible, swap rows to put a nonzero entry in the top left corner of the
matrix. If not possible, skip to the last step in this part.

• Multiply the top row by a nonzero constant to make its first nonzero entry
into a leading 1.
• Add a multiple of the top row to each lower row to get zero entries below
the top row’s leading 1.

• Ignore the top row and leftmost column of the matrix. If there are any rows
remaining, go back to the top of Part 1, and repeat the process on the
remaining entries of the matrix. If there are no rows remaining after you
ignore the top row, go to Part 2.
Part 2:

• If the bottom row has a leading 1, add a multiple of the bottom row to
each higher row to get zero entries above that leading 1. If the bottom row
doesn’t have a leading 1, skip to the next step.
128 Functions of Vectors

• Ignore the bottom row of the matrix. If there are two or more rows remaining,
go back to the top of Part 2, and repeat the process on the remaining entries
of the matrix. If you have reached the top row after you ignore the bottom
row, you’re done with the algorithm.

Let’s see how this works on an example.


 
0 3 3 0
Example 12. Find the reduced echelon form of  2 0 0 −4.
−1 0 1 6
We have a zero in the top left corner of our matrix, so we’ll start by
swapping the order of rows to put something nonzero up there. Since both
other rows have nonzero first entries, we have a choice of which row to swap
with row 1. It doesn’t matter, so I’ll start by swapping rows 1 and 2 which
gives us    
0 3 3 0 2 0 0 −4
 2 0 0 −4 →r1 ↔r2  0 3 3 0  .
−1 0 1 6 −1 0 1 6
Next I have to multiply the top row by a constant to create our first leading
1. Since the first nonzero entry in the top row is 2, I’ll multiply the top row
by 12 which is
   
2 0 0 −4 1 0 0 −2
 0 3 3 0  →1r  0 3 3 0  .
2 1
−1 0 1 6 −1 0 1 6

(Note that we multiplied the whole row by 12 , not just the first entry!) Next
we need to add multiples of the top row to the lower rows to get zeros below
our leading 1. The second row already has a zero as its first entry, so we just
need to tackle the third row. Its first entry is −1, so we can simply add row 1
to row 3 which gives us
   
1 0 0 −2 1 0 0 −2
 0 3 3 0  →r3 +r1 0 3 3 0  .
−1 0 1 6 0 0 1 4

We’ve reached the end of our first repetition of Part 1 of the algorithm, so
we’ll ignore the top row and left column of our matrix and repeat Part 1 on
the remaining matrix entries shown in the picture below.
 
1 0 0 −2
0 3 3 0 
0 0 1 4

The top left corner of this remaining section is nonzero, so we can skip
2.6 Row Reduction 129

swapping rows and go directly to creating our leading 1. Since the current
entry is 3, we’ll multiply the second row of the matrix by 13 and get
   
1 0 0 −2 1 0 0 −2
0 3 3 0  → 1 r 0 1 1 0  .
3 2
0 0 1 4 0 0 1 4

We already have zero underneath our new leading 1, so we can again drop the
top row and left column of our block as shown below.
 
1 0 0 −2
0 1 1 0 
0 0 1 4

Our remaining section’s top left corner is not only nonzero, it is already a
leading 1! There are no matrix entries below this leading 1 and we’ve reached
the bottom row of our matrix, so we’re done with Part 1 of our algorithm and
ready to start Part 2.
The bottom row of our matrix has a leading 1, so we need to add multiples
of this third row to the rows above to get zeros above that leading 1. The top
row already has a zero there, but the second row doesn’t. Since the second
row’s entry is 1, we’ll need to add −1 times the third row to the second row
to cancel out that entry. This looks like
   
1 0 0 −2 1 0 0 −2
0 1 1 0  →r2 −r3 0 1 0 −4 .
0 0 1 4 0 0 1 4

We’ve reached the end of our first repetition of Part 2, so we’ll ignore the
bottom row and repeat Part 2 on the top two rows of our matrix. This makes
our new “bottom row” the second row, which has a leading 1. Since the top
row has a zero above this leading 1, we don’t have to do anything. Ignoring
the second row leaves us at the top row of our matrix, so we’re done with the
algorithm and can see that
 
0 3 3 0
 2 0 0 −4
−1 0 1 6

has reduced echelon form


 
1 0 0 −2
0 1 0 −4 .
0 0 1 4

Now that we understand how to find the reduced echelon form of a matrix,
let’s practice using that to solve a matrix equation.
130 Functions of Vectors
 
0 3 3
Example 13. Solve the matrix equation A~x = ~b where A =  2 0 0
  −1 0 1
0
and ~b = −4.
6
In Example 12, we saw that the augmented coefficient matrix of this
equation has reduced echelon form
 
1 0 0 −2
0 1 0 −4 .
0 0 1 4

Since the original augmented coefficient matrix and the reduced echelon form
have the same solutions, we can read off the solutions from the reduced
  echelon
−2
form. This gives us x1 = −2, x2 = −4, and x3 = 4 or ~x = −4. If you’d
4
like to check your work, you can multiply this vector by our original matrix
A and see that your answer is ~b.

Most calculators and computer mathematics packages have commands


which implement this row reduction algorithm, and most of the time working
mathematicians use technology to compute the reduced echelon form of a
matrix. (You can read about how to do row reduction using Mathematica in
Appendix A.2.) However, it is important to go through enough examples of
this algorithm by hand that you understand how it works, since we will rely
on some of its theoretical properties in later sections.

Exercises 2.6.
1. Find
 the augmented
  coefficient matrix of the matrix equation
3 −4 0 −5
~x = .
4 1 17 8
2. Find
 the
 augmented
  coefficient matrix of the matrix equation
3 4 4
6 2 ~x = −2.
0 1 0
3. Find
 the augmented
  coefficient
 matrix of the matrix equation
−1 3 5 0
 2 4 6  ~x = −6.
0 9 −2 10
4. Find
 the augmented
 coefficient
 matrix of the matrix equation
9 2 0 −1 1
~x = .
5 −8 3 4 2
2.6 Row Reduction 131

5. Find the augmented coefficient matrix of the following set of


equations: 2x1 + x2 − 11x3 = 5, −5x1 + 3x3 = −1/2, and
10x1 − 9x2 + 2x3 = 0.
6. Find the augmented coefficient matrix of the following set of
equations: 8x1 − 3x2 + x3 − x4 = 14, x1 + 6x2 + 2x3 + x4 = −4, and
−2x1 − 4x2 + 10x3 + 8x4 = 2.
7. Find the augmented coefficient matrix of the following set of
equations: −4x1 − x3 = 7, x2 + x3 + 2x4 = 9, and x1 + 5x4 = 13.
8. Find the augmented coefficient matrix of the following set of
equations: x1 + 2x2 + 3x3 = 15, x1 − 5x3 = 6, −2x1 + 9x2 = 13, and
4x2 − x3 = −1.
9. Find
 the augmented
  coefficient
 matrix
  of the  vector equation
2 4 −2 4 −1
x1 + x2 + x3 + x4 = .
1 2 −1 0 9
10. Find
 the augmented
  coefficient
  matrix  of the vector equation
−1 6 −9 0
x1  0  + x2 2 + x3  1  = 0.
4 0 −2 1
11. Find
 the augmented
  coefficient
  matrix of the vector equation
3 −2 0
0  1  −5
x1   + x2   =  
     .
−6 8 1
7 4 2
12. Find
 the augmented
  coefficient
  matrix
  of the vector equation
3 0 −5 19
x1 + x2 + x3 = .
−5 12 1 −11
 
0 −1 4 2
13. Consider the augmented coefficient matrix .
6 0 −5 11
(a) Rewrite this augmented coefficient matrix as a matrix equation.
(b) Rewrite this augmented coefficient matrix as a list of equations.
(c) Rewrite this augmented coefficient matrix as a vector equation.
 
−7 2 0 1
1 0 −4 3
14. Consider the augmented coefficient matrix  2 −9 1 1.

6 −6 1 0
(a) Rewrite this augmented coefficient matrix as a matrix equation.
(b) Rewrite this augmented coefficient matrix as a list of equations.
(c) Rewrite this augmented coefficient matrix as a vector equation.
15. Decide whether or not the following matrix is in reduced
 echelon
1 −3 0 0
form. If it is, circle all its leading 1s. 0 0 1 0
0 0 0 1
132 Functions of Vectors

16. Decide whether or not the following matrix is in


 reduced echelon
1 0 2
form. If it is, circle all its leading 1s. 0 1 0
0 0 −1
17. Decide whether or not the following matrix is in reduced
 echelon
1 0 1 4
form. If it is, circle all its leading 1s.
0 1 0 −3
18. Decide whether or not the following matrix is in reduced
 echelon
1 −3 0 0
form. If it is, circle all its leading 1s. 0 0 1 0
0 1 0 0
 
1 −2 0 4 5
0 0 2 1 3
19. Find the reduced echelon form of A =  0 0 0 −1 −1.

0 0 1 2 3
 
4 8 0 −4
0 −1 2 1
20. Find the reduced echelon form of A =  3 6
.
0 −3
0 2 −4 6
 
0 1 −5 0
21. Find the reduced echelon form of A = −1 2 0 −2.
0 −4 12 4
 
−5 0 −10 5
22. Find the reduced echelon form of A =  6 1 12 8 .
2 0 3 −1
 
4 0 8
23. Find the reduced echelon form of A =  2 1 4 .
−3 0 −7
 
4 −2 4 14
24. Find the reduced echelon form of A = .
7 0 7 14
25. The
 reduced echelon
 form of A~x = ~b’s augmentation matrix is
1 0 0 −7
0 1 0 2 . Use this to solve for ~x.
0 0 1 6
26. The
 reduced echelon
 form of A~x = ~b’s augmentation matrix is
1 0 0 4
0 1 0 −3. Use this to solve for ~x.
0 0 1 12
   
2 0 2 14
27. Use row reduction to solve 3 1 −1 ~x =  0 .
1 −3 1 10
2.6 Row Reduction 133
   
2 4 0 2
28. Use row reduction to solve 5 1 −3 ~x = −1.
1 0 1 −7
       
−2 0 4 −14
29. Use row reduction to solve x1  0  + x2 3 + x3 2 =  7 .
−1 0 4 −9
     
6 −3 −12
30. Use row reduction to solve x1 + x2 = .
10 4 7
31. Use row reduction to solve the following equations simultaneously:
−3x1 + x3 = −1, x1 + x2 + x3 = 0, and 2x1 − 4x2 + x3 = −5.
32. Use row reduction to solve the following equations simultaneously:
x1 + x2 + x3 = 6, −3x2 + 2x3 = 9, and −5x1 + 4x2 + x3 = 15.
134 Functions of Vectors

2.7 Applications of Row Reduction


Now that we understand how to use row reduction to solve matrix equations,
we can apply it to many of the computational aspects of ideas discussed
in previous sections. (At this point, I’ll assume you’ve practiced doing row
reduction by hand enough to understand its mechanics and are now row
reducing matrices either on your calculator or a computer. See Appendix A.2
for help using Mathematica.) Let’s start by applying row reduction to solving
f (~x) = ~b using f ’s matrix.
 √ 
  1 3
x − 2 x 1 + 2 x 2
Example 1. Let f : R2 → R2 by f 1
= √ . Find ~x for
x2 3 1
  2 x1 + 2 x2
−1
which f (~x) = .
1
This may look like a function
√ nobody could love, but in fact it is reflection
about the line with slope 3 as pictured below. (See 2.2’s Exercise 36 for the
general formula of the matrix which reflects R2 about y = mx.)
2

1 �

� � �
� =
� �/�
� -� / �
� =
� � �

-2 -1 1 1 2
0

�= � �

-1


Essentially we’re
 asking which vector reflects across the line y = ( 3)x
−1
to land on ~b = . This could perhaps
-2 be worked out geometrically with
1
30-60 right triangles, but I think
" it is
√ #
much easier to solve via row reduction.
1 3

Our function has matrix √ 2 2
, so the augmented coefficient matrix
3 1
2 2
2.7 Applications of Row Reduction 135

of our equation is " √ #


3
− 12 2 −1
√ .
3 1
2 2 1
This augmented coefficient matrix has reduced echelon form
" √ #
1 0 12 + 23

1 3
0 1 2 − 2
√ √
1 3 1 3
so our solution is x1 = 2 + 2 and x2 = 2 − 2 or
" √ #  
1 3
2 + 2 1.366
~x = √ ≈ .
1
− 3 −0.366
2 2

We can check this answer √ geometrically by plotting ~x and ~b to verify that


~b is ~x’s reflection across y = ( 3)x.

-� 1

-2 -1 1 2

�����
�= � � -�����

-1

In economics, matrix equations are used to model the interplay between


industries using the Leontif input-output model. Here we set up an input-
-2
output matrix that describes the inputs and outputs of each industry and
use that information to solve for each industry’s production needed to meet a
specified level of demand.

Example 2. Let’s look at three industrial sectors: agriculture, manufactur-


ing, and service. To produce goods or services in one industry typically requires
input from some of the others, including that same industry (think of the seed
136 Functions of Vectors

corn required to produce more corn). Suppose we know that for each dollar of
agricultural output we need $0.30 of agricultural input, $0.10 of manufacturing
input, and $0.20 of service input. For each dollar of manufacturing output we
need $0.10 of agricultural input, $0.50 of manufacturing input, and $0.20 of
service input. For each dollar of service output we need $0 of agricultural
input, $0.10 of manufacturing input, and $0.40 of service input. If we want to
produce $150,000 of agricultural output, $100,000 of manufacturing output,
and $225,000 of service output above what is needed in the production process,
how much does each industry need to produce?

Let’s start by setting up the vectors that will model demand and
production. We have three industries to keep track of, so we’ll use 3-vectors
to model both the production and demand. I’ll let the first entry track
agriculture, the second track manufacturing, and the third track  services.
150, 000
This means that the net output or external demand vector is ~b = 100, 000
225, 000
since this is what we want to produce beyond the input requirements of
theseindustries’
 production. We can also set up our overall production vector
x1
~x = x2  where xi is the total amount, including input requirements, each
x3
industry should produce.
Next we need to set up our input-output matrix A. We want A~x to give the
input requirements of producing ~x, which means A must be a 3×3 matrix. If we
consider the top entry of A~x, it should tell us the amount of agricultural input
needed to produce ~x, i.e., $0.30 times the amount of agricultural production
which is x1 , $0.10 times the amount of manufacturing production which is
x2 , and $0 times the amount of service production which is x3 . Therefore the
top row of our matrix contains the amounts of agricultural input needed to
produce each industry in order of our vector’s entries. If we label the rows and
columns of A in the same order as our vectors, then extending this reasoning
to the other two entries of A~x shows us that aij is the amount of industry i’s
input needed by industry j. For example, a12 was the amount of agricultural
input (and agriculture is our vector’s first entry) needed for manufacturing
(which is our vector’s
 second entry). This means our input-output matrix is
0.3 0.1 0
A = 0.1 0.5 0.1.
0.2 0.2 0.4
Now that we have A, ~x, and ~b, we need to relate them in a matrix equation.
The total output we want to produce is the sum of the outputs needed as
inputs for our various industries, A~x, plus the external demand ~b, so our total
output is A~x + ~b. However, we plan to produce ~x, so our total output is also
~x. Setting these two versions of total output equal gives us ~x = A~x + ~b.
2.7 Applications of Row Reduction 137

This is a matrix equation, but it doesn’t have the right format for row
reduction so we’ll need to do some algebra to get it there. We can start by
subtracting A~x from both sides to get ~x − A~x = ~b. We can’t factor ~x out of
the left-hand side, because we’d be left with 1 − A, which doesn’t make sense.
However, recall from Example 4 of 2.2 that the n × n identity matrix, In , has
In ~x = ~x. Here n = 3, and if we substitute in I3 ~x for ~x we get I3 ~x − A~x = ~b.
Now we can factor out ~x to get (I3 − A)~x = ~b so our matrix equation is in a
format where we can use row reduction to solve for ~x.
Since
     
1 0 0 0.3 0.1 0 0.7 −0.1 0
I3 − A = 0 1 0 − 0.1 0.5 0.1 = −0.1 0.5 −0.1
0 0 1 0.2 0.2 0.4 −0.2 −0.2 0.6

our augmented coefficient matrix is


 
h i 0.7 −0.1 0 150, 000
I3 − A | ~b = −0.1 0.5 −0.1 100, 000
−0.2 −0.2 0.6 225, 000

which row reduces to  


1 0 0 267, 287
0 1 0 371, 011
0 0 1 587, 766
(with answers rounded to the nearest dollar).
This means that in order to end up with a net output of $150,000 in
agriculture, $100,000 in manufacturing, and $225,000 in service, we actually
need to produce $267,287 in agriculture, $371,011 in manufacturing, and
$587,766 in service.

In the previous examples, we ended up with the nicest possible reduced


echelon form, which gave us one unique solution to our matrix equation.
However, we saw in Examples 4 and 5 of the last section that it’s possible
to have reduced echelon forms which are more complicated. We’ll spend the
rest of this section exploring what to do in such cases while simultaneously
applying row reduction to previous ideas like kernel, range, and linear
independence. Let’s start by using our ability to quickly and systematically
solve f (~x) = ~b to compute the kernel of a map.

3 2
  3. Compute the kernel of the function f : R → R where
Example
x1  
4x1 + x2 + 3x3
f x2  = .
2x1 − 4x3
x3
138 Functions of Vectors

The kernel is all 3-vectors ~x with f (~x) = ~0. This function has matrix
 
4 1 3
A=
2 0 −4

so finding the kernel of f is equivalent to solving A~x = ~0. This equation has
the augmented coefficient matrix
 
4 1 3 0
2 0 −4 0

whose reduced echelon form is


 
1 0 −2 0
.
0 1 11 0

This isn’t the simplest kind of reduced echelon form we’d hoped for, but
we can still read off the equations given by its rows. The top row gives us
x1 − 2x3 = 0 and the bottom row gives us x2 + 11x3 = 0. Even though we
don’t have a unique one-number answer for each variable, these two equations
are still enough for us to find the kernel of f . The first equation x1 − 2x3 = 0
can be solved for x1 to give x1 = 2x3 , while the second equation x2 + 11x3 = 0
can be solved for x2 to give x2 = −11x3 . Once we pick a value for x3 , we’ll
get values for x1 and x2 , for example, if x3 = 2 then x1 = 4 and x2 = −22.
One way to write our overall answer is
 
 2x3 
ker(f ) = −11x3  .
 
x3

Notice that in the example above, our kernel contained more than just ~0,
so our map wasn’t 1-1. We can tell that because we ended up with a variable,
x3 , which didn’t have its own equation with a leading 1 as its coefficient. That
happened because x3 ’s column didn’t have a leading 1 in A’s reduced echelon
form. This gives us an easy-to-check criterion for a function to be 1-1.

Theorem 1. A function f : Rn → Rm is 1-1 if and only if the reduced echelon


form of its matrix A has a leading 1 in every column.

If we shift our focus from solving equations of the form f (~x) = ~b to solving
vector equations, then we can apply row reduction to answering the question of
whether ~v1 , . . . , ~vk are linearly independent or linearly dependent. Remember
from 1.3 that ~v1 , . . . , ~vk are linearly dependent if we can find a solution to
x1~v1 + · · · + xk~vk = ~0 where some xi is nonzero, otherwise ~v1 , . . . , ~vk are
linearly independent.
2.7 Applications of Row Reduction 139
       
2 0 −1 5
−1 0 −2 0
Example 4. Are  
 0 ,
 ,
4
 , and   linearly independent or
0 0
3 0 4 2
linearly dependent?

We can check this by solving


         
2 0 −1 5 0
−1 0 −2 0 0
x1          
 0  + x2 4 + x3  0  + x4 0 = 0
3 0 4 2 0

to see whether or not we have solutions where some of the variables are
nonzero. From 2.2 we know that the vector equation x1~v1 + · · · + xk~vk = ~b
is equivalent to the matrix equation A~x = ~b where A is the matrix whose
columns are ~v1 , . . . , ~vk . This means we want to solve A~x = ~b where
   
2 0 −1 5 0
−1 0 −2 0 0
A=  ~  
 0 4 0 0 and b = 0 .
3 0 4 2 0

This matrix equation has the augmented coefficient matrix


 
2 0 −1 5 0
−1 0 −2 0 0
 
0 4 0 0 0
3 0 4 2 0

whose reduced echelon form is


 
1 0 0 2 0
0 1 0 0 0
 .
0 0 1 −1 0
0 0 0 0 0

From here we can see x1 = −2x4 , x2 = 0, x3 = x4 , and x4 is free. Since we


can choose a nonzero value for x4 , our vectors are linearly dependent.

Notice that our vector equation above had a variable, x4 , which wasn’t
necessarily zero precisely because that variable’s column didn’t have a leading
1 in the reduced echelon form of our augmented coefficient matrix. This
means that we have the following check for linear independence and linear
dependence.
140 Functions of Vectors

Theorem 2. A set of vectors ~v1 , . . . , ~vk in Rn are linearly independent if the


reduced echelon form of the matrix with columns ~v1 , . . . , ~vk has a leading 1
in every column. If the reduced echelon form has a column with no leading 1,
then ~v1 , . . . , ~vk are linearly dependent.

So far, we’ve always been in the situation where A~x = ~b had a solution.
However, we’ve seen that this isn’t always the case. How can we tell that
our matrix equation has no solution from the reduced echelon form of its
augmented coefficient matrix? The trick is to look at where our leading 1s
occur. If we have a leading 1 in the rightmost (augmentation) column, our
matrix equation has no solutions. To understand why this is true, think back
to our interpretation of a matrix equation as a list of linear equations. Since
a leading 1 is the first nonzero entry in its row, having a leading 1 in the
rightmost column means that its row looks like 0 · · · 0 | 1. This corresponds to
the equation 0x1 +· · ·+0xn = 1, i.e., 0 = 1, which is clearly impossible. Happily
such a leading 1 will be formed during Part 1 of our row reduction algorithm,
which saves us from having to do Part 2. (In fact you can see your matrix
equation has no solution as soon as you see that the first nonzero entry in any
row occurs in the last column.) It’s always important to understand when a
problem cannot be solved, because then you don’t waste time continuing to
work on it.
   
1 −5 4 −3
Example 5. The equation A~x = ~b with A =  2 −7 3 and ~b = −2
−2 1 7 −1
has no solution.

The augmented coefficient matrix of this equation is


 
1 −5 4 −3
 2 −7 3 −2
−2 1 7 −1

which has reduced echelon form


 
1 0 − 13
3 0
 −5 
0 1 3 0 .
0 0 0 1

If we look at the bottom row of the matrix, we get the equation 0 = 1. Since
this is clearly a contradiction (there are no values of x1 , x2 , and x3 which will
make zero equal one), our original equation doesn’t have any solutions.

A slightly different perspective on the question of whether or not a matrix


equation has solutions is to ask which vectors ~b make A~x = ~b have solutions.
2.7 Applications of Row Reduction 141

If our matrix A corresponds to a linear function f , this is the same as asking


for the range of f .
3 3
Example 6. Compute the  range of the function f : R → R where f has
1 −5 4
matrix A =  2 −7 3.
−2 1 7
Asking for the range of f is asking for every ~b where  we have a solution
b1
to A~x = ~b. To figure this out, we’ll pick a generic ~b = b2  and try to solve
b3
~
A~x = b. The augmented coefficient matrix of this equation is now
 
1 −5 4 b1
 2 −7 3 b2  .
−2 1 7 b3

Mathematica isn’t as good about row reducing a matrix which has variable
entries, so we’ll do this one by hand (and get in a little extra row reduction
practice along the way).
We already have a 1 in the top left corner, so we can move on to adding
multiples of row 1 to rows 2 and 3 to create zeros under that leading 1. In our
row reduction notation, this is
   
1 −5 4 b1 1 −5 4 b1
 2 −7 3 b2  →r2 −2r1  0 3 −5 b2 − 2b1 
−2 1 7 b3 −2 1 7 b3
 
1 −5 4 b1
→r3 +2r1 0 3 −5 b2 − 2b1  .
0 −9 15 b3 + 2b1

Next we ignore the top row and first column, and we multiply the second row
by 13 to create our next leading 1. This gives us
   
1 −5 4 b1 1 −5 4 b1
0 3 −5 b2 − 2b1  → 1 r 0 −5 1 2 
3 2
1 3 3 b2 − 3 b1 .
0 −9 15 b3 + 2b1 0 −9 15 b3 + 2b1

We add a multiple of row 2 to row 3 to get a zero below our new leading 1,
which looks like
   
1 −5 4 b1 1 −5 4 b1
0 1 −5 1
− 2  0 −5 1 2 .
3 3 b 2 3 b 1 → r 3 +9r 2
1 3 3 b2 − 3 b1
0 −9 15 b3 + 2b1 0 0 0 (b3 + 2b1 ) + (3b2 − 6b1 )

Since the first three entries of the bottom row are zero, the bottom row will
142 Functions of Vectors

prevent us from having any solutions (as in Example 5) unless the bottom
entry of the right column is also 0. This can be expressed as

(b3 + 2b1 ) + (3b2 − 6b1 ) = 0

which simplifies to
−4b1 + 3b2 + b3 = 0.
As long as ~b satisfies this condition, we’ll have a solution to f (~x) = ~b and ~b
will be in the range of f . Another way to write this is
  
 b1 
range(f ) = b2  −4b1 + 3b2 + b3 = 0 .
 
b3

Notice that the range of the example function above is smaller than its
codomain, i.e., f isn’t onto. This happened because there were some vectors
~b where f (~x) = ~b didn’t have a solution. Thinking in terms of row reduction,
this happened because there was a row in the reduced echelon form of f ’s
matrix which was all zeros. This created a row in the augmented coefficient
matrix of the form 0 · · · 0 | ∗ leaving the possibility of an unsolvable equation.
As with 1-1, this gives us an easy-to-check criterion for when a function is
onto.

Theorem 3. A function f is onto if and only if the reduced echelon form of


its matrix A has a leading 1 in every row.

Note that Theorems 1 and 3 are similar, so it’s important to remember


that for onto we’re checking rows and for 1-1 we’re checking columns! This
should be much easier since you’ve seen why each of these criteria hold, which
is why it’s important to understand the reasons behind our mathematical rules
rather than just blindly memorizing them.
Finally, let’s apply row reduction to the question of whether or not a vector
is in the span of a set of vectors. We know from 1.2 that ~b is in the span of
~v1 , . . . , ~vk exactly when we can solve x1~v1 + · · · + xk~vk = ~b. This means we
can check whether or not ~b is in the span of ~v1 , . . . , ~vk by using row reduction
to check whether or not our vector equation has solutions.
       
1 2 1 −4
Example 7. Is ~b = 2 in the span of −1, 0, and  5 ?
3 −2 1 16
Asking this question is equivalent to asking whether we can solve the vector
equation        
2 1 −4 1
x1 −1 + x2 0 + x3  5  = 2 .
−2 1 16 3
2.7 Applications of Row Reduction 143

This vector equation has augmented coefficient matrix


 
2 1 −4 1
−1 0 5 2
−2 1 16 3

whose reduced echelon form is


 
1 0 −5 0
0 1 6 0 .
0 0 0 1

Our augmented coefficient matrix has a leading 1 in the right column of its
reduced echelon form. This means our vector equation has no solution, and
1
therefore ~b = 2 is not in the span of the other vectors.
3

       
0 2 1 −4
Example 8. Is ~b =  4  in the span of −1, 0, and  5 ?
16 −2 1 16
As in the previous example, this is equivalent to asking whether we can
solve        
2 1 −4 0
x1 −1 + x2 0 + x3  5  =  4  .
−2 1 16 16
This vector equation has augmented coefficient matrix
 
2 1 −4 0
−1 0 5 5
−2 1 16 16

whose reduced echelon form is


 
1 0 −5 −4
0 1 6 8 .
0 0 0 0

Since our augmented coefficient matrix doesn’t have a leading 1 in the right
column of its reduced
 echelon form, our vector equation has a solution.
0
Therefore ~b =  4  is in the span of the other vectors.
16
144 Functions of Vectors

Exercises 2.7.
 
1
1. Use row reduction to solve f (~x) =  9  where f has matrix
  −1
−1 −2 −1 3
A= 0 −3 −6 4.
−2 −3 0 3
 
18
2. Use row reduction to solve f (~x) = −4 where f has matrix
17
 
2 −4 6
A = −1 3 0.
2 −5 3
 
2
3. Use row reduction to solve f (~x) = 5 where f has matrix
1
 
2 8 4
A = 2 5 1 .
4 10 −1
 
3
4. Use row reduction to solve f (~x) = 1 where f has matrix
  8
1 1 −2
A = 3 −2 4 .
2 −3 6
   
x1 −x1 − x2 + 2x3
5. Find the kernel of f x2  = −9x1 + 4x2 + 5x3 .
x3 6x1 − 2x3 − 4x3
 
x1  
  2x1 + 6x2 − 5x3
6. Find the kernel of f x2 = .
x1 + 3x2 + x3
x3
 
x1  
x2  −x1 − 2x2 − x3 + 3x4
7. Find the kernel of f    −3x2 − 6x3 + 4x4 .
x3  =
−2x1 − 3x2 + 3x4
x4
 
  5x1 − 2x2 + x3
x1  3x2 − 3x3 
8. Find the kernel of f x2  =  
5x1 − 5x2 + 4x3 .
x3
−x2 + x3
 
1 2 −3 0
9. Is the linear map f with matrix A = −1 2 4 1  1-1?
0 4 1 −2
2.7 Applications of Row Reduction 145
 
1 1 −2
10. Is the linear map f with matrix A = 3 −2 4  1-1?
2 −3 6
   
x1 −4x1 − 4x3
11. Is f x2  =  3x1 + x2 − x3  1-1?
x3 −x1 + 3x2 − x3
   
x1 5 −10 9
12. Is f x2  = 0 −2 1  1-1?
x3 0 6 −3
     
3 5 4
 1  −2  5 
13. Are      
 0 ,  1 , −1 linearly dependent or linearly indepen-
−1 2 2
dent?
     
1 2 5
14. Are 1, 3, 6 linearly independent or linearly dependent?
1 1 4
     
2 3 1
1 0 0
15. Are      
0, 2, 2 linearly independent or linearly dependent?
1 1 0
     
−1 3 5
16. Let ~v1 =  0 , ~v2 = −2, and ~v3 = −2
−2 2 6
(a) Are these vectors linearly independent or linearly dependent?
(b) What is the dimension of the space spanned by ~v1 , ~v2 , and ~v3 ?
     
2 0 −3
17. Let ~v1 = −1, ~v2 = 9, and ~v3 =  6 .
3 5 −2
(a) Are these vectors linearly independent or linearly dependent?
(b) What is the dimension of the space spanned by ~v1 , ~v2 , and ~v3 ?
     
1 2 0
−1 −1 −1
18. Let ~v1 =      
 0 , ~v2 =  1 , and ~v3 =  1 .
0 0 0
(a) Are these vectors linearly independent or linearly dependent?
(b) What is the dimension of the space spanned by ~v1 , ~v2 , and ~v3 ?
     
1 0 −4
0 2 4
19. Let ~v1 =      
−1, ~v2 = 1, and ~v3 =  6 .
1 3 2
146 Functions of Vectors

(a) Show that ~v1 , ~v2 , ~v3 are linearly dependent.


(b) Write one of these vectors as a linear combination of the other
two.
(c) What can you say about the dimension of Span{~v1 , ~v2 , ~v3 }?
 
−1 4
20. Find the range of the linear map f with matrix A =  0 1.
−2 3
 
1 2
21. Find the range of the linear map f with matrix A = 2 5.
0 4
   
x1 x1 − 4x3
22. Find the range of f x2  = −2x1 + x2 + 9x3 .
x3 3x2 + 3x3
   
x1 2x1 − 4x2
23. Find the range of f = .
x2 −x1 + 3x2
 
1 2 −3 0
24. Is the linear map f with matrix A = −1 2 4 1  onto?
0 4 1 −2
 
1 1 −2
25. Is the linear map f with matrix A = 3 −2 4  onto?
2 −3 6
   
x1 5 −10 9
26. Is f x2  = 0 −2 1  onto?
x3 0 6 −3
   
x1 −4x1 − 4x3
27. Is f x2  =  3x1 + x2 − x3  onto?
x3 −x1 + 3x2 − x3
 
1 −5 0 0
28. Let f : R4 → R3 by f (~x) = A~x where A = 0 0 1 −2.
0 0 −4 8
(a) Show f is not onto.
(b) Write down
  an easy-to-check condition we can use to tell if a
x1
vector x2  is in the range of f .
x3
29. Write down a matrix A so the linear map f (~x) = A~x is 1-1 but not
onto.
30. Write down the matrix of a linear map which satisfies each of the
given conditions. (Your matrices need not be square.)
(a) 1-1 and onto
2.7 Applications of Row Reduction 147

(b) 1-1 but not onto


(c) onto but not 1-1
       
1 
 1 2 −3 
1       
−3
 in Span   ,   ,  0  ?
0
31. Is ~b = 1 −1  0   6 

 
 
1 2 1 −2
       
2 2 1 0
32. Is ~u = 10 in the span of ~v1 = 0, ~v1 = 1, ~v1 = 14?
7 1 1 8
       
2 2 0 −4
33. Is ~u = 1 in the span of ~v1 =  0 , ~v2 = 1, and ~v3 =  1 ?
1 −1 0 4
     
2 0 −4
34. Do  0 , 1, and  1  span all of R3 ?
−1 0 4
     
1 0 −4
35. Do  0 , 1, and  0  span all of R3 ?
−1 2 6
     
1 2 0
36. Do 0, 0, and 1 span all of R3 ?
1 2 1
37. Briefly explain how you would figure out whether or not a set of
4-vectors spans all of R4 .
38. Is it possible to span all of R6 with five 6-vectors, i.e., five vectors
from R6 ? Explain why or why not.
39. Redo Example 2 if for each dollar of agricultural output we need
$0.30 of agricultural input, $0.10 of manufacturing input, and $0.40
of service input, for each dollar of manufacturing output we need
$0.10 of agricultural input, $0.50 of manufacturing input, and $0.40
of service input, and for each dollar of service output we need $0
of agricultural input, $0.10 of manufacturing input, and $0.40 of
service input.
40. Suppose we divide industry into two categories: raw materials and
manufacturing. For each dollar of raw materials produced, we need
$0.10 of raw materials and $0.35 of manufactured goods while for
each dollar of manufacturing we need $0.50 of raw materials and
$0.20 of manufactured goods. If we want to end up with a net output
of $750 of raw materials and $1000 of manufacturing, what should
our production level be for each industry?
148 Functions of Vectors

2.8 Solution Sets


Whether we are finding the range of a linear function from Rn to Rm , solving a
vector equation, or figuring out which points ~x are mapped to ~b by a geometric
transformation, we often want to describe the set of vectors ~x which satisfy
A~x = ~b for some matrix A. In this section, we’ll explore the possible options
for such sets and how to easily describe them.

Definition. A vector ~x which satisfies A~x = ~b is called a solution, and the


set of all such vectors is called the solution set.

We saw in the last section that we may not have any solutions to A~x = ~b,
in which case we can say that there are no solutions or that our solution set
is empty. We also saw that if there are solutions it’s possible to have either a
single solution or a set of infinitely many different solutions determined by our
choice of values for some of the variables. Are no solutions, one solution, or
infinitely many solutions our only options? Let’s explore this question visually.
For simplicity’s sake, we’ll do our exploration in 2D using a 2 × 2 matrix A.
In the 2 × 2 case, A~x = ~b is really
    
a b x1 b
= 1
c d x2 b2

which we can write as the equations ax1 + bx2 = b1 and cx1 + dx2 = b2 .
For familiarity’s sake, let’s use x for x1 and y for x2 , so our equations are
ax + by = b1 and cx + dy = b2 . Geometrically, this means we’re looking at
two lines in the plane. A solution to A~x = ~b is a pair of values for x and y
which satisfy both of these equations, which geometrically means a point in
the plane (x, y) which lies on both lines. This means our solution set is the
set of intersection points of our two lines.
If our two lines are in a fairly standard configuration, they’ll intersect in
precisely one point as shown in Figure 2.2.
3

2.8 Solution Sets 149

-� � + � = � ��+�=�

-2 -1 1 2

Figure 2.2: A single point intersection


-1

Having only one point of intersection 3 means we have one uniquely


determined solution to A~x = ~b, as in 2.7’s Example 1.
If our two lines are parallel, they won’t intersect at all as in Figure 2.3.

-� � + � = �

-� � + � = � 1

-2 -1 1 2

Figure 2.3: No points of intersection


-1
Since there are no points of intersection, A~x = ~b has no solutions, as in
2.7’s Example 5.
The last way to draw two lines in the plane may sound silly, but it is still
a possibility. Here we draw two lines which are the same line, i.e., draw the
second line directly on top of the first line as in Figure 2.4.
3

150 Functions of Vectors

-� � + � = �

-� � + � � = �

-2 -1 1 2

Figure 2.4: Infinitely many points of intersection

(Note that the equations of these -1 two lines look different even though

they actually describe the same line.) In this case, any point on our doubled
line is a point of intersection, so there are infinitely many different points
of intersection. This means A~x = ~b has infinitely many solutions as in 2.7’s
Example 3. As in that example, this happens when we can choose a value
for some of the variables and the other variables depend on that choice. Since
each variable can be given any real number as a value, we have infinitely many
choices for its value, and hence infinitely many different possible solutions.
Our exploration showed us that in 2D there are three categories of solution
sets: There are solution sets which are empty, i.e., there are no solutions, there
are solution sets which consist of a single vector, i.e., one unique solution,
and there are solution sets which are infinite, i.e., there are infinitely many
solutions. This was also suggested by our work using row reduction to solve
A~x = ~b in the last section. Although we’ll omit a proof, this is true in general.

Theorem 1. A matrix equation A~x = ~b has either no solutions, one solution,


or infinitely many solutions.

Let’s do one more example of each type. We’ll start with the case where
we have no solutions.
 
−3 −1 5
Example 1. Find the solution set of A~x = ~b where A =  1 2 −2
0 5 −1
 
2
and ~b =  6 .
15
2.8 Solution Sets 151

The augmented coefficient matrix of this equation is


 
−3 −1 5 2
1 2 −2 6
0 5 −1 15

which has reduced echelon form


 
1 0 − 85 0
0 1 − 15 0 .
0 0 0 1

The last row of this matrix corresponds to the equation 0 = 1, so our equation
has no solutions.
Geometrically, our solution set lives in R3 because ~x is a 3-vector. The three
rows of our original matrix correspond to three planes in R3 with equations:
−3x − y + 5z = 2, x + 2y − 2z = 6, and 5y − z = 15. (In general, a single
linear equation in Rn describes an (n − 1)-dimensional object.) Since we have
no solutions, we know our three planes don’t all intersect at the same point.
Feel free to use Mathematica or another software package to plot these three
planes together to verify this visually.

Again, notice that when we have no solutions we’ll always end up with an
impossible equation of the form 0 = b where b is nonzero. (If we’re in reduced
echelon form, we’ll have b = 1.) This is easily spotted in the reduced echelon
form, because we only have an equation of this format if there is a leading 1
in the rightmost column.
Next let’s look at the case where we have one unique solution.
 
3
Example 2. Find the solution set of f (~x) = ~b where ~b = 4 and
1
   
x1 −3x1 + x3
f x2  =  x1 − 2x3 .
x3 −x1 + x2 + 3x3
 
−3 0 1
This function has matrix A =  1 0 −2, so we’re looking for the
−1 1 3
~
solution set of A~x = b. This equation has augmented coefficient matrix
 
−3 0 1 3
 1 0 −2 4
−1 1 3 1
152 Functions of Vectors

which has reduced echelon form


 
1 0 0 −2
0 1 0 8 .
0 0 1 −3

 x1 = −2, x2 = 8, and x3 = −3, so we have the unique solution


Thismeans
−2
~x =  8 .
−3
Again, if we think about this geometrically, we know the three planes in R3
represented by the three
 rows
 of our augmented coefficient matrix all intersect
−2
in exactly one point:  8 .
−3

Notice that this case can only occur if we have no impossible equations
(so we have a solution) and each variable has its own equation where it has a
leading 1 as its coefficient. This means we need a reduced echelon form which
has a leading 1 in every column except the rightmost one.
Finally, let’s look at the case where we have infinitely many solutions.
 
2 −4 1
Example 3. Find the solution set of A~x = ~b where A = 1 0 1  and
  1 −8 −1
6
~b = 3.
3
This equation has augmented coefficient matrix
 
2 −4 1 6
1 0 1 3
1 −8 −1 3

which has reduced echelon form


 
1 0 1 3
0 1
1 4 0 .
0 0 0 0

This gives us the equations x1 + x3 = 3, x2 + 14 x3 = 0, and 0 = 0. The last


of these gives us no information, but at least it isn’t an impossible equation
of the form 0 = 1. Solving for x1 and x2 in the first two equations gives us
x1 = 3 − x3 and x2 = − 14 x3 . There is no equation to determine the value of
x3 , so we can pick any real number we like. This is often expressed by saying
x3 is a free variable. This means we have infinitely many different solutions
2.8 Solution Sets 153

depending on our infinite choices for x3 . We can write this as x1 = 3 − x3 ,


x2 = − 14 x3 , and x3 is free, or as
 
3 − x3
~x =  − 14 x3  .
x3

Geometrically, we can think of our real number line of choices for x3 as


the line of triple intersection between the three planes corresponding to the
rows of our augmented coefficient matrix. (If we had 2 free variables, we would
have two real number lines worth of choices, i.e., a plane of intersection, and
so on.)

To get infinitely many solutions we need to avoid impossible equations,


but we also need at least one variable whose value we get to choose. Those
variables are the ones which don’t have an equation where their coefficient is
a leading 1. In other words, we’re in this case if we have no leading 1 in the
rightmost column and no leading 1 in at least one other column.
Now that we understand how to check the reduced echelon form to see
whether or not we’re in each individual case of our solution set’s format, let’s
combine them into one coherent whole. Suppose we’re solving A~x = ~b. No
matter which type of solution set we end up with, we always start by doing
our row reduction algorithm on the augmented coefficient matrix [A | ~b]. In
the interests of efficiency, we’d like to recognize as early as possible in that
process which type of solution set A~x = ~b has. The earliest criteria to show up
is whether or not we have a leading 1 in the rightmost column, which indicates
no solutions. If we fail to see that, then our equation has at least one solution.
To see whether it has more than one solution, we complete our row reduction
to see whether or not all the rest of our columns contain leading 1s. If they do,
we have only one solution. If they don’t, we have infinitely many solutions.
We can visualize this process as a flow chart:

Leading 1 in
rightmost
column?
No
Yes

Leading 1 in
No solutions all other
columns?
No
Yes

Infinitely
1 solution many
solutions
154 Functions of Vectors

Now that we’ve figured out the possible options for the solution set of
A~x = ~b, we’ll move on to figuring out how to concisely describe the solutions
to a given matrix equation. We’ll start with a special case of a matrix equation
where ~b = ~0, so we’re solving A~x = ~0. (We’ve seen this before in 2.5 when we
looked at the null space of a matrix.) This case is often called the homogeneous
case (having ~b = 6 ~0 is then called the nonhomogeneous case). Although it
isn’t immediately obvious, using linear algebra to balance chemical equations
always results in matrix equations with ~b = ~0. To see how this works, let’s
reexamine our example from Chapter 0.

Example 4. When propane is burned, it combines with oxygen to form


carbon dioxide and water. Propane molecules are composed of three carbon
atoms and eight hydrogen atoms, so they are written as C3 H8 . Oxygen
molecules each contain two atoms of oxygen, so they are written as O2 . Carbon
dioxide molecules each contain one carbon atom and two oxygen atoms, so
they are written CO2 . Water molecules are made up of two hydrogen atoms
and one oxygen atom (which is why some aspiring chemistry comedians call
water dihydrogen monoxide), so they are written H2 O. Using this notation,
our chemical reaction can be written C3 H8 + O2 → CO2 + H2 O. Since atoms
are neither created nor destroyed in a chemical reaction, how many of each
kind of molecule must be included in this reaction?

This may look nothing like a linear algebra problem, let alone one of the
form A~x = ~b, but in fact row reduction is one of the best ways to tackle it.
Let’s suppose we have x1 molecules of propane, x2 molecules of oxygen, x3
molecules of carbon dioxide, and x4 molecules of water in our reaction. In this
notation, our chemical reaction looks like
x1 (C3 H8 ) + x2 (O2 ) → x3 (CO2 ) + x4 (H2 O).
Since the number of a particular type of atom is the same before and after
the reaction, this actually gives us a linear equation for each type of atom
present in our various molecules. In this reaction, our molecules contain three
types of atoms: carbon (C), hydrogen (H), and oxygen (O). Looking at carbon
atoms, we have 3 from each propane molecule, 0 from each oxygen molecule,
1 from each carbon monoxide molecule, and 0 from each water molecule. This
means we have 3x1 + 0x2 before our reaction and 1x3 + 0x4 afterward. Thus
we must have 3x1 = x3 or 3x1 − x3 = 0. Looking at hydrogen atoms, by the
same method we get that 8x1 = 2x4 or 8x1 − 2x4 = 0. Similarly, from oxygen
we get 2x2 = 2x3 + x4 or 2x2 − 2x3 − x4 = 0.
This means we’re really just trying to solve the equation A~x = ~0 where
 
3 0 −1 0
A = 8 0 0 −2 .
0 2 −2 −1
2.8 Solution Sets 155

(This is a homogeneous equation since we’re setting A~x equal to ~0.) In fact
balancing a chemical reaction always gives a homogeneous equation, since we
never have a constant term on either side of our chemical reaction.
The reduced echelon form of our augmented coefficient matrix is
 
1 0 0 − 14 0
 
0 1 0 − 54 0
0 0 1 − 34 0

so we get x1 = 14 x4 , x2 = 54 x4 , x3 = 34 x4 , and x4 free. Since we have a free


variable, our equation has infinitely many solutions, which shouldn’t surprise
you since we don’t usually worry about only being allowed to burn a fixed
number of propane molecules at one time.
This is perfectly fine as a linear algebra answer, but chemists don’t want to
think about fractions or negative quantities of a molecule. They want positive
integer solutions rather than real number solutions. We can easily fix this
problem by choosing a value of x4 that is a positive integer multiple of 4.
If x4 = 4k for some positive integer k, then our solution becomes x1 = k,
x2 = 5k, x3 = 3k, x4 = 4k.

There are many similar problems where, for practical reasons, we want
our solutions to have integer values. In fact, there are people who specifically
study integer-valued linear algebra, but that is outside the scope of this book.
The nice thing about solving homogeneous equations is that we always have
at least one solution, namely ~x = ~0. This means we only have two possible
options for our solution set instead of the usual three. Additionally, if there is
one unique solution, we already know it is ~x = ~0. This means we can focus our
attention on how to describe our solution set when we have infinitely many
solutions. Since listing all possible solutions would literally take forever, we’ll
instead focus on finding a finite set of vectors which span our solution set.
This is best illustrated by working through an example, which we’ll do below.

~
Example 5. Find a finite set of vectors which span the solution set of A~x = 0
3 2 0 4 −1
where A = 0 2 6 0 −8.
1 1 1 4 1
The reduced echelon form of A is
 
1 0 −2 0 1
0 1 3 0 −4
0 0 0 1 1

which gives us the equations x1 − 2x3 + x5 = 0, x2 + 3x3 − 4x5 = 0, and


x4 + x5 = 0. (I didn’t bother to write down the augmentation column here,
because a column of zeros is unchanged by any row operations and so will
156 Functions of Vectors

remain a column of zeros in the reduced echelon form.) The third and fifth
columns of A’s reduced echelon form had no leading 1s, so x3 and x5 are free.
Solving for x1 , x2 , and x4 gives us x1 = 2x3 − x 5 ,x2 = −3x3  +4x5 , and

 2x 3 − x 5 


 
 −3x3 + 4x5 

x4 = −x5 . So in vector format, our solution set is   

x3 .

 −x5 


 

x5
A span is a set of linear combinations of vectors, i.e., a sum of vectors which
each have their own scalar coefficient. There isn’t a single scalar coefficient we
can factor out of each entry of our solution vector, but we can split each entry
up as a sum of two terms; one with x3 and one with x5 . This gives us
     
2x3 − x5 2x3 −x5
−3x3 + 4x5  −3x3   4x5 
     
 x3  =  x3  +  0  .
     
 −x5   0  −x5 
x5 0 x5

Now we can pull out our scalars, since every entry of the first vector in our
sum is a multiple of x3 and every entry in the second vector is a multiple of
x5 . This means we have
       
2x3 −x5 2 −1
−3x3   4x5  −3 4
       
 x3  +  0  = x3  1  + x5  0  .
       
 0  −x5  0 −1
0 x5 0 1

This newest format is precisely the span of the two vectors being multiplied
by x3 and x5 ! Therefore our solution set can be written as
   

 2 −1 

   4 
 −3
   

  
Span  1  ,  0  .

   −1 
 0





0 1

(Notice that there is a shortcut through this process: We get one spanning
vector per free variable, and that vector is precisely the vector of coefficients
on that variable in our general solution set vector.)
If we wanted to describe the solutions to this equation to someone else,
the most efficient way would be to tell them our spanning vectors. This same
procedure works for the solution set of any homogeneous equation.
As a bonus, it turns out that the spanning set we constructed in this
way conveys even more information. The spanning vectors form a linearly
2.8 Solution Sets 157

independent set! To see this, focus on the entries of the vectors which
correspond to the free variables. If xi is a free variable, the spanning vectors
will all have entry 0 in the ith spot except for the spanning vector formed
from xi ’s coefficients. This means that no linear combination of these spanning
vectors equals ~0 unless all coefficients in the combination are zero, i.e., our
spanning vectors are linearly independent. As discussed in 1.3, this means the
dimension equals the number of spanning vectors. Therefore the dimension
of the null space of a matrix A is the same as the number of free variables
in the solution set of A~x = ~0. In the example above, this means A has a
two-dimensional null space.
Now that we’ve successfully figured out a good way to describe the solution
set of A~x = ~0, let’s turn our attention to the situation where A~x = ~b and ~b =
6 ~0.
(Remember that b = ~ ~
6 0 doesn’t mean all its entries are nonzero, but rather that
~b has at least one nonzero entry.) We have no problem describing the solution
set when A~x = ~b has no solutions. Similarly if A~x = ~b has one unique solution,
we can easily describe the solution set by simply giving that solution. As with
A~x = ~0, this leaves us with the task of concisely describing our solution set
when we have infinitely many solutions.
Let’s start with a geometric exploration. As at the beginning of this section
we’ll work in R2 for ease of drawing pictures,   but here let’s look at a single
  x  
line. This means A~x = ~b looks like a c = b or ax + cy = b which we
y
can rewrite more familiarly as y = ac x + b. If we’re in the homogeneous case
then b = 0, which means we’re talking3 about a line through the origin as in
Figure 2.5.

1
-� � + � = �

-2 -1 1 2

Figure 2.5: A homogeneous


-1 solution set
158 Functions of Vectors

When we move back to the case where b = 6 0, we’re adding some nonzero
number b to our equation. Specifically, this is the difference between the lines
y = ac x and y = ac x + b. Adding a constant to a function shifts that function
up or down depending on the sign of the constant. (Think about how adding
“+C” on the end of a definite integral changes the graph of the antiderivative.)
This shift is shown in Figure 2.6.
3

-� � + � = �

1
-� � + � = �

-2 -1 1 2

-1

Figure 2.6: A corresponding pair of solution sets

The picture above suggests that for a fixed matrix A, the solution set of
A~x = ~b is a shifted version of the solution set of A~x = ~0. In our example
above, the vector we added to create that shift is a point on the line which  is
0
the solution set of A~x = ~b. For example, we could have used the point .
2
This relationship also holds more generally, i.e., the solution set of A~x = ~b
is a shifted version of the solution set of A~x = ~0 and that shift can be done
by adding any particular solution to A~x = ~b. In mathematical notation, if ~v
is any solution to A~x = ~b, i.e., A~v = ~b, then the solution set to A~x = ~b is
{~x + ~v | A~x = ~0}. This may sound a bit complicated, so let’s do an example.
 
  0
3 2 0 4 −1 3
 
Example 6. Let A = 0 2 6 0 −8. Use the fact that ~v =  
−1 is a
1 1 1 4 1  1
2
2.8 Solution Sets 159
 
8
solution to A~x = ~b for ~b = −16 to describe all solutions of A~x = ~b.
8
Our matrix A is the same asin  Example 5,and  we know from there that

 2x 3 − x 5 


 
 −3x 3 + 4x 5


~
the solution set of A~x = 0 is   

x3 . Since the solution set of

 −x 



5 

x5
A~x = ~b can be written as ~v plus the solution set to A~x = ~0, we get that the
solutions of A~x = ~b are
     

 2x3 − x5 0  
 2x3 − x5 


   
 −3x 3 + 4x 
5
 3
 

 
 
 −3x 3 + 4x 5 + 3 

 x  + −1  =  x − 1 
  3     3  .

  −x   1 
 
  −x + 1 



5 
  
5 

x5 2 x5 + 2

If we didn’t already have a particular solution ~v , we could also just solve


directly for the solution set. It won’t turn out to be a span as with A~x = ~0,
but will be almost a span.
 
~ 1 0 −1 2
Example 7. Find the solution set of A~x = b where A =
  0 1 3 −3
−5
and ~b = .
1
The augmented coefficient matrix of this equation is
 
1 0 −1 2 −5
0 1 3 −3 1

which is already in reduced echelon form. There is no leading 1 in the rightmost


column, so A~x = ~b does have solutions. Our matrix gives us the equations
x1 − x3 + 2x4 = −5 and x2 + 3x3 − 3x4 = 1, so our solution set can be written
x1 = −5 + x3 − 2x4 and x2 = 1 − 3x3 + 3x4 , with x3 and x4 free.
To try to make this look as much as possible like a span, we can follow the
same general procedure as Example 5. First we rewrite our solution set as a
vector to get  
−5 + x3 − 2x4
 1 − 3x3 + 3x4 
 .
 x3 
x4
Each component of this vector is a sum of three terms: a constant, a multiple
160 Functions of Vectors

of x3 , and a multiple of x4 . Splitting it up as a sum of three vectors gives us


     
−5 x3 −2x4
 1  −3x3   3x4 
 +   
 0   x3  +  0  .
0 0 x4

We can factor an x3 out of the second vector and an x4 out of the third to get
     
−5 1 −2
1 −3 3
  + x3   + x4   .
0 1 0
0 0 1

The second two terms are precisely the span of the second and third vectors,
so this can be written as
     
−5 
 1 −2 
1     
−3
  + Span   ,  3  .
0   1   0 

 

0 0 1

In this case we couldn’t communicate our solution set simply by giving a


list of spanning vectors, but we could give a list of spanning vectors along with
the extra vector which needs to be added to that span to give the solution
set. You can check that this meshes with our previous discussion, by checking
that the constant vector is a solution to A~x = ~b and the span is the solution
set to A~x = ~0.
Another way to view the solution sets of matrix equations A~x = ~b is
to connect them to the range of the function f with matrix A. (Since f is
a matrix map, we could also refer to its range as the column space of A,
Col(A).) Recall that the range was all ~b’s where A~x = ~b had a solution, i.e.,
the set of all vectors in the codomain that could be written as A~x for some
~x. One possible question we can ask is how large this range is, i.e., what is
the dimension of the range of A’s map. Since the range is also the span of the
columns of A, we can find the dimension of the range by asking how many of
A’s columns are linearly independent, which prompts the following definition.

Definition. The rank of a matrix A, written rk(A), is the number of linearly


independent columns of A.

 
4 1 3
Example 8. Compute the rank of A =  2 3 −1.
−1 6 −7
We saw in the last section that we could check the linear independence or
2.8 Solution Sets 161

linear dependence of a set of vectors by creating a matrix whose columns are


our vectors and checking whether or not every column of its reduced echelon
form had a leading 1. Here we’re starting with a matrix and thinking of the
vectors which are its columns, but otherwise
 this
 is the same procedure.
1 0 1
The reduced echelon form of A is 0 1 −1. This doesn’t have a leading
0 0 0
1 in every column, so the columns aren’t linearly independent. This tells us
that the rank of A isn’t 3. Our free variable is x3 , which we saw in 1.2 can
be interpreted as meaning that the third column is in the span of the other
two columns. If we ignore A’s third column, the other two columns do have
leading 1s in their reduced echelon form. This means that they are linearly
independent, so rk(A) = 2. This also tells us that A’s column space has
dimension 2.

The discussion before and during the last example showed us that the
linearly independent columns of A were precisely the ones whose columns
had a leading 1 in A’s reduced echelon form and their number tells us the
dimension of the range of A’s map. This gives the following result.

Theorem 2. The rank of a matrix A equals both the number of columns of


A’s reduced echelon form which contain a leading 1 and the dimension of the
columns space of A.

For example, this shows the column space of the matrix in Example 5 is a
2D subspace of R5 .
If we combine this new idea of the rank of a matrix with our earlier
discussion of the dimension of a matrix’s null space, we get the following
theorem, usually called the Rank-Nullity Theorem. (Some people call the
dimension of a matrix’s null space its nullity.)

Theorem 3. Let A be an m × n matrix. Then rk(A) + dim(Nul(A)) = n.

This theorem is remarkable in that it operates on two levels: one where it is


completely obvious that the theorem is true, and another where the theorem
tells us something interesting. We’ll start with the first layer to convince
ourselves this equation always holds.
The easiest term to interpret in the Rank-Nullity Theorem’s equation is n,
which is just the number of columns of our matrix A. In our discussion of rank,
we saw that the easiest way to compute the rank of a matrix is to count the
number of columns with leading 1s in its reduced echelon form. This means
we can view rk(A) as the number of columns of A’s reduced echelon form
which contain a leading 1. In our discussion of solving A~x = ~0, we saw that
the dimension of the null space of A was the number of free variables. Since
each free variable corresponds to a column of the reduced echelon form of A
162 Functions of Vectors

which doesn’t have a leading 1, dim(Nul(A)) is the number of columns of A’s


reduced echelon form which don’t contain a leading 1. From this perspective,
our equation can be read as saying: the number of columns of A’s reduced
echelon form with a leading 1 plus the number of columns without a leading
1 equals the total number of columns. Since a column must either contain or
not contain a leading 1, this is clearly true.
While the argument above tells us rk(A) + dim(Nul(A)) = n always holds,
it doesn’t provide any evidence that this result is interesting. To see why
we should care about the Rank-Nullity Theorem, let’s switch from thinking
computationally about A’s reduced echelon form to thinking geometrically
about the linear function f with matrix A. Since A is an m × n matrix, we
know f : Rn → Rm , so the n on the right-hand side of our equation is the
size of f ’s domain. The rank of A is the dimension of the column space of A,
i.e., the dimension of the range of f . The null space of A is the kernel of f , so
dim(Nul(A)) is the dimension of f ’s kernel. Putting this all together, we get a
sort of “conservation of dimension” law. The map f starts with n dimensions
of inputs, and ends up with rk(A) dimensions of outputs. What happened to
the other dimensions of the domain? The Rank-Nullity Theorem says those
dimensions must be collapsed down to ~0 in Rm , i.e., they are in the kernel.
To me, conservation laws are very beautiful things, so I love having one here
in linear algebra.
 
4 1 3
Example 9. Show the Rank-Nullity Theorem holds for A =  2 3 −1.
−1 6 −7
This is the matrix from Example 8, so we know rk(A) = 2.
To find the dimension of the null space of A, we need to find a basis for
Nul(A) which we can do by writing the solution set of A~x = ~0 as a span and
remembering that the dimension of Nul(A) is the number of spanning vectors
forthe solution
 set. We saw in Example 8 that the reduced echelon form of A
1 0 1
is 0 1 −1, which gives us the equations x1 + x3 = 0, and x2 − x3 = 0.
0 0 0
Therefore the solution set of A~x = ~0 is
   
 −x3   −1 
 x3  = Span  1  .
   
x3 1
Since there is one spanning vector, we get that dim(Nul(A)) = 1.
Putting this together, we see that
rk(A) + dim(Nul(A)) = 2 + 1 = 3.
A has three columns, so n = 3 and the Rank-Nullity Theorem is satisfied.
Geometrically, this equation tells us that A’s map f has domain R3 , a
two-dimensional range, and a one-dimensional kernel.
2.8 Solution Sets 163

Exercises 2.8.

1. Which type of solution set do the following equations have?


x1 + x2 − 4x3 = 0, −x1 + 3x3 = −1, −3x1 + 4x2 − 15x3 = 5.
2. Which
 type
 of solution
 setdoes thefollowing
 vector equation have?
2 −5 3 1
0 6 −1 1
x1        
−4 + x2  0  + x3  2  = 0
1 12 7 1
3. Find the solution set of the following equations or show they have no
solution. x1 +2x2 +3x3 = 0, 4x1 +5x2 +6x3 = 3, 7x1 +8x2 +9x3 = 0
4. Find the solution
 setof thefollowing
 vector
  equation
 or show it has
−1 1 2 0
no solution. x1  3  + x2 0 + x3 3 = 0
1 2 7 0
5. Find the solution set
 of the following
 matrix
  equation or show that
2 −2 0 4
it has no solution. 0 5 10  ~x =  5 
0 −4 −7 −5
 
3
6. Find the solution set of f (~x) = or show it has no solution,
11
   
x1 x1 + 2x2
where f = .
x2 3x1 + 4x2
7. In 1.2’s Exercise 7, we explored the chemical process of rusting
iron, which starts when iron, F e, oxygen gas, O2 , and water, H2 O,
combine to form iron hydroxide, F e(OH)3 . Let’s match 1.2 and
make the first entry in our vectors count iron atoms (F e), the second
count oxygen (O), and the third count hydrogen (H). If our reaction
combines x1 molecules of iron, x2 molecules of oxygen, and x3
molecules of water to create x4 molecules of rust, how many of each
kind of molecule must be present in this reaction? (Remember that
chemists want their numbers of molecules to be positive integers!)
8. In 1.2’s Exercise 8, we learned sodium chlorate, N aClO3 , is
produced by the electrolysis of sodium chloride, N aCl, and water,
H2 O, and that the reaction also produces hydrogen gas, H2 . Let’s
match 1.2 and make the first entry in our vectors count sodium
atoms (N a), the second count chlorine (Cl), the third count oxygen
(O), and the fourth count hydrogen (H). If our reaction combines x1
molecules of sodium chloride with x2 molecules of water to produce
x3 molecules of sodium chlorate and x4 molecules of hydrogen gas,
how many of each kind of molecule must be present in this reaction?
(Remember that chemists want their numbers of molecules to be
positive integers!)
164 Functions of Vectors

9. (a) How can you tell from the reduced echelon form of an
augmented coefficient matrix that the corresponding matrix
equation has one unique solution?
(b) Give an example of a 3 × 4 augmented coefficient matrix in
reduced echelon form whose corresponding matrix equation has
one unique solution.
10. For each option below, write down the reduced echelon form of an
augmented coefficient matrix whose matrix equation has that type
of solution set. What features of your reduced echelon form caused
you to have that type of solution set?
(a) no solution
(b) 1 solution
(c) infinitely many solutions
11. The solution set of the matrix equation A~x = ~0 is: x1 = 3x2 − 4x4 ,
x3 = 5x4 , and x2 and x4 free. Write this solution set as the span of
a set of vectors.
12. The solution set of the matrix equation A~x = ~0 is: x1 = 0,
x2 = 4x3 − 5x5 , x4 = 8x5 and x3 and x5 free. Write this solution
set as the span of a set of vectors.
 
1 −2 0 0 3
13. Let A = 0 0 1 0 −4. Write the solution set to A~x = ~0 as
0 0 0 1 1
the span of a set of vectors.
 
1 1 0 −1
14. Let A = 0 0 1 −2. Write the solution set of A~x = ~0 as the
0 0 0 0
span of a set of vectors.
 
1 2 0 −3
15. Let A = 0 0 1 −1. Write the solution set of A~x = ~0 as the
0 0 0 0
span of a set of vectors.
 
1 −1 0 2
16. Let A = . Write the solution set of A~x = ~0 as the
0 0 1 −3
span of a set of vectors.
 
1 0 2 −1
17. Let A = .
0 1 0 3
(a) Write the solution set of A~x = ~0 as the span of a set of vectors.
(b) Fix some nonzero vector ~b. If A~x = ~b has solutions, how is its
solution set related to your answer to (a)?
 
1 0 −1 4
18. Let A = 0 1 3 0.
0 0 0 0
2.8 Solution Sets 165

(a) Write the solution set of A~x = ~0 as the span of a set of vectors.
(b) Briefly describe the relationship between the solution
  set of
2
A~x = ~0 and the solution set of A~x = ~b where ~b = −3.
9
19. Suppose A~x = ~b has at least one solution. Describe the geometric
relationship between its solution set and the solution set of A~x = ~0.
20. Can a linear system with five equations in seven variables have
one unique solution? If you answer yes, give an example of such a
system. If you answer no, briefly explain why this is impossible.
21. Is it possible to have a linear system of three equations in four
variables which has one unique solution? If you answer yes, give an
example of such a system. If you answer no, briefly explain why this
is impossible.
 
−3 4 0
22. Find the rank of A =  1 2 1 .
2 −6 −1
 
2 5 −1
23. Find the rank of A = 3 8 4 .
1 9 −2
 
1 2 4
−1 3 5
24. Find the rank of A =  0 1 0.

−2 6 9
 
−5 0 2 1
25. Find the rank of A = .
2 0 −4 6
 
1 2 3
−1 3 2
26. Show the Rank-Nullity Theorem holds for A =   0 1 1.

−2 6 4
 
−3 1 2
27. Show the Rank-Nullity Theorem holds for A =  4 2 −6.
0 1 −1
28. Suppose f : R5 → R3 is a linear map whose matrix has rank 2.
What is the dimension of f ’s kernel?
29. Suppose f : R6 → R9 is a linear map whose matrix has rank 4.
What is the dimension of f ’s kernel?
166 Functions of Vectors

2.9 Large Matrix Computations


In many modern applications, matrices are being used to store vast data sets
worth of information. This may mean that our matrices are so large that even
computers will be slowed down trying to do basic matrix operations or solve
matrix equations. In this section we’ll explore two possible tools to use in these
types of situations that allow these vast problems to be reduced to easier or
smaller cases.
A note before we begin our discussion: the major difficulty in trying to
explore these ideas in a book is that we have the option to either do realistic
examples which are too large to easily write down or to do examples which
are so small that we should do them using the techniques developed in earlier
sections. I’ve made the choice to use smaller examples here, so you’ll have to
use your imagination to extend them to the case where our matrices may have
hundreds or even thousands of rows and columns.
The first idea we’ll discuss helps us compute rA, A + B, or AB when A
and B are very, very large. Our main trick here is to partition our matrices
into smaller pieces and do our matrix operation piece by piece. This also has
the advantage that we can choose to only compute part of rA, A + B, or
AB rather than computing the whole matrix. Before we dive in, let’s be very
specific about what we mean by a partition of a matrix.

Definition. A partition of a matrix A is a set of vertical and horizontal cuts


which slice A into a collection of rectangular submatrices.

 
2 1 0 −5
Example 1. Partition the matrix A = 7 −3 9 0  using a vertical
1 −1 6 8
cut between the third and fourth columns and a horizontal cut between the
second and third rows.

The standard way to indicate where we are slicing our matrix to create
our partition is to draw horizontal and vertical lines through the matrix to
visually show each cut. Here we want to cut vertically between the third and
fourth columns and horizontally between the second and third rows. This
means we need to draw a vertical line between the third and fourth columns
and a horizontal line between the second and third rows. This gives us the
following visual illustration of our partitioned matrix
 
2 1 0 −5
 7 −3 9 0  .
1 −1 6 8
2.9 Large Matrix Computations 167

This partition gives us a 2 × 3 top left submatrix, a 2 × 1 top right submatrix,


a 3 × 1 bottom left submatrix, and a 1 × 1 bottom right submatrix.

 
3 −1 4
Example 2. Explain why  −2 1 5  is not a partition of the matrix
  8 0 2
3 −1 4
A = −2 1 5.
8 0 2
The vertical cut between the first and second columns is fine, but the
horizontal cuts do not go all the way across the matrix. Therefore this is not
a partition.

Now that we understand how to properly partition a matrix, let’s look


at how to do matrix operations on partitioned matrices. The easiest matrix
operation here is multiplication by a scalar. Since we get rA by multiplying
each entry of A by r, we can compute part of rA quite easily without
computing the whole matrix. In the language of partitions, this means that
if we’ve partitioned A then we can compute the submatrices of the same
partition on rA one submatrix at a time by simply multiplying each submatrix
by r. In other words, if we only want to know one piece of rA, we would choose
a partition of A which isolates the part of rA we want to compute and then
multiply it by r to get the piece of rA that we’re interested in.

Example 3. Compute the submatrix  of −2A consisting


 of the top two rows
2 1 0 −5
and middle two columns where A = 7 −3 9 0 .
1 −1 6 8
To do this, we first need to create a partition that isolates the requested
submatrix of A. This means we need to do two vertical cuts, between the
first and second columns and between the third and fourth columns, and one
horizontal cut between the second and third rows. This looks like
 
2 1 0 −5
 7 −3 9 0  .
1 −1 6 8

The submatrix we’re interested in is the top middle 2 × 2 piece. Since we’re
only interested in the corresponding part of −2A, we can ignore the rest of
the matrix and just multiply that 2 × 2 submatrix by −2 to get
 
−2 0
.
6 −18
168 Functions of Vectors

Matrix addition isn’t much more complicated than scalar multiplication,


since we compute each entry of A + B by adding together the corresponding
entries of A and B. This means that if we’re interested in finding only part of
A + B, we can compute only that part by using carefully selected partitions
of A and B. However, we do have one additional restriction that we didn’t
have when computing rA: the sum A + B only makes sense if A and B are
the same size. This means that not only do we need to require A and B to
be the same size, but also that we need to use the same partition on both of
them and on A + B so that we’re adding submatrices of the same size. This
may initially sound confusing, but will hopefully be cleared up in the example
below.

Example 4. Compute the submatrixof A + B consisting


 of the
 bottom two

3 −5 4 9 −9 2
rows and left two columns where A = 1 0 −1 and B = 5 4 −6.
2 5 −1 3 −3 7
The submatrix of A + B we’re interested in is the 2 × 2 bottom left corner.
We can create a partition which isolates this part of both A and B by cutting
vertically between the second and third columns and horizontally between the
first and second rows. This gives us
   
3 −5 4 9 −9 2
 1 0 −1  and  5 4 −6  .
2 5 −1 3 −3 7

Now we can compute the requested piece of A + B by adding the 2 × 2


submatrices that form the bottom left corners of A and B. This gives us
     
1 0 5 4 6 4
+ = .
2 5 3 −3 5 2

The situation for matrix multiplication is a little more complicated,


because both computing the entries of AB and choosing the right partitions
for A and B are a little trickier. If we’re interested in computing part of AB,
we can’t focus only on those entries of A and B as in matrix addition, because
matrix multiplication uses the whole ith row of A and jth column of B to find
the ijth entry of AB. Additionally, we can’t simply use the same partition on
A, B, and AB, because for matrix multiplication to make sense we need the
number of columns of A to equal the number of rows of B and similarly for
any submatrices of A and B that we want to multiply together. Let’s tackle
these problems in two stages, starting with the right choice of partitions.
To figure out the right partitions for A and B to compute a particular
piece of AB, remember our rule that the product of an m × k and a k × n
matrix is an m × n matrix. Reversing the order, we can see that to find an
m × n submatrix of AB, we’ll need to use an m × k partition of A and a k × n
partition of B. In other words, the vertical cuts we use in our partition of A
2.9 Large Matrix Computations 169

need to match up with the horizontal cuts in our partition of B since these
correspond to our choice of matching k. The placement of the horizontal cuts
in our partition of A, which correspond to m, and the vertical cuts in our
partition of B, which correspond to n, depend on the size of the submatrix of
AB we want to compute.
 
−1 0 2 5
Example 5. Find partitions of the matrices A =  3 1 −3 2 and
  4 −2 0 1
0 7 −2 4 1
 6 −4 2 0 −1
B =  
−9 5 −2 4 3  which could be used to compute the 2 × 3
3 5 −1 1 2
submatrix in the top right corner of AB.

From our discussion above, we know that to create a 2×3 submatrix in the
top right corner of AB we’ll need to partition A with a horizontal cut between
the second and third rows and B with a vertical cut between the second and
third columns. This gives us
 
  0 7 −2 4 1
−1 0 2 5  6 −4 2 0 −1 
 3 1 −3 2  and   −9 5 −2 4 3  .

4 −2 0 1
3 5 −1 1 2

Additionally, we’ll need to partition A with a vertical cut between the kth
and k + 1st columns and B with a matching horizontal cut between the kth
and k + 1st rows. If there were any way to take advantage of blocks of zero
entries in either A or B, the choice of k could matter more. However, since
there aren’t any zero blocks, let’s divide the four rows of A and columns of
B evenly, i.e., let k = 2. This means we’re adding a vertical cut between the
second and third columns of A and a horizontal cut between the second and
third rows of B. This gives us
 
  0 7 −2 4 1
−1 0 2 5  6 −4 2 0 −1 
 3 1 −3 2  and   −9 5 −2 4 3  .

4 −2 0 1
3 5 −1 1 2

Notice that our partitions of A and B naturally give rise to a partition of


AB as seen in the following example.

Example 6. Use the partitions on A and B in the previous example to create


a partition on AB.
170 Functions of Vectors

The partition on AB gets its horizontal cuts from the horizontal cuts in
our partition of A and its vertical cuts from the vertical cuts in our partition
of B. (I remember this by remembering that AB gets its number of rows
from A and its number of columns from B so it seems natural to divide AB’s
rows as A’s rows are divided and AB’s columns as B’s columns are divided.)
Therefore we have one horizontal cut between the second and third rows of
AB and one vertical cut between the second and third columns of AB. This
gives us the partition below (with * used to denote an unknown matrix entry).
 
∗ ∗ ∗ ∗ ∗
 ∗ ∗ ∗ ∗ ∗ 
∗ ∗ ∗ ∗ ∗

Now that we understand how to choose partitions for A and B and the
partition this creates for AB, how can we compute the submatrices of AB’s
new partition? This ends up being easier than you might expect. In fact we
can link it back to basic matrix multiplication by replacing each submatrix
of A and B by a variable and then multiplying as if those variables were our
matrix entries. This is illustrated below.

Example 7. Use the partitions of A and B from previous examples to


compute the 2 × 3 submatrix in the top right corner of AB.

Recall that we had partitioned A and B as


 
  0 7 −2 4 1
−1 0 2 5  6 −4 2 0 −1 
 3 1 −3 2  and   −9 5

−2 4 3 
4 −2 0 1
3 5 −1 1 2

which gave us the following partition on AB


 
∗ ∗ ∗ ∗ ∗
 ∗ ∗ ∗ ∗ ∗ .
∗ ∗ ∗ ∗ ∗

Each of our matrices has been divided into four submatrices, which we will call
A1 , . . . , A4 , B1 , . . . , B4 , and (AB)1 , . . . , (AB)4 respectively. In this notation,
our matrix multiplication becomes
    
A1 A2 B1 B2 (AB)1 (AB)2
= .
A3 A4 B3 B4 (AB)3 (AB)4

Now our computation can be done as if we were multiplying two 2×2 matrices
whose entries are our submatrices of A and B, i.e., AB1 = A1 B1 + A2 B3 ,
AB2 = A1 B2 + A2 B4 , AB3 = A3 B1 + A4 B3 , and AB4 = A3 B2 + A4 B4 .
2.9 Large Matrix Computations 171

We only want to compute (AB)2 , so we can ignore most of these submatrix


products and just do (AB)2 = A1 B2 + A2 B4 . Going back to our matrix
partitions, we get
       
−1 0 2 5 −2 4 1 −2 4 3
A1 = , A2 = , B2 = , B4 = .
3 1 −3 2 2 0 −1 −1 1 2

Plugging this into our formula above, we get that the 2 × 3 submatrix in
the top right corner of AB is
     
−1 0 −2 4 1 2 5 −2 4 3
+
3 1 2 0 −1 −3 2 −1 1 2
     
2 −4 −1 −9 13 16 −7 9 15
= + = .
−4 12 2 4 −10 −5 0 2 −3

Clearly this represents significant computational savings over having


to compute the entire matrix product AB. Even if you are interested in
computing all of AB, you may still be able to get some work savings out
of this method if either A or B has a block of entries which are all 0. If you
can choose your partitions so that a submatrix of A or B is the zero matrix,
then you’ve removed the need to compute any products with that submatrix.
Now that we’ve discussed how to break up matrix operations, let’s turn
our attention to solving the matrix equation A~x = ~b when A is terribly large.
The main idea here will be to create a factorization of A as the product of two
special matrices. For the discussion below we will assume that A is square,
i.e., n × n. It is possible to adapt this method to other cases, but we will opt
for the simpler explanation and notation here. We start by noticing that there
are two cases where solving A~x = ~b is especially easy.

Definition. A matrix A is upper triangular if it has aij = 0 for i > j. A


matrix A is lower triangular if it has aij = 0 for i < j.

Note that the entries aij = 0 with i > j are the entries below the diagonal
of A while the entries aij = 0 with i < j are the entries above the diagonal
of A. I remember this by recognizing a triangular matrix as one which has a
triangle of zeros either above or below its diagonal. The other entries can be
freely chosen, and if these free entries are in the upper part of the matrix it
is upper triangular. If the free entries are in the lower part of the matrix it is
lower triangular. However you choose to remember this, it is important to note
that the “upper” and “lower” in the definition above refer to the arbitrary
entries rather than the entries which must be zero.
 
−4 7 2
Example 8. Is  0 0 −5 upper triangular or lower triangular?
0 0 1
172 Functions of Vectors

This matrix has nonzero entries only on or above the diagonal, so it is


upper triangular.

Example 9. Write down a 3 × 3 lower triangular matrix.

Here we want to create a 3 × 3 matrix whose nonzero entries are all on or


below its diagonal. One possible way to do this is
 
2 0 0
−5 4 0 .
0 6 9

Our factorization for A will be A = LU where L is a lower triangular


matrix and U is an upper triangular matrix. The reason for doing this is that
solving a matrix equation is much easier when the matrix is either upper or
lower triangular. Thus we’ll be able to replace solving the general matrix
equation A~x = ~b with a two-stage process where we first solve a matrix
equation involving L and then solve a matrix equation involving U . Before
we get into the details, let’s explore how triangular matrices help us solve
matrix equations more quickly.
If U is upper triangular, then we can solve U~x = ~b one variable at a time
by starting at the bottom and working our way up. The bottom row of our
augmented coefficient matrix corresponds to an equation where all variables
have coefficient 0 except xn , so we can easily solve for xn . The next row up
will have 0 coefficients on all by the last two variables. Since we already know
the value of xn , we can plug that in and easily solve for xn−1 . Repeating this
process up the rows from the bottom, we soon find all of ~x.
   
−4 7 2 4
Example 10. Solve  0 −5 3  ~x = 2.
0 0 −1 6
The augmented coefficient matrix of this equation is
 
−4 7 2 4
 0 −5 3 2 .
0 0 −1 6

The bottom row gives us the equation −x3 = 6, which we can quickly solve to
get x3 = −6. The next row up gives us the equation −5x2 + 3x3 = 2. Plugging
in x3 = −6, we get −5x2 − 18 = 2 so x2 = −4. The top row gives us the
equation −4x1 + 7x2 + 2x3 = 4. Plugging in x2 = −4 and x3 = −6, we get
−4x1 − 28 − 12 = 4 so x1 = −11.

Solving L~x = ~b where L is lower triangular is very similar except that


we start with the top row and work our way down. Here the top row of our
2.9 Large Matrix Computations 173

augmented coefficient matrix corresponds to an equation where all but the


first variable have 0 coefficients. Thus we can easily solve for x1 . The second
row gives an equation with 0 coefficients except on the first two variables, so
we can plug in the value of x1 and solve for x2 . Repeating this process as we
move down the rows of our matrix, we get ~x.
   
2 0 0 8
Example 11. Solve −5 4 0 ~x = −12.
1 6 9 7
The augmented coefficient matrix of this equation is
 
2 0 0 8
−5 4 0 −12 .
1 6 9 7

The top row gives the equation 2x1 = 8, so x1 = 4. The second row gives the
equation −5x1 + 4x2 = −12. Plugging in x1 = 4 gives −20 + 4x2 = −12, so
x2 = 2. The bottom row gives the equation x1 + 6x2 + 9x3 = 7. Plugging in
x1 = 4 and x2 = 2 gives 4 + 12 + 9x3 = 7, so x3 = −1.

In either case, it was much easier to solve a matrix equation where our
matrix was triangular than it would have been for a general matrix.
Now suppose we have factored A into the product of a lower triangular
matrix L and an upper triangular matrix U , so A = LU . Instead of solving
A~x = ~b directly, we can first solve L~y = ~b and then solve U~x = ~y . Then

A~x = (LU )~x = L(U~x) = L~y = ~b,

so ~x is our solution to A~x = ~b.


 
2 4 6
Example 12. Use the fact that A = −1 −14 1 has LU -factorization
−2 −13 2
    
2 0 0 1 2 3 −2
A = −1 4 0 0 −3 1 to solve A~x =  13 .
−2 3 1 0 0 5 −4
We’ll use the strategy above and start by solving the equation L~y = ~b
which in our case is    
2 0 0 −2
−1 4 0 ~y =  13  .
−2 3 1 −4
As in the previous example, we can start from the top and work our way down
the rows. The top row says 2y1 = −2, so y1 = −1. The second row tells us
−y1 + 4y2 = 13. Plugging in y1 = −1 gives 1 + 4y2 = 13, so y2 = 3. The
174 Functions of Vectors

last row says −2y1 + 3y2 + y3 = −4. Plugging in y1 = −1 and y2 = 3 gives
−1
2 + 9 + y3 = −4, so y3 = −15. This means ~y =  3 .
−15
Next we need to solve the equation U~x = ~y which in our case is
   
1 2 3 −1
0 −3 1 ~x =  3  .
0 0 5 −15

This is very similar to the process for L~y = ~b except that we’ll start at the
bottom row and work up. The bottom row says 5x3 = −15, so x3 = −3. The
second row says −3x2 + x3 = 3. Plugging in x3 = −3 gives −3x2 − 3 = 3, so
x2 = −2. The top row says x1 + 2x2 + 3x3 = −1. Plugging in x2 = −2 and
x3 = −3 gives us x1 − 4 − 9 = −1, so x1 = 12.  
12
Thus the solution to our original matrix equation A~x = ~b is ~x = −2.
−3

Now that we understand why factoring A into the product of an upper and
lower triangular matrix is so useful, let’s discuss how to compute L and U . If
you think about it, you’ll realize that Part 1 of our row reduction algorithm
actually produces an upper triangular matrix, because it starts from the upper
left and creates zeros below each of our leading 1s. This means that we can
find our upper triangular matrix U simply by running the first half of row
reduction on A. The trickier part will therefore be finding L so that A = LU .
To do this, we’ll need a new way to view row operations as multiplication by
a special type of matrix.

Definition. The elementary matrix of a row operation is the matrix we get


by doing that row operation to In .

Example 13. Let n = 3. Find the elementary matrices of the row operations
r3 − 7r1 , r2 ↔ r3 , and 12 r1 .

Since n = 3, we’ll find the elementary matrices of these row operations by


doing each one to I3 .
To find the elementary matrix of r3 − 7r1 we do
   
1 0 0 1 0 0
0 1 0 →r3 −7r1  0 1 0
0 0 1 −7 0 1
 
1 0 0
so the elementary matrix of r3 − 7r1 is  0 1 0.
−7 0 1
2.9 Large Matrix Computations 175

To find the elementary matrix of r2 ↔ r3 we do


   
1 0 0 1 0 0
0 1 0 →r2 ↔r3 0 0 1
0 0 1 0 1 0
 
1 0 0
so the elementary matrix of r2 ↔ r3 is 0 0 1.
0 1 0
To find the elementary matrix of 12 r1 we do
  1 
1 0 0 2 0 0
0 1 0 → 1 r  0 1 0
2 1
0 0 1 0 0 1
1 
2 0 0
so the elementary matrix of 12 r1 is  0 1 0.
0 0 1

These elementary matrices link our row operations to matrix multiplication


in the following way. Suppose E is the elementary matrix of some row
operation and A is an n × n matrix. Then EA is the matrix we get by doing
E’s row operation to A. In other words, multiplication on the left by the
elementary matrix of a row operation does that row operation to the matrix
being multiplied.
Part 1 of our row reduction algorithm is just a series of row operations, so
from this perspective we can write our upper triangular matrix U as

U = Ek Ek−1 · · · E2 E1 A

where Ei is the elementary matrix of the ith row operation we used. Note
that since matrix multiplication is not commutative it is important to place
E1 closest to A since this corresponds to doing its row operation first, then
E2 , etc. However, we want to write A = LU , so we need some way to move
the Ei ’s to the other side of the equation and in particular to the left of U .
Luckily each row operation has another row operation which reverses it.
If we added a multiple of one row to another row, we can subtract that same
multiple. If we multiplied a row by a constant, we can divide by that constant.
If we swapped two rows, we can swap those two rows back. Thus each Ei has
another elementary matrix which undoes it. If we call this other matrix Fi ,
then what we’re saying is that multiplication on the left by Ei and then Fi
is the same as doing nothing, i.e., Fi Ei = In . If we find the F1 , . . . , Fk which
undo E1 , . . . , Ek , then we can start moving the Ei matrices to the other side
of the equation U = Ek Ek−1 · · · E2 E1 A to get something of the form LU = A.
We start by multiplying both sides on the left by Fk to cancel off Ek . This
176 Functions of Vectors

gives us

Fk U = Fk Ek Ek−1 · · · E2 E1 A = In Ek−1 · · · E2 E1 A = Ek−1 · · · E2 E1 A.

Next we multiply on the left by Fk−1 to cancel off Ek−1 , which gives

Fk−1 Fk U = Ek−2 · · · E2 E1 A.

Repeating this process with Fk−2 through F1 gives us

F1 F2 · · · Fk−1 Fk U = A.

This is very close to what we want, but to finish our factorization we will need
to argue that F1 F2 · · · Fk−1 Fk is a lower triangular matrix which we can use
as our L.
The first step in seeing F1 F2 · · · Fk−1 Fk is lower triangular is to see that
each of the Fi matrices is lower triangular. Remember that each Ei was the
elementary matrix of a row operation used in the first half of our row reduction
algorithm to either create a leading 1 along the diagonal of A or use a leading
1 to create a zero below that leading 1. Thus the row operations we use only
affect entries on or below the diagonal of A. To create the elementary matrix
of a row operation, we perform our row operation on In . Since In is lower
triangular and we’re doing a row operation that changes only entries below
the diagonal, the upper section of each of our Ei ’s will still be all 0s. Thus
each Ei is lower triangular. Since each Fi is the reverse of the row operation
of Ei , it also only affects entries below the diagonal. Therefore each Fi is also
lower triangular. You will explore in the exercises that the product of lower
triangular matrices is again lower triangular, so F1 F2 · · · Fk−1 Fk is a lower
triangular matrix, which we can call L. Therefore we can use careful row
reduction and reversal of row operations to find L and U so that A = LU .
 
−4 12 −20
Example 14. Find an LU -factorization of A =  0 2 4 .
−1 6 0
The first step of this process is to use Part 1 of our row reduction algorithm
to find our upper triangular matrix U . While we do this, we’ll need to keep
careful track of our row operations (in order) so we can use them to create
our lower triangular matrix L. I’ll use our usual notation to write down the
row operation at each step.
     
1 −3 5 1 −3 5 1 −3 5
A →− 14 r1  0 2 4 →r3 +r1 0 2 4 → 12 r2 0 1 2
−1 6 0 0 3 5 0 3 5
   
1 −3 5 1 −3 5
→r3 −3r2 0 1 2  →−r3 0 1 2 .
0 0 −1 0 0 1
2.9 Large Matrix Computations 177

This last matrix is our upper triangular matrix


 
1 −3 5
U = 0 1 2 .
0 0 1
Now we can use our row operations to create L. Remember that
L = F1 F2 · · · Fk−1 Fk
where Fi is the elementary matrix of the row operation that reverses the ith
row operation used above.
Our first row operation was − 14 r1 , which is reversed by the row
operation−4r1 . Doing −4r1 to I3 gives us the elementary matrix of −4r1 ,
so  
−4 0 0
F1 =  0 1 0 .
0 0 1
Our second row operation was r3 +r1 , which is reversed by r3 −r1 . This means
 
1 0 0
F2 =  0 1 0 .
−1 0 1
Our third row operation was 12 r2 , which is reversed by 2r2 . Thus
 
1 0 0
F3 = 0 2 0 .
0 0 1
Our fourth row operation was r3 − 3r2 which is reversed by r3 + 3r2 , so
 
1 0 0
F4 = 0 1 0 .
0 3 1
Our fifth and last row operation was −r3 which is reversed by −r3 , so
 
1 0 0
F5 = 0 1 0  .
0 0 −1
This means  
−4 0 0
L = F1 · · · F5 =  0 2 0 .
−1 3 −1
Therefore our LU -factorization is
  
−4 0 0 1 −3 5
A =  0 2 0  0 1 2 .
−1 3 −1 0 0 1
178 Functions of Vectors

Notice that to use our LU -factorization method to solve A~x = ~b, we need
to find L and U which takes a bit of work. This means LU -factorization makes
the most sense if we want to solve A~x = ~b for several different values of ~b.
In that case, we can find A = LU once and then use our LU -factorization
repeatedly to solve each matrix equation quickly. However, if we want to solve
A~x = ~b for a single value of ~b, it may be quicker to simply row reduce the
augmented coefficient matrix.
This certainly isn’t an exhaustive list of methods for dealing with very
large matrices, and the matrices used in the examples and exercises of this
book are small enough that we won’t need these techniques again. However,
given the increasing importance of large data sets stored in correspondingly
large matrices, I hope you will keep these techniques in mind in case you need
them in future applications.

Exercises 2.9.
1. Let A be a 3×3 matrix. Find the partition which isolates the entries
a21 and a31 or say that no such partition exists.
2. Let A be a 3×4 matrix. Find the partition which isolates the entries
a31 , a32 , and a34 or say that no such partition exists.
3. Let A be a 4×5 matrix. Find the partition which isolates the entries
a22 , a23 , a32 , and a33 or say that no such partition exists.
4. Let A be a 5×6 matrix. Find the partition which isolates the entries
a25 , a26 , a35 , a36 , a45 , and a46 or say that no such partition exists.
 
−4 1 0 9
5. Let A =  3 −2 5 1. Use a partition to compute the
6 3 −2 7
submatrix of 5A consisting of the bottom row and middle two
columns.  
5 −4 2
−1 8 0
6. Let A =  2
. Use a partition to compute the submatrix
3 4
−1 0 6
of −6A consisting of the middle two rows and left column.
 
12 −7 4
7. Let A =  3 −1 4 . Use a partition to compute the submatrix
9 2 −3
of 4A consisting of the bottom two rows and right two columns.
 
7 1 −5 3
−1 0 3 6
8. Let A =  
 5 2 3 −1. Use a partition to compute the
9 3 0 1
submatrix of −2A consisting of the middle two rows and left two
columns.
2.9 Large Matrix Computations 179
   
1 −1 3 12 −7 4
9. Let A = 5 −2 9 and B =  3 −1 4 . Use a partition to
4 1 2 9 2 −3
compute the submatrix of A + B consisting of the middle row and
right two columns.
   
1 6 −7 6 −2 8
10. Let A = −3 4 −10 and B = 0 1 −1. Use a partition
2 −8 1 3 5 2
to compute the submatrix of A + B consisting of the top two rows
and middle column.
   
4 −2 4 3 −9 2
11. Let A = 7 1 −1 and B =  0 −2 4. Use a partition to
1 1 5 −5 1 3
compute the submatrix of A + B consisting of the bottom two rows
and left two columns.
   
1 0 −1 6 1 2
 0 3 −1 −1 3 4
12. Let A =    
−3 1 0  and B =  0 2 1. Use a partition to
1 1 −1 5 2 3
compute the submatrix of A + B consisting of the middle two rows
and right two columns.
13. Let A be a 5 × 3 matrix and B be a 3 × 4 matrix. Find partitions
of A and B that you could use to compute the submatrix of AB
consisting of the top two rows and right column.
14. Let A be a 3 × 2 matrix and B be a 2 × 4 matrix. Find partitions
of A and B that you could use to compute the submatrix of AB
consisting of the bottom row and left two columns.
15. Let A be a 3 × 3 matrix and B be a 3 × 4 matrix. Find partitions
of A and B that you could use to compute the submatrix of AB
consisting of the middle row and middle two columns.
16. Let A be a 5 × 2 matrix and B be a 2 × 3 matrix. Find partitions
of A and B that you could use to compute the submatrix of AB
consisting of the top three rows and left two columns.
 
1 −1 3  
2  3 1
0 −2
17. Let A = 4 and B = −1 2 . Use a partition to
1 1
4 −2
−1 2 1
compute the submatrix of AB consisting of the middle two rows
and right column.
 
6 1 2  
−1 3 4 −4 1 0 9
18. Let A =    
 0 2 1 and B = 3 −2 5 1 . Use a partition
6 3 −2 7
5 2 3
180 Functions of Vectors

to compute the submatrix of AB consisting of the bottom two rows


and right two columns.
 
  6 1 2
−4 1 0 9 −1 3 4
19. Let A =  3 −2 5 1 and B =  
 0 2 1. Use a partition
6 3 −2 7
5 2 3
to compute the submatrix of AB consisting of the top two rows and
left two columns.
   
2 −1 3 0 4 2 −1 9
1 1 −1 3   
20. Let A =   and B =  3 −1 2 1. Use a
4 2 0 −1   0 1 1 5
5 2 1 −2 −2 1 −1 6
partition to compute the submatrix of AB consisting of the middle
two rows and middle two columns.
    
−2 0 10 2 0 0 −1 0 5
21. Use the factorization  1 8 −1 = −1 4 0  0 2 1 
 −3  2 14  3 1 1 0 0 −2
−2 0 10 12
to solve  1 8 −1 ~x = −6.
−3 2 14 14
    
−6 −3 12 −3 0 0 2 1 −4
22. Use the factorization  10 4 −23 =  5 −1 0 0 1 3 
 0 1  9 0 1 3 0 0 2
−6 −3 12 33
to solve  10 4 −23 ~x = −66.
0 1 9 35
    
2 −4 −1 −1 0 0 −2 4 1
23. Use the factorization −4 8 5  =  2 1 0   0 0 3
6 −12 2 −3 2 −1 0 0 1
   
2 −4 −1 11
to solve −4 8 5  ~x = −25.
6 −12 2 28
    
12 4 −8 4 0 0 3 1 −2
24. Use the factorization  0 3 −1 =  0 −1 0 0 −3 1 
 −6
 −5  3 −2 1 2 0 0 −1
12 4 −8 −8
to solve  0 3 −1 ~x =  13 .
−6 −5 3 −13
25. Show
 that
 multiplying on the left by the elementary matrix
1 0
multiplies the second row by −2.
0 −2
2.9 Large Matrix Computations 181
 
0 1
26. Show that multiplying on the left by the elementary matrix
1 0
swaps the first and second rows.
27. Find the 2 × 2 elementary matrix which adds 6 times the first row
to the second row.
28. Find the 2 × 2 elementary matrix which multiplies the first row by
7.
29. Find the 3 × 3 elementary matrix which swaps the first and third
rows.
30. Find the 3 × 3 elementary matrix which adds −3 times the third
row to the second row.
 
4 0
31. Find the elementary matrix which undoes .
0 1
 
1 −5
32. Find the elementary matrix which undoes .
0 1
 
1 0 0
33. Find the elementary matrix which undoes 0 0 1.
0 1 0
 
1 0 0
34. Find the elementary matrix which undoes 0 1 9.
0 0 1
 
−1 1 3
35. Find an LU -factorization of A = −5 8 39.
1 1 3
 
2 −8 6
36. Find an LU -factorization of A = 3 −15 6.
1 −5 3
 
1 −2 4
37. Find an LU -factorization of A = 0 2 −2.
1 1 5
 
1 6 −7
38. Find an LU -factorization of A =  1 4 −13.
−2 −8 21
39. (a) Show that the product of two 3 × 3 lower triangular matrices
is again lower triangular.
(b) Use your result from (a) to explain why any finite product of
3 × 3 lower triangular matrices is lower triangular.
40. Repeat the previous exercise with n × n lower triangular matrices.
182 Functions of Vectors

2.10 Invertibility
So far, we’ve solved f (~x) = ~b for ~x using row reduction. However, from your
earlier study of functions, you know that some functions can be quickly undone
using another function called the inverse function. For example, f (x) = 2x
has f −1 (x) = 12 x. On the other hand, some functions don’t have an inverse,
like f (x) = x2 . In this section we’ll develop a technique to simultaneously
determine whether or not a given linear function from Rn to Rm has an
inverse and to find its inverse function if it has one.
Intuitively, the reason f (x) = 2x and f −1 (x) = 12 x are inverses is that
they undo each other. What we mean by that is that if we compose these
two functions in either
 order we get the identity function. In other words,
f (f −1 (x)) = f 12 x = 2 12 x = x and f −1 (f (x)) = f −1 (2x) = 12 (2x) = x
The general versions of these two equations give us our formal definition of
invertibility.

Definition. A function f : Rn → Rm is invertible if it has an inverse function


f −1 : Rm → Rn so that f −1 (f (~x)) = ~x for every ~x in Rn and f (f −1 (~b)) = ~b
for every ~b in Rm .

This definition looks a little complicated, but the first condition is saying
that f −1 undoes f while the second says that f undoes f −1 . This means we
get a bonus fact: f −1 is also invertible and has inverse function f .

Theorem 1. A function f is invertible if and only if it is both 1-1 and onto.

This makes sense, because to be able to reverse a function we need


everything in the domain and codomain to match up in unique pairs. If
our function isn’t onto then there are elements of the codomain which aren’t
matched up with anything from the domain, so f can’t be invertible. If our
function isn’t 1-1, then multiple elements from the domain map to some single
element of the codomain, so f also won’t be invertible.
From our work in 2.5, we know that a linear function f : Rn → Rm can be
1-1 only if n ≤ m and can be onto only if m ≤ n. Therefore it is only possible
to have both 1-1 and onto when m = n.

For the rest of this section, we will assume that all functions have
the same domain and codomain, i.e., f : Rn → Rn and all matrices
are n × n.

Since we are only working with linear functions from Rn to Rn , we know


f ’s matrix will be n × n. Our inverse function f −1 will also have an n × n
matrix, which we’ll call A−1 . This means we can reformulate our definition of
invertibility in terms of matrices as follows. (This reformulation uses the fact
2.10 Invertibility 183

that the matrix of the identity map f : Rn → Rn by f (~x) = ~x is the identity


matrix In .)

Definition. A matrix A is invertible if it has an inverse matrix A−1 so that


A−1 A = In and AA−1 = In .

Note that we need to check A−1 A = In and AA−1 = In separately, since


we’ve seen that matrix multiplication doesn’t always commute.

This definition allows us to rephrase our twin problems of deciding whether


or not a linear function is invertible and finding its inverse function (if possible)
as deciding whether or not a matrix is invertible and finding its inverse matrix.
This translation may seem semantic, but it allows us to bring row reduction
to bear on the problem which will turn out to be a big help.
Let’s tackle these two problems one at a time, starting with how to tell
whether or not a given n × n matrix A is invertible. We know A is invertible
if its corresponding linear function is both 1-1 and onto. We can check both
of these conditions simultaneously by row reducing A and looking at where
the leading 1s lie in its reduced echelon form. As we saw in 2.5, if we have
a leading 1 in every column, then A’s function is 1-1. If we have a leading 1
in every row, then A’s function is onto. Therefore to get both 1-1 and onto,
which together imply invertibility, we need to see a leading 1 in every row and
column of A’s reduced echelon form. Since A is a square matrix, this means
its reduced echelon form must have a stripe of leading 1s along the diagonal,
i.e., A’s reduced echelon form must be In .
 
2 1 −2
Example 1. A = 18 2 3  is not invertible.
2 −1 4
To see this we compute the reduced echelon form of A, which is
 
1 0 12
0 1 −3 .
0 0 0

Since the reduced echelon form is not I3 , A is not invertible.


   
x1 4x1 + x2 − x3
Example 2. f : R3 → R3 by f x2  =  x2 + 3x3  is invertible.
x3 −4x1 − x2
To see this we first find the matrix of f , which is
 
4 1 −1
A= 0 1 3 .
−4 −1 0
184 Functions of Vectors

(See 2.2 if you need a reminder of how to find the matrix of a linear function.)
Next we compute the reduced echelon form of A, which is
 
1 0 0
I3 = 0 1 0 .
0 0 1

Therefore f is invertible.

If it turns out that f and A are not invertible then we are done, since we
can’t find f −1 and A−1 . However, if f and A are invertible, we’d like a method
for finding f −1 and A−1 . To do this, we’ll start by considering the equation
AA−1 = In . We don’t know how to solve an equation like this, so we’ll have to
reduce it to a set of simpler equations of the form A~x = ~b. In 2.3, we saw that
the jth column of a matrix product AB can be found by multiplying A by
the jth column of B. In our equation B = A−1 , so we can find the n columns
of A−1 by solving the n matrix equations

A~x = ~e1 , A~x = ~e2 , . . . , A~x = ~en

where ~ej is the jth column of In and so is all 0s except for a 1 as its jth
entry. (We know that each equation will have one unique solution because A’s
reduced echelon form is In .)
We could solve these equations by row reducing the n augmented coefficient
matrices
[A | ~e1 ] , [A | ~e2 ] , . . . , [A | ~en ]
however that’s a lot of redundant work since we’d have to row reduce A
repeatedly. Additionally, since A’s reduced echelon form is In , we know that
whatever ~ej becomes in the reduced echelon form of [A | ~ej ] is the answer to
A~x = ~ej . A more efficient way to find A−1 is to solve for all its columns at
once by row reducing [A | In ]. The left-hand side will become In , while each
column of the right hand side becomes  a column of A−1 . Thus the reduced
−1
echelon form of [A | In ] is In | A .
The best part of this is that the process for finding A−1 and the process
for deciding if A is invertible can be done at once! Simply row reduce [A | In ].
If the left half turns into In , then A is invertible and A−1 is the right half of
the reduced echelon form. If the left half doesn’t turn into A−1 , then A isn’t
invertible.
 
3 7 0
Example 3. Determine whether or not A = 2 5 0 is invertible. If it is
0 1 1
invertible, find A−1 .
2.10 Invertibility 185

To simultaneously decide whether or not A is invertible and compute A−1


if it exists, we row reduce
 
3 7 0 1 0 0
[A | I3 ] = 2 5 0 0 1 0 .
0 1 1 0 0 1

This gives us  
1 0 0 5 −7 0
0 1 0 −2 3 0 .
0 0 1 2 −3 1
Since the left half of our row reduced matrix is I3 , A is invertible. Its inverse
is the right half of our row reduced matrix, so
 
5 −7 0
A−1 = −2 3 0 .
2 −3 1

   
x1 4x1 + x2 − x3
Example 4. Find the inverse function of f x2  =  x2 + 3x3 .
x3 −4x1 − x2
This is our function from Example 2, so we already know it is invertible
and that f corresponds to the matrix
 
4 1 −1
A= 0 1 3 .
−4 −1 0

To find f ’s inverse function, we’ll use the fact that f −1 has matrix A−1 . We’ll
do that as in the last example by row reducing [A|I3 ]. In our case this is
 
4 1 −1 1 0 0
0 1 3 0 1 0
−4 −1 0 0 0 1
which row reduces to
 
1 0 0 − 34 − 14 −1
0 1 0 3 1 3 .
0 0 1 −1 0 −1
The right half of this matrix gives us
 3 
−4 − 14 −1
−1
A = 3  1 3 .
−1 0 −1
186 Functions of Vectors

This means f −1 is the function with f −1 (~x) = A−1 ~x. If we want to write f −1
in the same way we originally wrote f , we can compute
 3    3 
− 4 − 14 −1 x1 − 4 x1 − 14 x2 − x3
A−1 ~x =  3 1 3  x2  =  3x1 + x2 + 3x3 
−1 0 −1 x3 −x1 − x3

to get    3 
x1 − 4 x1 − 14 x2 − x3
f −1 x2  =  3x1 + x2 + 3x3  .
x3 −x1 − x3

Note that there is still one gap here: we know that the matrix we’re calling
A−1 satisfies AA−1 = In , but we haven’t shown that it satisfies A−1 A = In .
Suppose that you have a matrix A which you know is invertible, so there is
some A−1 with AA−1 = In and A−1 A = In . We’ve found a matrix which for
the moment I’ll call B so that AB = In . Multiplying both sides of this equation
on the left by A−1 gives us A−1 (AB) = A−1 (In ) which can be simplified to
B = A−1 . Therefore the matrix we’ve been solving for by row reducing [A | In ]
is actually A−1 .
Recall that in Chapter 0’s Example 4 we talked about multiple linear
regression as a search for an equation y = β0 + β1 x1 + · · · βp xp which can
predict the value of a variable y in terms of a set of variables x1 , . . . , xp . Now
that we understand how to compute the inverse of a matrix, we can actually
solve for the βs to find these equations.
These regression models are based on data sets where we know the values
of all the variables, including y. This allows us to plug in the values of y and
x1 , . . . , xp from each data point one at a time to create a set of linear equations
in β0 , β1 , . . . , βp . (If the constant term bothers you, pretend there is a variable
x0 whose value for every data point is 1.) If we have n data points, this gives
us the equations

y1 =β0 + β1 x11 + · · · βp x1p


y2 =β0 + β1 x21 + · · · βp x2p
..
.
yn =β0 + β1 xn1 + · · · βp xnp

where xij is the value of the jth variable and yi is the value of y for the ith
data point. (Don’t let the unusual notation here fool you, we know the values
of the xs and are going to solve for the values of the βs.)
Usually there isn’t a single value for each β that will solve this system of
equations, so we introduce an error term i onto the end of each equation. We
2.10 Invertibility 187

can rewrite this as the matrix equation ~y = X β~ + ~ for the n × (p + 1) matrix


 
1 x11 · · · x1p
1 x21 · · · x2p 
 
X = . .. ..  .
 .. . . 
1 xn1 ··· xnp
~ that minimizes the errors. I’ll skip over
Our goal is to find the vector β
the derivation of the solution formula which involves tools from statistics and
calculus. However, we will need one new definition: the transpose of a matrix.

Definition. The transpose of an m × n matrix A is the n × m matrix AT


where aTij = aji .

In other words, the transpose is the matrix formed by reflecting A’s entries
across the diagonal.
 
0 4
Example 5. Find the transpose of A = −1 0 .
−2 −3
Since A is 3×2, its transpose, AT , will be 2×3. Using A’s columns
 as AT’s
0 −1 −2
rows (or reflecting A’s entries across the diagonal) gives us A = .
4 0 −3

Using our new notation, the coefficients that minimize the errors in the
regression equation y = β0 + β1 x1 + · · · + βp xp are given by the formula

~ = X T X −1 X T ~y .
β
You may worry about taking the inverse of X T X since X wasn’t a square
matrix, however, X is n × (p + 1) so X T is (p + 1) × n which makes X T X as
square ((p + 1) × n) · (n × (p + 1)) = (p + 1) × (p + 1) matrix.

Example 6. A chain of toy stores is trying to predict quarterly sales by city


(in thousands of dollars) based on each city’s population under 12 years old
(in thousands) and the average amount of money people have each month to
spend on nonessential items. They collect data for four of the cities where
they have stores and find the data given in the table below. Use this data set
to find a regression equation to predict sales based on population under 12
and average nonessential spending money.
City Code Sales Population Under 12 Nonessential Spending
1 $174.4 68.5 $167
2 $164.4 45.2 $168
3 $244.2 91.3 $182
4 $154.6 47.8 $163
188 Functions of Vectors

We want to predict quarterly sales, so that is our variable y. We will base


our prediction on population under 12, x1 , and average monthly nonessential
spending money, x2 . This means our regression equation will have the form
y = β0 + β1 x 1 + β2 x 2 .
To solve for our βs, we need to set  up our matrix solution formula. First,
β0
our solution vector will be β~ = β1 . We have four data points, so n = 4.
β2    
y1 174.4
y2  164.4
Since yi is the quarterly sales in the ith city, we have ~y =    
y3  = 244.2 .
y4 154.6
The xi1 entry of our matrix X is the population under 12 and the xi2 entry
is the nonessential spending for the ith city. This means
 
1 68.5 167
1 45.2 168
X= 1 91.3 182 .

1 47.8 163
 −1
We want to find β~ = X T X X T ~y , and we can start by finding X’s
transpose.  
1 1 1 1
X T = 68.5 45.2 91.3 47.8 .
167 168 182 163
This means
 
  1 68.5 167
1 1 1 1 
1 45.2 168
X T X = 68.5 45.2 91.3 47.8  1 91.3 182

167 168 182 163
1 47.8 163
 
4 252.8 680
= 252.8 17, 355.8 43, 441.1 .
680 43, 441.1 115, 806
 −1
To find XT X , we row reduce [X T X | I3 ] which gives us (with some
rounding)  
1 0 0 453.212 0.975 −12.322
0 1 0 0.975 0.003 −0.007  .
0 0 1 −3.027 −0.007 0.020
This means X T X is invertible, and we have
 
 453.212 0.975 −3.027
−1
XT X =  0.975 0.003 −0.007 .
−3.027 −0.007 0.020
2.10 Invertibility 189

−1
~ = XT X
Plugging these elements into our formula β X T ~y gives us
 
     174.4
β0 453.212 0.975 −3.027 1 1 1 1 
164.4
β1  =  0.975 0.003 −0.007 68.5 45.2 91.3 47.8  244.2

β2 −3.027 −0.007 0.020 167 168 182 163
154.6
 
−445.48
=  0.60 
3.48

so our regression equation is y = −445.48 + 0.60x1 + 3.48x2 .

Exercises 2.10.
  1 
7 0 −1 0
1. Let A = . Verify that A = 7 .
−14 1 2 1
   2

2 1 −1 3 − 23 − 53
 11 
2. Let A =  3 1 4 . Verify that A−1 = − 23 5
3 3 .
−1 0 −2 −3 1 1 1
3 3
 
1 −2 1
3. Find the inverse of A = −3 7 −6.
2 −3 0
 
−3 4 0
4. Find the inverse of A =  1 2 1.
2 0 1
 
12 −4
5. Let A = . Find A−1 or show that A is not invertible.
−6 2
 
5 1
6. Let A = . Find A−1 or show that A is not invertible.
4 1
 
2 0 4
7. Let A =  0 −1 0 . Find A−1 or show that A is not invertible.
−2 6 −3
 
1 −1 2
8. Let A = 0 1 −2 . Find A−1 or show that A is not invertible.
2 0 −1
 
0 −3 −6
9. Let A = −1 −2 −1. Find A−1 or show that A is not invertible.
−2 −3 0
 
1 −4 2
10. Let A = 0 −1 3 . Find A−1 or show that A is not invertible.
0 1 −2
190 Functions of Vectors
 
1 2 1
11. Let A = 1 5 2. Find A−1 or show that A is not invertible.
2 0 1
 
1 0 6
12. Let A = 2 2 −9. Find A−1 or show that A is not invertible.
1 1 −3
   
x1 −x1 + 2x2 + x3
13. Let f x2  =  2x2 + x3 . Find f −1 or show that f is
x3 x1 + x2
not invertible.
   
x1 5x1 + x2 + 3x3
14. Let f x2  = 2x1 − 3x2 + 6x3 . Find f −1 or show that f is
x3 3x1 + 4x2 − 3x3
not invertible.
   
x1 −x1 + 2x2 + x3
15. Let f x2  =  −2x2 + 2x3 . Find f −1 or show that f is
x3 x2 − x3
not invertible.
   
x1 x1 + x2 + 3x3
16. Let f x2  =  x1 − 4x3 . Find f −1 or show that f is not
x3 x2 − x3
invertible.
17. Suppose A is an invertible n × n matrix. Explain why A−1 is
invertible.
 
7 −3 0
18. Why can’t the function with matrix A = be invertible?
1 −5 2
19. In Example 2 from 2.7, we solved the matrix equation (I3 − A)~x = ~b
to figure out our production levels. Use matrix inverses to solve this
equation. (You should still get the same answer!)
20. What does your answer to the previous problem say about which net
production levels, ~b, are possible to achieve for the three industries
from Example 2 in 2.7?
21. Use the method from Example 6 and the data in the table below to
find the regression equation which predicts the percentage of people
in a given city who have heart disease based on the percentage of
people in that city who bike to work and the percentage who smoke.
City Code Heart Disease Bikers Smokers
1 17 25 37
2 12 40 30
3 19 15 40
4 18 20 35
2.10 Invertibility 191

22. Use the method from Example 6 and the data in the table below to
find the regression equation which predicts a student’s percentage
on the final exam based on their percentages on the two midterms.
Student Code Final Midterm 1 Midterm 2
1 68 78 73
2 75 74 76
3 85 82 79
4 94 90 96
5 86 87 90
6 90 90 92
7 86 83 95
8 68 72 69
9 55 68 67
10 69 69 70
192 Functions of Vectors

2.11 The Invertible Matrix Theorem


In this section, we’ll do two things at once: link invertibility to a huge list
of other conditions, and sneakily review the majority of the concepts we’ve
learned about matrix equations and linear functions.

Note: All matrices discussed in this section are n × n matrices!

Theorem 1. Suppose A is an n × n matrix. Then the following conditions


are equivalent:
1. A is invertible.
2. The reduced echelon form of A is In .
3. The reduced echelon form of A contains n leading 1s.
4. rk(A) = n.
5. The columns of A are linearly independent.
6. The equation A~x = ~0 has one unique solution.
7. N ul(A) = {~0}.
8. The linear function f (~x) = A~x is 1-1.
9. The columns of A span Rn .
10. The equation A~x = ~b always has a solution.
11. The linear function f (~x) = A~x is onto.
12. Col(A) = Rn .
13. There is a matrix C with AC = In .
14. There is a matrix D with DA = In .

Two notes before we dive in:


One good way to think about “the following are equivalent” theorems is
as mathematical “k for the price of 1” deals. If you know one condition on
this list, then you get all the rest as well. Similarly, if you know one condition
on this list fails, then all the others fail as well.
It might seem like we need to explain why each condition implies every
other condition, but that would require 14 ∗ 13 = 182 different explanations.
Instead, think of each explanation as an arrow, where P → Q means we’ve
explained that if condition P holds then condition Q must also hold. (Note
that the arrow and explanation modeled above are one-way.) Our goal is to
provide just enough connections between our conditions so that we could start
at any one of these 14 statements and travel along arrows to any of the other
2.11 The Invertible Matrix Theorem 193

statements. This more efficient approach will allow us to get away with only
20 arrows, two of which are left as exercises.
The bird’s-eye view of the 18 arrows we’ll establish together is given in
Figure 2.7.

13 14

4 2

6 12

5 7 11 9

8 10

Figure 2.7: A road map of our proof

We’ll start by establishing that 1 → 2 → 3 → 4 → 1, so the first four


conditions are linked in a circle. We just established in 2.10 that the reduced
echelon form of an invertible matrix is In , so 1 → 2. Since it is clear that
In has n leading 1s, we get 2 → 3. The rank of a matrix equals the number
of leading 1s in its reduced echelon form, so it is clear that a matrix with n
leading 1s has rank n which gives us 3 → 4. Finally, if a matrix has rank n,
then its reduced echelon form has n leading 1s. Since leading 1s can’t share
a row or column, this means we must have our leading 1s down the diagonal
forming In . We saw in 2.10 that if the reduced echelon form of a matrix is
In , then A is invertible. Hence 4 → 1 is established, and we’ve completed our
circle.
Our next stage is to create a second circle out of the next four conditions
by showing 5 → 6 → 7 → 8 → 5. If the columns of A are linearly independent,
194 Functions of Vectors

then we know from our second definition of linear independence that A~x = ~0
has only the trivial solution. This gives us 5 → 6. The equation A~x = ~0 always
has the solution ~x = ~0, so if A~x = ~0 has one unique solution, that solution
is ~x = ~0. Since the null space of A is the solution set of A~x = ~0, this gives
us 6 → 7. The corresponding map f has ker(f ) = N ul(A) = {~0}, so f is
1-1 and we get 7 → 8. Finally, if A corresponds to a 1-1 map f , we know
ker(f ) = N ul(A) = {~0}, so A~x = ~0 has only the trivial solution which means
that the columns of A are linearly independent. Thus we have 8 → 5 and have
completed our second circle.
Next we’ll create a third circle out of the next four conditions by showing
9 → 10 → 11 → 12 → 9. If the columns of A span Rn , then every n-vector ~b
can be written as a linear combination of the columns of A. Rewriting that
vector equation as a matrix equation, the coefficients on the columns of A
combine to form a solution ~x to A~x = ~b. This gives us 9 → 10. Since the
corresponding map f has f (~x) = A~x, saying A~x = ~b has a solution for every ~b
in Rn means that every ~b in Rn is an output of f . Thus f is onto, and we get
10 → 11. A function is onto if its range equals its codomain. Since the column
space of a matrix is the range of the corresponding map and the codomain
is Rn , this means we have Col(A) = Rn and 11 → 12. Finally, the column
space of a matrix can also be thought of as the span of its columns. Therefore
Col(A) = Rn means the columns of A span Rn , so we get 12 → 9 and have
completed our third circle.
Now that we’ve completed all three circles, we can start linking them
together. We’ll link our first and second circles by showing 3  7 and our first
and third circles by showing 3  11. We saw in 2.5 that A’s corresponding
function is 1-1 exactly when there is a leading 1 in every column of its reduced
echelon form. Since A has n columns, this is equivalent to saying A’s reduced
echelon form has n leading 1s, so we get 3  7. We also saw in 2.5 that A’s
function is onto exactly when there is a leading 1 in every row of its reduced
echelon form. Since A has n rows, this is equivalent to saying A’s reduced
echelon form has n leading 1s, so we get 3  11.
At this point, we’ve established the equivalence of the conditions 1 − 12.
To finish up our explanation of this theorem, we’ll need to link in the last
two statements. I’ll provide two of those arrows here, and leave the other
two as exercises. If our matrix is invertible, it has an inverse matrix A−1
with AA−1 = In and A−1 A = In . This means we can let A−1 = C to get
1 → 13 and A−1 = D to get 1 → 14. Going back from 13 or 14 to 1 is hard.
We did something similar sounding in 2.10, but it was under the assumption
that A was invertible, which we can’t assume if we’re starting with 13 or 14.
Instead I’d suggest showing 13 → 10, and 14 → 6. These are Exercises 8 and
9 respectively. With these last two conditions connected to the rest, we are
done.
Now that we’ve established our theorem, let’s see some examples of how
it can be used.
2.11 The Invertible Matrix Theorem 195
 
0 1
Example 1. A = is invertible.
1 0
The linear function f given by f (~x) = A~x can be thought of as reflection
about the line y = x. Geometrically, it isn’t too hard to convince yourself that
this function is onto. In fact, since doing a reflection twice brings you back
where you started, we get f (f (~x)) = ~x. In other words, for every vector ~x, we
know f (~x) maps to ~x, so f is onto.
This means A satisfies condition 11 of the Invertible Matrix Theorem, so
A is invertible.
 
−3 1 0
Example 2. A =  4 2 −7 is invertible.
1 0 9
This matrix has reduced echelon form I3 , so it satisfies condition 2 of the
Invertible Matrix Theorem and is therefore invertible.
 
1 −5 4
Example 3. A =  2 −7 3 is not invertible.
−2 1 7
In
 Example
 5 from 2.7, we saw that A~x = ~b didn’t have a solution for
−3
~b = −2. This means A~x = ~b doesn’t have a solution for every ~b.
−1
Thus A fails to satisfy condition 10 of the Invertible Matrix Theorem, and
therefore isn’t invertible.

Exercises 2.11.
1. Give a direct explanation of the arrow 1 → 6.
2. Give a direct explanation of the arrow 1 → 10.
3. Give a direct explanation of the arrow 11 → 2.
4. Give a direct explanation of the arrow 3 → 9.
5. Our explanation of the Invertible Matrix Theorem didn’t include a
direct explanation of why a matrix whose function is 1-1 has rank
n, i.e., 8 → 4. Use the explanations/arrows in our proof of this
theorem to explain how you can get from condition 8 to condition
4.
6. Use the Rank-Nullity Theorem from 2.8 to give a direct explanation
of the arrows 7  12.
 
−3 0 1 4
 1 0 −5 8 
7. Let A =  
 7 0 −2 6 . Which of the Invertible Matrix
2 0 4 −1
196 Functions of Vectors

Theorem’s thirteen conditions do you think most easily shows A


is not invertible? Give an explanation of how your choice shows
A−1 does not exist.
8. Let A be an n × n matrix. Show that if there is a matrix C with
AC = In , then the equation A~x = ~b always has a solution. (This is
the missing arrow 13 → 10.)
9. Let A be an n × n matrix. Show that if there is a matrix D with
DA = In , then the equation A~x = ~0 has one unique solution. (This
is the missing arrow 14 → 6.)
10. Show that if A is not invertible then AB is not invertible
11. Show that if B is not invertible then AB is not invertible.
12. Suppose A is an invertible n × n matrix. Explain why A satisfies
the Rank-Nullity Theorem from 2.8.
3
Vector Spaces

3.1 Basis and Coordinates


In 2.6, we developed row reduction to help us work with Rn and its linear
functions. However, we showed in 2.3 that Mmn is not just the space of linear
functions from Rn to Rm , but also a vector space in its own right. This means
we can ask the same types of questions in Mmn as in Rn , but we don’t have
the same computational toolbox at our disposal. In this section, we’ll develop
a system for creating a map from any vector space V to Rn in order to transfer
our questions about V over to Rn . This will allow us to answer those questions
in Rn using row reduction if that seems easier. We’ll start in the only concrete
case we have where V = Mmn .
Our first step in linking Mmn and Rk is to figure out a way to match
each matrix from Mmn up with a vector in Rk . (I’m using Rk instead of Rn
since we’ve already used n in specifying Mmn .) We’ll do this by finding a
set of matrices which span Mmn and identifying each m × n matrix with the
coefficients used to write it as a linear combination of our spanning set.
       
1 1 0 0 1 0 0 1
Example 1. Show B1 = , B2 = , B3 = , B4 = ,
0 0 1 1 1 0 0 1
 
1 0
and B5 = span M22 . Use this spanning set to identify the matrix
0 1
 
2 1
A= with a vector ~x in R5 .
1 2
Before we get started, notice that our vector ~x is in R5 because we have
five matrices in our spanning set, which means A’s linear combination will
have five coefficients.
To see  B1 , B2 , B3 , B4 , and B5 span M22 , let’s pick a generic 2 × 2
 that
a b
matrix and try to write it as a linear combination of the Bs. To do
c d
this, we need to solve
           
1 1 0 0 1 0 0 1 1 0 a b
x1 + x2 + x3 + x4 + x5 = .
0 0 1 1 1 0 0 1 0 1 c d
(Here we’re assuming we know a, b, c, d and are solving for x1 , . . . , x5 .) We can

197
198 Vector Spaces

simplify the left side of this equation to get


   
x1 + x3 + x5 x1 + x4 a b
= .
x2 + x3 x2 + x4 + x5 c d
Identifying corresponding matrix entries gives us
x1 + x3 + x5 = a
x1 + x4 = b
x2 + x3 = c
x2 + x4 + x5 = d.
This is a much more familiar problem! In fact it is equivalent to asking
when the matrix equation
 
1 0 1 0 1
1 0 0 1 0
  ~x = ~b
0 1 1 0 0
0 1 0 1 1

has solutions for every 4-vector ~b. When we discussed finding the range of a
linear function in 2.5, we realized this is equivalent to the matrix having a
leading 1 in every row of its reduced echelon form. The reduced echelon form
of  
1 0 1 0 1
1 0 0 1 0
 
0 1 1 0 0
0 1 0 1 1
is  
1 0 0 1 0
0 1 0 1 0
 .
0 0 1 −1 0
0 0 0 0 1
We do have a leading 1 in every row, so our matrix equation always has a
solution, and B1 , . . . , B5 span M22 .
Since the Bs span M22 , our matrix A must be a linear combination of
these five matrices. Notice that A = B1 + B2 + B5 , so our linear combination
has coefficients 1 on B1 , B2 , and B5 and coefficients 0 on B3 and B4 . This
means we identify A with the 5-vector ~x which has 1s in the 1st, 2nd, and 5th
spots and 0s in the 3rd and 4th spots, so
 
1
1
 
~x =  
0 .
0
1
3.1 Basis and Coordinates 199

There are two possible issues with this process of identifying a matrix with
its vector of coefficients. The first is that our vector of coefficients depends
heavily on the order in which we listed our spanning vectors. For instance, in
the example above we could swap the order of B2 and B3 . This  would mean
1
0
 
A = B1 + B3 + B5 , so we’d identify A with the 5-vector ~x =  
1. We can fix
0
1
this problem by being careful to list our spanning set in the particular order
we want and preserving this order throughout the process. I’ll always specify
an order in this book, and I recommend that you always specify an order for
your spanning sets when doing problems on your own.
The second possible problem occurs when we can write our matrix as a
linear combination of the spanning matrices in more than one way, i.e., using
two different sets of coefficients. This can certainly happen, and means we
don’t have a unique vector to match up with our matrix. For instance, in
the example above we can also write A = B3 + B4 + B5 which would mean
0
0
 
identifying A with the 5-vector ~y =  
1. Clearly this isn’t good, because we
1
1
want to create a function which maps from M22 to Rn and this example’s
function appears to be multi-valued. Let’s explore this situation further to see
if we can figure out how to avoid it.
Suppose A is in the span of B1 , . . . , Bk with two different sets of
coefficients. This means we have two sets of scalars x1 , . . . , xk and y1 , . . . , yk
with A = x1 B1 + · · · xk Bk and A = y1 B1 + · · · yk Bk . If we subtract one of
these linear combinations from the other we get

(x1 B1 + · · · xk Bk ) − (y1 B1 + · · · yk Bk ) = A − A = ~0Mmn .

We can combine like terms of the left-hand side to get

(x1 − y1 )B1 + · · · (xk − yk )Bk = ~0Mmn .

Since our two sets of coefficients are different, we must have at least one place
where xi − yi 6= 0. I’ll assume for ease of notation that this happens at i = 1,
i.e., x1 − y1 6= 0. In that case, we can subtract (x1 − y1 )B1 from both sides of
our equation to get

(x2 − y2 )B2 + · · · (xk − yk )Bk = −(x1 − y1 )B1

Since −(x1 − y1 ) 6= 0, we can divide all our coefficients by it to see that B1 is


in the span of the other Bs. Aha! That’s a condition we’ve discussed before.
200 Vector Spaces

What we’ve just seen is that the coefficients used to write a matrix as
a linear combination of our spanning set aren’t unique exactly when the
spanning matrices are linearly dependent. This means that we can avoid this
situation by requiring that our spanning set be linearly independent, which
leads to the following definition. Note that there is nothing in our exploration
above that doesn’t generalize to a spanning set in any vector space, so I’ll
state the definition in those general terms.

Definition. A set of vectors B = {~b1 , . . . , ~bn } is a basis for a vector space V


if V = Span{~b1 , . . . , ~bn } and ~b1 , . . . , ~bn are linearly independent.

We’ve actually already seen this before when we constructed a spanning


set for the solution set of a homogeneous matrix equation.
 
3 2 0 4 −1
Example 2. Find a basis for the null space of A = 0 2 6 0 −8.
1 1 1 4 1
The null space of A is the same as the solution set of A~x = ~0, and we saw
in Example 5 from 2.8, that
   

 2 −1  

   4 
 −3
   

  
N ul(A) = Span  1  ,  0  .

   −1 
 0





0 1

If you look at the 3rd and 5th entries of these vectors corresponding to the
free variables x3 and x5 , you’ll see that all vectors have a 0 in that spot except
the vector containing the coefficients of that free variable which has a 1. This
means we cannot form a nontrivial linear combination of these vectors with all
coefficients equal to zero. Thus our spanning vectors are linearly independent
and so form a basis for N ul(A).

Note that there was nothing special about the matrix A and its null space
in the example above. Although we didn’t know it at the time, our method
for finding a spanning set for N ul(A) actually finds a basis.
       
1 0 0 1 0 0 0 0
Example 3. The matrices , , , and are a basis
0 0 0 0 1 0 0 1
for M22 .

To see that these four matrices form a basis for M22 , we need to check that
they span M22 and are linearly independent. We can check that they span in
the same way we did in Example 1, by setting a linear combination of them
3.1 Basis and Coordinates 201

equal to a generic 2 × 2 matrix. This means we want to solve


         
1 0 0 1 0 0 0 0 a b
x1 + x2 + x3 + x4 = .
0 0 0 0 1 0 0 1 c d

Simplifying the left side of this equation, we get


   
x1 x2 a b
= .
x3 x4 c d

This can be always be solved by letting x1 = a, x2 = b, x3 = c, x4 = d, so our


four matrices span M22 .
To see that these four matrices are linearly independent, we need to show
that          
1 0 0 1 0 0 0 0 0 0
x1 + x2 + x3 + x4 =
0 0 0 0 1 0 0 1 0 0
has only the solution x1 = x2 = x3 = x4 = 0. As above, the left-hand side
simplifies to give us    
x1 x2 0 0
= .
x3 x4 0 0
Identifying entries gives us that all the xi s are zero, so our matrices are linearly
independent. Since they also span, they are a basis for M22 .

We can form a similar basis for Mmn for any m and n by using mn matrices
each of which has a 1 in one entry and zeros everywhere else. This is what
is usually called the standard basis for Mmn . The only occasional point of
confusion is how to order these matrices. Most mathematicians order them
by the position of the 1 as in the example above, i.e., from left to right along
each row from the top to the bottom.
Since spanning and linear independence don’t depend on the order of the
set of matrices, this basis also gives us several more bases for M22 by simply
reordering these four matrices. However, there are also a vast array of other
bases for M22 .
       
1 0 0 2 0 0 0 3
Example 4. The matrices , , , are also a
0 −1 −2 0 0 2 0 0
basis for M22 .

We’ll start by checking linear independence using the equation


         
1 0 0 2 0 0 0 3 0 0
x1 + x2 + x3 + x4 = .
0 −1 −2 0 0 2 0 0 0 0
This can be simplified to
   
x1 2x2 + 3x4 0 0
= .
−2x2 −x1 + 2x3 0 0
202 Vector Spaces

Identifying matrix entries tells us that x1 = 0, 2x2 + 3x4 = 0, −2x2 = 0,


and −x1 + 2x3 = 0. Since x1 = 0, the fourth equation says 2x3 = 0 which
means x3 = 0. The third equation tells us x2 = 0, and plugging that into the
second equation gives 3x4 = 0 or x4 = 0. Since all coefficients in our linear
combination must be zero, our four matrices are linearly independent.
To see that our four matrices span M22 , we’ll proceed as in the first example
and set a linear combination of them equal to a generic 2 × 2 matrix. This
gives us
         
1 0 0 2 0 0 0 3 a b
x1 + x2 + x3 + x4 =
0 −1 −2 0 0 2 0 0 c d

(As in Example 1, remember that we’re assuming a, b, c, d are known, and


we’re solving for the xs.) Simplifying gives us
   
x1 2x2 + 3x4 a b
=
−2x2 −x1 + 2x3 c d

which can be rewritten as the list of equations below.

x1 = a
2x2 + 3x4 = b
−2x2 = c
−x1 + 2x3 = d

The first equation clearly tells us that x1 = a, and we can solve the third
equation for x2 to get x2 = − 12 c. Plugging x1 = a into the fourth equation
gives us −a + 2x3 = d. Solving for x3 now gives us x3 = 12 d − 12 a. Plugging
x2 = − 12 c into the second equation gives us 2(− 12 c) + 3x4 = b which simplifies
to −c + 3x4 = b. Solving for x4 , we get x4 = 13 b + 13 c. Since these solutions
make sense no matter what values of a, b, c, d we used, our four matrices span
M22 and are therefore a basis.

Even though we already have our usual computational tools available in


the case where V = Rn , we will sometimes still want to use a basis for Rn .
There is certainly nothing in our definition that prevents us from applying
this concept there.
     
1 0 0
0 1 0
     
     
Example 5. The (standard) basis for Rn is 0 , 0 , . . . ,  ... .
 ..   ..   
. . 0
0 0 1
In Rn , we have more tools for checking linear independence and spanning.
3.1 Basis and Coordinates 203

In particular, remember that we can apply the Invertible Matrix Theorem


from 2.11. Since we have n n-vectors, we can view them as the columns of an
n × n matrix. In the case of these particular vectors, they are the columns
of the n × n identity matrix In . This means the Invertible Matrix Theorem
automatically tells us that they are linearly independent and span all of Rn .
Therefore they are a basis.

As with Mmn , our standard basis is by no means the only basis for Rn .
   
1
2 1
Example 6. Another basis for R is and .
1 −1
While we could use the Invertible Matrix Theorem to check that these two
vectors are a basis for R2 , let’s practice a more geometric approach.
If we sketch a picture of these two 2 vectors and the lines they span in the

plane, it looks like

-2 -1 1 2

-1

From this picture, it is clear that neither of these vectors lies along the
line which is the span of the other. This
-2
means they are linearly independent.
To see that they span all of R2 , let’s think about the dimension of
   
1 1
Span , .
1 −1

The two spanning vectors are linearly independent, so their span is 2D, i.e.,
 
2 2 1
a plane. Since R is a plane, these two vectors must span R . Therefore
1
 
1
and are a basis for R2 .
−1
The picture above also suggests a reason we might choose to work with
this basis instead of the standard basis for R2 : working with reflection about
204 Vector Spaces
 
1
line y = x. Since this line is the span of , our first basis vector is fixed
1
 
1
by that reflection. The other basis vector is sent to its negative by that
−1
reflection. This means that if we put a general vector in terms of this basis it
becomes very easy to  see what
 happens to it under reflection  about  y = x. In
1 1 1 1
particular, if ~v = a +b , then ~v ’s reflection is a −b .
1 −1 1 −1

Now that we have a solid understanding of how a basis works, we can


use it to help us match elements of any vector space V with vectors in Rn
so we can transfer questions about V as we discussed at the start of this
section. We do this by matching a vector ~v in V with a vector made up of
the coefficients used to write it as a linear combination of the basis vectors
which we’ve guaranteed is both possible and unique by having a basis both
span and be linearly independent. Since we’ll be discussing these vectors of
basis coefficients so often, we give them a name.

Definition. Let B = {~b1 , . . . , ~bn } be a basis for a vector space V . Fora


a1
 
vector ~v = a1~b1 + · · · + an~bn in V , the B-coordinate vector of ~v is [~v ]B =  ... .
an

Note that the size of a B-coordinate vector is the number of vectors in the
basis B.
 
1 −1 5
Example 7. What is the coordinate vector of the matrix A =
    2 0 8
1 0 0 0 1 0 0 0 1
with respect to the standard basis , , ,
      0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
, , for M23 ?
1 0 0 0 1 0 0 0 1
Since our coordinate vector is made up of the coefficients used to write A
as a linear combination of the basis vectors, we’ll start by solving for those
coefficients. This means solving the equation
       
1 0 0 0 1 0 0 0 1 0 0 0
x1 + x2 + x3 + x4
0 0 0 0 0 0 0 0 0 1 0 0
     
0 0 0 0 0 0 1 −1 5
+ x5 + x6 = .
0 1 0 0 0 1 2 0 8
The left side simplifies to give us
   
x1 x2 x3 1 −1 5
=
x4 x5 x6 2 0 8
3.1 Basis and Coordinates 205

which has solution x1 = 1, x2 = −1, x3 


= 5,x4 = 2, x5 = 0, x6 = 8. This
1
−1
 
5
means A has B-coordinate vector [A]B =  
 2 .
 
0
8

If you noticed that the entries of this coordinate vector were the same as
the entries of the matrix A read off from left to right across each row starting
with the first row and working downward, you have hit on the easiest way
to compute coordinate vectors with respect to the standard basis for M23 . In
fact, this method works for any matrix in any Mmn as long as you are using
the standard basis. The ease of finding coordinate vectors with respect to the
standard basis for Mmn is why this is the basis most commonly used, and
hence called standard.
 
2 6
Example 8. Find the coordinate vector of matrix A = with
−3 14
       
1 0 0 2 0 0 0 3
respect to the basis B = , , , for M22 .
0 −1 −2 0 0 2 0 0
Here we are not using the standard basis for M22 , so we can’t just read
off matrix entries and will have to use the general method of finding a linear
combination of the basis vectors which equals our matrix A. We could go back
to Example 4 and plug in our values for a, b, c, d, but I’ll do this problem from
scratch to more accurately reflect the solution process we’d use if we hadn’t
done that previous work.
Let’s start by writing our matrix A as a linear combination of our basis
via the equation
         
1 0 0 2 0 0 0 3 2 6
x1 + x2 + x3 + x4 =
0 −1 −2 0 0 2 0 0 −3 14

which can be simplified to


   
x1 2x2 + 3x4 2 6
= .
−2x2 −x1 + 2x3 −3 14

This gives us the equations below.

x1 = 2
2x2 + 3x4 = 6
−2x2 = −3
−x1 + 2x3 = 14
206 Vector Spaces

These equations have augmented coefficient matrix


 
1 0 0 0 2
0 2 0 3 6
 
 0 −2 0 0 −3
−1 0 2 0 14

which has reduced echelon form


 
1 0 0 0 2
0 1 0 0 3
 2
.
0 0 1 0 8
0 0 0 1 1

This means x1 = 2, x2 = 23 , x3 = 8, and x4 = 1, so our B-coordinate vector is


 
2
 32 
[A]B = 
8 .

We can also find coordinate vectors in Rn . You can convince yourself that
if we use the standard basis for Rn discussed above then the coordinate vector
of any vector ~v is just that vector itself. However, we will sometimes want to
work in terms of another basis for Rn , especially in Chapter 4.
   
1 1
Example 9. If we use the basis B = , for R2 , what is the
1 −1
 
−3
B-coordinate vector of ~v = ?
7
To find this coordinate vector, we need to solve
     
1 1 −3
x1 + x2 = .
1 −1 7
The left side simplifies to give us
   
x1 + x2 −3
=
x1 − x2 7
so we need to solve x1 + x2 = −3, x1 − x2 = 7 which has augmented coefficient
matrix  
1 1 −3
.
1 −1 7
This row reduces to  
1 0 2
0 1 −5
3.1 Basis and Coordinates 207
 
2
so our solution is x1 = 2 and x2 = −5. Thus [~v ]B = .
−5
To check our answer is correct, we can compute
       
1 1 2−5 −3
2 −5 = = .
1 −1 2+5 7

Since we got ~v , our coordinate vector is right.

We came up with the idea of coordinate vectors in order to link any vector
space V to Rn by matching a vector from ~v with its coordinate vector with
respect to some basis B for V . We can formalize this by creating a map from
V to Rn which sends each vector from V to its B-coordinate vector. Since a
coordinate vector has as many entries as there are vectors in the basis, this
map goes to Rn where n is the number of vectors in V ’s basis B.

Definition. Let B = {~b1 , . . . , ~bn } be a basis for a vector space V , then the
B-coordinate map is the function fB : V → Rn by f (~v ) = [~v ]B .

       
1 0 0 2 0 0 0 3
Example 10. Take the basis B = , , , for
0 −1 −2 0 0 2 0 0
2 6
M22 . Find n so that fB : M22 → Rn , and compute fB .
−3 14
The first part of this question is a fairly straightforward counting problem.
The B-coordinate map connects V , which in our case is M22 with Rn where n
is the number of elements in our basis B. Since B contains four matrices, we
have n = 4. Thus fB : M22 → R4 . This also provides a sanity check for our
computation of fB (A), because we now know our answer must be a 4-vector.
The B-coordinate map sends any matrix to its B-coordinate vector, i.e.,
fB (A) = [A]B . This is the same basis B and matrix A as in Example 8, so
 
2
 32 
fB (A) = [A]B =  8 .

No matter which basis B we chose to create fB , this map always has several
nice properties. The first is that every fB is 1-1. To see this, remember that
1-1 means we cannot have ~v 6= w~ with fB (~v ) = fB (w).
~ In this case, that would
mean ~v 6= w ~ B . If two vectors have the same B-coordinate
~ with [~v ]B = [w]
vector, that means they are both equal to the same linear combination of
the basis vectors and are therefore equal. Thus having fB (~v ) = fB (w)
~ means
~v = w,
~ so fB is 1-1.
208 Vector Spaces

The coordinate map fB is also onto for every B. For any vector ~x in Rn ,
we can find ~v in V which has ~x = [~v ]B by letting ~v = x1~b1 + · · · + xn~bn . Thus
every element of the codomain is also in the range, i.e., fB is onto.
Since fB is both 1-1 and onto, it is invertible. This means our coordinate
map has created an exact correspondence between the set of vectors in V and
the set of vectors in Rn , so it can be used to go either direction between the two
vector spaces. The only thing we might worry about is that fB doesn’t create
a similar correspondence between the operations of V and the operations of
Rn . However, life is as good as possible, i.e., fB is a linear function.
To see that fB : V → Rn is linear, we need to check that for every ~v and
~ in V and every scalar r we have
w
fB (~v + w) ~ and fB (r · ~v ) = r · fB (~v ).
~ = fB (~v ) + fB (w)
This is equivalent to checking that
[~v + w] ~ B and [r · ~v ]B = r · [~v ]B .
~ B = [~v ]B + [w]
(Notice that the “+” and “·” on the left-hand sides of these equations are the
operations from V , while the + and · on the right-hand sides are our usual
vector operations from Rn .)
Suppose
~v = x1~b1 + · · · + xn~bn and w
~ = y1~b1 + · · · + yn~bn
so    
x1 y1
 ..   .. 
[~v ]B =  .  and [w]
~ B =  . .
xn yn
Then
~ = x1~b1 + · · · + xn~bn + y1~b1 + · · · + yn~bn
~v + w
= (x1 + y1 )~b1 + · · · + (xn + yn )~bn
so  
x1 + y1
 .. 
[~v + w]
~ B= . .
xn + yn
Now it is easy to see that
     
x1 + y1 x1 y1
 .
..   .   .
[~v + w]
~ B=  =  ..  +  ..  = [~v ]B + [w]
~ B.
xn + yn xn yn
Similarly,
r · ~v = r(x1~b1 + · · · + xn~bn ) = rx1~b1 + · · · + rxn~bn
3.1 Basis and Coordinates 209

so  
rx1
 
[r · ~v ]B =  ...  .
rxn
Therefore    
rx1 x1
 ..   .. 
[r · ~v ]B =  .  = r  .  = r · [~v ]B
rxn xn
so fB is a linear function.
We can summarize this discussion as follows.

Theorem 1. If B is a basis for a vector space V , then fB is a 1-1 and onto


(and hence invertible) linear function.

Note that if we are in the special case where V = Rn , then fB is an


invertible linear function from Rn to Rn . This means fB corresponds to an
invertible n × n matrix, so it also has all the other equivalent conditions from
the Invertible Matrix Theorem in 2.11.
These good properties of fB allow us to finally do what we set out to
at the beginning of this section: answer a question about a vector space V
by translating our question to Rn and using row reduction. We do this by
choosing a basis B for V and mapping all the vectors from V in our question
over to Rn with fB . Then we ask the same question about the resulting vectors
in Rn . Because fB is a 1-1, onto, linear map, the answer we get in Rn will
be the same as the answer we would have gotten if we’d just answered the
original question in V .
     
2 4 1 −3 1 1
Example 11. Are A1 = , A2 = , and A3 =
−1 0 −3 1 1 2
linearly independent in M22 ?

We could answer this question directly in M22 by seeing if


       
2 4 1 −3 1 1 0 0
x1 + x2 + x3 =
−1 0 −3 1 1 2 0 0

has only the trivial solution. However, now that we have the idea of coordinate
maps, we can also answer it by translating the problem into Rn and solving
it there. I’ll show this newer method here, but feel free to go back to 2.4 to
see the older method being used.
To use a coordinate map, I first need to choose a basis for M22 . To make
things as easy as possible, I’ll use the standard basis
       
1 0 0 1 0 0 0 0
B= , , , .
0 0 0 0 1 0 0 1
210 Vector Spaces

Since we’re using the standard basis, we can find the coordinate vectors of our
three original matrices by reading left to right across each row starting at the
top. This gives us
     
2 1 1
4 −3 1
[A1 ]B =      
−1 , [A2 ]B = −3 , and [A3 ]B = 1 .
0 1 2

These three coordinate vectors are linearly independent exactly when A1 ,


A2 , and A3 are linearly independent, but it is much easier to check linear
independence of vectors in R4 using row reduction. The matrix whose columns
are the three coordinate vectors is
 
2 1 1
 4 −3 1
 
−1 −3 1
0 1 2

which has reduced echelon form


 
1 0 0
0 1 0
 .
0 0 1
0 0 0

Since each column of the reduced echelon form contains a leading 1, the three
coordinate vectors and hence the three matrices A1 , A2 , and A3 are linearly
independent.

Now that we have fB : V → Rn for any basis B = {~b1 , . . . , ~bn } for V , there
is still one important question left to be answered: Is it possible to have two
different bases for V which have different numbers of vectors in them? If so,
this would link V to Rn and Rm for n 6= m. This sounds fairly strange since
the coordinate map is supposed to preserve everything about V , while we
intuitively think of Rn and Rm as different sized spaces for n 6= m. Therefore
our next goal is to show that this cannot happen.
Suppose we have a vector space V with two different bases B = {~b1 , . . . , ~bn }
and C = {~c1 , . . . , ~cm }. We know that ~b1 , . . . , ~bn are linearly independent
vectors, so their C-coordinate vectors in Rm must h i be linearly
h i independent.
This means the m × n matrix with columns b1 , . . . , ~bn must have a
~
C C
reduced echelon form with leading 1s in every column which is only possible
if n ≤ m. Similarly, ~c1 , . . . , ~cm are linearly independent vectors, so their B-
coordinate vectors in Rn must also be linearly independent. This means the
n × m matrix whose columns are [~c1 ]B , . . . , [~cm ]B must have a reduced echelon
form with leading 1s in every column. For this to be possible, we must have
3.1 Basis and Coordinates 211

m ≤ n. Therefore we must have m = n, i.e., our two bases must contain the
same number of vectors.
One way to think about this number n of basis vectors in any basis for
V is that to write down any vector in V , we can choose n different numbers
to use as coefficients to use in the linear combination of our basis vectors.
This means that V allows us n independent choices. This reminds me of our
discussion of the geometric idea of dimension at the start of 1.3 where we said
a n-dimensional object allowed us n different directions of motion. In either
case, we’re specifying a particular vector/point in a space which allows us n
independent choices/directions. We’ll use this connection to create our linear
algebra definition of dimension.

Definition. The dimension of a vector space V , written dim(V ), is the


number of vectors in any basis for V .

Since we already have an established idea that the dimension of Rn is n,


our first job is to make sure our two notions of dim(Rn ) agree with each other.

Example 12. dim(Rn ) = n

Since every basis contains the same number of vectors, it doesn’t matter
which basis we use to compute the dimension. The standard basis
     
1 0 0
0 1 0
     
0 0 .
  ,   , . . . ,  .. 
 ..   ..   
. . 0
0 0 1

contains n vectors – one for each entry in an n-vector. Thus the dimension of
Rn is n.

Example 13. dim(Mmn ) = mn

Again, we might as well compute the dimension of Mmn by counting the


number of matrices in the standard basis. Since the standard basis for Mmn
has one matrix for each entry in an m × n matrix, it contains mn matrices.
Therefore dim(Mmn ) = mn.

 
3 2 0 4 −1
Example 14. Find the dimension of the null space of A = 0 2 6 0 −8.
1 1 1 4 1
This is the matrix from Example 2, where we found that N ul(A) had
212 Vector Spaces
   
2 −1
−3  4 
   
basis    
 1  ,  0 . Since there are two basis vectors, the null space of A is
 0  −1
0 1
two-dimensional.

The most straightforward way to find a basis is to start with a spanning


set and whittle it down to a basis. If your spanning set is already linearly
independent, as in the case of our spanning sets for N ul(A), it is already a
basis and you are done. If not, there is some vector in the spanning set which is
in the span of the others. Remove that vector, which doesn’t change the overall
span. Now check linear independence again. Keep repeating this process until
you are left with a linearly independent spanning set, i.e., a basis for your
original span. As long as you start with a finite set of spanning vectors, this
process will always terminate. (This is very similar to our process from 1.3 for
finding the dimension of a span, which makes sense because basis is so closely
tied to dimension.)
 
1 1
Example 15. Find a basis for Span {B1 , B2 , B3 , B4 , B5 } where B1 = ,
        0 0
0 0 1 0 0 1 1 0
B2 = , B3 = , B4 = , and B5 = .
1 1 1 0 0 1 0 1
These are the five 2 × 2 matrices from Example 1 of this section, which
we saw spanned all of M22 . This means we’re really looking for a new basis
for M22 which is a subset of B1 , B2 , B3 , B4 , B5 . Our first step here is to see
if these matrices are linearly independent and already form a basis. Here it is
clear that they are not linearly independent, since B1 + B2 = B3 + B4 which
means B1 = −B2 + B3 + B4 , i.e., B1 is in the span of the other matrices.
According to our algorithm’s instructions, we’ll remove B1 from our set. The
span is still the same, so Span{B2 , B3 , B4 , B5 } = M22 , and our new question
is whether these four matrices are linearly independent. We can check that by
solving
         
0 0 1 0 0 1 1 0 0 0
x2 + x3 + x4 + x5 =
1 1 1 0 0 1 0 1 0 0

to see if we have nontrivial solutions. This can be simplified to


   
x3 + x5 x4 0 0
=
x2 + x3 x2 + x4 + x5 0 0

which can be rewritten as x3 + x5 = 0, x4 = 0, x2 + x3 = 0, x2 + x4 + x5 = 0.


Since x4 = 0, this reduces to x3 + x5 = 0, x2 + x3 = 0, x2 + x5 = 0. The first
and third equations now tell us that x3 = −x5 = x2 , which means the middle
3.1 Basis and Coordinates 213

equation can be rewritten as −2x5 = 0. Thus x5 = 0 and hence x2 = x3 = 0


as well. This means B2 , B3 , B4 , B5 are linearly independent and therefore a
basis for their span, M22 .

We’ve seen something similar to this algorithm when we discussed the


rank of a matrix in 2.8 as the number of linearly independent vectors in the
spanning set for the column space of a matrix. Let’s take those ideas to their
logical conclusion by finding a basis for Col(A) below.
 
1 −4 −2 −6
Example 16. Find a basis for the column space of A = −4 −2 2 −6.
3 3 −1 7
Since the column space is the span of A’s columns, our spanning set is the
four 3-vectors which are the columns of A. We can use our algorithm above
to reduce this spanning set down to a basis for Col(A).
The first step is to check if they are linearly independent, which we can
do by row reducing and seeing if we have a leading 1 in every column of the
reduced echelon form. Our matrix A row reduces to
 
1 0 − 23 23
 1 5
0 1 3 3
0 0 0 0

which doesn’t have a leading 1 in every column. This means our spanning set
isn’t a basis.
In the case of a general spanning set, we’d find one of our vectors to
remove and start the process over. However, here in Rn with row reduction
at our disposal we can be more efficient. The first two columns of the reduced
echelon form are the only ones which have leading 1s in them. This means
the first two columns of A are linearly independent and the other two are not.
This allows us to skip ahead and simply pick the columns of A which produced
leading 1s in the reduced echelon form as our basis for Col(A). Thus a basis
for Col(A) is    
1 −4
−4 , −2 .
3 3

This procedure works in general, so we can find a basis for the column
space of a matrix by simply finding the reduced echelon form and choosing as
our basis for Col(A) the columns of A which correspond to the columns of the
reduced echelon form containing leading 1s. Note that it is very important to
use the columns of A as your basis rather than the columns of the reduced
echelon form!
Here’s another fact about dimension that should seem reasonable.
214 Vector Spaces

Theorem 2. If W is a subspace of V , then we have dim(W ) < dim(V ).

You can see this illustrated in the example above for W = Col(A) and
V = R3 , since dim(W ) = 2 and dim(R3 ) = 3. This theorem is true because
a basis for W is linearly independent not only as a set of vectors in W , but
also as a set of vectors in V . By an argument very similar to the one where
we showed that every basis of a vector space has the same size, you can now
see that our basis for W can contain at most dim(V ) vectors.

To finish up this section, we can leverage the ability to translate problems


from V to Rn to create a shortcut for checking that a set of vectors is a basis for
V . The Invertible Matrix Theorem tells us that the two conditions necessary
for a basis of Rn are equivalent as long as we have n n-vectors to form the
columns of a square n × n matrix. (See our discussion of the standard basis for
Rn .) In the theorem below, we’ll generalize that idea to general vector spaces.

Theorem 3. Suppose dim(V ) = n, and B = {~b1 , . . . , ~bn }. Then the following


are equivalent:
1. ~b1 , . . . , ~bn are linearly independent
2. Span{~b1 , . . . , ~bn } = V

In other words, if we know that B contains the correct number of vectors


to be a basis for V , then we only need to check one of the two conditions for
being a basis and we automatically get the other one for free!
To see that this is true, take coordinate vectors of ~b1 , . . . , ~bn with respect
to some other basis to translate the whole situation to Rn . This means we have
~v1 , . . . , ~vn in Rn . If we use these n vectors as the columns in an n × n matrix,
we immediately get that these two conditions are equivalent, since they are
both conditions from the invertible matrix theorem. Thus having ~b1 , . . . , ~bn
linearly independent and having Span{~b1 , . . . , ~bn } = V are equivalent.
       
1 0 1 1 1 1 1 1
Example 17. Is B = , , , a basis for M22 ?
0 0 0 0 1 0 1 1
The dimension of M22 is 2·2 = 4, so this set contains the correct number of
matrices to be a basis. (If it didn’t contain four matrices, we’d automatically
know it couldn’t be a basis for M22 .) Using the theorem above, we can decide
whether or not our four matrices from B are a basis either by checking if they
span M22 or if they are linearly independent. Whichever condition we choose
to check, we can do it in M22 or use a coordinate map to check it in Rn . In
this case, I’ll use coordinate vectors with respect to the standard basis for M22
to show that B spans M22 .
3.1 Basis and Coordinates 215

With respect to the standard basis, our matrices from B have coordinate
vectors        
1 1 1 1
0 1 1 1
 ,  ,  ,  .
0 0 1 1
0 0 0 1
These coordinate vectors span R4 exactly when the matrices from B span M22 .
Four 4-vectors span R4 if and only if the matrix which has them as columns
is invertible. In our case, this would be the matrix
 
1 1 1 1
0 1 1 1
 
0 0 1 1 .
0 0 0 1

This matrix has reduced echelon form I4 , so by the invertible matrix theorem
the coordinate vectors span Rn . This means our four matrices from B span
M22 and hence are a basis.

Exercises 3.1.
     
2 −1 3
1. Do 0,  1 , and −1 form a basis for R3 ?
1 4 −3
     
3 4 1
2. Do 0, −1, and  2  form a basis for R3 ?
4 5 −3
       
1 3 0 −2
2 −1  4   0 
3. Do         4
3,  5 , −6,  1  form a basis for R ?
4 0 1 2
       
−6 1 −2 0
4 0 3 8
4. Do         4
 0 , −5,  1 , and 4 form a basis for R ?
2 3 1 0
5. Use the Invertible Matrix Theorem to redo Example 6.
6. Could there be a basis for R3 consisting of 5 vectors? Briefly explain
why or why not.
       
1 1 0 0 1 0 0 1
7. Show that , , , and form a basis for
0 0 1 1 0 1 0 0
M22 .
       
2 −1 −1 0 0 0 0 2
8. Show that , , , and form a basis
0 0 0 2 2 −1 −1 0
for M22 .
216 Vector Spaces
       
1 0 0 1 1 0 0 0
9. Do , , and form a basis for M22 ?
0 1 1 0 0 0 1 0
         
1 0 0 0 1 0 0 0 1 1 1 1 0 0 0
10. Do , , , , , and
1 0 0 0 1 0 0 0 1 0 0 0 1 1 1
 
1 0 0
form a basis for M23 ?
0 0 1
 
1 −2 0 4 5
0 0 2 1 3
11. Find a basis for the column space of A =  0 0 0 −1 −1.

0 0 1 2 3
 
1 3 0
12. Find a basis for the column space of A = −2 2 −8.
0 −1 1
 
2 0 4 −2
13. Find a basis for the column space of A = .
0 1 −3 1
 
2 −4 0 2
14. Find a basis for the column space of A = 0 1 −1 2 .
0 −1 2 −3
 
1 −4 0 1 0 0
15. Find a basis for the null space of A = 0 0 1 −2 0 −1.
0 0 0 0 1 5
 
0 2 1 0
16. Find a basis for the null space of A = 1 0 1 −1.
0 0 0 9
 
1 0 2 −1 0
17. Find a basis for the null space of A = 0 −1 3 2 −1.
0 2 −6 −3 4
 
1 −1 0 2
18. Find a basis for the null space of A = −2 2 1 8 .
−1 1 1 10
     
2 0 0 1 0 0 −1 0 0
19. Do 0 0 0, 0 0 0, and  0 1 0  form a basis for the
0 0 1
 0 0 0 0 0 −1
 a 0 0 
set W = 0 b 0 ?
 
0 0 c
3.1 Basis and Coordinates 217
     
1 0 0 1 0 0 0 0 0
20. Do 0 1 0, 0 −1 0, and 0 1 0  form a basis for the
0 01 0 0 0 0 0 −1
 a 0 0 
set W = 0 b 0 ?
 
0 0 c
21. What is the dimension of M49 ?
22. What is the dimension of M25 ?
 
−6 −5 14
1 −4 −59
23. What is the dimension of the null space of 
4
?
1 −32
−1 1 −3
2
 
2 6 −1 −7
24. What is the dimension of the column space of  1 3 −2 −2?
−4 −12 0 16
25. What is the codomain of the coordinate map fB for any basis B of
M34 ?
26. Why was it so important to us that every basis of a vector space
contain the
same number of vectors?
    
 5 2 −1 
27. Let B = 10 , 1 ,  0  be a basis for R3 .
 
0 4 1
 
3
(a) Find the vector ~v where [~v ]B = −2.
6
 
−4
(b) Find the coordinate vector of w ~ =  0  with respect to B.
−7
     
 0 1 −2 
28. Let B =  3  , −1 ,  0  be a basis for R3 .
 
−4 2 2
 
4
(a) Find the vector ~v where [~v ]B = −1.
2
 
6
(b) Find the coordinate vector of w ~ = −7 with respect to B.
10
       
 3
 −1 0 4  
 2  2   1  −6
29. Let B =   ,   ,   ,   be a basis for R4 .
0  5   7   0 

 
 
1 0 −2 3
218 Vector Spaces
 
3
0
(a) Find the vector ~v where [~v ]B = −2.

−1
 
−12
 13 
(b) Find the coordinate vector of w ~ = 
 12  with respect to B.
−9
       

 1 −1 1 −1 
        
−1  ,   ,   ,  0  be a basis for R4 .
0 1
30. Let B =   0   1   0  −1

 
 
−1 −1 −1 −1
 
−2
3
(a) Find the vector ~v where [~v ]B =  1 .

−4
 
−3
2
(b) Find the coordinate vector of w ~ =  with respect to B.
7
13
       
2 0 0 0 0 −1 0 1
31. Let B = , , , be a basis for M22 .
0 2 0 1 0 0 1 0
 
3
2
(a) Find the vector ~v where [~v ]B =  
 .
−1
4
 
1 2
(b) Find the coordinate vector of w ~= with respect to B.
3 4
       
1 0 1 0 0 1 0 1
32. Let B = , , , be a basis for M22 .
0 0 0 1 0 0 1 0
 
1
−2
(a) The vector ~v whose B-coordinate vector is   3 .

−1
 
2 4
(b) Find the B-coordinate vector of ~v = .
6 8
       
2 −1 −1 0 0 0 0 2
33. Let B = , , , be a basis for
0 0 0 2 2 −1 −1 0
M22 .
3.1 Basis and Coordinates 219
 
2
−1
(a) Find the vector ~v where [~v ]B =  
 3 .
1
 
5 −2
(b) Find the coordinate vector of w ~ = with respect to
−3 7
B.
         
1 1 1 1 1 1 1 1 1 1 1 0 1 0 1
34. Let B = , , , , ,
1 1 0 1 0 1 0 1 1 1 1 1 1 1 1
 
0 1 1
be a basis for M23 .
1 1 1
 
−1
2
 
0
(a) Find the vector ~v where [~v ]B =  
 1 .
 
−3
4
 
−3 3 −4
(b) Find the coordinate vector of w ~= with respect
0 −6 5
to B.
   
1 2 4 3
35. Use a coordinate map to check whether ~v1 = , ~v2 =
 3 4 2 −1

−2 1
and ~v3 = are linearly independent or linearly dependent.
4 9
   
2 0 1 1
36. Use a coordinate map to check whether ~v1 = , ~v2 = ,
  0 2 0 1
0 1
and ~v3 = in M22 are linearly independent or linearly
1 0
dependent.
   
1 2 4 3
37. Use a coordinate map to check whether ~v1 = , ~v2 = ,
3 4 2 −1
   
−1 0 −2 1
~v3 = , and ~v4 = span M22 .
4 1 4 9
   
2 0 1 1
38. Use a coordinate map to check whether ~v1 = , ~v2 = ,
0 2 0 1
   
0 2 0 1
~v3 = , and ~v4 = span M22 .
2 0 1 0
220 Vector Spaces

3.2 Polynomial Vector Spaces


In this section, we’ll discuss a new family of commonly used vector spaces
whose vectors are polynomials. While it is possible to have polynomials with
many variables, here we’ll restrict to the case of one variable which I’ll call x
which leaves us with familiar objects like 3x2 − 7x and x + 2.
Note that I used the word objects here very intentionally. You may be
more used to treating polynomials as functions where you’re plugging in
numbers or solving for x. Resist that temptation here! Remember that our
original description of a vector space was a set of mathematical objects with
two operations which followed certain rules. If you find yourself treating a
polynomial as a function in this section there is a very high probability that
you’re making a mistake.

Definition. The set of all polynomials in x is P .

From earlier algebra classes, we already know how to add two polynomials
together by adding the coefficients on corresponding powers of x and how
to multiply a polynomial by a scalar by multiplying each coefficient by that
scalar. Although polynomials don’t look like matrices or vectors, we can use
these operations to view P as a vector space.

Theorem 1. P is a vector space using polynomial addition as its + and


multiplication of a polynomial by a scalar as its ·.

We can check this by looking at each of the conditions in the definition of


a vector space introduced in 2.4. I’ll number the checks to correspond with
the numbering of the conditions in our definition.
1. The first condition is closure of addition. If we add together two
polynomials p(x) and q(x), it is clear that p(x) + q(x) is still a
polynomial, and therefore is in P .
2. Next up is commutativity of addition. If we are adding two
polynomials p(x) and q(x), it doesn’t matter what order we add
them so
p(x) + q(x) = q(x) + p(x)
which shows polynomial addition is commutative.
3. Similarly, if we’re adding three polynomials, it doesn’t matter which
pair we add first. In other words

(p(x) + q(x)) + t(x) = p(x) + (q(x) + t(x))

so polynomial addition is associative.


3.2 Polynomial Vector Spaces 221

4. To show P has an additive identity we need to find a polynomial


to act as our zero vector ~0P . This polynomial needs to have the
property p(x)+~0 = p(x) for any polynomial p(x) in P . Since we add
polynomials by adding corresponding coefficients (and assuming
that any missing terms have a coefficient of zero), we can construct
our ~0P polynomial by setting all its coefficients equal to zero. This
means our additive identity is simply ~0P = 0. If it is hard to see
this as a polynomial, you can think of it as 0 + 0x + 0x2 + · · · .
5. Now that we know that ~0P = 0, we can look for additive inverses in
P . For each p(x), we want to find a −p(x) so that −p(x)+p(x) = ~0P
or −p(x) + p(x) = 0. To cancel out a polynomial, we need to cancel
out each of its coefficients, so −p(x) is p(x) with the sign of each
coefficient switched.
6. If we multiply a polynomial by a scalar, we get another polynomial,
so P is closed under scalar multiplication.
7. If we are planning to multiply a polynomial p(x) by two scalars r
and s, we have
r(sp(x)) = (rs)p(x)
so scalar multiplication is associative.
8. Multiplying any polynomial by 1 leaves the polynomial unchanged.
9. If we have two scalars r and s and a polynomial p(x), we know

(r + s)p(x) = rp(x) + sp(x).

10. If we have two polynomials p(x) and q(x) and a scalar r, we know

r · (p(x) + q(x)) = rp(x) + rq(x)

so scalar multiplication distributes over addition.


Since all ten properties check out, P with the usual polynomial operations
is a vector space.

Often we don’t want to consider polynomials of arbitrarily large degree,


but instead want to work with a smaller set like all cubics or all quadratics.
We’ll tackle this whole family of bounded degree polynomial vector spaces at
once by leaving the bound on the degree generic and calling it n.

Definition. For any positive integer n, Pn is the set of all polynomials in x


of degree at most n.

Note that Pn also includes polynomials whose degrees are smaller than n.
If that bothers you, think of those polynomials as having coefficient 0 on the
missing powers of the x.
222 Vector Spaces

Theorem 2. Pn is a vector space using polynomial addition as its + and


multiplication of a polynomial by a scalar as its ·.

We could check this using our original vector space definition, but a much
easier way to approach this is to show instead that Pn is a subspace of P .
After all, the operations are the same and the polynomials in a given Pn are
certainly a subset of P . This means we need to show that ~0P is in Pn and
that Pn is closed under polynomial addition and scalar multiplication.
The zero vector of P is ~0P = 0. Since 0 is a constant, its degree is 0 which
is less than any positive n. Therefore ~0P is in Pn for any n.
To show closure of addition, suppose we have two polynomials p(x) and
q(x) in Pn . We need to show p(x) + q(x) is also in Pn . The sum of two
polynomials in x is another polynomial in x. Since p(x) and q(x) are both in
Pn , their degrees are both less than or equal to n. Adding polynomials never
raises their degree, so p(x) + q(x) will have a degree at most the larger of
deg(p) and deg(q). Therefore p(x) + q(x)’s degree is at most n which means
p(x) + q(x) is in Pn .
To show closure of scalar multiplication, suppose we have a polynomial
p(x) in Pn and a scalar r. We need to show rp(x) is also in Pn . A scalar
multiple of a polynomial in x is another polynomial in x. As above, we have
deg(p) ≤ n. Multiplying a polynomial by a constant never raises its degree,
so deg(rp) ≤ deg(p) ≤ n which means rp(x) is in Pn .
Thus Pn is a subspace of P , and therefore a vector space in its own right.

Now that we know P and Pn are vector spaces, we can develop the same
ideas and tools for working with them that we have for Rn and Mmn . We’ll
start with the idea of linear independence.
Our general definition of linear independence involved the vector equation
x1~v1 + · · · + xk~vk = ~0. Here we’ve already got an x as part of our polynomials,
so I’ll call our coefficients a1 , . . . , ak to avoid confusion. This will mean that
we’ll be solving the equation a1~v1 + · · · + ak~vk = ~0 for the ai ’s not for the x
that’s part of our polynomials!

Example 1. Are x2 −2x+1, x3 −2x+1, and 6x−3 linearly independent in P3 ?

To check whether or not these polynomials are linearly independent, we


need to determine if there is a solution to the equation

a1 (x2 − 2x + 1) + a2 (x3 − 2x + 1) + a3 (6x − 3) = 0

other than a1 = a2 = a3 = 0. Notice how I carefully put each polynomial in


parentheses to ensure that each term is multiplied by its coefficient ai .
We can expand our equation by multiplying each polynomial through by
ai to get

a1 x2 − 2a1 x + a1 + a2 x3 − 2a2 x + a2 + 6a3 x − 3a3 = 0.


3.2 Polynomial Vector Spaces 223

Collecting like terms gives us

a2 x3 + a1 x2 + (−2a1 − 2a2 + 6a3 )x + (a1 + a2 − 3a3 ) = 0.

For a polynomial to equal the zero polynomial, we need each of its coefficients
to be zero. This gives us four equations: a2 = 0 from the x3 term, a1 = 0 from
the x2 term, −2a1 − 2a2 + 6a3 = 0 from the x term, and a1 + a2 − 3a3 = 0
from the constant term. Since the first two equations tell us a1 = a2 = 0, we
can plug that into either the third or fourth equation to see a3 = 0.
Since all coefficients in our equation must be 0, our polynomials are linearly
independent.

We already saw that Pn is a subspace of P , but we can also find many other
subspaces of P and Pn . In the following example we see one such subspace
and also explore the span of a set of polynomials.

Example 2. Show W = {ax2 + bx + (a + b) | a, b ∈ R} is a subspace of P4 .

We could check this using the subspace test along the lines of our check
that Pn is a subspace of P , but I’ll instead write W as the span of a set of
polynomials. As we saw in 2.3, spans are always automatically subspaces.
To write W as a span, we need to find a set of polynomials whose linear
combinations give us all of W . In other words, we need to rewrite our generic
element of W as a linear combination of our spanning polynomials. Notice that
there are two important pieces to any polynomial in W , the part controlled
by a and the part controlled by b. Separating our generic element of W into
the sum of those two parts gives us

ax2 + bx + (a + b) = (ax2 + a) + (bx + b).

We can pull an a out of the first part and a b out of the second part to get

ax2 + bx + (a + b) = a(x2 + 1) + b(x + 1).

Aha! Now our generic element of W is written as a linear combination of x2 +1


and x + 1. This means

W = Span{x2 + 1, x + 1}

so W is automatically a subspace of P4 .

Now that we understand how to work within Pn , let’s explore the option
of mapping problems to Rn using a basis by finding standard bases for P and
Pn along with their coordinate maps.
The standard basis vectors for Pn are xn , xn−1 , . . . , x2 , x, 1. However, there
is no widespread consensus about whether this list should start with 1 and go
224 Vector Spaces

up to xn or start at xn and go down to 1, so you should always be careful to


specify which order you’re using.

Example 3. B = {xn , xn−1 , . . . , x2 , x, 1} is our standard basis for Pn .

To see that B is a basis for Pn , we need to check that it spans Pn and is


linearly independent.
We can check linear independence by arguing that none of the polynomials
from B are in the span of the rest. This is true because each polynomial in
B has a unique degree. Multiplying by a nonzero scalar doesn’t change the
degree of a polynomial, and multiplication by zero just gives us 0 which isn’t
helpful. Adding polynomials of different degrees can’t cancel out the term
of the largest degree, so will never have a sum of 0. Therefore B is linearly
independent.
To check that B spans Pn , consider a generic element an xn + · · · + a1 x + a0
of Pn . On closer inspection, it is already written as a linear combination of
the polynomials from B. Therefore B spans Pn , so it is a basis for Pn .

Since the dimension of a vector space is defined to be the number of vectors


in any basis, we get the following.

Theorem 3. dim(Pn ) = n + 1

If you are tempted to think the dimension of Pn is n, remember that a


polynomial of degree n has n + 1 terms: n powers of x plus the constant term.
This means when we write down a polynomial with degree n we are choosing
n+1 different coefficients, and we’ve previously described dimension intuitively
as the number of choices needed to specify a particular object/point.

Now that we have our standard basis B for Pn we can take advantage of
the corresponding coordinate map fB : Pn → Rn+1 . For example, fB gives us
a new way to answer the following question.

Example 4. Are x2 −2x+1, x3 −2x+1, and 6x−3 linearly independent in P3 ?

We could solve this problem in P3 as we did in Example 1, but instead let’s


use our coordinate map with respect to the standard basis B = {x3 , x2 , x, 1}.
Since the entries of a B-coordinate vector are just the coefficients of the
polynomial (in order of degree), we get
     
0 1 0
1 0 0
[x − 2x + 1]B = 
2  3    
−2 , [x − 2x + 1]B = −2 , [6x − 3]B =  6  .
1 1 −3
3.2 Polynomial Vector Spaces 225

The matrix with these three columns is


 
0 1 0
1 0 0
 
−2 −2 6
1 1 3

which has reduced echelon form


 
1 0 0
0 1 0
 .
0 0 1
0 0 0

Since we have a leading 1 in each column of the reduced echelon form, our
three B-coordinate vectors and hence x2 − 2x + 1, x3 − 2x + 1, and 6x − 3 are
linearly independent.

As with Rn and Mmn , the standard basis is by no means the only basis
for Pn .

Example 5. Show that x3 − 3, x2 − 2, x − 1, and x3 + x2 + x + 1 form a basis


for P3 .

Since we have four polynomials and dim(P3 ) = 4, we can invoke Theorem


3 from 3.1 to show this is a basis by showing that our four polynomials span
P3 . (We could also have shown that they are linearly independent.) We could
do this directly in P3 , but as in the previous example I’ll use the coordinate
map with respect to the standard basis B = {x3 , x2 , x, 1} to do this in R4 .
Our four polynomials have the following B-coordinate vectors:
       
1 0 0 1
0 2 1 0 3 2 1
[x −3]B = 
3     
 0  , [x −2]B =  0  , [x−1]B =  1  , [x +x +x+1]B = 1 .
 

−3 −2 −1 1

The matrix whose columns are our four vectors is


 
1 0 0 1
0 1 0 1
 
0 0 1 1
−3 −2 −1 1

which has reduced echelon form I4 . Since each column of the reduced echelon
form has a leading 1, our four coordinate vectors span R4 and hence our four
polynomials span P3 . Therefore x3 − 3, x2 − 2, x − 1, and x3 + x2 + x + 1 form
a basis for P3 .
226 Vector Spaces

The situation for P is a little stranger. Since there is no upper limit on the
degree of polynomials in P we need to include xk for every positive integer k.

Example 6. The standard basis of P is 1, x, x2 , x3 , . . . ..

You can convince yourself that this is a basis using a very similar argument
to that for Pn .

Since our basis for P has infinitely many basis vectors, we are forced to
conclude the following.

Theorem 4. dim(P ) = ∞

While we could conceivably still use a coordinate map for P , we’d have
to map to R∞ (which we haven’t defined) so we’ll just stick with working in P .

We can also explore linear functions with P or Pn as their domain or


codomain. While these functions can’t be written as matrix multiplication,
we can still check linearity and compute their kernels and ranges.
 
a+c b − 2c
Example 7. Let f : P2 → M22 by f (ax2 + bx + c) = . Show
0 a+c
that f is linear and compute its kernel and range.

To check that f is a linear function, we need to show that

f (p(x) + q(x)) = f (p(x)) + f (q(x)) and f (r · p(x)) = r · f (p(x))

for all polynomials p(x) and q(x) and all scalars r. Let’s fix the notation
p(x) = ax2 + bx + c and q(x) = αx2 + βx + γ.
Our first check is that f splits up over addition.

f (p(x) + q(x)) = f (ax2 + bx + c + αx2 + βx + γ)


= f ((a + α)x2 + (b + β)x + (c + γ))
 
(a + α) + (c + γ) (b + β) − 2(c + γ)
=
0 (a + α) + (c + γ)
 
(a + c) + (α + γ) (b − 2c) + (β − 2γ)
=
0 (a + c) + (α + γ)
   
a + c b − 2c α + γ β − 2γ
= +
0 a+c 0 α+γ
= f (ax2 + bx + c) + f (αx2 + βx + γ) = f (p(x)) + f (q(x))
3.2 Polynomial Vector Spaces 227

Next we’ll check that f splits up over scalar multiplication.

f (r · p(x)) = f (r(ax2 + bx + c)) = f ((ra)x2 + (rb)x + (rc))


   
(ra) + (rc) (rb) − 2(rc) r(a + c) r(b − 2c)
= =
0 (ra) + (rc) 0 r(a + c)
 
a + c b − 2c
=r = r · f (ax2 + bx + c) = r · f (p(x))
0 a+c

Since f satisfies both conditions, it is a linear function.


The kernel of f is all polynomials which are mapped to the zero vector of
M22 . Since ~0M22 is the 2 × 2 zero matrix, this means we want to find the set
of all ax2 + bx + c so that
 
2 0 0
f (ax + bx + c) = .
0 0

For our function, this means solving for a, b, and c in the equation
   
a + c b − 2c 0 0
= .
0 a+c 0 0

Setting corresponding entries equal gives us a+c = 0, b−2c = 0, and a+c = 0.


The first and third equations tell us we must have a = −c, and the second
equation tells us we need b = 2c. This means our kernel is

ker(f ) = {−cx2 + 2cx + c}.

If we aren’t sure this is correct, remember we can always check our work by
computing f (−cx2 + 2cx + c) and making sure we get the 2 × 2 zero matrix.
The range of f is the set of all 2 × 2 matrices which
 are
 outputs of our
v w
function. In other words, it is the set of all matrices where
y z
 
v w
f (ax2 + bx + c) =
y z

for some polynomial ax2 + bx + c. We can find these matrices by solving for
a, b, and c in the equation
   
a + c b − 2c v w
= .
0 a+c y z

(Here we are assuming we know the values of v, w, y, and z.) Setting


corresponding entries equal gives us a + c = v, b − 2c = w, 0 = y, and
v w
a + c = z. This immediately tells us that for to be in the range of
y z
f , we must have y = 0. What values of v, w, and z can we have? We need
228 Vector Spaces

there to be a solution to the other three linear equations. These equations


have augmented coefficient matrix
 
1 0 1 v
0 1 −2 w .
1 0 1 z

We can answer this question by row reduction, so let’s do enough of our


algorithm to see where the leading 1s are.
   
1 0 1 v 1 0 1 v
0 1 −2 w →r3 −r1 0 1 −2 w 
1 0 1 z 0 0 0 z−v

It is now clear that our first two rows are not an issue, because they contain
leading 1s which aren’t in the last column. Our third row is a potential problem
though, because its equation is 0 = z −v which
 only
 has a solution when z = v.
v w
This means that to have a 2 × 2 matrix in the range of f means we
y z
need y = 0 and z = v. Thus
 
v w
range(f ) = .
0 v

Note: Since the kernel is a subspace of the domain, our kernel must be a
set of polynomials, not a set of matrices. On the other hand, the range is a
subspace of the codomain, and so must be a set of 2 × 2 matrices not a set of
polynomials. It is often helpful to remind yourself of the types of objects in
the kernel and range to check your work.

Exercises 3.2.
1. Let W = {ax2 + bx} where a is even and b is positive. (Both a
and b are real numbers.) Show that W is closed under the usual
polynomial addition but not under scalar multiplication.
2. Redo Example 2 using the subspace test.

3. Show that W = ax2 with the usual polynomial addition and
scalar multiplication is a vector space.
4. Let V = {2ax2 + 2bx + 2c} be the subset of P2 where all coefficients
are even. Show that with the usual polynomial addition and scalar
multiplication, V is a vector space.
5. Show that 1 + x2 , x, and 1 are a basis for P2 .
6. Show that x2 + 6x + 9, x2 − 9, and −2x + 1 are a basis for P2 .
7. Is B = {1 + x2 , 1 − x2 , 4} a basis for P2 ?
8. Is B = {−x3 + x2 − x + 1, x3 + x, x2 − x + 1, 2x2 + 2} a basis for P3 ?
3.2 Polynomial Vector Spaces 229

9. The set B = 2x2 , x2 − x + 1, 3x is a basis for P2 .
 
−1
(a) Find the vector ~v in V whose B-coordinate vector is  3 .
2
2
(b) Find the coordinate vector of 3x + 10x − 1 with respect to
basis B.
10. The set B = {1, 1 + x, 1 + x + x2 } is a basis for P2 .
(a) Find the polynomial ~vwhose coordinate vector with respect to
1
this basis is [~v ]B = 2.
3
(b) Find the coordinate vector of ~v = 4 + 2x + 3x2 with respect to
this basis.
11. The set B = {x2 + x + 1, 2x2 + 2, x2 } is a basis for P2 .
 
1
(a) Find the polynomial whose B-coordinate vector is 2.
3
(b) Find the B-coordinate vector of x2 − 4x + 6.
12. The set B = {x + 2, 3x + 1} is a basis for P1 .
 
−1
(a) Find the vector ~v which has [~v ]B = .
2
(b) Find the B-coordinate vector of x − 1.
(c) Find the codomain, W , of the B-coordinate map fB : P2 → W .
13. Decide whether or not 6x + 4 is in the span of x2 + x + 1 and
x2 − 2x − 1 without using a coordinate map.
14. Decide whether or not 2x2 − x + 10 is in the span of x − 2 and
x2 + x + 1 without using a coordinate map.
15. Use a coordinate map to decide whether or not 6x + 4 is in the span
of x2 + x + 1 and x2 − 2x − 1.
16. Use a coordinate map to decide whether or not 2x2 − x + 10 is in
the span of x − 2 and x2 + x + 1.
17. Redo Example 4 in P3 , i.e., without using a coordinate map.
18. Decide whether x2 − x + 1, x2 + 2x + 3, and 5x2 − 2x are linearly
independent or linearly dependent without using a coordinate map.
19. Use a coordinate map to see if x3 − x2 − 2, −3x3 + 2x2 + x − 1, and
−x2 + x − 7 are linearly independent or linearly dependent.
20. Use a coordinate map to see if x2 − x + 1, x2 + 2x + 3, and 5x2 − 2x
are linearly independent or linearly dependent.
230 Vector Spaces
 
a b
21. Let f = ax2 + (b + c)x + a.
c d
(a) What is the domain of f ?
(b) What is the codomain of f ?
 
2 a+b 0
22. Let f (ax + bx + c) = .
a+c a+b
(a) What is the domain of f ?
(b) What is the codomain of f ?
 
a b c
23. Let f : M23 → P4 by f = ax4 + (b + c)x2 + (d − f ).
d e f
(a) Find the kernel of f .
(b) Find the range of f .
 
2 2a 0
24. Let f : P2 → M22 by f = (ax + bx + c) = .
0 b+c
(a) Find the kernel of f .
(b) Find the range of f .
 
2 a−b b+c
25. Let f : P2 → M22 be given by f (ax +bx+c) = .
a + b + 2c 0
(a) Find the kernel of f .
(b) Find the range of f .
 
a−b+c 0 0
26. Let f : P2 → M33 by f (ax2 + bx + c) =  0 a − b 0 .
0 0 2c
(a) Find the kernel of f .
(b) Find the range of f .
3.3 Other Vector Spaces 231

3.3 Other Vector Spaces


Although Rn , Mmn , and Pn are our main examples of vector spaces, there are
many more examples of familiar mathematical environments which also turn
out to be vector spaces. In this section we’ll explore two of these: the complex
numbers and the set of continuous functions from R to R.
The complex numbers, C, are an extension of the real numbers formed by
including a new number i which is defined to be the square root of −1. They
were developed to help solve equations like x2 + 4 = 0 which don’t have a
solution in R. Such equations may not seem relevant to the real world, but
they do come up in practical applications and can help solve real problems.
The way we include i into R is the same way we include any other real number.
It is part of the set, but also can be added to and multiplied by other numbers.
This means we can have things like 7 + i and −3i. We can also multiply i by
itself, but since i2 = −1 any power of i can be reduced to ±i. We’ll formally
define C as follows.

Definition. The complex numbers are C = {a + bi | a, b in R}.

The first term a is typically called the real part of a complex number, while
the second term is called the imaginary part. (For a more detailed discussion
of C, see Appendix A.1.)
Geometrically, we can think of C as a plane by plotting the complex
number a + bi as the point (a, b). For this reason, many people call the x-
axis the real axis and the y-axis the imaginary axis. To illustrate this, look at
the plot of 4 + 2i in
4 Figure 3.1.

�+�ⅈ
2

-1 1 2 3 4 5

Figure 3.1: A geometric representation of a complex number


-1

-2
232 Vector Spaces

Complex numbers are added and multiplied by scalars as if they were


polynomials in i. In other words,

(a + bi) + (x + yi) = (a + x) + (b + y)i

and
r(a + bi) = (ra) + (rb)i.
Keep in mind that our scalar r is from R not C. (To learn about the rule
for multiplying a complex number by another complex number, see Appendix
A.1.)

Example 1. Compute (2 + 4i) + (−1 + 7i).

Following our rule above, to add these two complex numbers, we add their
real and imaginary parts separately. This gives us

(2 + 4i) + (−1 + 7i) = (2 − 1) + (4 + 7)i = −1 + 11i.

Example 2. Compute −10(3 − 2i).

To multiply this complex number by −10, we multiply both the real and
imaginary parts by −10. This gives us

−10(3 − 2i) = −10(3) − 10(−2)i = −30 + 20i.

Theorem 1. C is a vector space using complex addition as its + and


multiplication of a complex number by a scalar as its ·.

As with our check in 3.2 that P is a vector space, I’ll number my


explanations to correspond with the conditions from the definition.
1. If we have two complex numbers a + bi and x + yi, their sum is
defined to be

(a + bi) + (x + yi) = (a + x) + (b + y)i

which is clearly in C. Thus C is closed under addition.


2. The order in which we add complex numbers doesn’t matter since

(a+bi)+(x+yi) = (a+x)+(b+y)i = (x+a)+(y+b)i = (x+yi)+(a+bi)

so complex addition is commutative.


3. If we have three complex numbers a + bi, x + yi, and w + vi, which
3.3 Other Vector Spaces 233

pair we add first doesn’t matter since

[(a + bi) + (x + yi)] + (w + vi) = [(a + x) + (b + y)i] + (w + vi)


= (a + x + w) + (b + y + v)i
= (a + bi) + [(x + w) + (y + v)i]
= (a + bi) + [(x + yi) + (w + vi)].

Thus complex addition is associative.


4. Our zero vector in C will be a complex number a + bi so that

(a + bi) + (x + yi) = x + yi

for all x + yi. We can do this by simply setting a = b = 0, so our ~0


in C is simply 0. (Alternately, you can think of ~0C as 0 + 0i.)
5. Since complex numbers add component-wise and ~0C = 0, the
additive inverse of a complex number will be one whose values for a
and b are the opposites of our original complex number. This means
the additive inverse of a + bi is −a − bi.
6. If we have a complex number a + bi and a scalar r, their product is
defined to be
r(a + bi) = (ra) + (rb)i
which is clearly in C, so C is closed under scalar multiplication.
7. Whether we multiply a complex number a + bi by a scalar s and
then another scalar r or simply by the product of the two scalars
rs doesn’t matter since

r(s(a + bi)) = r((sa) + (sb)i) = (rsa) + (rsb)i = (rs)(a + bi)

so scalar multiplication is associative.


8. Multiplying a complex number a + bi by 1 just means multiplying
a and b by 1, so 1(a + bi) = a + bi.
9. Multiplying a complex number by a sum of two scalars distributes
since

(r + s)(a + bi) = ((r + s)a) + ((r + s)b)i = (ra + sa) + (rb + sb)i
= ((ra) + (rb)i) + ((sa) + (sb)i) = r(a + bi) + s(a + bi).

10. Multiplying a sum of complex numbers by a scalar also distributes


since

r((a + bi) + (x + yi)) = r((x + a) + (y + b)i) = r(x + a) + r(y + b)i


= (rx + ra) + (ry + rb)i = r(a + bi) + r(x + yi).
234 Vector Spaces

Therefore C is a vector space as claimed.

Now that we know C is a vector space, we can explore all the properties
we looked at in Rn , Mmn , and Pn . Let’s start with subspaces.

Example 3. Show W = {bi | b in R} is a subspace of C.

Before we run our subspace test, notice that W can be thought of as all
complex numbers a + bi where a = 0. Since these complex numbers have no
real part, they are often called purely imaginary numbers. Geometrically, this
means W is the set pictured below.
2

� = {�ⅈ}

-2 -1 1 2

-1

The zero vector of C is 0, which we can see is in W by letting b = 0 to get


0i. If we have two elements bi and ci in W , their sum is
-2
bi + ci = (b + c)i

which is also in W . Thus W is closed under addition. If we have an element


bi in W and a scalar r, then

r(bi) = (rb)i

which is in W . Thus W is closed under scalar multiplication. Since it satisfies


all three conditions of the subspace test, W is a subspace of C.

Next let’s figure out the dimension of C by finding a basis.


3.3 Other Vector Spaces 235

Example 4. Show 1 and i are a basis for C.

From our definition of C it is clear that every complex number is a linear


combination of 1 and i, so C = Span{1, i}. Since there are only two spanning
elements, we can check linear independence by seeing if one of them is in the
span of the other. There is no real number a which will make ai a multiple of
1, so 1 and i are linearly independent. Therefore 1 and i are a basis for C.

In fact, 1 and i are the standard basis for C, although their ordering isn’t
standard so be careful to specify whether you’re using the basis 1, i or the
basis i, 1.
Since C has a basis with two elements, we know:

Theorem 2. dim(C) = 2

Now that we have access to a coordinate map via our standard basis, let’s
explore linear independence in C.

Example 5. Are 1 + i, −2 + 14i, and 10 − 6i linearly independent?

We don’t actually need to do any computations to answer this question.


The dimension of C is 2, so no matter which basis we pick for C, our coordinate
map is fB : C → R2 . This means if we use our coordinate map to translate this
question into R2 , we’re asking if three vectors in R2 are linearly independent.
This is clearly impossible, because we can’t get leading 1s in all three columns
of a 2 × 3 matrix. Therefore 1 + i, −2 + 14i, and 10 − 6i are linearly dependent.

Example 6. Are −2 + 14i and 10 − 6i linearly independent?

Now that we’re only talking about two complex numbers, it is possible for
them to be linearly independent. Let’s check using the coordinate map with
respect to the standard basis B = {1, i}. Coordinate vectors with respect to
this basis have first entry equal to the real part of the complex number (the
part without the i) and second entry equal to the coefficient on i. This means
our two coordinate vectors are
   
−2 10
[−2 + 14i]B = and [10 − 6i]B = .
14 −6

The matrix with these two columns is


 
−2 10
,
14 −6

which has reduced echelon form I2 . Therefore, by the Invertible Matrix


Theorem, its columns are linearly independent, which means −2 + 14i and
10 − 6i are linearly independent as well.
236 Vector Spaces

As with the other vector spaces we’ve seen, we can create linear maps with
C as their domain or codomain.

Example 7. Find the kernel of the map f : C → M22 given by


 
a − 2b 0
f (a + bi) = .
0 −2a + 4b

As usual, we find the kernel of f by setting f ’s output equal to the zero


vector of the codomain, which in this case is M22 . This means we want to
solve    
a − 2b 0 0 0
=
0 −2a + 4b 0 0
so we must have a − 2b = 0 and −2a + 4b = 0. Both of these equations give
us a = 2b, so we get
ker(f ) = {2b + bi}.
Using our geometric identification of C with the plane, we can draw the kernel
1
by rewriting our defining equation of the
2 kernel as b = 2 a. This means ker(f )
can be thought of as the line pictured below.

1

�= �

-2 -1 1 2

-1

Now that we’ve explored our new vector space C, let’s turn our attention
to another, more familiar, collection of mathematical objects: the set of
continuous functions from R to R, which -2 we will call C . Notice that this set
includes many functions we haven’t talked about in linear algebra so far, like
ex and 2x + sin(x). These may not seem like things that belong in our linear
world, but as we did with polynomials, we can use them as the mathematical
objects, i.e., vectors in a new vector space. Note that as with polynomials, we
mean the “objects” part of this idea seriously. We don’t get to treat ex as a
3.3 Other Vector Spaces 237

function if we want to use it here, so as with P , resist the urge to plug things
into your continuous functions, solve for x, etc.
The other important building block of any vector space is its two operations
+ and ·. For our operations, we’ll use the usual notion of adding and scaling
functions you worked with in calculus.

Theorem 3. C is a vector space using addition of functions as its + and


multiplication of a function by a scalar as its ·.

As above, I’ll number my explanations to correspond with the conditions


from the definition.
1. If we have two continuous functions f and g from R to R, their
sum f + g is clearly also a function from R to R. As you saw in
calculus, the fact that limits split up over sums implies that f + g is
continuous. Therefore f + g is in C , so C is closed under addition.
2. In calculus, you also saw that f + g = g + f , so function addition is
commutative.
3. Again, in calculus you saw that (f +g)+h = f +(g +h), so function
addition is associative.
4. To find our additive identity, we need a function ~0 in C so that
f + ~0 = f . Since constant functions are always continuous, we can
let ~0C be the zero function z(x) = 0.
5. To find the additive inverse of a continuous function f , we need to
find a function which cancels out each output of f so that their sum
is the zero function. We can do this by simply multiplying f by −1.
This will be continuous since f is continuous.
6. As with addition, the fact from calculus that scalar multiplication
can be pulled out of a limit tells us that if f (x) is continuous
then rf (x) is also continuous. Therefore C is closed under scalar
multiplication.
7. Since multiplying a function by a scalar really means multiplying
the function’s outputs by that scalar, the fact that multiplication
in R is associative gives us r(sf (x)) = (rs)f (x). Thus scalar
multiplication in C is associative.
8. Multiplying a function by 1 doesn’t change its outputs, so
1f (x) = f (x).
9. From working with continuous functions in calculus you’ve already
seen that (r + s)f (x) = rf (x) + sf (x), so scalar multiplication
distributes over addition of scalars.
10. Similarly, r(f (x) + g(x)) = rf (x) + rg(x), so scalar multiplication
distributes over function addition.
238 Vector Spaces

Therefore C with our usual operations on functions is a vector space.

As with the complex numbers, we can now explore some of our vector
space ideas. Again, let’s start with subspaces.

Example 8. Show P is a subspace of C .

Since every polynomial in one variable is a continuous function from R


to R, we know P is a subset of C . In fact, our notions of function addition
and scalar multiplication are exactly the same as those of polynomial addition
and scalar multiplication in the case where our function is a polynomial. Since
we’ve already shown that P is a vector space, we can now conclude that P is
a subspace of C .

However, this exposes a problem with finding a basis for C and using its
coordinate map to tackle spans and linear independence. From 3.1 we know
that dim(P ) ≤ dim(C ) and dim(P ) = ∞, so we must have the following.

Theorem 4. dim(C ) = ∞

Since C is infinite dimensional, it won’t have a nice coordinate map to


Rn . In fact, although it can be proved that a basis for C exists, actually
constructing a basis for C is difficult enough to be beyond the scope of this
book.

Exercises 3.3.
1. Show R is a subspace of C.
2. Is W = {1 + bi} a subspace of C?
3. Show that 1 + i and 1 − i form a basis for C.
4. Show that 4 − 2i and 1 − 4i form a basis for C.
5. Is {2 − i, 4 − 2i} a basis for C?
6. Is {2 + 2i, −3 + 3i} a basis for C?
7. Let B = {1 + i, i} be a basis for C.
 
−3
(a) Find the complex number ~v with [~v ]B = .
5
(b) Find the B-coordinate vector of 2 − 6i.
8. Let B = {1 + i, 1 − i} be a basis for C.
 
1
(a) Find the complex number ~v with [~v ]B = .
7
(b) Find the B-coordinate vector of 8 + 3i.
3.3 Other Vector Spaces 239

9. Let B = {4 − 2i, 1 − 4i} be a basis for C.


 
−3
(a) Find the complex number ~v with [~v ]B = .
−1
(b) Find the B-coordinate vector of 12 + 4i.
10. Let B = {−1 + 2i, 2 + 5i} be a basis for C.
 
4
(a) Find the complex number ~v with [~v ]B = .
1
(b) Find the B-coordinate vector of 10 + i.
11. Decide whether 6 − 9i and 12 − 18i are linearly independent or
linearly dependent without using a coordinate map.
12. Decide whether 5+4i and 4+5i are linearly independent or linearly
dependent without using a coordinate map.
13. Use a coordinate map to decide whether 6 − 9i and 12 − 18i are
linearly independent or linearly dependent.
14. Use a coordinate map to decide whether 5+4i and 4+5i are linearly
independent or linearly dependent.
15. Decide whether or not 4 − 2i and 10 + i span C without using a
coordinate map.
16. Decide whether or not 20 + 8i and 15 + 6i span C without using a
coordinate map.
17. Use a coordinate map to decide whether or not 4 − 2i and 10 + i
span C.
18. Use a coordinate map to decide whether or not 20 + 8i and 15 + 6i
span C.
 
−a a
19. Let f (a + bi) = .
0 b
(a) Find the domain of f .
(b) Find the codomain of f .
 
x1
20. Let f x2  = (x1 + x2 ) + (x1 − x3 )i.
x3
(a) Find the domain of f .
(b) Find the codomain of f .
 
−a a
21. Let f (a + bi) = .
0 b
(a) Find the kernel of f .
(b) Find the range of f .
240 Vector Spaces
 
x1
22. Let f x2  = (x1 + x2 ) + (x1 − x3 )i.
x3
(a) Find the kernel of f .
(b) Find the range of f .
 
a b
23. Let f = (a − 2b) + ci.
c d
(a) Find the kernel of f .
(b) Find the range of f .
24. Let f (a + bi) = −bi.
(a) Find the kernel of f .
(b) Find the range of f .
25. Let W = {A sin(x)+B | A, B in R}, so W is the set of sine functions
with all possible amplitudes and midlines. (Adjusting the amplitude
and midline of a sine function can be very helpful in modeling
periodic behavior.) Show W is a subspace of C .
26. Show that the integers, Z, do not form a vector space with their
usual addition and scalar multiplication.
27. Show that the set of all differentiable functions, D, from R → R
with our usual addition and scalar multiplication of functions is a
vector space.
28. Let V be the set of positive real numbers. Define “addition” to be
our usual multiplication, so a“+”b = ab, and “scalar multiplication”
by r to be the rth power, so r“ · ”a = ar . Is V a vector space? Show
that your answer is correct.
29. Let V = R with the usual addition and multiplication of real
numbers, and W = R>0 be the vector space of positive real
numbers whose “addition” is multiplication and whose “scalar
multiplication” raises the real number to that scalar power (see
the previous exercise). Define f : V → W by f (x) = 2x . Either
show that f is a linear map, or show that it is not.
4
Diagonalization

4.1 Eigenvalues and Eigenvectors


In Example 5 from 2.2, we developed a function to predict population sizes
in different life stages of the smooth coneflower from one year to the next.
Suppose we aren’t interested in just predicting the next year’s population,
but instead want to know what the population will look like in 10 years or 20
years. We can even ask whether the population will survive in the long term or
whether it is dying out. To do this, we can repeatedly apply our linear function
representing the population change over each year to our initial population
vector. This means we are repeatedly multiplying by a fixed matrix.
This need to multiply many times by the same matrix also comes up
in game theory, when we study movement around the board of games like
Monopoly. Here our vector entries represent the likelihood of ending up on
the various spaces on the board, and the matrix we’re multiplying by contains
the probabilities of moving from one space to another.
In this chapter, we’ll start developing a method which can help us simplify
problems with repeated matrix multiplication. As with scalar multiplication,
we can rewrite repeated matrix multiplication as multiplication by the powers
of a matrix. For example, if f has matrix A, then f (f (~x)) = A(A(~x)) = A2 ~x. In
general, these problems are complicated whether we are repeatedly applying a
linear map, repeatedly multiplying by a matrix, or taking powers of a matrix.
However, there is a case where matrix operations behave much better because
some strategically chosen entries are zero.    
a b x y
Suppose we are multiplying matrices A = and Z = .
c d z w
According to our usual rules of matrix multiplication, we’d get
 
ax + bz ay + bw
AZ = .
cx + dz cy + dw

One way to make this less complicated is to set enough entries of A equal to
zero so that each entry of their product contains only one term rather than
being a sum of two terms. We can eliminate one term from the sum in  each
a 0
matrix entry by making b and c equal zero. This would make A =
0 d

241
242 Diagonalization
 
ax ay
and AZ = . We could have set other entries of A equal to zero, but
dz dw
this gives the nice additional result that each row of Z is being scaled by the
corresponding entry of A.
We could also go back and focus on the entries of Z, but instead let’s
generalize the type of matrix we created with A.

Definition. An n × n matrix is diagonal if its entries aij are zero unless


i = j.

(Remember that our notation is that aij is the entry of A in the ith row
and jth column.)
 
2 0 0
Example 1. A = 0 0 0  is a diagonal matrix.
0 0 −3
All entries off the diagonal are zero, so this matrix is diagonal. (Notice
that one of our diagonal entries is also zero, which is fine because diagonal
entries are allowed to be any real number.)

Matrix multiplication is much easier if we stick to diagonal matrices. To


multiply two n × n diagonal matrices, simply multiply their corresponding
diagonal entries.
  
−2 0 0 1 0 0
Example 2. Compute  0 4 0 0 3 0 .
0 0 7 0 0 −2
If we do this using our regular definition of matrix multiplication, we get
      
−2 0 0 1 0 0 −2(1) 0 0 −2 0 0
 0 4 0 0 3 0  =  0 4(3) 0  =  0 12 0 .
0 0 7 0 0 −2 0 0 7(−2) 0 0 −14

Notice that the product above does indeed work out to be a diagonal
matrix whose entries are the products of the diagonal entries of our two
diagonal matrices.
Another way to view multiplication by a diagonal matrix is to decompose
our diagonal matrix into a product of elementary matrices. Each elementary
matrix will correspond to a row operation where we scale some row by a
constant. This allows us to compute the matrix product AB where A is
a diagonal matrix by scaling each row of B by A’s diagonal entry in that
row. (This is the bonus nice property we noticed in our discussion before the
definition of a diagonal matrix.)
4.1 Eigenvalues and Eigenvectors 243

Example 3. Use
 the multiplication
 technique
 described
 above to compute
1 0 0 2 7 4
AB where A = 0 3 0  and B = −1 5 2.
0 0 −2 6 −3 0
We could do this using our usual method for matrix multiplication, but
it’s easier to simply multiply each row of B by the corresponding diagonal
entry of A. For our A, this means multiplying B’s first row by 1, its second
row by 3, and its third row by −2. This gives us
   
2(1) 7(1) 4(1) 2 7 4
AB = −1(3) 5(3) 2(3)  =  −3 15 6 .
6(−2) −3(−2) 0(−2) −12 6 0

Be careful to remember that this method doesn’t work for computing AB if


B is diagonal, since matrix multiplication doesn’t commute. It turns out that
there is a related shortcut in that case, which you’ll explore in the exercises.
We can also think about multiplication by a diagonal matrix geometrically.
A diagonal matrix can be interpreted as a transformation which scales each
axis of Rn by the corresponding diagonal entry. In Example 1 above, if we
think of the three axes of R3 in order as the x-axis, y-axis, and z-axis, our
matrix scales the x-axis by 2, the y-axis by 0, and the z-axis by −3.
Now that we understand diagonal matrix multiplication, let’s shift our
attention back to taking powers of a matrix. If our matrix is diagonal, the
shortcut used in Example 3 says we are repeatedly scaling each row of our
matrix by its own diagonal entries.
 
3 0
Example 4. Compute A2 and A3 where A = .
0 −2
Let’s start with A2 = A·A. We saw in Example 2 that we can compute the
product of two diagonal matrices by multiplying their diagonal entries together
and using those products as the diagonal entries of the matrix product. Since
here we are multiplying A by itself, the diagonal entries will be the products of
the old diagonal entries by themselves, i.e., the squares of the diagonal entries.
Computationally, this looks like
 2       2   
3 0 3 0 3 0 3(3) 0 3 0 9 0
= = = = .
0 −2 0 −2 0 −2 0 −2(−2) 0 (−2)2 0 4

Finding A3 = A · A · A means doing this process three times, so we get


     
3 0 3 0 3 0 3(3)(3) 0
=
0 −2 0 −2 0 −2 0 −2(−2)(−2)
 3   
3 0 27 0
= = .
0 (−2)3 0 −8
244 Diagonalization

Notice that our cubed matrix’s diagonal entries are the cubes of the original
matrix’s diagonal entries.

We can generalize what happened here to say that we can take the kth
power of a diagonal matrix by simply taking the kth power of each of the
matrix’s diagonal entries.
 5
−1 0 0 0
0 3 0 0
Example 5. Compute 
0
 .
0 2 0
0 0 0 −2
This would be annoying to do without our shortcut, but it’s easy to do by
taking the 5th power of each diagonal entry. Then our computation is just
 5  
−1 0 0 0 (−1)5 0 0 0
0 3 0 0  (3)5 0 
  = 0 0 
0 0 2 0   0 0 (2) 5
0 
0 0 0 −2 0 0 0 (−2)5
 
−1 0 0 0
 0 243 0 0 

= .
0 0 32 0 
0 0 0 −32

Diagonal matrices also make multiplying a matrix and a vector easier. If


we think about our geometric interpretation of scaling each axis in Rn by its
corresponding diagonal entry of the matrix, it should make sense that we can
multiply a vector by a diagonal matrix by multiplying each vector entry by
the matrix’s corresponding diagonal entry.
  
2 0 0 5
Example 6. Compute 0 0 0   11 .
0 0 −3 −2
We can compute this matrix vector product by scaling the first entry of
our vector by 2, the second by 0, and the third by −3 to give us
      
2 0 0 5 2(5) 10
0 0 0   11  =  0(11)  =  0  .
0 0 −3 −2 −3(−2) 6

Combining this with our previous discussion of powers of diagonal


matrices, we see that repeated multiplication of a vector by a diagonal matrix
is not terribly hard.
4.1 Eigenvalues and Eigenvectors 245

 4  
2 0 0 5
Example 7. Compute 0 0 0   11 .
0 0 −3 −2
 4  
2 0 0 5
We can either compute 0 0 0  and then multiply it by  11 , or
0 0 −3 −2
think of repeating the multiplication in Example 6 four times in a row. In
either case, we’ll get
 4        
2 0 0 5 24 (5) 16(5) 80
0 0 0   11  =  04 (11)  =  0(11)  =  0  .
0 0 −3 −2 (−3)4 (−2) 81(−2) −162

The only way this could be easier is if all diagonal entries of our matrix were
equal. Then we could simply multiply all entries of our vector by the same
number. This is possible, and sometimes happens  even when   our matrix A
1 2 1
isn’t diagonal. For example, consider A = and ~x = . Their product
4 3 2
    
1 2 1 5
is A~x = = . If we compare this with ~x, we see that A~x = 5~x so
4 3 2 10
multiplication by A scales each entry of ~x by 5 even though A is not diagonal!
This is a very special situation that we’ll spend some time exploring. We start
by giving the vectors that have this special sort of relationship with A a name.

Definition. A nonzero vector ~x is an eigenvector of a matrix A with


eigenvalue λ if A~x = λ~x.

In other words, ~x is an eigenvector if multiplying ~x by A is the same


as multiplying ~x by some scalar λ. Geometrically, this means that A scales
its eigenvector by λ along each axis of Rn . Notice that having A~x = λ~x
means every eigenvector has a unique eigenvalue. (We’ll see later on that each
eigenvalue has many different eigenvectors.)
   
−1 1 2
Example 8. Show ~y = is an eigenvector of A = with eigenvalue
1 4 3
−1.

To see this, we need to check that A~y = (−1)~y . Doing this computation
gives us
          
1 2 −1 −1 + 2 1 −1(−1) −1
= = = = (−1) .
4 3 1 −4 + 3 −1 (−1)1 1

This shows ~y is an eigenvector of A with eigenvalue −1 as claimed.


246 Diagonalization

Since this was the same matrix


 we discussed before the definition of an
1
eigenvector, we know ~x = is also an eigenvector of A with eigenvalue 5.
2
This shows us that a given matrix can have many eigenvectors with different
eigenvalues.

We can use eigenvalues to answer the question of whether a population


grows larger or dies out over time. (A population dies out if eventually all
entries of its population vector go to the zero, i.e., its population vector goes
to the zero vector.)

Example 9. Suppose a population has demographic matrix A so that if ~x


is the population vector for one year, then A~x is the population vector for
the next year. If ~x is an eigenvector of A with eigenvalue λ = 0.75, does this
population grow or die out in the long run?

To figure out what happens to our population in the long run, we want
to multiply our population vector ~x by increasingly large powers of our
demographic matrix A. In other words, we want to look at the limit of Ak ~x
as k → ∞.
Since A~x = 0.75~x, we know A2 ~x = (0.75)2 ~x. Multiplying by A again gives
us
A3 ~x = A(0.75)2 ~x = (0.75)2 A~x = (0.75)2 (0.75)~x = (0.75)3 ~x.
Continuing this pattern gives us Ak ~x = (0.75)k ~x.
From calculus we know that lim (0.75)k = 0. This means that as k → ∞,
k→∞
the limit of Ak ~x = (0.75)k ~x is 0~x = ~0.
Since our population vector had a limit of ~0, our population dies out in
the long run.

Because of the impact of these eigenvalues on population growth, biologists


often refer to a demographic matrix’s largest eigenvalue as its “population
growth rate.”
Now that we’ve seen how easy eigenvalues and eigenvectors are to work
with, the natural next question is how to find them. Given an n × n matrix
A, we can find its eigenvectors and eigenvalues by solving A~x = λ~x for ~x and
λ. As it stands, this doesn’t have a very familiar format, so we’ll do a bit of
algebraic manipulation to try and massage it back into the shape of our usual
matrix equation A~x = ~b.
We can start by moving both terms to the left-hand side of the equation,
which gives us A~x −λ~x = ~0. At this point, it’s tempting to try to factor ~x out of
both terms on the left-hand side. However, that would leave a factor of A − λ,
which doesn’t make sense because A is a matrix and λ is a scalar. We can
fix this by inserting an extra factor of the n × n identity matrix In between
λ and ~x. (Recall that we did something similar in 2.7’s Example 2.) Since
4.1 Eigenvalues and Eigenvectors 247

In ~x = ~x, we must have λIn ~x = λ~x. This makes our equation A~x − λIn ~x = ~0.
Now we can factor ~x out of the left-hand side, leaving A − λIn . Since both A
and λIn are n × n matrices, this now makes sense. Our factored equation is
(A − λIn )~x = ~0 which has the familiar format of a matrix equation.
Unfortunately, we are still at a bit of a loss. The matrix in our matrix
equation above contains the unknown variable λ. This means we are trying
to solve for two unknown quantities, ~x and λ, simultaneously within the same
equation. Additionally, while we usually invoke row reduction to help solve
matrix equations, it is less effective with a variable in the matrix’s entries. We
need a new tool which we’ll develop in the next section. In 4.3, we’ll come
back to the problem of finding eigenvectors and eigenvalues from the equation
(A − λIn )~x = ~0 armed with the idea of the determinant.

Exercises 4.1.
 2
−9 0 0
1. Compute  0 4 0 .
0 0 7
 3
2 0 0
2. Compute 0 −3 0 .
0 0 5
 6
−1 0 0
3. Compute  0 0 0 .
0 0 2
 3
1 0 0
4. Compute 0 −5 0 .
0 0 6
 
2 0 0
5. Let B = 0 −3 0.
0 0 5
 
3 −1 6
(a) Compute AB for A = 2 1 −4.
0 7 −2
(b) Come up with a shortcut for multiplication on the right by a
diagonal matrix which is similar to the one used in Example 3
for multiplication on the left.
 
10 0 0
6. (a) Find the inverse of  0 −5 0 or show it is not invertible.
0 0 6
 
−1 0 0
(b) Find the inverse of  0 0 0 or show it is not invertible.
0 0 2
248 Diagonalization

(c) Give a rule which lets you easily see when a diagonal matrix is
invertible.
(d) Give a formula for the inverse of a diagonal matrix. (Your
formula should not involve row reduction.)
7. Let W be the set of n × n diagonal matrices. Show W is a subspace
of Mnn .
   
−3 2 −2 0
8. Let ~v =  0  and A = −2 4 1. Decide whether or not ~v is
−6 0 1 2
an eigenvector of A. If it is, give its eigenvalue λ.
   
2 2 −2 0
9. Let ~v =  1  and A = −2 4 1. Decide whether or not ~v is
−1 0 1 2
an eigenvector of A. If it is, give its eigenvalue λ.
   
5 1 2 1
10. Let ~v =  0  and A =  0 −4 0 . Decide whether or not
−10 −2 1 −2
~v is an eigenvector of A. If it is, give its eigenvalue λ.
   
11 3 0 0
11. Let ~v = −2 and A = 0 1 −1. Decide whether or not ~v is
4 2 −1 −3
an eigenvector of A. If it is, give its eigenvalue λ.
   
1 3 −2 4
12. Let ~v = 2 be an eigenvector of A = 0 4 −2 with
1 4 0 0
4
eigenvalue λ = 3. Compute A ~v .
   
1 1 0 −1
13. Let ~v = 17 be an eigenvector of A = 7 −2 −6 with
4 4 0 −4
eigenvalue λ = −3. Compute A2~v .
   
2 6 2 8
14. Let ~v = 1 be an eigenvector of A = 4 −1 −9 with
0 0 0 5
eigenvalue λ = 7. Compute A2~v .
   
−4 3 −2 4
15. Let ~v =  2  be an eigenvector of A =  0 4 −2 with
2 −1 0 0
eigenvalue λ = 2. Compute A5~v .
     
x1 6x1 + 2x2 + 8x3 −30
16. Let f x2  =  4x1 − x2 − 9x3 . The vector ~v = −41 is an
x3 5x3 14
eigenvector of f ’s matrix with eigenvalue λ = 5. Use this to compute
f (~v ).
4.1 Eigenvalues and Eigenvectors 249
     
x1 −6x1 + 5x2 − 4x3 −13
17. Let f x2  =  10x2 + 7x3 . The vector ~v = −35 is
x3 −x2 + 2x3 5
an eigenvector of f ’s matrix with eigenvalue λ = 9. Use this to
compute f (~v ).
     
x1 x1 − x3 0
18. Let f x2  = 7x1 − 2x2 − 6x3 . The vector ~v = −5 is
x3 4x1 − 4x3 0
an eigenvector of f ’s matrix with eigenvalue λ = −2. Use this to
compute f (f (~v )).
     
x1 3x1 − 2x2 + 4x3 −4
19. Let f x2  =  4x2 − 2x3 . The vector ~v =  2  is an
x3 −x1 1
eigenvector of f ’s matrix with eigenvalue λ = 3. Use this to compute
f (f (f (~v ))).
20. In Example 9, we saw that starting with a population eigenvector
of a demographic matrix with eigenvalue λ = 0.75 meant the
population would die out in the long run. What if the population
eigenvector had eigenvalue λ = 1.6?
21. Put together Example 9 and Exercise 20 to come up with a condition
on λ so that a population whose population eigenvector with
eigenvalue λ dies out in the long run. Find another condition on
λ so that the population grows in the long run.
250 Diagonalization

4.2 Determinants
In this section we’ll develop a computational tool called the determinant which
assigns a number to each n × n matrix. Then we’ll discuss what that number
tells us about the matrix. We’ll start in the smallest interesting case where
n = 2. Here our definition of the determinant is a byproduct of computing of
the inverse of a general 2 × 2 matrix. This will get a bit messy in the middle,
but hang in until the computational dust settles and you’ll see something
interesting happen.
 
a b
Let A = . To find the inverse, we need to row reduce [A | I2 ]. This
c d
looks like
    " #
b 1
a b 1 0 1 1 ab a1 0 1 a a 0
→ ( · r1 ) → (r2 − c · r1 ) .
c d 0 1 a c d 0 1 0 d − bc
a
c
a 1

bc ad − bc
Since d − = , the last matrix can be rewritten as
a a
" #
b 1
1 a a 0
.
0 ad−bc
a
c
a 1

Continuing our row reduction from here we get


" # " #
b 1 b 1
1 a a 0 a 1 a a 0
→( · r2 ) c a
0 ad−bc
a − ac 1 ad − bc 0 1 − ad−bc ad−bc
" #
b
b 1 0 F − ad−bc
→ (r1 − · r2 ) c a
a 0 1 − ad−bc ad−bc

where
 
1 b c ad − bc bc ad − bc + bc
F= − − = + =
a a ad − bc a(ad − bc) a(ad − bc) a(ad − bc)
ad d
= = .
a(ad − bc) ad − bc
Substituting this back into our matrix gives us
" #
d b
1 0 ad−bc − ad−bc
c a .
0 1 − ad−bc ad−bc

Since A row reduced to I2 , we get


" #
d b
− ad−bc
A−1 = ad−bc
c a .
− ad−bc ad−bc
4.2 Determinants 251
1
We can make this more appealing looking by factoring out ad−bc from each
entry to get
 −1  
a b 1 d −b
= .
c d ad − bc −c a

This formula for the inverse of a 2 × 2 matrix makes sense as long as


our denominator ad − bc isn’t zero. This means the number ad − bc tells
us something interesting about our matrix A, which prompts the following
definition.
 
a b
Definition. The determinant of A = is det(A) = ad − bc.
c d

Notice that ad is the product of the two diagonal entries, and bc is the
product of the two non-diagonal entries. Therefore one way to remember this
formula is to think of it as “the product of the diagonal entries minus the
product of the non-diagonal entries”.
 
3 2
Example 1. Compute the determinant of A = .
6 5
Here a = 3, b = 2, c = 6 and d = 5. Plugging those values into the formula
above, we get
det(A) = 3(5) − 2(6) = 3.

We can restate the conclusion from our inverse computation at the


beginning of this section as follows: a 2 × 2 matrix A is invertible exactly when
det(A) 6= 0. This means the matrix from Example 1 is invertible, because its
determinant was nonzero. The determinant also tells us more about A than
just whether or not it is invertible, which should make sense. After all, the
determinant’s output gives us more information than just zero or nonzero.
Thinking of a 2 × 2 matrix A as a map from R2 to itself, we can ask what
the determinant of the matrix tells us about its map’s effect on the plane. It
turns out that the determinant tells us two things about the geometric effects
of the map: how it affects areas and whether or not it flips the plane over.
Let’s start with the question of whether A’s map flips the plane over.
Just to make sure we are all interpreting this idea the same way, think of
the maps f and g where f reflects R2 about the line y = x and g rotates R2
counterclockwise by 90◦ . If we imagine R2 as an infinite sheet of paper, we can
identify one side of the paper as the top side and the other as the bottom side.
After applying the map f , our top and bottom sides have switched places,
i.e., f has flipped the plane over. The map g on the other hand leaves the
top and bottom sides in their original positions, so it does not flip over the
plane. If we’re looking at a map which isn’t defined geometrically, it can be
more complicated to figure out whether that map flips the plane over or not.
252 Diagonalization

However, we can use the sign of the determinant of A to quickly determine


whether or not A’s map flips over the plane. If det(A) is positive, then it does
not flip R2 . Alternately, if det(A) is negative, then it does flip R2 . To justify
this relationship, let’s compute the determinants of our two example maps
discussed above.

Example 2. Find the signs of the determinants for the maps f and g
discussed above.

Before we can compute these two determinants, we need to find the matrix
of each map. Let’s call f ’s matrix A and g’s matrix B to avoid any confusion.
We saw in 2.2 that we can find the matrix of a map   from a geometric
2 1
description of its action on R by putting the image of in the first column
0
 
0
and the image of in the second column.
1  
1
Our map f is reflection about the line y = x, so geometrically it sends
      0
2
0 0 1
to and to .
1 1 0



-2 -1 1 2


   
-1 0 1
This means f ’s matrix has first column and second column , so
1 0
 
0 1
A= .
1 0  
◦ 1
The other map g is counterclockwise
-2 rotation by 90 , so it sends to
      0
0 0 −1
and to .
1 1 0
2

4.2 Determinants 253



-2 -1 1 2


   
0 −1
This means f ’s matrix has first column
-1 and second column , so
  1 0
0 −1
B= .
1 0
Computing our two determinants, we get det(A) = 0(0) − 1(1) = −1 and
det(B) = 0(0) − (−1)(1) = 1. Thus, as expected, the map that flips the plane
has a negative determinant and the-2map that doesn’t flip the plan has a
positive determinant.
2.0
Next let’s see what determinants can tell us about the effect that a matrix’s
map has on area. We’ll explore this by looking at the effect of the map on the
unit square, i.e., the area enclosed by the unit vectors along the x and y axes
1.5
in Figure 4.1.

1.0

0.5

-1.0 -0.5 0.5 1.0 1.5 2.0

Figure
-0.5 4.1: The unit square

This square has area 1, so we can compare the area of the unit square’s
image under various matrix maps to 1 to see what effect those maps have on
areas. -1.0
254 Diagonalization

Example 3. The maps f and g2.0from Example 2 don’t change areas.

To see this, we need to compute the image of the unit square under each
map. These are shown below. 1.5

1.0




0.5

2.0

-1.0 -0.5 0.5 1.0 1.5 2.0




1.5�

-0.5

1.0

-1.0 �


0.5

-2.0 -1.5 -1.0 -0.5 0.5 1.0





-0.5

From the pictures above, we can see that both of these images still have
area 1.
-1.0

 
2 1
Example 4. What effect does the matrix A = have on areas?
1 3
To figure this out, we need to find the image of the unit square after
multiplication by A. This is shown in the following picture.
4.2 Determinants 255





-1 1 2 3 4

This image is a parallelogram,


-1 so we can compute its area using the formula
Area = base · height with base b and height h are shown below.
4


2

-1 1 2 3 4

We can compute the-1 lengths of both b and h using the formula for the
distance between two points in the plane:
p
d = (x1 − x2 )2 + (y1 − y2 )2 .

Plugging in the points at the ends of b and h gives us


p √
b = (2 − 0)2 + (1 − 0)2 = 5
256 Diagonalization

and p √
h= (1 − 2)2 + (3 − 1)2 = 5.
Thus the area of the image of the unit square is
√ √
b · h = 5 · 5 = 5.

Since we started with a square of area 1 and ended up with an image that
has area 5, we can say A multiplies areas by 5.

To connect area with the determinant, let’s compare each of our three
example maps’ effects on area with the value of their determinants. The map
f from Examples 2 and 3 has determinant −1 and has no effect on area. (We
can also think of this as saying f multiplies area by 1.) Similarly, the map
g from Examples 2 and 3 has determinant 1 and also multiplies area by 1.
Finally, the map A from Example 4 multiplies area by 5 and has determinant
det(A) = 2(3) − 1(1) = 5. These three examples illustrate the general pattern:
the map of a 2 × 2 matrix A multiplies area by | det(A)|.
We can summarize our geometric exploration of the determinant of a 2 × 2
matrix as follows.

Theorem 1. Let f : R2 → R2 have matrix A. If det(A) is negative, then


f flips the plane over, while if det(A) is positive, then f does not. Also, f
multiplies areas by | det(A)|.

Now that we understand 2 × 2 determinants, let’s extend this idea to n × n


matrices. There are many ways to do this, but we’ll start with an iterative
process which rewrites det(A) in terms of the determinants of (n − 1) × (n − 1)
submatrices of A. We’ll make this transition from n × n to (n − 1) × (n − 1)
by deleting a row and a column of A. Applying this repeatedly will eventually
reduce down to the 2 × 2 case where we can use our ad − bc formula for the
determinant. Before we can we get started, we need to establish some notation.

Definition. If A is an n × n matrix, Aij is the (n − 1) × (n − 1) matrix


formed by removing A’s ith row and jth column.

(Note that this is different from aij which is the entry of A in the ith row
and jth column.)
 
2 0 −1 5
3 1 0 −4
Example 5. Find A24 and A31 for A = 
−1
.
2 1 1
7 −3 0 1
We can find the first submatrix A24 by removing A’s 2nd row and 4th
column. Crossing out this row and column looks like
4.2 Determinants 2571

2 3
2 0 1 5
63 1 0 47
6 7
4 1 2 1 15
7 3 0 1

which means  
2 0 −1
A24 = −1 2 1 .
7 −3 0
Similarly, we can find the second submatrix A31 by removing A’s 3rd row
and 1st column. Crossing out this row and column looks like
 
2 0 −1 5
 3 1 0 −4
 
−1 2 1 1
7 −3 0 1

which means  
0 −1 5
A31 = 1 0 −4 .
−3 0 1

Now that we’ve established our notation, we can state the following
formulas which serve as the n × n to (n − 1) × (n − 1) reduction step at
the heart of our algorithm for computing an n × n determinant. The first
formula is usually called expansion along a row, because it contains one term
for each entry along some particular row of A.

Theorem 2. Let A be an n × n matrix and fix i with 1 ≤ i ≤ n. Then

det(A) = (−1)i+1 ai1 det(Ai1 )+(−1)i+2 ai2 det(Ai2 )+· · ·+(−1)i+n ain det(Ain ).

Note that there are n terms, each of which contains three parts: a sign
(plus or minus), an entry of A, and the determinant of an (n − 1) × (n − 1)
submatrix of A.

Example 6.  Implement this formula


 for the determinant along the second
2 0 −1 5
3 1 0 −4
row of A = 
−1 2
.
1 1
7 −3 0 1
Since we’re expanding along the second row of A, we have i = 2. Plugging
258 Diagonalization

this into our formula gives us

det(A) = (−1)2+1 a21 det(A21 ) + (−1)2+2 a22 det(A22 )


+ (−1)2+3 a23 det(A23 ) + (−1)2+4 a24 det(A24 ).

The a2j are the entries (in order) along the second row of A, so we have
a21 = 3, a22 = 1, a23 = 0, and a24 = −4. The A2j are the 3 × 3 submatrices
of A where we’ve deleted the second row and jth column, so we have
   
0 −1 5 2 −1 5
A21 =  2 1 1 , A22 = −1 1 1 ,
−3 0 1 7 0 1
   
2 0 5 2 0 −1
A23 = −1 2 1 , A24 = −1 2 1 .
7 −3 1 7 −3 0
Plugging these back into our formula gives us
   
0 −1 5 2 −1 5
det(A) = (−1)3 3 det  2 1 1 + (−1)4 1 det −1 1 1
−3 0 1 7 0 1
   
2 0 5 2 0 −1
+ (−1)5 0 det −1 2 1 + (−1)6 (−4) det −1 2 1  .
7 −3 1 7 −3 0

Simplifying the powers of −1 gives us


   
0 −1 5 2 −1 5
det(A) = −3 det  2 1 1 + 1 det −1 1 1
−3 0 1 7 0 1
   
2 0 5 2 0 −1
− 0 det −1 2 1 + (−4) det −1 2 1  .
7 −3 1 7 −3 0

Alternately, we can use the formula below which is often called expansion
down a column, because it contains one term for each entry down a particular
column of A.

Theorem 3. Let A be an n × n matrix and fix j with 1 ≤ j ≤ n. Then

det(A) = (−1)1+j a1j det(A1j )+(−1)2+j a2j det(A2j )+· · ·+(−1)n+j anj det(Anj ).

As with expansion along a row, we have n terms each with the same three
parts: a sign, a matrix entry, and the determinant of a submatrix.
4.2 Determinants 259

Example 7. Implement
 this formula
 for the determinant down the first
2 0 −1 5
3 1 0 −4
column of A = 
−1 2
.
1 1
7 −3 0 1
Since we’re expanding down the first column, we have j = 1. Plugging this
into our formula above gives us

det(A) = (−1)1+1 a11 det(A11 ) + (−1)2+1 a21 det(A21 )


+ (−1)3+1 a31 det(A31 ) + (−1)4+1 a41 det(A41 ).

The ai1 are the entries (in order) down the first column, so we have a11 = 2,
a21 = 3, a31 = −1, and a41 = 7. The Ai1 are the 3 × 3 submatrices of A we
get by deleting the ith row and first column, so we have
   
1 0 −4 0 −1 5
A11 =  2 1 1  , A21 =  2 1 1 ,
−3 0 1 −3 0 1
   
0 −1 5 0 −1 5
A31 = 1 0 −4 , A41 = 1 0 −4 .
−3 0 1 2 1 1
Plugging these into our formula gives us
   
1 0 −4 0 −1 5
det(A) = (−1)2 2 det  2 1 1  + (−1)3 3 det  2 1 1
−3 0 1 −3 0 1
   
0 −1 5 0 −1 5
+ (−1)4 (−1) det  1 0 −4 + (−1)5 7 det 1 0 −4 .
−3 0 1 2 1 1

The signs on our four terms simplify to


   
1 0 −4 0 −1 5
det(A) = 2 det  2 1 1  − 3 det  2 1 1
−3 0 1 −3 0 1
   
0 −1 5 0 −1 5
+ (−1) det  1 0 −4 − 7 det 1 0 −4 .
−3 0 1 2 1 1

Note that together these formulas give us 2n different choices on how to


tackle det(A), because A has n rows and n columns. Miraculously, it doesn’t
matter which row or column we pick, so let’s make the best, i.e., easiest, choice
260 Diagonalization

possible. Usually this means picking the row or column with the most zero
entries, because if aij = 0 its whole term is multiplied by zero and can be
ignored.

Example 8. Which row(s) or column(s) make computing the determinant


2 0 −1 5
3 1 0 −4
of A = 
−1
 easiest?
2 1 1
7 −3 0 1
This question can be rephrased as “Which row(s) or column(s) of A have
the most 0 entries?” Stated in this format, the answer is clearly the second
column of A, which contains two 0s. If for some reason we didn’t want to
expand down the second column, the first, second, and third rows and second
column would be our next best choices since they each contain one 0 entry.

While we can continue to use these two formulas directly as in Examples


6 and 7, there is a commonly used way to find the sign and submatrix for a
given term based on the position of that term’s matrix entry. In every term
of our formula for det(A), the subscripts on our matrix entry aij and our
(n − 1) × (n − 1) submatrix Aij match. This means that if we’re looking at the
term with aij , we’re removing the ith row and jth column of A to get Aij . In
other words, we can find the submatrix which goes with aij by removing the
row and column of A where aij appears as shown below.
 
a11 a12 · · · a1j · · · a1n
 a21 a22 · · · a2j · · · a2n 
 
 .. .. .. .. 
 . . . . 
 
 ai1 ai2 · · · aij · · · ain 
 
 . .. .. .. 
 .. . . . 
am1 am2 · · · anj · · · amn

We can also link aij ’s position to the sign associated with its term in our
sum. The term containing aij has sign (−1)i+j which is positive if i + j is
even and negative if i + j is odd. The top left corner of any matrix is a11 .
Since 1 + 1 = 2 is even, the sign piece of a11 ’s term is positive. If we travel
along a single row, we see that the signs of the terms alternate, because i + j
is followed by i + (j + 1) = (i + j) + 1, which will have the opposite even/odd
parity. Similarly, as we travel down a single column, the signs alternate, since
i + j is followed by (i + 1) + j = (i + j) + 1 which again has the opposite
even/odd parity. We can use this to fill in a matrix with plus and minus signs
4.2 Determinants 261

corresponding to where (−1)i+j is positive and negative respectively as follows


 
+ − + − ···
− + − + · · ·
 
+ − + 
 .
 .. 
− + . 
 
.. ..
. .

We can read off the sign of aij ’s term of our determinant sum by checking
which sign is in the ijth spot in the matrix above.

Example 9. What are thesigns of the terms if we compute the determinant


2 0 −1 5
3 1 0 −4

of A =   along the fourth row?
−1 2 1 1
7 −3 0 1
In this case, our matrix of pluses and minuses above is the 4 × 4 matrix
 
+ − + −
− + − +
 
+ − + − .
− + − +

Therefore if we compute the determinant of A by expanding along the fourth


row, our signs will be − + − +.

If you like these ideas for finding (−1)i+j and Aij from the position of aij ,
feel free to use them. If not, feel free to use the original formulas for expansion
along a row or down a column.
Now that we understand how to rewrite the determinant of an n × n
matrix A in terms of determinants of (n − 1) × (n − 1) matrices, we can give
the procedure for computing an n × n determinant. Pick a row or column of
A, and use one of the formulas above to rewrite det(A) in terms of smaller
determinants. Pick a row or column of each smaller matrix and repeat this
process. Eventually the smaller matrices will be 2 × 2, where we can compute
their determinants using our ad − bc formula. This process is fairly tedious
for large matrices, but is easily implemented on a computer. We’ll usually
practice with n ≤ 4.
 
3 0 −5
Example 10. Compute the determinant of A = 1 1 2  using
4 −2 −1
expansion along a row.
262 Diagonalization

Since we are allowed to pick whichever row we like, we should pick the
1st row since it is the only one which has a zero entry. This means we’ll have
i = 1, so

det(A) = (−1)1+1 a11 det(A11 ) + (−1)1+2 a12 det(A12 ) + (−1)1+3 a13 det(A13 ).

Plugging in our matrix entries a1j along the first row and our submatrices A1j
with the first row and jth column deleted gives us
   
1 2 1 2
det(A) = (−1)2 3 det + (−1)3 0 det
−2 −1 4 −1
 
1 1
+ (−1)4 (−5) det .
4 −2

Our middle term is 0, so we can remove it and simplify our signs to get
   
1 2 1 1
det(A) = 3 det − 5 det .
−2 −1 4 −2

Our smaller submatrices are now 2 × 2, so we can use our ad − bc formula


to compute each of their determinants. This gives us
 
1 2
det = 1(−1) − 2(−2) = −1 + 4 = 3
−2 −1

and  
1 1
det = 1(−2) − 1(4) = −2 − 4 = −6.
4 −2
Plugging these 2 × 2 determinants back into our formula for det(A) gives us

det(A) = 3(3) − 5(−6) = 39.

To check that it doesn’t matter which method we use to compute the


determinant, let’s recompute det(A) using expansion down a column instead
of along a row.
 
3 0 −5
Example 11. Compute the determinant of A = 1 1 2  using
4 −2 −1
expansion down a column.

Again, we can pick whichever column we like, so let’s pick the second
column since it has a zero entry. Plugging j = 2 into our formula for the
determinant gives us

det(A) = (−1)1+2 a12 det(A12 ) + (−1)2+2 a22 det(A22 ) + (−1)3+2 a32 det(A32 ).
4.2 Determinants 263

Next we can plug in our matrix entries ai2 down the second column and our
submatrices Ai2 with the ith row and second column removed to get
   
1 2 3 −5
det(A) = (−1)3 0 det + (−1)4 1 det
4 −1 4 −1
 
3 −5
+ (−1)5 (−2) det .
1 2

Our first term is 0, so this simplifies to


   
3 −5 3 −5
det(A) = 1 det + 2 det .
4 −1 1 2

As in our previous example, we can now use our 2 × 2 determinant formula


to get  
3 −5
det = 3(−1) − (−5)(4) = −3 + 20 = 17
4 −1
and  
3 −5
det = 3(2) − (−5)(1) = 6 + 5 = 11.
1 2
Plugging these back into our formula for det(A) gives us

det(A) = 1(17) + 2(11) = 39.

(Notice that this is the same answer we got by expanding along the 1st row!)

There are two classes of matrices where computing the determinant is easy
even when they are very large: lower triangular matrices and upper triangular
matrices. (These were discussed in 2.9 as tools for solving A~x = ~b with an
extremely large A.) If A is lower triangular, then all its entries above the
diagonal are zeros, i.e., aij = 0 if i < j. In particular, this means that the
first row of A looks like a11 0 . . . 0. If we start computing det(A) by expanding
along this top row, we’ll get

det(A) = +a11 det(A11 ).

The matrix A11 is also lower triangular as shown by the picture below.
 
a11 0 0 ··· 0
a21 a22 0 · · · 0 
 
 .. . .. .. 
 . . 
 
 . .. 
 .. . 0 
an1 an2 · · · ann
264 Diagonalization

The first row of A11 is a22 0 . . . 0, so expanding along this row gives

det(A11 ) = +a22 det(B)

where B is A11 with its first row and first column removed. Plugging this back
into our formula for det(A) gives us

det(A) = a11 a22 det(B)

where B is A with its top 2 rows and leftmost 2 columns removed. We can
keep repeating this process until we get down to the 2 × 2 case where our
submatrix is  
a(n−1)(n−1) 0
an(n−1) ann
which has determinant a(n−1)(n−1) ann . This gives us the following fact.

Theorem 4. An n × n lower triangular matrix has det(A) = a11 a22 · · · ann .

 
−3 0 0 0
7 2 0 0
Example 12. Find the determinant of A = 
0
.
1 8 0
4 −3 5 −1
Since A is lower triangular, its determinant is the product of its diagonal
entries. Thus
det(A) = −3(2)(8)(−1) = 48.

On the other hand, if A is upper triangular, then all its entries below the
diagonal are zeros, i.e., aij = 0 if i > j. This means that the nth row of A has
the form 0 . . . 0 ann . If we start computing det(A) by expanding along this
bottom row, we’ll get

det(A) = +ann det(Ann ).

The matrix Ann is also upper triangular as shown by the picture below.
 
a11 · · · · · · a1(n−1) a1n
 .. .. 
 0 . . 
 
 . . .. 
 .. .. . 
 
 0 ··· 0 a(n−1)(n−1) a(n−1)n 
0 ··· 0 0 ann

The last row of Ann is 0 . . . 0 a(n−1)(n−1) , so expanding along this row gives

det(Ann ) = +a(n−1)(n−1) det(B)


4.2 Determinants 265

where B is Ann with its last row and last column removed. Plugging this back
into our formula for det(A) gives us
det(A) = ann a(n−1)(n−1) det(B)
where B is A with its bottom 2 rows and rightmost 2 columns removed. We
can keep repeating this process until we get down to the 2 × 2 case where our
submatrix is  
a11 a12
0 a22
which has determinant a11 a22 . This gives us the following fact.

Theorem 5. An n × n upper triangular matrix has det(A) = a11 a22 · · · ann .

 
7 0 −1 9
0 3 4 0
Example 13. Compute the determinant of A = 
0
.
0 1 −1
0 0 0 4
Since A is upper triangular, its determinant is the product of its diagonal
entries. This means we have

det(A) = 7(3)(1)(4) = 84.

Theorem 5 is intriguing, because the first half of our row reduction


algorithm transforms a general matrix A into an upper triangular form U .
This means we can link det(A), which is harder to compute, with det(U )
which is just the product of U ’s diagonal entries. In order to link these
determinants, we’ll need to understand what each of our row operations do to
the determinant.
Suppose we multiply the ith row of our matrix A by some scalar s, to create
a new matrix B. (This is how we create leading 1s during row reduction.) We
can choose how to compute the determinant of B, so let’s expand along the
ith row. This gives us
det(B) = (−1)i+1 bi1 det(Bi1 ) + · · · + (−1)i+n bin det(Bin ).
However, all rows of A and B are exactly the same except for the ith row, so
Bij = Aij since both have the ith row removed. The ith row of B is s times
the ith row of A, which means bij = saij . Plugging this into our formula for
B’s determinant gives us
det(B) = (−1)i+1 sai1 det(Ai1 ) + · · · + (−1)i+n sain det(Ain ).
Factoring an s out of every term gives us
det(B) = s((−1)i+1 ai1 det(Ai1 ) + · · · + (−1)i+n ain det(Ain ))
= s det(A).
266 Diagonalization

Next let’s consider the effect of swapping two rows of A. (This is how we
put a nonzero entry into the top left corner during row reduction.) Suppose
we swap the ith and i + 1st rows of A to create B. If we expand along the ith
row of B we get

det(B) = (−1)i+1 bi1 det(Bi1 ) + · · · + (−1)i+n bin det(Bin ).

However, the ith row of B is the i + 1st row of A, so bij = a(i+1)j and
Bij = A(i+1)j . Plugging this back into our computation of B’s determinant,
we get

det(B) = (−1)i+1 a(i+1)1 det(A(i+1)1 ) + · · · + (−1)i+n a(i+1)n det(A(i+1)n ).

This is almost the expansion of det(A) along the i + 1st row, but the signs
have changed. Instead of (−1)(i+1)+j we have (−1)i+j . Changing the power on
−1 by 1 means we’ve switched the sign of each term, so det(B) = − det(A).
Since we don’t always swap two adjacent rows of A, we’ll also need to
consider the more general case of swapping the ith and jth rows. For the ease
of explanation, I’ll suppose i < j and rewrite j as i + k for some positive k.
I’ll label the rows of A as r1 through rn . After swapping the ith and i + kth
rows of A, the rows are in the following order from top to bottom (with the
swapped rows in bold):

r1 , . . . , ri−1 , ri+k , ri+1 , . . . , ri+(k−1) , ri , ri+(k+1) , . . . rn .

Since swapping two consecutive rows multiplies the determinant by −1, we’ll
perform consecutive swaps on the rows of A until we’ve gotten them into the
order listed above. If the total number of consecutive swaps needed is odd,
the net effect will be to multiply the determinant by −1. If it is even, the
determinant will be multiplied by 1 and so remain unchanged.
We start with the rows in their original order r1 , . . . , rn . First we’ll perform
consecutive swaps of ri with the rows below it until it is directly below ri+k .
Again I’ll put the two rows being swapped in bold. The first consecutive swap
switches ri and ri+1 to give us

r1 , . . . , ri−1 , ri+1 , ri , ri+2 . . . , rn .

The second switches ri and ri+2 to give us

r1 , . . . , ri−1 , ri+1 , ri+2 , ri , ri+3 , . . . , rn .

Continuing in the same fashion, the kth consecutive swap switches ri and ri+k
to give us
r1 , . . . , ri−1 , ri+1 , . . . , ri+k , ri , ri+(k+1) , . . . , rn .
Now we’ll perform consecutive swaps of ri+k with the rows above it until it
is between ri−1 and ri+1 where ri was in our original matrix. Again, the two
4.2 Determinants 267

rows being switched are in bold. The first of these consecutive swaps switches
ri+k with ri+(k−1) to give us

r1 , . . . , ri−1 , ri+1 , . . . , ri+(k−2) , ri+k , ri+(k−1) , ri , ri+(k+1) , . . . , rn .

The second of these consecutive swaps switches ri+k with ri+(k−2) to give us

r1 , . . . , ri−1 , ri+1 , . . . , ri+(k−3) , ri+k , ri+(k−2) , ri+(k−1) , ri , ri+(k+1) , . . . , rn .

The k − 1st of these switches ri+k with ri+1 to give us

r1 , . . . , ri−1 , ri+k , ri+1 , . . . , ri+(k−1) , ri , ri+(k+1) , . . . , rn .

This is the same as simply swapping the ith and i + kth rows directly, and
we got there via k + (k − 1) consecutive swaps. Since k + k − 1 = 2k − 1 is
always odd, this means that in swapping any two rows of a matrix multiplies
the determinant by −1.
Our final row operation is adding a multiple of one row to another row. (We
use this to create the needed zeros below each leading 1 during row reduction.)
To make this precise, suppose that we add s times the kth row of A to the
ith row of A to get a new matrix B. If we compute B’s determinant along the
ith row, we get

det(B) = (−1)i+1 bi1 det(Bi1 ) + · · · + (−1)i+n bin det(Bin ).

In our new matrix B we have bij = aij + sakj and Bij = Aij , so our
determinant is really

det(B) = (−1)i+1 (ai1 + sak1 ) det(Ai1 ) + · · · + (−1)i+n (ain + sakn ) det(Ain ).

We can split this sum up as


 
det(B) = (−1)i+1 ai1 det(Ai1 ) + · · · + (−1)i+n ain det(Ain )
 
+ (−1)i+1 sak1 det(Ai1 ) + · · · + (−1)i+n sakn det(Ain ) .

The sum inside the first set of brackets is just det(A), and we can factor an s
out of the second set of brackets to get
 
det(B) = det(A) + s (−1)i+1 ak1 det(Ai1 ) + · · · + (−1)i+n akn det(Ain ) .

Now the sum inside the remaining set of brackets is the determinant of the
matrix C we’d get by replacing the ith row of A by its kth row. This means
C has identical ith and kth rows, so swapping these two rows would leave
the determinant unchanged. However, we just saw that swapping two rows
changes the sign of the determinant. This means det(C) = − det(C), so we
must have det(C) = 0. Plugging this back into our equation for det(B) gives
us
det(B) = det(A) + s det(C) = det(A).
268 Diagonalization

Therefore adding a multiple of one row to another row doesn’t change the
determinant.
We summarize these results in the following theorem.

Theorem 6. Let A be an n × n matrix. Multiplying a row of A by the scalar


s multiplies det(A) by s, swapping two rows of A multiplies det(A) by −1,
and adding a multiple of one row to another doesn’t change det(A).

Now that we understand how row operations change the determinant and
how to compute the determinant of an upper triangular matrix, we have an
alternative to our expansion method for finding det(A). The plan here is to
use row operations to transform A into an upper triangular matrix U . As we
go we’ll keep track of each time we swapped rows or multiplied a row by a
constant. (We don’t need to keep track of adding a multiple of one row to
another since that doesn’t have any effect on the determinant.) Then we can
compute det(U ) and then undo each of the changes our row operations made
to the determinant to get det(A). This is illustrated in the example below.
 
3 0 −5
Example 14. Compute the determinant of A = 1 1 2  by using row
4 −2 −1
operations to link A to an upper triangular matrix.

Our first job here is to use row operations to transform A into an upper
triangular matrix. I’ll use our usual notation for row operations since we need
to keep track of when we’ve swapped two rows or scaled a row by a constant.
     
3 0 −5 1 1 2 1 1 2
1 1 2  →r1 ↔r2 3 0 −5 →r2 −3r1 0 −3 −11
4 −2 −1 4 −2 −1 4 −2 −1
   
1 1 2 1 1 2
→r3 −4r1 0 −3 −11 →r2 ↔r3 0 −6 −9 
0 −6 −9 0 −3 −11
   
1 1 2 1 1 2
3 3
→− 16 r2 0 1 2
 →r3 +3r2 0 1 2

13
0 −3 −11 0 0 −2

This last matrix is upper triangular, so let’s call it U . We can compute


det(U ) = − 13
2 by taking the product of U ’s diagonal entries. However, we also
have a link between det(U ) and det(A). To figure out what that relationship
is, we need to look back at the row operations we used to get from A to U . We
can ignore all the places we added a multiple of one row to another, since they
don’t affect the determinant. We swapped two rows twice during the process
and scaled a row by − 16 once. This means to get det(U ) we multiplied A’s
4.2 Determinants 269

determinant by −1 twice (once for each row swap) and by − 16 once (when we
scaled a row by − 16 ). Thus
 
1
det(U ) = (−1)(−1) − det(A)
6
or  
1
det(U ) = − det(A).
6
Plugging in det(U ) gives us
 
13 1
− = − det(A).
2 6

Solving for det(A) now gives


 
13
det(A) = (−6) − = 39.
2

(Notice that this matrix is the same one we used for Examples 6 and 7 where
we also got det(A) = 39.)

We now have 2n + 1 choices when computing the determinant of an n × n


matrix: the 2n options for expansion along a row or down a column method
plus this new way via row reduction. Sometimes we’ll want to use one of
them, sometimes another. If A has a row or column which is mostly zeros,
I’d personally prefer to use expansion. If A is close to triangular already,
I’ll use the method outlined in the example above. You may choose to use
another method from the one I or your friends would use, but the beauty of
this mathematical choice is that we’ll all end up with the same value for the
determinant.
Now that we understand how to compute the determinant of any n × n
matrix, let’s go back to our inspiration for the 2 × 2 determinant: that a 2 × 2
matrix is invertible if and only if its determinant is nonzero. Is this also true
in the n × n case? We know that a square matrix is invertible if and only if
its reduced echelon form is the identity matrix, so we’d like to link that to
the determinant. If we think for a minute about how row operations change
the determinant, none of them change it from zero to nonzero or vice versa
unless we multiply a row of our matrix by the constant 0. Since we aren’t
allowed to multiply a row by 0 as we move from A to its reduced echelon
form, A and its reduced echelon form either both have a nonzero determinant
or both have determinant equal to zero. The reduced echelon form of A is
always an upper triangular matrix, so its determinant is simply the product
of its diagonal entries. If A is invertible, then its reduced echelon form is In .
Since det(In ) = 1, we must have det(A) 6= 0. If A isn’t invertible, then its
reduced echelon form must have at least one zero on the diagonal. This would
270 Diagonalization

mean that the determinant of the reduced echelon form of A, and hence also
that of A itself, would be 0. Therefore we get the following addition to the
Invertible Matrix Theorem.

Theorem 7. An n × n matrix A is invertible if and only if det(A) 6= 0.

We will rely on this fact when we return to finding eigenvectors and


eigenvalues in the next section.
In addition to our link between invertibility and determinants, we can
use our understanding of row operations’ effects on the determinant to show
that determinants interact nicely with matrix multiplication. To do this, recall
the elementary matrices we discussed in 2.9 which link our row operations to
matrix multiplication in the following way. Suppose E is the elementary matrix
of some row operation and A is any n × n matrix. Then EA is the matrix
we get by doing that row operation to A. In other words, multiplication on
the left by the elementary matrix of a row operation does that row operation
to the matrix we multiplied. Let’s explore the determinant of each type of
elementary matrix.
If E is the elementary matrix of a row operation which adds a multiple
of one row to another row, then its row operation has no effect on the
determinant, so det(EA) = det(A). Since we got E by doing this row operation
to In , we must have det(E) = det(In ) = 1. This means that for any n × n
matrix A we have

det(EA) = det(A) = det(E) det(A).

If E is the elementary matrix of a row operation which swaps two


rows, then its row operation changes the sign the determinant, which means
det(EA) = − det(A). Since we got E by doing this row operation to In , we
must have det(E) = − det(In ) = −1. This means that for any n × n matrix
A we have
det(EA) = − det(A) = det(E) det(A).
If E is the elementary matrix of a row operation which multiplies a row
by a constant s, then its row operation multiplies the determinant by s, so
det(EA) = s det(A). Since we got E by doing this row operation to In , we
must have det(E) = s det(In ) = s. This means that for any n × n matrix A
we have
det(EA) = s det(A) = det(E) det(A).
Looking at our three types of row operations, we can see that in all three
cases we get
det(EA) = det(E) det(A).
We would like to extend this idea that the determinant splits up over matrix
products to any two n × n matrices A and B.
4.2 Determinants 271

If A or B isn’t invertible, we know from 2.11’s Exercises 10 and 11 that AB


also isn’t invertible. This means that det(AB) = 0. Since either det(A) = 0
or det(B) = 0, we have det(AB) = det(A) det(B).
If both A and B are invertible, they both have reduced echelon form In , i.e.,
they each have a series of row operations which takes them to In . Since each
row operation is reversible by another row operation, we can also go from In to
A or B via row operations. This means we have elementary matrices E1 , . . . Ek
and F1 , . . . F` so that A = E1 · · · Ek In and B = F1 · · · F` In . Repeatedly using
the fact that det(EA) = det(E) det(A), we get
det(A) = det(E1 ) · · · det(Ek ) det(In )
and
det(B) = det(F1 ) · · · det(F` ) det(In ).
Since det(In ) = 1, this means
det(A) det(B) = det(E1 ) · · · det(Ek ) det(F1 ) · · · det(F` ).
Their product is
AB = E1 · · · Ek In F1 · · · F` In = E1 · · · Ek F1 · · · F` .
Taking the determinant and applying the fact that det(EA) = det(E) det(A)
repeatedly, we get
det(AB) = det(E1 ) · · · det(Ek ) det(F1 ) · · · det(F` )
which is equal to our computation of det(A) det(B) above.
We can summarize this as follows:

Theorem 8. For any n × n matrices A and B, det(AB) = det(A) det(B).

If A is invertible, this theorem gives us a nice fact about the determinant


of A−1 .

1
Theorem 9. Let A be an invertible matrix. Then det(A−1 ) = .
det(A)

To see why this is true, remember AA−1 = In , so we must have


det(AA−1 ) = det(In ) = 1.
But the determinant on the left-hand side of the equation can be split up into
the product of the determinants of A and A−1 to give us
det(A) det(A−1 ) = 1.
Since det(A) 6= 0, we can divide both sides by det(A) to get
1
det(A−1 ) = .
det(A)
272 Diagonalization

Exercises 4.2.
 
−53
1. Compute det .
16
 
2 7
2. Compute det .
−1 −8
 
1 4
3. Compute det .
3 −2
 
9 0
4. Compute det .
−6 7
 
−1 2
5. Use the determinant to describe the geometric effect ’s map
−2 1
has on the plane?
 
4 0
6. Use the determinant to describe the geometric effect ’s map
2 3
has on the plane?
 
1 3
7. Use the determinant to describe the geometric effect ’s map
0 −1
has on the plane?
 
3 5
8. Use the determinant to describe the geometric effect ’s map
6 2
has on the plane?
 
5 0 0
9. Compute the determinant of 6 −2 0.
1 7 4
 
−1 0 0
10. Compute the determinant of  4 3 0.
9 −2 6
 
2 0 7
11. Compute the determinant of 0 8 −3.
0 0 1
 
−4 1 −6
12. Compute the determinant of  0 −3 −5.
0 0 −2
 
2 −1 0
13. Find A23 where A = 4 −5 2 .

1 0 −2
 
1 0 8
14. Find A12 where A = 6 −3 1.
3 −5 0
4.2 Determinants 273
 
0 −5 2
15. Find A31 where A = 8 3 1 .
7 −4 −1
 
−3 1 4
16. Find A22 where A =  2 0 −1.
9 3 6
17. What effect does swapping rows 1 and 2 have on the determinant?
18. What effect does adding 3 times row 2 to row 1 have on the
determinant?
19. What effect does multiplying row 1 by −4 have on the determinant?
20. What effect does adding −2 times row 3 to row 1 have on the
determinant?
 
1 3 4
21. Compute the determinant of A =  2 0 −2 three times: once
−1 1 −1
by expanding along a row, once by expanding down a column, and
once by using row operations to reduce it to an upper triangular
matrix.
 
1 0 6
22. Compute the determinant of A = 2 2 −9 three times: once
1 1 −3
by expanding along a row, once by expanding down a column, and
once by using row operations to reduce it to an upper triangular
matrix.
 
4 8 10
23. Compute the determinant of A =  1 −1 3  three times: once
−4 0 2
by expanding along a row, once by expanding down a column, and
once by using row operations to reduce it to an upper triangular
matrix.
 
2 3 −1
24. Compute the determinant of A =  0 5 3  three times: once
−4 −6 2
by expanding along a row, once by expanding down a column, and
once by using row operations to reduce it to an upper triangular
matrix.
 
−5 2 7
25. Use the determinant of A =  3 0 −2 to decide whether or
4 0 1
not A is invertible.
274 Diagonalization
 
−7 3 0 1
4 0 0 0
26. Use the determinant of A = 5
 to decide whether
2 1 6
−8 −2 1 3
or not A is invertible.
 
2 0 0 4
2 0 3 0
27. Use the determinant of A = 
−7
 to decide whether
3 10 −5
−1 0 6 0
or not A is invertible.
 
−1 3 0 2
0 1 0 −4
28. Use the determinant of A = 
7
 to decide whether
−4 2 9
−2 0 0 1
or not A is invertible.
29. Let A and B be n × n matrices with det(A) = 3 and det(B) = −6.
(a) Compute det(AB).
(b) Compute det(A−1 ).
30. Let A and B be 2 × 2 matrices with det(A) = 10 and det(B) = −1.
(a) Compute det(AB).
(b) Compute det(B −1 ).
31. Let A and B be 4 × 4 matrices with det(A) = 12 and det(B) = −2.
(a) Compute det(AB).
(b) Compute det(A−1 ).
32. Let A and B be 3 × 3 matrices with det(A) = −2 and det(B) = 4.
(a) Find det(A−1 ).
(b) Find det(AB).
33. Let A be a 3 × 3 matrix with det(A) = 6. What is det(2 · A)?
34. Let A be an n × n matrix with det(A) = 6. What is det(2 · A)?
35. If A is a 1 × 1 matrix, we can define its determinant to be its
only
 entry
 a11 . Use this definition to show that the determinant of
a b
is ad − bc if we compute it using our formulas for expansion
c d
along a row and down a column.
36. Let V = {A ∈ Mnn | A is diagonal and det(A) 6= 0} with operations
A“ + ”B = AB and r“ · ”A = Ar . Show V is a vector space.
4.3 Eigenspaces 275

4.3 Eigenspaces
After our interlude developing determinants in 4.2, let’s get back to finding
the eigenvalues and eigenvectors of an n × n matrix A. Recall from the end
of 4.1 that we’d reduced the problem of finding the eigenvalues, λ, and the
eigenvectors, ~x, of a matrix A to solving the equation (A − λIn )~x = ~0. Our
issue back then was that our equation had two unknowns and we didn’t know
how to solve for both of them at once. In this section we’ll use the determinant
to solve for λ first, and then use our value of λ to solve for ~x later.
To isolate solving for λ and solving for ~x, remember that the Invertible
Matrix Theorem from 2.11 tells us that we have a nonzero solution to A~x = ~0
exactly when A isn’t invertible. Since eigenvectors are defined to be nonzero,
we can apply this to (A − λIn )~x = ~0 and get that we’ll have eigenvectors for
A precisely when A − λIn isn’t invertible. Usually we check invertibility of
matrices by row reducing to see if their reduced echelon form is In , however
the presence of the variable λ makes this unappealing. Instead we’ll rely on
our newest addition to the Invertible Matrix Theorem: the fact that a matrix
is invertible exactly when its determinant is nonzero. In other words, whenever
det(A − λIn ) = 0 we’ll know A − λIn isn’t invertible so we’ll get eigenvectors
for A and λ will be an eigenvector of A.
 
1 2
Example 1. Find all the eigenvalues of A = .
4 3
(Note that this is the matrix from 4.1’s Example 8.)
We want to find all values of λ for which det(A − λIn ) = 0. Our matrix is
2 × 2, so n = 2. We’ll start by computing
         
1 2 1 0 1 2 λ 0 1−λ 2
A − λI2 = −λ = − = .
4 3 0 1 4 3 0 λ 4 3−λ

Taking the determinant gives us

det(A − λI2 ) = (1 − λ)(3 − λ) − (2)(4) = 3 − 4λ + λ2 − 8 = λ2 − 4λ − 5.

This means our eigenvalues are the solutions to

λ2 − 4λ − 5 = 0.

We can factor our polynomial in λ to get

(λ − 5)(λ + 1) = 0

which has solutions λ = 5 and λ = −1. Therefore A has two eigenvalues: 5


and −1.
276 Diagonalization
 
1 0 7
Example 2. Find all the eigenvalues of A = −2 2 14 .
1 0 −5
As in the previous example, we start by computing det(A − λIn ). Here
n = 3, so we get
     
1 0 7 1 0 0 1−λ 0 7
A − λI3 = −2 2 14  − λ 0 1 0 =  −2 2−λ 14  .
1 0 −5 0 0 1 1 0 −5 − λ

The best choice for computing the determinant is to expand down the 2nd
column since it has two zero entries. This gives us
 
1−λ 0 7  
1−λ 7
det  −2 2−λ 14  = (−1)2+2 (2 − λ) det
1 −5 − λ
1 0 −5 − λ
= (2 − λ)((1 − λ)(−5 − λ) − 1(7)).

Since we want to set this equal to zero and solve for λ, I’ll leave the factor of
(2 − λ) in the front and try to expand and factor the rest of this polynomial.
This gives us

(2 − λ)(λ2 − 4λ − 5 − 7) = (2 − λ)(λ2 + 4λ − 12) = (2 − λ)(λ − 2)(λ + 6).

Now we can find the eigenvalues of A by setting det(A − λI3 ) = 0. Solving

(2 − λ)(λ − 2)(λ + 6) = 0

gives us λ = 2, λ = 2, and λ = −6, so 2 and −6 are the eigenvalues of A.

Example 3. Find the population


 growth rate of a population which has
0 0.25 0.25
demographic matrix A = 0.5 0.25 0.5 .
0.5 0.5 0.25
Recall from 4.1 that the population growth rate is the largest eigenvalue
of a demographic matrix, so we need to find the eigenvalues of A and select
the biggest one. A is 3 × 3, so we start by computing
 
0−λ 0.25 0.25
A − λI3 =  0.5 0.25 − λ 0.5  .
0.5 0.5 0.25 − λ
4.3 Eigenspaces 277

Computing the determinant by expanding down the first column gives us


   
0.25 − λ 0.5 0.25 0.25
det(A − λI3 ) = −λ det − 0.5 det
0.5 0.25 − λ 0.5 0.25 − λ
 
0.25 0.25
+ 0.5 det .
0.25 − λ 0.5

Expanding each of our 2 × 2 determinants gives us

det(A − λI3 ) = −λ((0.25 − λ)2 −(0.5)2 ) − 0.5((0.25)(0.25 − λ) − (0.25)(0.5))


+ 0.5((0.25)(0.5) − (0.25)(0.25 − λ)).

This simplifies to

−λ3 + 0.5λ2 + 0.4375λ + 0.0625 = −(λ − 1)(λ + 0.25)2

so our eigenvalues are the solutions to

−(λ − 1)(λ + 0.25)2 = 0,

namely λ = 1 and λ = −0.25.


The largest of these eigenvalues is λ = 1, which means the population
growth rate is 1.

The examples above show us if A is an n × n matrix, det(A − λIn ) is


always a polynomial in λ of degree n. This tells us a few things. First of all,
an n × n matrix can have at most n different eigenvalues, although it can
have fewer than n as in our Example 2. Second, it is possible to have a matrix
with no eigenvalues among the real numbers, because not every polynomial
has real roots. (Think of x2 + 4.) We have chosen to work with vector spaces
where our scalars are real numbers, but many people choose to work with
vector spaces whose scalars are complex numbers so that they can be assured
of always having eigenvalues. If you’re interested in learning more about this,
see Appendix A.1. For the rest of this book, I’ll make sure to choose matrices
for examples and exercises whose eigenvalues are real numbers.
In chemistry, eigenvalues are used as part of the Hückel method to compute
molecular orbitals for planar organic molecules like hydrocarbons. These
in turn allow chemists to calculate things like charge densities, molecular
reactivity, and even the color of a molecule! To do this, chemists compute
the eigenvalues of a matrix H with one row and column for each carbon atom
where Hii = α and Hij = β if atoms i and j share an atomic bond and
Hij = 0 if the ith and jth atoms are not bonded. (Here α and β are different
relative energy levels of electrons.) To find eigenvalues of H we need to solve
det(A) = 0 where A = H − λIn has diagonal entries α − λ and the same off-
diagonal entries as H. If we divide all entries by β, then our diagonal entries
278 Diagonalization
α−λ
are , which we will call x, and our off-diagonal entries are 1 for bonded
β
atoms and 0 otherwise. (This is similar to the adjacency vectors discussed in
Example 7 from Chapter 0.) Solving det(A) = 0 for x and using the values of
x, α, and β then allows us to find λ.

Example 4. Find the matrix A and solve det(A) = 0 for x for the molecule
1,3-butadiene.

Since our matrix A depends on the number and bonds between the carbon
atoms in a molecule of 1,3-butadiene, let’s start by looking at a picture of its
molecular structure. (Here Hs are hydrogen atoms and Cs are carbon.)

H H

C1 C3 H

H C2 C4

H H

There are 4 carbon atoms which I’ve helpfully labeled C1 through C4 , so


our matrix A will be 4 × 4. Remember that A’s diagonal entries are all x, and
Aij is 1 if there is a bond between Ci and Cj and 0 otherwise. Looking at the
picture above, we can see that there are bonds between
 C1 and
 C2 , C2 and
x 1 0 0
1 x 1 0
C3 , and C3 and C4 . Therefore our matrix is A =  0 1 x 1 .

0 0 1 x
We want to solve det(A) = 0 for x, so let’s take the determinant of A by
expanding along the first row. This gives us
   
x 1 0 1 1 0
det(A) = x det  1 x 1  − 1 det 0 x 1  .
0 1 x 0 1 x

There are choices of how to expand each of our 3 × 3 subdeterminants, so let’s


expand down the first column of each. This gives us
 
x 1 0    
    x 1 1 0
det 1 x 1 = x det − 1 det
1 x 1 x
0 1 x
= x(x2 − 1) − 1(x − 0)
4.3 Eigenspaces 279

and  
1 1 0  
x 1
det 0 x 1  = 1 det = 1(x2 − 1).
1 x
0 1 x
Plugging these subdeterminants back into our formula for det(A) gives us

det(A) = x(x(x2 − 1) − 1(x − 0)) − 1(1(x2 − 1)) = x4 − 3x2 + 1.

This doesn’t factor nicely over the integers, but we can solve
√ det(A) = 0 using

technological assistance to get the solutions x = 12 (1 + 5), x = 21 (1 − 5),
√ √
x = 12 (−1 + 5), and x = 12 (−1 − 5).

Now that we know how to find the eigenvalues of a matrix A, we can go


back to our original equation (A − λIn )~x = ~0 to solve for the eigenvectors for
each eigenvalue. Recall that a matrix equation has either no solutions, one
solution, or infinitely many solutions. Clearly the equation (A − λIn )~x = ~0
always has at least the solution ~x = ~0, and if we have chosen λ to be an
eigenvalue, then there will also be at least one nonzero eigenvector solution as
well. This means that when λ is an eigenvalue of A the equation (A−λIn )~x = ~0
has infinitely many solutions which are eigenvectors of A with eigenvalue λ.

Definition. Let λ be an eigenvalue of a matrix A. The eigenspace of λ is


Eλ = {~x|A~x = λ~x}.

In other words, the eigenspace of an eigenvalue λ is the set of all


eigenvectors which have λ as their eigenvalue.
 
1 2
Example 5. Find the eigenspace of A = for λ = 5.
4 3
We can find all eigenvectors of A whose eigenvalue is 5 using the equation
(A − λI2 )~x = ~0. Plugging in our A and λ gives us
   
1 2 1 0
−5 ~x = ~0
4 3 0 1

which can be simplified to


 
−4 2
~x = ~0.
4 −2

At this point, we are finding the null space of


 
−4 2
4 −2
280 Diagonalization

just as we did in 2.5 and 2.7. The reduced echelon form of this matrix is
 
1 − 12
0 0

so x1 = 12 x2 and x2 is free. As in 2.8, we can write this solution set as a span.


1   1 
x2 2
Since our solution vector is 2 , our null space is Span . We saw in
x2 1
3.1 that this spanning set is also a basis for the null space.
This means that  1 
2
E5 = Span
1
1
and 2 is a basis for E5 .
1

 
1 0 7
Example 6. Find the eigenspace of A = −2 2 14  for λ = 2.
1 0 −5
Mirroring the previous example, we need to find the null space of
 
−1 0 7
A − 2I3 = −2 0 14  .
1 0 −7

This matrix row reduces to  


1 0 −7
0 0 0
0 0 0
so x1 = 7x3 and x2 and x3 are free. Thus our eigenspace E2 is the set of
7x3
vectors of the form  x2 .
x3
As we did above, we can write this as
   
 0 7 
E2 = Span 1 , 0
 
0 1
   
0 7
or say E2 has basis 1 , 0.
0 1

Biologists call the eigenvector of length 1 with the largest eigenvector (aka
the population growth rate) the “stable stage distribution”. We’ll learn how
4.3 Eigenspaces 281

to compute the length of a vector and find vectors of given lengths within a
span in Chapter 5, but for now we can at least find a basis for the eigenspace
containing the stable state distribution.

Example 7. Find the eigenspace containing the stable stage distribution for
the population discussed in Example 3.

In Example 3 we found that the population growth rate was λ = 1, so we


want to find the eigenspace E1 of our demographic matrix
 
0 0.25 0.25
A = 0.5 0.25 0.5  .
0.5 0.5 0.25

This means finding the null space of


 
−1 0.25 0.25
A − 1I3 = 0.5 −0.75 0.5 
0.5 0.5 −0.75

which row reduces to  


1 0 −0.5
0 1 −1 
0 0 0
so x1 = 0.5x3 , x2 = x3 , and x3 is free. This means E1 is all vectors of the
0.5x3
form  x3 .
x3
Thistellsus our stable stage distribution is (the vector of length 1) in the
0.5
span of  1 .
1

As we’ve seen in the examples above, there is always at least one free
variable when we solve for ~x. If there were no free variables, we’d know the only
solution was ~x = ~0, which would mean our matrix was invertible and λ wasn’t
an eigenvalue. For me, this usually means I’ve messed up my determinant
somehow. If we have more than one free variable, as in Example 6, that just
means our eigenspace has dimension greater than 1.
We can get an upper bound on the dimension of an eigenspace Eλ by
looking at how its eigenvalue solves the polynomial det(A − λIn ). Some roots
of a polynomial occur only once, while some occur multiple times. We saw
this in Example 2, where det(A − λIn ) was (λ − 2)2 (λ + 6). In this case, −6
is a single root, because only one copy of (λ + 6) factors out of det(A − λIn ).
However, two copies of (λ−2) factor out of det(A−λIn ), so 2 is a double root.
In general, the dimension of an eigenspace is bounded above by the number
282 Diagonalization

of times it is a root of det(A − λIn ). Applying this to Example 2, we get that


dim(E−6 ) ≤ 1 while dim(E2 ) ≤ 2.

Exercises 4.3.
1. Suppose det(A − λI4 ) = (4 − λ)(−1 − λ)2 ( 12 − λ). Find all the
eigenvalues of A, and say how many times each eigenvalue is a root
of det(A − λI4 ).
2. Suppose det(A−λI4 ) = (7−λ)3 (−10−λ). Find all the eigenvalues of
A, and say how many times each eigenvalue is a root of det(A−λI4 ).
3. Suppose det(A − λI3 ) = λ3 − λ2 − 6λ. Find all the eigenvalues of A,
and say how many times each eigenvalue is a root of det(A − λI3 ).
4. Suppose det(A−λI3 ) = λ3 +8λ2 +7λ. Find all the eigenvalues of A,
and say how many times each eigenvalue is a root of det(A − λI3 ).
 
1 2 1
5. Find all the eigenvalues of A =  0 −4 0 .
−2 1 −2
 
−1 0 8
6. Find all the eigenvalues of A =  6 2 −5.
0 0 7
 
0 9 −1
7. Find all the eigenvalues of A = 0 3 6 .
0 8 −5
 
1 0 −2
8. Find all the eigenvalues of A = −4 0 8 .
0 2 1
 
1 0 −1
9. Find the eigenspace of A = 7 −2 −6 for eigenvalue λ = −2.
4 0 −4
 
6 2 8
10. Find the eigenspace of A = 4 −1 −9 for eigenvalue λ = 5.
0 0 5
 
3 −2 4
11. Find the eigenspace of A =  0 4 −2 for eigenvalue λ = 3.
−1 0 0
 
0 −2 0
12. Find the eigenspace of A = 1 6 2 for eigenvalue λ = 0.
2 3 4
 
1 2 −11
13. Compute all the eigenvalues and eigenspaces of A = 2 −2 6 .
0 0 −9
4.3 Eigenspaces 283
 
0 −7 0
14. Compute all the eigenvalues and eigenspaces of A = 1 −6 2.
2 −9 4
 
−6 5 −4
15. Compute all the eigenvalues and eigenspaces of A =  0 10 7 .
0 −1 2
 
−2 0 0
16. Compute all the eigenvalues and eigenspaces of A =  7 4 2.
1 5 1
17. As we did in Example 4, set up the matrix A and solve for x for
ethylene (pictured below).
H H

C1 C2

H H
18. As we did in Example 4, set up the matrix A and solve for x for
benzene (pictured below).
H

H C1 H

C6 C2

C5 C3

H C4 H

H
19. Find the growth rate
 of a population whose demographic matrix is
0 1 6
A = 0.7 0.6 0. (As with many realistic examples, the numbers
0.3 0.4 0
here do not come out clean. You will probably want to use some
technology to help, see Appendix 2 for help using Mathematica.)
20. For the demographic matrix in the previous problem, find a basis
for the eigenspace which contains the stable stage distribution.
284 Diagonalization

4.4 Diagonalization
Remember that we started investigating eigenvectors because we were looking
for an easier way to repeatedly multiply a vector by the same matrix. Obvi-
ously this is very easy if our vector is an eigenvector, but in our applications we
won’t always have that luxury. For example, our population vector is unlikely
to magically turn out to be an eigenvector of our demographic matrix for
every population. In this section we’ll explore a way to use eigenvectors to
make things easier for general vectors.
To start, let’s suppose ~x1 and ~x2 are eigenvectors of an n × n matrix A
with eigenvectors λ1 and λ2 respectively. If ~v = ~x1 + ~x2 , then

Ak~v = Ak (~x1 + ~x2 ) = Ak ~x1 + Ak ~x2 = (λ1 )k ~x1 + (λ2 )k ~x2 .

Similarly, if w
~ = a~x1 , then

Ak~v = Ak (a~x1 ) = a(Ak ~x1 ) = a(λ1 )k ~x1 .

This suggests that multiplication by A would be substantially easier if we


could write any vector as a linear combination of eigenvectors. To make this
possible, we’ll try to find a basis for Rn made up of eigenvectors of A. Since
dim(Rn ) = n, we know it is enough to find n linearly independent eigenvectors
of A. Each eigenvector of A is in one of A’s eigenspaces, so we’ll start there.
In the last section, we practiced finding the eigenspace for each eigenvalue
of A, and ended up finding a basis for each eigenspace. This means by the
time we’ve finished finding all of A’s eigenspaces, we’ve created several linearly
independent sets of eigenvectors. To help combine these eigenspace bases into
a basis for Rn , we’ll use the following fact.

Theorem 1. Eigenvectors with different eigenvalues are linearly independent.

Suppose we have eigenvectors ~v1 , . . . , ~vk with eigenvalues λ1 , . . . , λk . If they


are linearly dependent, there is some ~vi which is in the span of the others.
Even stronger, there is some ~vi which is in the span of ~v1 , . . . , ~vi−1 . Pick the
first place that this happens, i.e., the smallest such i. Then

~vi = a1~v1 + · · · + ai−1~vi−1

and ~v1 , . . . , ~vi−1 are linearly independent. Substituting a1~v1 + · · · + ai−1~vi−1


for ~vi in the equation A~vi = λi~vi gives us

A(a1~v1 + · · · + ai−1~vi−1 ) = λi (a1~v1 + · · · + ai−1~vi−1 ).

Expanding both sides gives

A(a1~v1 ) + · · · + A(ai−1~vi−1 ) = λi (a1~v1 ) + · · · + λi (ai−1~vi−1 ).


4.4 Diagonalization 285

Since A(a1~vj ) = λj a1~vj , we get


(λ1 a1 )~v1 + · · · + (λi−1 ai−1 )~vi−1 = (λi a1 )~v1 + · · · + (λi ai−1 )~vi−1 .
Moving all terms to the right-hand side of the equation gives us
~0 = (λi a1 )~v1 + · · · + (λi ai−1 )~vi−1 − (λ1 a1 )~v1 − · · · − (λi−1 ai−1 )~vi−1

which simplifies to
~0 = (λi − λ1 )a1~v1 + · · · (λi − λi−1 )ai−1~vi−1 .

Since ~v1 , . . . , ~vi−1 are linearly independent, we must have (λi − λj )aj = 0 for
j = 1, . . . , i − 1. Since the eigenvalues were different, we must have λi − λj 6= 0
for all j, which means aj = 0 for all j. Thus it wasn’t possible to have any ~vi
in the span of the other ~v s, so these eigenvectors are linearly independent.
Since eigenvectors with different eigenvalues are linearly independent, we
can combine our bases from all the eigenspaces of A to create a set of linearly
independent eigenvectors. In fact, this is the largest linearly independent
collection of eigenvectors of A. We know from 3.1’s Theorem 3 that if a linearly
independent set contains dim(V ) vectors, it is a basis for V . This means that
if we’ve collected n vectors from the bases of A’s eigenspaces, we’ve found a
basis for Rn composed of eigenvectors of A. However, if there are fewer than
n vectors in our collection, then we cannot create a basis of eigenvectors for
Rn . Because the number of vectors in a basis equals the dimension, another
way to state this is as follows.

Theorem 2. There is a basis for Rn made up of eigenvectors of an n × n


matrix A exactly when the sum of the dimensions of A’s eigenspaces is n.

Below we’ll see examples of both cases.

Example
 1. Find a basis for R3 which is made up of eigenvectors of
1 0 7
A = −2 2 14 .
1 0 −5
This is the matrix from Examples 2 and 6 in 4.3, so we  already
  know that
0 7
A has eigenvalues 2 and −6 and that E2 has basis 1 , 0. This means
0 1
we’ve already got two out of three basis vectors for R3 .
To complete our basis for R3 , we need to find another basis vector from
E−6 . We can do this by solving (A − (−6)I3 )~x = ~0. Plugging in A and
simplifying gives us  
7 0 7
−2 8 14 ~x = ~0.
1 0 1
286 Diagonalization

This new matrix has reduced echelon form


 
1 0 1
0 1 2
0 0 0

so x1 = −x3 , x2 = −2x3 , and x3 is free. This means


 
 −x3 
E−6 = −2x3 
 
x3
 
−1
which has basis −2.
1
Putting this newest basis vector for E−6 together with our two basis vectors
for E2 gives us the basis      
−1 0 7
−2 , 1 , 0
1 0 1
for R3 made up of eigenvectors of A.

3
Example
 2. Show that
 we can’t find a basis for R made up of eigenvectors
−6 −3 5

of A = 3 0 −2.
0 0 4
This isn’t a matrix we’ve discussed before, so we’ll have to take it from
the top: find the eigenvalues, find a basis for each eigenspace, and notice that
we don’t get three basis vectors from that process.
To find the eigenvalues, we need to compute
 
−6 − λ −3 5
det  3 0−λ −2  .
0 0 4−λ

Let’s do that by expanding along the 3rd row since it has two zero entries.
This gives us
 
−6 − λ −3 5  
   3+3 −6 − λ −3
det 3 −λ −2 = (−1) (4 − λ) det
3 −λ
0 0 4−λ
= (4 − λ)((−6 − λ)(−λ) − (−3)(3)).
4.4 Diagonalization 287

Factoring our determinant and setting it equal to zero gives us


(4 − λ)((−6 − λ)(−λ) − (−3)(3)) = (4 − λ)(λ2 + 6λ + 9)
= (4 − λ)(λ + 3)2 = 0
which means A’s eigenvalues are 4 and −3.
Next we need to find a basis for E4 and E−3 . Let’s start with E4 . This
eigenspace is the null space of
   
−6 − 4 −3 5 −10 −3 5
A − 4I3 =  3 0 − 4 −2  =  3 −4 −2
0 0 4−4 0 0 0
which has reduced echelon form
 
1 0 − 26
49
 5 
0 1 49 .
0 0 0
This means  26 

 49 x3  
 5 
E4 = − 49 x3 

 

x3
 26

49
 5
which has basis − 49 . Our other eigenspace E−3 is the null space of
1
   
−6 − (−3) −3 5 −3 −3 5
A − (−3)I3 =  3 0 − (−3) −2  =  3 3 −2
0 0 4 − (−3) 0 0 7
which has reduced echelon form
 
1 1 0
0 0 1 .
0 0 0
Here x1 = −x2 and x3 = 0, so
 
 −x2 
E−3 =  x2 
 
0
 
−1
which has basis  1 .
0
Notice that between our two eigenspaces, we only have two basis vectors.
Since we’re looking for a basis for R3 , this isn’t enough. Therefore we can’t
get a basis for R3 made up of eigenvectors of A.
288 Diagonalization

From the discussion at the end of 4.3, we know that the dimension of
any eigenspace satisfies 1 ≤ dim(Eλ ) ≤ k where k is the number of times
λ is a solution to the polynomial det(A − λIn ) = 0. (This is often called
the multiplicity of λ.) If λ is a so-called single root, i.e., k = 1, then clearly
dim(Eλ ) = 1. For larger values of k, we’ve seen that it’s possible to have
dim(Eλ ) < k as in Example 2 for λ = −3. If we list each repeated root the
number of times it solves a polynomial, the degree of any polynomial equals
the number of its roots. Thus our degree n polynomial, det(A − λIn ), has n
roots in total, so the sum of the valueP`of k over all eigenvalues λ1 , . . . , λ` is
n. As a formula, this looks like n = i=1 ki . Since ki is the upper bound on
the number of basis vectors for Eλi , to get the sum of the dimensions of our
eigenspaces equal to n we must have dim(Eλi ) = ki for every eigenvalue
of A. We can see that this is true in Example 1 but not in Example 2.
Practically speaking, this means if you come across any eigenspace whose
dimension is smaller than the multiplicity of its eigenvalue, you already know
it is impossible to find a basis of eigenvectors for Rn , which may save you
some work. As a consequence of this discussion we also get the following fact.

Theorem 3. An n × n matrix with n different eigenvalues always has a basis


of eigenvectors for Rn .

For example, the matrix in Example 1 of 4.3 was 2 × 2 and had eigenvalues
5 and −1, so we know there is a basis for R2 made up of its eigenvectors.
Suppose A has enough linearly independent eigenvectors to form a basis B
for Rn . We found B to simplify multiplication by A for vectors which aren’t
eigenvectors, which we can do by rewriting those vectors in terms of our
eigenvector basis B. In other words, we want to replace vectors in Rn by
their B-coordinate vectors. We’re already working in Rn , so our coordinate
vector function fB is a map from Rn to itself. We’ll explore this special case of
coordinate functions in the next section. For now, let’s see what we’ve gained
by computing A[~v ]B instead of A~v .
Let B = {~b1 , . . . , ~bn } be a basis for Rn where ~bi is an eigenvector of A with
eigenvalue λi . Let ~v be any vector in Rn . We can write ~v = a1~b1 + · · · + an~bn
so  
a1
 
[~v ]B =  ...  .
an
Now

A~v = A(a1~b1 + · · · + an~bn ) = a1 (A~b1 ) + · · · + an (A~bn )


= a1 (λ1~b1 ) + · · · + an (λn~bn ) = (a1 λ1 )~b1 + · · · + (an λn )~bn .

This may not look immediately better, but if we think in terms of B-coordinate
4.4 Diagonalization 289

vectors, this means  


a1 λ1
 
[A~v ]B =  ...  .
an λn
We can summarize this relationship as follows.

Theorem 4. Let A be an n × n matrix and B = {~b1 , . . . , ~bn } be a basis


n
for R of A’s eigenvectors
 with eigenvalues λ1 , . . . , λn respectively. If we let
λ1 0 · · · 0
 .. 
 0 ... .   
D= .  , then [A~v ] = D [~v ] and Ak~v = Dk [~v ] .
.  B B B B
 .. .. 0 
0 ··· 0 λn

This means that if we are willing to work in terms of a basis of eigenvectors


we can make A behave like a diagonal matrix, which makes repeated
multiplication much easier.
   
1 −1
Example 3. Let B = , be a basis for R2 made up of eigenvectors
  2 1  
1 2 5 0
of A = . Use the diagonal form of A which is D = to show
4 3   0 −1
5
that we have [A~v ]B = D[~v ]B for ~v = .
4
We saw in 4.1 that our two basis vectors are eigenvectors of A with
5 0
eigenvalues 5 and −1 respectively, which means D = is the diagonal
0 −1
matrix whose diagonal entries are the eigenvalues of our basis vectors (in
order).
To check the rest of this claim, let’s compute both [A~v ]B and D[~v ]B to
show they’re the same.
    
1 2 5 13
A~v = = .
4 3 4 32

To get [A~v ]B we need to solve


     
1 −1 13
x1 + x2 =
2 1 32

which has augmented coefficient matrix


 
1 −1 13
.
2 1 32
290 Diagonalization

This row reduces to  


1 0 15
0 1 2
so  
15
[A~v ]B = .
2
To find D[~v ]B , we first need to compute [~v ]B . This means we need to solve
     
1 −1 5
x1 + x2 =
2 1 4

which has augmented coefficient matrix


 
1 −1 5
.
2 1 4

This row reduces to  


1 0 3
0 1 −2
so  
3
[~v ]B = .
−2
Multiplying by D gives us
    
5 0 3 15
D[~v ]B = =
0 −1 −2 2

which matches [A~v ]B as claimed.

Since putting Rn in terms of a basis of eigenvectors makes A act like a


diagonal matrix, we make the following definition.

Definition. An n × n matrix A is diagonalizable if there is a basis for Rn


made up of eigenvectors of A.

 
−6 −3 5
Example 4. The matrix A =  3 0 −2 is diagonalizable.
0 0 4
 This
 isthe
 matrix
 from Example 1, which had the basis of eigenvectors
−1 0 7
−2, 1, 0 for R3 .
1 0 1
4.4 Diagonalization 291
 
1 0 7
Example 5. The matrix A = −2 2 14  is not diagonalizable.
1 0 −5
This is the matrix from Example 2, which we showed did not have a basis
of eigenvectors for R3 .

Exercises 4.4.
 
7 −1 0
1. How many linearly independent eigenvectors does  2 4 3
−4 5 8
need to be diagonalizable?
 
−5 0 2 1
 3 −4 1 9
2. How many linearly independent eigenvectors does 
2

0 0 −4
1 6 −2 5
need to be diagonalizable?
 
9 0
3. How many linearly independent eigenvectors does A =
4 13
need to be diagonalizable?
4. Is a 3×3 matrix A with eigenvalues λ = 4, λ = −2, and λ = 1 where
dim(E4 ) = 1, dim(E−2 ) = 1, and dim(E1 ) = 1 diagonalizable?
5. Is a 3 × 3 matrix A with eigenvalues λ = 0 and λ = −5 where
dim(E0 ) = 1 and dim(E−5 ) = 1 diagonalizable?
6. Is a 4×4 matrix A with eigenvalues λ = 2, λ = −8, and λ = 3 where
dim(E2 ) = 1, dim(E−8 ) = 1, and dim(E3 ) = 1 diagonalizable?
7. Is a 4 × 4 matrix A with eigenvalues λ = 7, λ = 0, and λ = 10 where
dim(E7 ) = 1, dim(E0 ) = 1, and dim(E10 ) = 2 A diagonalizable?
 
 −2 
8. Is a 3 × 3 matrix A with eigenspaces E0 = Span −1 ,
 
    1
 0   1 
E3 = Span 4 , and E−1 = Span 0 diagonalizable?
   
1 1
 
 −2 
9. Is a 3 × 3 matrix A with eigenspaces E1 = Span  3  and
 
    1
 8 −3 
E−11 = Span 1 ,  0  diagonalizable?
 
0 1
292 Diagonalization
 

 1 
  
1
10. Is a 4 × 4 matrix A with eigenspaces E2 = Span  0,

 
 
    1

 12   9 
   
  
0  , and E5 = Span  0  diagonalizable?
E−6 = Span   1  −2

  
 
   
0 1
 
 −3 

 
 2 
11. Is a 4 × 4 matrix A with eigenspaces E−2 = Span   , and
 0 

 
 
      1

 8 −1 5 
      
1  ,   , 0 diagonalizable?
0
E4 = Span       

 0 1 0 

 
0 0 1
 
8 −3
12. Is A = diagonalizable?
9 −4
 
3 0 5
13. Is A = 2 1 4 diagonalizable?
5 0 3
 
2 3 0
14. Is A = 5 4 0  diagonalizable?
6 6 −1
 
2 5 1
15. Is A = 3 4 8 diagonalizable?
0 0 7
16. Explain why a matrix is diagonalizable if and only if the dimension
of each eigenspace, Eλ , equals the multiplicity of λ.
4.5 Change of Basis Matrices 293

4.5 Change of Basis Matrices


In the last section, we saw that while it is usually easiest to use the
standard basis for a vector space, there are situations where it is easier to use
another basis instead. Our example from 4.4 was using a basis of eigenvectors
to simplify matrix multiplication, but you’ve probably seen echoes of this
idea in other areas of mathematics as well – think of using cylindrical or
spherical coordinates in 3D modeling. Guidance systems for some machining
tools also use matrices to change coordinates. In this section we’ll develop
a computational technique to make this change of basis go smoothly. In
particular, we’ll see what is gained when our change of basis happens inside
Rn .
Suppose we have an n-dimensional vector space V where we have been
working with a basis B. However, we now want to work with V in terms of
another basis C. Since dim(V ) = n, working with any basis for V means using
the coordinate map to create a correspondence between V and Rn . If we’re
using basis B, this means using fB and working with B-coordinate vectors.
If we’re using basis C, it’s fC and C-coordinate vectors. We can think of this
visually in Figure 4.2.

V
fB fC

Rn Rn

Figure 4.2: Two coordinate maps

If we have a vector ~v from V , this means we have two different coordinate


vectors of ~v in Rn : [~v ]B and [~v ]C . It is certainly possible to get from [~v ]B to
[~v ]C by using [~v ]B to find ~v and then using fC to get [~v ]C . However, tracing
that route on the picture above lets us see that we’d be traveling along two
legs of the triangle. It would be quicker to just cut across the bottom of the
diagram by creating a map directly from Rn in terms of B-coordinates to Rn
in terms of C-coordinates. This function is appropriately called the change of
coordinates map or sometimes the change of basis map. Since it is a map from
Rn to itself, it corresponds to an n × n matrix as defined below.

Definition. Let B and C each be a basis for an n-dimensional vector space


V . The change of coordinates matrix is the n × n matrix PC←B which has
PC←B [~v ]B = [~v ]C for every ~v in V .

In other words, multiplication by this change of coordinates matrix changes


a vector’s B-coordinate vector into its C-coordinate vector. This allows us to
fill in our diagram of coordinate maps as in Figure 4.3.
294 Diagonalization

V
fB fC

Rn PC←B
Rn

Figure 4.3: The change of coordinates matrix

Pay attention to the notation here! As with function composition, the


subscript of PC←B is written in the opposite direction from how we usually
read. This is because we put [~v ]B on the right of PC←B in our multiplication.
I remember this by thinking that the B on the subscript needs to touch the
B-coordinate vector.
       
1 0 0 1 1 0 0 −1
Example 1. Use basis B = , , , and basis
   0 1 1 0 0 −1 1 0
      
1 0 1 1 1 1 1 1 3 5
C = , , , for M22 . The matrix ~v =
0 0 0 0 1 0 1  1 1 −5
−1
3
has B-coordinate vector [~v ]B =  
 4 . Use the change of coordinates matrix
−2
 
1 −1 1 1
0 0 0 −2
PC←B =  −1 1
 to find [~v ] .
1 1 C

1 0 −1 0
Plugging our PC←B and [~v ]B into [~v ]C = PC←B [~v ]B , gives us
    
1 −1 1 1 −1 −2
0 0 0 −2  3   4 
[~v ]C = 
−1 1
  =  .
1 1  4   6 
1 0 −1 0 −2 −5
To check our work, we can  make sure that [~v ]B and [~v ]C give us the same
−1
3
matrix ~v . Since [~v ]B =  
 4 , we get
−2
         
1 0 0 1 1 0 0 −1 3 5
~v = (−1) +3 +4 −2 = .
0 1 1 0 0 −1 1 0 1 −5
Similarly, [~v ]C tells us
         
1 0 1 1 1 1 1 1 3 5
~v = (−2) +4 +6 −5 = .
0 0 0 0 1 0 1 1 1 −5
Since our matrices agree, our change of coordinates computation checks out.
4.5 Change of Basis Matrices 295

Given bases B and C, how do we compute PC←B ? We want PC←B to satisfy


PC←B [~v ]B = [~v ]C for every vector ~v in V . Suppose B = {~b1 , . . . , ~bn } and
~v = x1~b1 + · · · + xn~bn , so  
x1
 .. 
[~v ]B =  .  .
xn
This means we really want
 
x1
 
PC←B  ...  = [~v ]C .
xn

However, we can also compute [~v ]C as fC (~v ). From this perspective,

[~v ]C = fC (~v ) = fC (x1~b1 + · · · + xn~bn ).

Since coordinate functions are linear, this splits up as

x1 fC (~b1 ) + · · · + xn fC (~bn )

which can also be written as


h i h i
x1 ~b1 + · · · + xn ~bn .
C C

Plugging this back into our equation with PC←B gives us


 
x1 h i h i
 .. 
PC←B  .  = x1 ~b1 + · · · + xn ~bn .
C C
xn

This equation identifies two different versions of a linear system: on the right
as a matrix equation and on the left as a vector equation. This gives us
 
.. ..
h . i h .i 
 ~bn 
Theorem 1. If B and C are bases for V , then PC←B =  ~b1 ... .
 C C
.. ..
. .

In other words, PC←B is the matrix whose columns are the C-coordinate
vectors of the basis vectors from B. Since this formula isn’t symmetric, I
remember that I’m transforming vectors from B into vectors with respect to
C to get PC←B .
296 Diagonalization

Example 2. Using M22 with basis B and C from Example 1, compute PC←B .

To find PC←B , we need to compute the C-coordinate vector of each matrix


from our basis B. Since
       
1 0 1 1 1 1 1 1
C= , , ,
0 0 0 0 1 0 1 1

this means solving the equation


         
1 0 1 1 1 1 1 1 b1 b2
x1 + x2 + x3 + x4 =
0 0 0 0 1 0 1 1 b3 b4
 
b b
for each 1 2 in B. We can simplify the left-hand side to get
b3 b4
   
x1 + x2 + x3 + x4 x2 + x3 + x4 b1 b2
= .
x3 + x4 x4 b3 b4

Setting entries equal gives us the four equations x1 + x2 + x3 + x4 = b1 ,


x2 + x3 + x4 = b2 , x3 + x4 = b3 , and x4 = b4 . This set of linear equations has
augmented coefficient matrix
 
1 1 1 1 b1
0 1 1 1 b2 
 
0 0 1 1 b3  .
0 0 0 1 b4
   
b b 1 0
For 1 2 = , this means our augmented coefficient matrix is
b3 b4 0 1
 
1 1 1 1 1
0 1 1 1 0
 
0 0 1 1 0
0 0 0 1 1

which has reduced echelon form


 
1 0 0 0 1
0 1 0 0 0
 .
0 0 1 0 −1
0 0 0 1 1

Therefore  
  1
1 0 0
=  .
0 1 C −1
1
4.5 Change of Basis Matrices 297
   
b b2 0 1
Similarly, when 1 = , our augmented coefficient matrix is
b3 b4 1 0
 
1 1 1 1 0
0 1 1 1 1
 
0 0 1 1 1
0 0 0 1 0
which has reduced echelon form
 
1 0 0 0 −1
0 1 0 0 0
 .
0 0 1 0 1
0 0 0 1 0
Therefore  
  −1
0 1 0
=  .
1 0 C 1
0
   
b b2 1 0
When 1 = , our augmented coefficient matrix is
b3 b4 0 −1
 
1 1 1 1 1
0 1 1 1 0
 
0 0 1 1 0
0 0 0 1 −1
which has reduced echelon form
 
1 0 0 0 1
0 1 0 0 0
 .
0 0 1 0 1
0 0 0 1 −1
Therefore 

  1
1 0 0
= 
 1 .
0 −1 C
−1
   
b1 b2 0 −1
Finally, when = , our augmented coefficient matrix is
b3 b4 1 0
 
1 1 1 1 0
0 1 1 1 −1
 
0 0 1 1 1 
0 0 0 1 0
298 Diagonalization

which has reduced echelon form


 
1 0 0 0 1
0 1 0 0 −2
 .
0 0 1 0 1
0 0 0 1 0

Therefore  
  1
0 −1 −2
=  .
1 0 C 1
0
Using these four C-coordinate vectors (in order) as the columns of our change
of coordinates matrix we get
 
1 −1 1 1
0 0 0 −2
PC←B = −1
.
1 1 1
1 0 −1 0

Notice in our last example, that we solved four matrix equations whose
augmented coefficient matrices were the same except for the augmentation
column. This means we could have borrowed the shortcut we use to compute
matrix inverses and solved all four equations simultaneously by row reducing
the 4 × 8 matrix formed by putting the common piece of the augmented
coefficient matrices on the left and the four augmentation columns on the
right. As with matrix inverses, the left-hand side would reduce to I4 while
the right-hand side would become PC←B . If B and C are bases for Rn , these
augmentation columns are just the basis vectors from B and the common part
of the augmented coefficient matrix has columns that are the basis vectors
from C. That means we can find PC←B as the right half of the reduced echelon
form of  
.. .. .. ..
 . . . . 
~c1 . . . ~cn ~b1 . . . ~bn  .
 
.. .. .. ..
. . . .
     
 2 −3 5 
Example 3. In R3 , find PC←B where B =  1  ,  0  , 4 and
 
      −1 2 1
 1 0 0 
C = −1 ,  1  , 0 .
 
1 −1 1
Since B and C are bases for R3 , we can use the shortcut described above
4.5 Change of Basis Matrices 299

to find PC←B by row reducing


 
1 0 0 2 −3 5
−1 1 0 1 0 4
1 −1 1 −1 2 1

to get  
1 0 0 2 −3 5
0 1 0 3 −3 9 .
0 0 1 0 2 5
Thus  
2 −3 5
PC←B = 3 −3 9 .
0 2 5

We can extend this trick to a more general vector space V if we are willing
to use a coordinate map to translate our problem from V to Rn . You can
redo Example 2 by using the standard basis for M22 to translate to R4 . (This
will actually give you the same augmented coefficient matrix pieces as we
computed there.)
Since B is linearly independent, the only vector with [~v ]B = ~0 is ~0V . This
means PC←B ~x = ~0 has only one solution, ~x = ~0, and therefore by the Invertible
Matrix Theorem PC←B is invertible. Its inverse, (PC←B )−1 , satisfies

(PC←B )−1 PC←B = In

so
(PC←B )−1 [~v ]C = (PC←B )−1 (PC←B [~v ]B ) = [~v ]B .
This gives us the following:

Theorem 2. (PC←B )−1 = PB←C

In other words, the inverse of a change of coordinates matrix is the change


of coordinates matrix in the other direction. We can visualize this in Figure
4.4.

V
fB fC

PC←B
Rn Rn
PB←C

Figure 4.4: The inverse change of coordinates matrix


300 Diagonalization

Example 4. Find PB←C for M22 with the same B and C as in Examples 1
and 2.

We could redo our original process for finding change of coordinates


matrices as we did in Example 2, by finding the B-coordinate vectors of the
matrices from our basis C and using them as the columns of our matrix.
However, it’s much shorter to just compute the inverse of our change of
coordinates matrix PC←B in the other direction.
We can find this inverse by row reducing
 
1 −1 1 1 1 0 0 0
0 0 0 −2 0 1 0 0
 
−1 1 1 1 0 0 1 0
1 0 −1 0 0 0 0 1

to get  
1 1 1
1 0 0 0 2 2 2 1
0 1 0 0 0 1
1 1
 2 
 1 1 1 .
0 0 1 0 2 2 2 0
0 0 0 1 0 − 12 0 0
This means
1 1 1

2 2 2 1
0 1
1 1
 
PB←C = (PC←B )−1 =  1 2
1 1 .
2 2 2 0
0 − 12 0 0

There is one special case where the process of finding PC←B is much easier:
when V = Rn , and C is the standard basis for Rn . This may sound too
specific to be very useful, but we’ll want it when we’re working with a basis
of eigenvectors.
Suppose V = Rn and C is its standard basis. Our process for finding PC←B
is to find the C-coordinate vector of each basis vector from our basis B and
use them as the columns of PC←B . However, the way we write vectors in Rn
is as their coordinate vectors in terms of the standard basis. This means that
each basis vector from B is its own C-coordinate vector, which gives us the
following.

Theorem 3. Let C be the  standard basis for Rn and B be any other basis.
.. ..
. .
Then PC←B = 

~b1 . . . ~bn .

.. ..
. .
4.5 Change of Basis Matrices 301

In other words, PC←B is just the matrix whose columns are the basis vectors
from B.
     
 −1 0 7 
Example 5. Find PC←B where B = −2 , 1 , 0 is the basis of
 
  1 0 1
1 0 7
eigenvectors of A = −2 2 14  from Example 1 in 4.4 and C is the
1 0 −5
standard basis for R3 .

Since C is the standard basis for R3 , the basis vectors from B are already
written as C-coordinate vectors. This means the columns of PC←B are simply
the basis vectors from B (in order), which means we have
 
−1 0 7
PC←B = −2 1 0 .
1 0 1

As we saw in the previous example, this links back to our work in the
previous section, by letting B be a basis of eigenvectors for an n × n matrix
A. From 4.4, we know that A acts like an n × n diagonal matrix D when
multiplied by B-coordinate vectors. (Recall that D’s diagonal entries are the
eigenvalues of our basis vectors from B (in order).) With our new change of
coordinates matrices, we get an alternate way to compute Ak~v in three stages:
change ~v from standard C-coordinates to B-coordinates using PB←C , multiply
by Dk which is the diagonal version of Ak , and change the result back to
standard C-coordinates using PB←C . We can visualize this as follows, where
we start at the top left corner and end up at the top right corner. (Simply
multiplying ~v by Ak can be visualized as starting in the same place but simply
going straight across the top to the same endpoint.)

Ak
Rn −−−−→ Rn
 x

PB←C y
P
 C←B
Dk
Rn −−−−→ Rn
Figure 4.5: Visualizing diagonalization

Since our change of coordinates matrices are in the special case discussed
above, we know PC←B is just the matrix P whose columns are the eigenvectors
in our basis B and PB←C = P −1 . This means we can update Figure 4.5 to get
Figure 4.6.
302 Diagonalization
Ak
Rn −−−−→ Rn
 x
 
P −1 y P
Dk
Rn −−−−→ Rn
Figure 4.6: Simplified notation for diagonalization

As in our previous picture, computing Ak~v directly means going from


the top left to the top right via the arrow for the function Ak . Our other
computational option is changing to B-coordinates, multiplying by Dk , and
changing back to C-coordinates, which means going from the top left to the
bottom left, then across to the bottom right, and finally up to the top right
via the functions P −1 , Dk , and P respectively. Either route gives us the
same answer for Ak~v , so we can choose whichever option seems easier for
our particular problem. For example, if k is small and we’re only planning
to do this for one vector ~v , it may be easier to just go across the top, i.e.,
compute Ak~v directly. On the other hand, if k is large or we’re planning to
do this many times for many different ~v s, it may be easier to find Ak~v using
eigenvectors instead. Many people sum up the relationship between these two
paths with one of the following formulas.

Theorem 4. Ak~v = P Dk P −1 ~v

or simply

Theorem 5. Ak = P Dk P −1
 
1 0 7
Example 6. Find P , P −1 , and D for A = −2 2 14 .
1 0 −5
This is the matrix we worked with in Example 5 of this section and
Example 1 of 4.4. From 4.4’s Example 1, we have the basis
     
 −1 0 7 
B = −2 , 1 , 0
 
1 0 1

of eigenvectors of A which have eigenvalues −6, 2, and 2 respectively. This


means our change of coordinates matrix from B-coordinates to standard
coordinates is  
−1 0 7
P = −2 1 0 .
1 0 1
The change of coordinates matrix in the other direction is P −1 , which we
4.5 Change of Basis Matrices 303

can compute by row reducing


 
−1 0 7 1 0 0
−2 1 0 0 1 0
1 0 1 0 0 1

to get  
1 0 0 − 18 0 7
8
 7
0 1 0 − 14 1 4
1 1
0 0 1 8 0 8
so  
− 18 0 7
8
 1 7
P −1 = − 4 1 4 .
1 1
8 0 8

Once we change to B-coordinates, A behaves like the diagonal matrix


whose diagonal entries are the eigenvalues of the basis vectors from B (in
order), so  
−6 0 0
D =  0 2 0 .
0 0 2
Note that if we were interested in dealing with Ak , we’d simply replace D
with  
(−6)k 0 0
D = 0
k
2k 0 .
0 0 2k

Exercises 4.5.
   
3 −4 2
1. Suppose PC←B = and [~v ]B = . Find [~v ]C .
−1 5 6
   
2 7 −1
2. Suppose PC←B = and [~v ]B = . Find [~v ]C .
−3 1 4
   
5 −2 1 2
3. Suppose PC←B = 0 8 −2 and [~v ]B =  0 . Find [~v ]C .
1 5 3 −1
   
4 0 −3 3
4. Suppose PC←B = 1 6 2  and [~v ]B = −1. Find [~v ]C .
−2 −1 0 1
   
8 3 3
5. Suppose PC←B = and [~v ]C = . Find [~v ]B .
2 5 1
   
4 6 10
6. Suppose PC←B = and [~v ]C = . Find [~v ]B .
3 5 2
304 Diagonalization
   
−3 2 0 3
7. Suppose PC←B =  8 −12 4 and [~v ]C = −4. Find [~v ]B .
1 −1 0 2
   
1 0 1 −3
8. Suppose PC←B =  −1 2 0 
and [~v ]C = 1 . Find [~v ]B .
0 −1 1 4
           
 −11 13 1   2 −3 0 
9. Let B =  −5  , 19 , 12 and C = 4 ,  1  , −1
   
2 10 1 2 0 1
be bases for R3 . Compute PC←B .
           
 −12 −2 −28   −4 2 −1 
10. Let B =  24  ,  8  ,  34  and C =  4  , 0 ,  5 
   
−13 11 −1 1 3 −2
be bases for R3 . Compute PC←B .
       
−3 3 6 32 4 18 8 8
11. Use the bases B = , , , and
   −3  −7  8 44 5 26 0 12
2 4 1 1 0 0 −1 3
C = , , , for M22 . Compute
2 8 −1 −1 1 0 0 1
PC←B .
       
−10 3 −12 3 0 5 0 11
12. Use the bases B = , , ,
−6 12 0 1 18 2 12 15
and        
−1 2 2 1 0 −1 5 0
C = , , , for M22 . Compute
0 4 6 1 0 3 0 5
PC←B .

13. Use thebases B = 5x2 − 3x + 5, −12x2 − 44x, 7x2 + 31x − 1 and
C = 3x2 + 7x + 1, −2x2 + 2x − 2, −8x + 2 for P2 . Compute
PC←B .
 2 2 2
14.  2 B = 14x 2 − 9x + 1,219x − 3x − 4, 9x + x − 10
Use the bases
and C = x − x + 4, 5x − 3, −2x + x + 3 for P2 . Compute
PC←B .
     
 3 1 5 
15. Let B =  7  ,  0  , 9 be a basis for R3 and C be the
 
−1 −3 1
standard basis. Compute P = PC←B .
     
 10 −3 9 
16. Let B = −6 ,  4  ,  2  be a basis for R3 and C be the
 
0 7 −5
standard basis. Compute P = PC←B .
4.5 Change of Basis Matrices 305
       

 8 −11 4 −2  
    −5  3 
 0   6     
17. Let B =   ,  , , be a basis for R4 and C be

 9 3  −1  4  
 
1 −3 2 11
the standard basis. Compute P = PC←B .
       
 4
 −2 0 9 
 −1  7   3  0
18. Let B =   ,   ,   ,   be a basis for R4 and C be
  −1 −6 2
 5

 

0 3 5 1
the standard basis. Compute P = PC←B .
     
 2 −3 0 
19. Let B = −4 ,  1  , −1 be a basis for R3 and C be the
 
2 0 1
−1
standard basis. Compute P = PB←C .
     
 −2 6 2 
20. Let B =  0  , 4 , 0 be a basis for R3 and C be the
 
1 2 1
standard basis. Compute P −1 = PB←C .
       
 3
 2 −1 0 
        
 0  1  1  0
21. Let B =   ,   ,   ,   be a basis for R4 and C be
 −2
 0 1 1  
 
1 0 0 1
the standard basis. Compute P −1 = PB←C .
       

 1 −2 0 1 
        
0 −1
 ,   ,   , 2 be a basis for R4 and C be
1
22. Let B =  1  0   2  0

 
 
0 1 0 1
the standard basis. Compute P −1 = PB←C .
     
 4 −1 3 
23. Let B = 1 ,  0  , −4 be a basis for R3 made up of
 
0 6 2
eigenvectors of A and C be the standard basis. If the eigenvalues of
the basis vectors of B are (in order) λ1 = −7, λ2 = 2, and λ3 = −1,
compute the matrices P and D used in our formula A = P DP −1 .
     
 5 −2 0 
24. Let B = 1 ,  6  , −4 be a basis for R3 made up of
 
0 1 1
eigenvectors of A and C be the standard basis. If the eigenvalues of
the basis vectors of B are (in order) λ1 = −4, λ2 = 5, and λ3 = 0,
compute the matrices P and D used in our formula A = P DP −1 .
       
 −13


4 0 −2  
 10  −1 3  1 
25. Let B =  
 ,   ,   ,   be a basis for R4 made up
  0  2  5 
 0

 

1 2 0 −3
306 Diagonalization

of eigenvectors of A and C be the standard basis. If the eigenvalues


of the basis vectors of B are (in order) λ1 = −1, λ2 = −1, λ3 = 21 ,
and λ4 = 6, compute the matrices P and D used in our formula
A = P DP−1 .
       

 −8 3 −1 5 
        
1  ,   ,   ,  0  be a basis for R4 made up
0 0
26. Let B =         

 0 1 −4 −2  
 
0 0 1 6
of eigenvectors of A and C be the standard basis. If the eigenvalues
of the basis vectors of B are (in order) λ1 = 3, λ2 = −6, λ3 = 0,
and λ4 = 5, compute the matrices P and D used in our formula
A = P DP −1 .
27. Describe a situation where you might want to diagonalize a matrix.
5
Computational Vector Geometry

5.1 Length
In this chapter, we’ll return to Rn and start developing a set of tools that will
allow us to easily compute geometric quantities without needing to visualize
them first. This is most obviously useful in dealing with Rn for n > 3, but
can also be easier than drawing a picture even in complicated situations in
R2 or R3 . (It is also appreciated by less than stellar artists like me!) The two
basic geometric quantities we’ll work with are length and angle. These are
both scalar quantities, so we’ll need a way to create scalars from vectors. Our
basic tool is the following.
   
x1 y1
 x2   y2 
   
Definition. Let ~x =  .  and ~y =  .  be n-vectors. Their dot product is
 ..   .. 
xn yn
~v · w
~ = x1 y1 + x2 y2 + · · · + xn yn .

Note that we cannot take the dot product of two vectors unless they are
both the same size.
   
−2 3
Example 1. Compute  1  ·  7 .
6 −1
Since both of our vectors are from R3 , this dot product makes sense. To
compute it, we multiply corresponding entries of the two vectors and add up
those products to get
   
−2 3
 1  ·  7  = −2(3) + 1(7) + 6(−1) = −5.
6 −1

As with any other new vector operation, we want to explore its properties,
including how it interacts with addition and scalar multiplication.

307
308 Computational Vector Geometry

The first nice property to notice about the dot product is that unlike
matrix multiplication it is commutative, i.e., ~x · ~y = ~y · ~x. This is because

~x · ~y = x1 y1 + x2 y2 + · · · + xn yn

which equals
y1 x1 + y2 x2 + · · · + yn xn = ~y · ~x
since multiplication of real numbers commutes.
Next, let’s see how the dot product interacts with vector addition. Suppose
~x, ~v , and w ~ are in Rn . If we take the dot product of one vector with the sum
of the other two vectors we get
         
x1 v1 w1 x1 v1 + w1
 x2   v2   w2   x2   v2 + w2 
         
~x · (~v + w)
~ =  .  ·  .  +  .  =  .  ·  .. 
 ..   ..   ..   ..   . 
xn vn wn xn vn + wn
= x1 (v1 + w1 ) + x2 (v2 + w2 ) + · · · + xn (vn + wn )
= x 1 v1 + x 1 w 1 + x 2 v2 + x 2 w 2 + · · · + x n vn + x n w n
= (x1 v1 + x2 v2 + · · · + xn vn ) + (x1 w1 + x2 w2 + · · · + xn wn )
= ~x · ~v + ~x · w
~

so ~x · (~v + w)
~ = ~x · ~v + ~x · w.
~ Thus the dot product distributes over vector
addition.
 
 
3 8
Example 2. Check that ~x · (~v + w)
~ = ~x · ~v + ~x · w
~ where ~x = , ~v = ,
  −2 5
−6
and w
~= .
1
The left-hand side of this equation is ~x · (~v + w)
~ which is
         
3 8 −6 3 2
· + = · = 3(2) + (−2)(6) = −6.
−2 5 1 −2 6

The right-hand side is ~x · ~v + ~x · w


~ which is
       
3 8 3 −6
· + · = (3(8) + (−2)(5)) + (3(−6) + (−2)(1)) = −6.
−2 5 −2 1

Both sides match, so we’ve verified that the equation holds.

Finally, let’s explore how dot products interact with scalar multiplication.
Suppose ~x and ~y are in Rn and r is a scalar. Multiplying the dot product of
5.1 Length 309

~x and ~y by r gives us
   
x1 y1
 x2   y2 
   
r(~x · ~y ) = r  .  ·  .  = r(x1 y1 + x2 y2 + · · · + xn yn )
 ..   .. 
xn yn
= rx1 y1 + rx2 y2 + · · · + rxn yn .
This is interesting, because we can split this up one of two ways: as
rx1 y1 + rx2 y2 + · · · + rxn yn = (rx1 )y1 + (rx2 )y2 + · · · + (rxn )yn
   
rx1 y1
 rx2   y2 
   
=  .  ·  .  = (r~x) · ~y
 ..   .. 
rxn yn
or as
x1 ry1 + x2 ry2 + · · · + xn ryn = x1 (ry1 ) + x2 (ry2 ) + · · · + xn (ryn )
   
x1 ry1
 x2   ry2 
   
=  .  ·  .  = ~x · (r~y ).
 ..   .. 
xn ryn
This means that scalar multiplication can be thought of as halfway distributing
over the dot product, in that multiplying a dot product by a scalar is the same
as multiplying one of the vectors in the dot product by that scalar, i.e.,
r(~x · ~y ) = (r~x) · ~y = ~x · (r~y ).
 
2
Example 3. Check that r(~x · ~y ) = (r~x) · ~y = ~x · (r~y ) holds for ~x = −3,
4
 
5
~y = 1, and r = 10.
2
As in the previous example, we’ll compute each part of the equation to
check that they match. The left piece is r(~x · ~y ), which in our case is
   
2 5
10 −3 · 1 = 10(2(5) + (−3)(1) + 4(2)) = 150.
4 2
The middle piece is (r~x) · ~y , which is
        
2 5 20 5
10 −3 · 1 = −30 · 1 = 20(5) + (−30)(1) + 40(2) = 150.
4 2 40 2
310 Computational Vector Geometry

The right piece is ~x · (r~y ), which here is


        
2 5 2 50
−3 · 10 1 = −3 · 10 = 2(50) + (−3)(10) + 4(20) = 150.
4 2 4 20

All three parts are equal, so we are done.

Now that we understand dot products, we can start using them to explore
a computational version of geometry in Rn .
In R2 , we have a nice formula for the length of a vector because we can
view our vector as the hypotenuse of a right triangle.

2
��
�=
��
��

-1 1 2 3

��

Figure 5.1: Vector length in R2


-1
 
x p
Thus the length of ~x = 1 is (x1 )2 + (x2 )2 . We have a similar formula
x2  
x1 p
in R3 which says that the length of ~x = x2  is (x1 )2 + (x2 )2 + (x3 )2 .
x3
If we think about dot products, the quantities inside these square roots
should look suggestive since they are sums of products of vector entries. In
fact, they are the dot products of the vectors with themselves. This allows us
to make the following definition.
5.1 Length 311

Definition. Let ~x be in Rn . The norm of ~x is ||~x|| = ~x · ~x.

This could also be called the length of ~x, but we’ll follow the usual
conventions and use norm. Notice that this definition does not require us
to have any picture of ~x, but is purely computational. As mentioned at the
start of this section, this is extremely exciting if we want to talk about the
length of vectors in Rn for n > 3.
 
−2
1
Example 4. Compute  
0 .
4
From the definition above, we know
  v   
u
−2 u −2 −2
1 u   
u
  = u  ·  1 .
1
0 t 0   0 
4 4 4

Computing the dot product gives us


p √
(−2)2 + 12 + 02 + 42 = 21.

Therefore  
−2
1 √
  = 21.
0
4

Another formula with a similar flavor to the length formula from R2 is


the formula for the distance between two points. From calculus, we recall
that
p the distance between (x1 , x2 ) and (y1 , y2 ) is given by the formula
(x1 − y1 )2 + (x2 − y2 )2 . If we think of these two points as vectors, the
distance formula   like a norm - specifically the norm of ~x − ~y
  starts to look
x1 y
where ~x = and ~y = 1 . To see this more precisely, we can rewrite
x2 y2
k~x − ~y k as
 
x1 − y1 p
k~x − ~y k = = (x1 − y1 )2 + (x2 − y2 )2
x2 − y2

as claimed.
Geometrically, we can look at this in Figure 5.2. Notice that the distance
between ~x and ~y , labeled d, is exactly the same length as the vector ~x − ~y
geometrically constructed via our parallelogram rule as the sum of ~x and −~y .
4

3
312 Computational Vector Geometry

d
x

1
x-y
y

-2 -1 1 2 3 4

-y
-1

Figure 5.2: The distance between vectors


-2

There is nothing special here about R2 . In fact we get the following general
way to compute distances between vectors in any Rn .

Theorem 1. The distance between ~x and ~y in Rn is k~x − ~y k.

   
−2 −3
0 1
Example 5. Find the distance between ~x =    
 6  and ~y =  4 .
1 0
Theorem 1’s formula tells us the distance between ~x and ~y is
     
−2 −3 1
0 1 −1
k~x − ~y k =    
 6 − 4  =  2 
 

1 0 1
p √
= 12 + (−1)2 + 22 + 12 = 7 ≈ 2.65.

As we did with the dot product, we next want to explore the basic
properties of the norm.
One of the first major things to notice about the norm is that it is always
the square root of a sum of squares. This means that no matter what the
entries of ~x are, ||~x|| can never be negative. Additionally, since the sum of
squares doesn’t allow for any cancelation within the square root, the only way
we can have ||~x|| = 0 is to have all entries of ~x be zero. In other words:
5.1 Length 313

Theorem 2. ||~x|| ≥ 0 for any ~x, and ||~x|| = 0 if and only if ~x = ~0.

This result also makes sense geometrically, since the length of a vector
can’t be negative and the only way to get a vector ~x to have length 0 is to
make ~x the point at the origin, i.e., ~0.
To see how the norm interacts with scalar multiplication, let’s consider ~x
in Rn and a scalar r. Then
   
x1 rx1
 x2   rx2  p
   
||r~x|| = r  .  =  .  = (rx1 )2 + (rx2 )2 + · · · (rxn )2
 ..   .. 
xn rxn
p p
= r2 (x 1 )2 + r2 (x 2)
2 + · · · r2 (xn )2 =
r2 ((x1 )2 + (x2 )2 + · · · (xn )2 )
√ p p
= r2 (x1 )2 + (x2 )2 + · · · (xn )2 = |r| (x1 )2 + (x2 )2 + · · · (xn )2
= |r|(||~x||).
Thus multiplication by a positive scalar can be done before or after the norm
without changing the answer. Multiplication by a negative scalar multiplies
the norm by the absolute value of the scalar. Geometrically this makes sense,
because scalar multiplication multiplies the length of the vector by r and
reverses its direction if r is negative. Since reversing the direction of a vector
does nothing to its length, only the magnitude of r matters when figuring out
the effect on the length of the vector.
 
3
Example 6. Verify ||r~x|| = |r|(||~x||) when ~x = and r = −2.
4
The left-hand side is ||r~x||, which in our case is
   
3 −6 p √
−2 = = (−6)2 + (−8)2 = 100 = 10.
4 −8

The right-hand side is |r|(||~x||), which is


    p
3 √
| − 2| = 2 32 + 42 = 2 25 = 2(5) = 10.
4

Since both sides are equal, we are done.

The interaction between vector addition and the norm is more complicated.
This shouldn’t surprise us, since we know from 2D geometry that the length
of the sum of two vectors is unlikely to be the sum of their lengths. However,
we are all familiar with the idea that for two vectors ~x and ~y in R2 at right
angles we can use the Pythagorean theorem to say that
||~x||2 + ||~y ||2 = ||~x + ~y ||2
314 3 Computational Vector Geometry

as shown in Figure 5.3.

������� ���� ��

1

�+�

-1 1 2 3

-1

Figure 5.3: Pythagorean theorem with vectors

In the next section we’ll generalize our idea of right angle as we have our
idea of length, after which we can tackle a generalization of the Pythagorean
Theorem. For now, let’s finish up this section by introducing and exploring
the idea of normalizing a vector.
The main idea here is that many computations are easier if the length of
your vector is 1. This is why if you ask a mathematician to embed a square
in R2 , they’ll usually choose to make each of its sides have length 1. Vectors
of length 1 are usually called unit vectors. If we aren’t lucky enough to start
out with ||~x|| = 1, we can fix that by multiplying ~x by the scalar ||~x1|| . This is
called normalizing ~x. Geometrically, normalizing a vector changes its length
but not its direction, which we can see because the normalized vector lies
along the same line as ~x since it is in ~x’s span. This is illustrated in Figure
5.4.
4

5.1 Length 315

x
1
x
x

-1 1 2 3 4

-1

Figure 5.4: Normalizing a vector

However, this is generally done computationally even when we can draw a


picture, because it is both quicker and more accurate.
 
−4
Example 7. Find the unit vector in the same direction as ~x =  0 .
3
If we think about it, the unit vector in the same direction as a given vector
is just the normalization of that vector, so this question is really asking us to
normalize ~x.
To do this, we first need to find ||~x|| which is
p √
(−4)2 + 02 + 32 = 25 = 5.

Next we need to divide ~x by 5, i.e., multiply by 51 . This gives us


   −4 
−4
1   5 
0 =  0 .
5
3 3
5

   −4 
−4 5
 
Therefore the unit vector in the same direction as  0  is  0 .
3 3
5

Recall from Chapter 4 that the stable stage distribution of a population is


the eigenvector of length 1 with the largest eigenvector, aka population growth
rate. At that point, we couldn’t compute that, because we didn’t have a way
to find a vector of length 1 in the span of a given vector. Now we can do this
by normalizing the basis vector of that eigenspace.
316 Computational Vector Geometry

Example 8. Find the stable stage distribution of the population from 4.3’s
Example 7.

We computed in 4.3 that


our 
population’s
 stable stage distribution was in
 0.5 
the eigenspace E1 = Span  1  . This vector has length
 
1
 
0.5 p √
 1  = (0.5)2 + (1)2 + (1)2 = 2.25 = 1.5.
1

Normalizing our spanning vector gives us


  1
0.5
1    32 
1 = 3 .
1.5
1 2
3
1
3
 
This means our stable stage distribution is  23 .
2
3

If we are dealing with more than one vector, we can’t normalize each of
them without changing the relationship between their lengths. However, we
can preserve the basics of the situation while normalizing one of the vectors,
say ~x, by multiplying all our vectors by the scalar needed to normalize ~x.

  9. Suppose we
Example  are looking at the triangle formed by the two vectors
4 2
~v1 = and ~v2 = , but we would really prefer to look at the similar
5 1
triangle whose longest side has length 1. Find the two vectors which define
that similar triangle.

To give ourselves a better idea what’s going on here, let’s look at our
original triangle.
5.1 Length 6 317

��
2

��

-1 1 2 3 4 5 6

-1

The vector ~v1 is clearly the longest side, so we can solve this problem by
finding the scalar needed to normalize ~v1 and then multiplying both vectors
by it. Since
p √
||~v1 || = 42 + 52 = 41

we’ll need to multiply both of our vectors by √1 . This gives us


41

  " √4 #  
1 4 41 .63
√ = ≈
41 5 √5 .78
41

and
  " √2 #  
1 2 41 .31
√ = ≈ .
41 1 √1 .16
41

Thus our new similar triangle


  with longest
 side of length 1 is given (ap-
.63 .31
proximately) by the vectors and . We can check this geometrically
.78 .16
by plotting both triangles together as seen below.
6
318 Computational Vector Geometry

-1 1 2 3 4 5 6

Exercises 5.1. -1
   
−3 2
1. Compute  1  ·  7  or say it is impossible.
5 −2
 
−1  
2. Compute  1  · 4 or say it is impossible.
6
13
   
2 1
3. Compute −1 · 2 or say it is impossible.
8 0
   
5 −2
4. Compute · or say it is impossible.
7 1
5. Suppose ~v · ~u = 8. Find ~v · (4~u).
6. Suppose ~v · ~u = 4, ~v · w
~ = −9, and ~u · w
~ = 12. Find ~v · (~u + w).
~
 
4
7. Compute −1 .
2
 
−1
1
8. Compute  1 .

−1
 
0
9. Compute 5 .
7
5.1 Length 319
 
−2
10. Compute .
6
   
7 3
11. Find the distance between and .
−2 −1
   
2 3
12. Find the distance between  8  and 9.
−3 1
   
4 7
13. Find the distance between −5 and −2.
0 4
   
5 4
−2 0
14. Find the distance between    
 0  and −1.
6 8
15. Suppose k~v k = 18. Find − 12 ~v .
 
5
16. Normalize .
12
 
6
17. Normalize  0 .
−6
 
1
0
18. Normalize  1 .

−1
 
−2
19. Normalize  1 .
0
20. Give an example of a situation when it is convenient to be able to
compute length without a picture?
320 Computational Vector Geometry

5.2 Orthogonality
In the last section we explored a way to compute the length of a vector in a
non-geometric way. Toward the end of the section we wanted to generalize the
Pythagorean theorem, but didn’t have a good notion of how to tell vectors
are at right angles without using a picture. In this section we’ll develop a
computational test for that and use it to explore several related ideas.
We’ve already developed a quick algebraic way to compute the dot product
of two vectors, but it turns out there is also a geometric formula for ~x ·~y which
computes the dot product in terms of the lengths of the vectors and the angle
between them.

Theorem 1. Let ~x and ~y be in Rn . Then ~x · ~y = ||~x||||~y || cos(θ) where θ is


the angle between ~x and ~y .

In other words, the dot product of two vectors can also be found by
multiplying together their lengths and the cosine of the angle between them.
While this formula holds for any n, I’ll provide an explanation here in R2
so we can draw pictures. The first step in our explanation is to rewrite our
vectors ~x and ~y in polar
4 coordinates. If we let αx be the
 angle between  ~x and
||~x|| cos(αx )
the x-axis as shown in Figure 5.5, then we have ~x = .
||~x|| sin(αx )

x sin[αx ]
2

1 x

αx

-1 1 2 3 4
x cos[αx ]

-1

Figure 5.5: ~x in polar coordinates


5.2 Orthogonality 321
 
||~y || cos(αy )
Similarly we can rewrite ~y as . Taking the dot product of ~x
||~y || sin(αy )
and ~y using our algebraic formula from 5.1 now gives us
   
||~x|| cos(αx ) ||~y || cos(αy )
~x · ~y = ·
||~x|| sin(αx ) ||~y || sin(αy )
= ||~x|| cos(αx )||~y || cos(αy ) + ||~x|| sin(αx )||~y || sin(αy )
= ||~x||||~y || cos(αx ) cos(αy ) + ||~x||||~y || sin(αx ) sin(αy )
 
= ||~x||||~y || cos(αx ) cos(αy ) + sin(αx ) sin(αy ) .

Using the trigonometric identity cos(β) cos(γ) + sin(β) sin(γ) = cos(γ − β),
this gives us
~x · ~y = ||~x||||~y || cos(αy − αx ).
Now we need to3 relate the difference between the angles associated with
our two vectors to the angle between them.

2 y

x
1

αx
αy

-1 1 2 3 4

Figure 5.6: The angle between vectors


-1
If we look at Figure 5.6, we can see that the angle between ~x and ~y is
θ = αy − αx . Therefore ~x · ~y = ||~x||||~y || cos(θ) as claimed.
This relates back to our original goal of checking when vectors are at right

angles because when
-2 θ = 90 = π/2 we have cos(θ) = 0. This means we can
use the dot product to define perpendicular vectors as follows.

Definition. ~ in Rn are orthogonal if ~v · w


Two vectors ~v and w ~ = 0.

As with saying norm instead of length, we’ll typically use orthogonal


instead of perpendicular or at right angles.
322 Computational Vector Geometry
   
1 5
Example 1. Show 2 and −1 are orthogonal.
3 −1
The dot product of these two vectors is
   
1 5
2 · −1 = 1(5) + 2(−1) + 3(−1) = 0.
3 −1

Since their dot product is zero, they are orthogonal.


   
1 1
Example 2. Show 2 and  1  are not orthogonal.
3 −5
As in the previous example, we take the dot product of these vectors and
get    
1 1
2 ·  1  = 1(1) + 2(1) + 3(−5) = −12.
3 −5
Since the dot product is nonzero, these vectors are not orthogonal.

With this analog for right angles, we can state the following theorem.

~ from Rn 4are orthogonal, then ||~v ||2 +||w||


Theorem 2. If ~v and w ~ 2 = ||~v +w||
~ 2.

In R2 , this is the Pythagorean Theorem for the triangle with sides a, ~v ,


and ~v + w
~ as shown by Figure 5.7, since a has the same length as w.
~
3

2
v +w

v
1

-2 -1 1 2 3

Figure 5.7: Vector Pythagorean theorem


-1
5.2 Orthogonality 323
   
1 5
Example 3. Check that Theorem 2 holds for 2 and −1.
3 −1
These are our vectors from Example 1, so we already know they are
orthogonal. To check that they satisfy Theorem 2, let’s compute each side
of its equation to make sure they match. For our vectors,
  2   2
1 5
~ 2 = 2 + −1
||~v ||2 + ||w||
3 −1
p 2 p 2
= 12 + 22 + 32 + 52 + (−1)2 + (−1)2
= 14 + 27 = 41.

The other side of the equation is


    2   2
1 5 6
~ 2 = 2 + −1 = 1
||~v + w||
3 −1 2
p 2
= 62 + 12 + 22 = 41.

Since the two sides of the equation are equal, the theorem is satisfied.

   
1 1
Example 4. Check that Theorem 2 does not hold for 2 and  1 .
3 −5
These are our vectors from Example 2, so we know they aren’t orthogonal.
As in the previous example, we can tackle this by computing each side of the
theorem’s equation. For our vectors,
  2   2
1 1
~ 2 = 2 +  1 
||~v ||2 + ||w||
3 −5
p 2 p 2
= 12 + 2 2 + 3 2 + 12 + 12 + 52
= 14 + 27 = 41.
324 Computational Vector Geometry

The other side of the equation is


    2   2
1 1 2
||~v + w||
~ =2 2 +  1  =  3
3 −5 −2
p 2
= 22 + 32 + (−2)2 = 17.

Since the two sides of the equation are different, the theorem doesn’t hold.

Suppose ~v is orthogonal to w,~ i.e., ~v · w


~ = 0. This also means any multiple
of ~v is orthogonal to w ~ = r(~v · w)
~ since (r~v )· w ~ = 0. Therefore it makes sense to
talk about a set of vectors which are orthogonal to w. ~ Additionally, we know ~v
is orthogonal to any multiple of w ~ since ~v · (rw)
~ = r(~v · w)
~ = 0. Geometrically,
this is the same as saying that if ~v3is orthogonal to w, ~ then any vector on the
line spanned by ~v is orthogonal to any vector on the line spanned by w ~ as
shown in Figure 5.8.

Span {w }

1
Span {v }

w
v

-2 -1 1 2

Figure 5.8: Orthogonal spans


-1

Now that we understand what it means for two vectors to be orthogonal to


each other, let’s extend that notion to talk about when a vector is orthogonal
to a whole set of vectors.

Definition. Let W be a subset of Rn . The orthogonal complement of W is


W ⊥ = {~v in Rn |~v · w ~ in W }.
~ = 0 for every w

If this notation is confusing, we can restate this in words by saying that


the orthogonal complement of W is the set of all vectors which are orthogonal
to everything in W .
5.2 Orthogonality 325
 
−6
Example 5. Show ~v = is orthogonal to W where W is the line y = 2x
3
2
in R .

Before we can check that ~v is in W ⊥ , we need  to write W as a set of


x
vectors in R2 . Since we often write 2-vectors as , adding in the condition
  y
x
that y = 2x means W = .
2x
Now that we have W written as a set of 2-vectors, we can check that ~v is
orthogonal to anything in W by taking the dot product of ~v with a generic
vector from W . In our case, this is
   
−6 x
· = −6(x) + 2(3x) = 0.
3 2x

Since this dot product is always zero, ~v is orthogonal to everything in W .


Therefore ~v is in W ⊥ .

   
1  5x1 + x2 
Example 6. Show ~v = 2 is not in W ⊥ for W = −x1 + 2x2  .
 
3 −x1 − x2
A vector is in W ⊥ only if it is orthogonal to everything in W . Therefore
to show ~v is not in W ⊥ , we simply need to find some w ~ in W which is not
orthogonal to ~v .  
1
If we let x1 = 0 and x2 = 1, we get w ~ =  2 . Taking the dot product
−1
with ~v gives us
   
1 1
2 ·  2  = 1(1) + 2(2) + 3(−1) = 2.
3 −1

This means ~v is not orthogonal to w, ~ and hence ~v is not orthogonal to


everything in W . Therefore ~v is not in W ⊥ .  
5
Notice that setting x1 = 1 and x2 = 0 actually gives us w ~ = −1 which
−1
we showed in Example 1 was orthogonal to ~v . However, this is irrelevant, since
W ⊥ is the set of vectors orthogonal to all of W .

In the previous examples we were only testing whether or not a single


vector was in W ⊥ , and our set W was fairly simple. In general this is a
326 Computational Vector Geometry

complicated problem, however, if W is a span then we have the following


easier criteria for W ⊥ .

Theorem 3. If W = Span{w ~ k }, then ~v is in W ⊥ if ~v · w


~ 1, . . . , w ~ i = 0 for
w ~ k.
~ 1, . . . , w

In other words, if W is a span then it is enough to check whether ~v is


orthogonal to each spanning vector. If it is, ~v is automatically orthogonal to
everything else in W . You will check that this is true in Exercise 22 of this
section.
     
1  −3 7 
Example 7. Show that ~v = 2 is in W ⊥ for W = Span  0  , −2 .
 
3 1 −1
Rather than checking that ~v is orthogonal to every w
~ in W , we can instead
just show ~v is orthogonal to the two spanning vectors of W . This is
   
1 −3
2 ·  0  = 1(−3) + 2(0) + 3(1) = 0
3 1

and    
1 7
2 · −2 = 1(7) + 2(−2) + 3(−1) = 0.
3 −1
Since both of these dot products are zero, ~v is orthogonal to all spanning
vectors of W and hence by Theorem 2 we know ~v is in W ⊥ .

   

 4 −1 
    
−2 , 5  .
Example 8. Find W ⊥ where W = Span   1   2 

 
 
0 9
 
x1
x2 
To be in W , a vector ~x =  
⊥  needs to satisfy the equations
x3 
x4

~ 1 · ~x = 4x1 − 2x2 + x3 = 0
w
~ 2 · ~x = −x1 + 5x2 + 2x3 + 9x4 = 0
w

This means we are really solving the linear system whose augmented coefficient
matrix is  
4 −2 1 0 0
.
−1 5 2 9 0
5.2 Orthogonality 327

This matrix has reduced echelon form


 
1 0 12 1 0
0 1 12 2 0

so we have x1 = − 12 x3 − x4 and x2 = − 12 x3 − 2x4 where x3 and x4 are free


variables. Thus  1 

 − 2 x3 − x4 
 1 
⊥ − 2 x3 − 2x4 

W =   .

 x3 
 
x4

In the case where W is a subspace of Rn (which is automatically true if


W is a span), we also get the following fact.

Theorem 4. If W is a subspace of Rn , then W ⊥ is also a subspace of Rn .

To see that this is true, we can use the subspace test. Since ~0 is orthogonal
to everything (~v · ~0 = 0 for every ~v in Rn ), we clearly have ~0 in W ⊥ . If ~x and
~y are in W ⊥ , then we have ~x · w ~ = 0 and ~y · w
~ = 0 for all w
~ in W . Therefore
(~x + ~y ) · w
~ = (~x · w)
~ + (~y · w)
~ =0+0=0
so ~x + ~y is in W . Finally, suppose ~x is in W ⊥ , so ~x · w

~ = 0 for every w
~ in
W , and r is any scalar. Then
(r~x) · w
~ = r(~x · w)
~ = r(0) = 0
so r~x is in W ⊥ . Since W ⊥ satisfies all three conditions of our subspace check,
it is a subspace of Rn .

Exercises 5.2.
   
2 1
1. Are −4 and 1 orthogonal?
1 2
   
−3 1
2. Are  0  and  5  orthogonal?
6 −1
   
8 1
−1 1
3. Are 
3
 and   orthogonal?
−2
2 1
   
4 1
−4 1
4. Are 
1
 and   orthogonal?
1
0 1
328 Computational Vector Geometry
   
1 7
5. Does Theorem 2 hold for ~v =  0  and w ~ = 12?
−1 7
   
3 −6
6. Does Theorem 2 hold for ~v = and w
~= ?
4 3
 
−5
7. Find a nonzero vector which is orthogonal to  5 .
1
 
1
8. Find a nonzero vector which is orthogonal to  2 .
−9
   
−2 2x1 − 4x2
9. Is in W ⊥ for W = ?
1 4x1 + x2
   
3 ⊥ 2x1 − 4x2
10. Is in W for W = ?
−1 −6x1 + 12x2
    
1  x1 + x2 
11. Is  1  in W ⊥ for W = x1 − 6x2  ?
 
−1 2x1 − x2
   
−3  5x1 + x2 
12. Is  4  in W ⊥ for W =  2x1 − x2  ?
 
1 7x1 + 7x2
     
−1  3 2 
13. Is −1 in W ⊥ for W = Span 5 , 0 ?
 
2 4 1
     
0  −3 4 
14. Is 1 in W ⊥ for W = Span  3  , −3 ?
 
1 4 3
     
4  0 2 
15. Is −2 in W ⊥ for W = Span 1 , 3 ?
 
−1 2 5
     
1  −1 2 
16. Is 3 in W ⊥ for W = Span  2  , −1 ?
 
1 −5 1
   
 −1 3 
17. Compute W ⊥ for W = Span  2  , −1 ?
 
1 0
   
 4 −3 
18. Compute W ⊥ for W = Span  2  ,  1  ?
 
−1 2
5.2 Orthogonality 329
   

 6 −3 
    
−2 , 2  ?
19. Compute W ⊥ for W = Span 
−1  0 

 
 
1 1
   

 1 0 
    
0 , 1  ?
20. Compute W ⊥ for W = Span 
−2  5 

 
 
5 −6
     

 2 −1 4 
      
−2 2 −4
21. Compute W ⊥ for W = Span  , ,  ?
   4  −5
 0

 

6 1 2
22. (a) Let W = Span{w ~ 1, w~ 2 } be a subspace of Rn , and ~v be any
n-vector. Show that if ~v · w ~ 1 = 0 and ~v · w~ 2 = 0 then ~v is in
W ⊥.
(b) Let W = Span{w ~ k } be a subspace of Rn , and let ~v be
~ 1, . . . , w
an n-vector. Show that if ~v · w ~ i = 0 for w ~ k then ~v is in
~ 1, . . . , w
W ⊥.
330 Computational Vector Geometry

5.3 Orthogonal Projection


In this section we’ll use our concept of orthogonality to develop a technique
which allows us to decompose a vector into two parts: one in the span of
another vector or set of vectors and the other orthogonal to it. This is
particularly applicable to physics, where we often want to decompose a force
vector into its parts along a certain line or plane of motion and orthogonal
to that motion. (There the part orthogonal to the line or plane of motion is
commonly called the normal vector.)
Let’s start in the simplest possible case, where our span W has only one
spanning vector ~y . Here we want to take our starting vector ~x and write it as a
sum of two pieces: ~x = w+~~ v . The first piece w~ should be in the span of ~y , which
we can think of either computationally as all multiples of ~y or geometrically
as the line defined by ~y . The second piece ~v should be orthogonal to W , i.e.,
in W ⊥ . Since W is the span of ~y , Theorem 2 from the last section tells us
we simply need ~v orthogonal to ~y . Geometrically, this means we want to find
~ along ~y ’s line and 5~v orthogonal to ~y ’s line so that ~x is the diagonal of the
w
parallelogram formed by w ~ and ~v as in Figure 5.9.

4
x =w+ v

Span {y }

v 1 w

-1 1 2 3 4 5

-1

Figure 5.9: Decomposing ~x

Putting this more computationally, we want to find two vectors w ~ and ~v


so that
~x = w ~ = r~y for some scalar r, and ~v · ~y = 0.
~ + ~v , w
5.3 Orthogonal Projection 331

We know the values of ~x and ~y , and want to solve for r (and hence w)
~ and
~v . We can use the first two equations to solve for ~v by plugging w~ = r~y into
~x = w
~ + ~v to get
~x = r~y + ~v .
Solving this for ~v gives us
~v = ~x − r~y .
It may feel as if we are stuck, but remember we have one more equation we
haven’t used yet: ~v · ~y = 0. Plugging ~v = ~x − r~y into this equation gives us

(~x − r~y ) · ~y = 0.

Using the properties of the dot product explored in 5.1 to expand the left-hand
side of this equation gives us

(~x − r~y ) · ~y = ~x · ~y − (r~y ) · ~y = ~x · ~y − r(~y · ~y ).

This means we have


~x · ~y − r(~y · ~y ) = 0.
Now that r is the only unknown quantity in this equation, we can solve for it
to get
r(~y · ~y ) = ~x · ~y
or
~x · ~y
r= .
~y · ~y
(Note that since dot products are real numbers, this value of r is a scalar.)
Now that we know the value of r, we can plug it back into the equations for
~ and ~v to get
w  
~x · ~y
~=
w ~y
~y · ~y
and  
~x · ~y
~v = ~x − ~y .
~y · ~y
Both of these vectors have special names which are defined below.
 
x·~
~ y
Definition. The orthogonal projection of ~x onto ~y is y ·~
~ y ~y and the
 
·~
component of ~x orthogonal to ~y is ~x − ~x~y·~
y
y.
y ~

These names should intuitively make sense, because the orthogonal


projection of ~x onto ~y is the piece of ~x that lies along the line defined by
~y and the component of ~x orthogonal to ~y is the piece of ~x that lies in the
orthogonal complement of ~y ’s span.
332 Computational Vector Geometry

   
2 3
Example 1. Find the orthogonal projection of onto and the
    9 1
2 3
component of orthogonal to .
9 1

Since these formulas are not symmetric in ~x and ~y , it is important to start


by identifying which vector to plug in for ~x and which to plug in for ~y . The
vector that we’re computing the projection of is always ~x and the vector we’re
projecting onto is always ~y , so in this example
   
2 3
~x = and ~y = .
9 1

Next we need to compute ~x · ~y and ~y · ~y . Plugging in our ~x and ~y gives us


   
2 3
~x · ~y = · = 2(3) + 9(1) = 15
9 1

and
   
3 3
~y · ~y = · = 32 + 12 = 10.
1 1

Plugging these dot products into our formula for the orthogonal projection
gives us
      
15 3 3 4.5
= (1.5) = .
10 1 1 1.5
   
2 3
To find the component of orthogonal to , we just need to subtract
9 1
 
2
the orthogonal projection from . This gives us
9
     
2 4.5 −2.5
− = .
9 1.5 7.5

     
2 3 4.5
Therefore, the orthogonal projection of onto is , and the
     9
 1 1.5
2 3 −2.5
component of orthogonal to is .
9 1 7.5
Geometrically, you can think of finding the orthogonal projection of ~x
onto ~y as shining a light perpendicularly down onto the line spanned by ~y and
looking at the shadow cast by ~x. This is shown in the picture below.
5.3 Orthogonal Projection 333

10


6  

4

����  

����������
����������
2 4 6 8 10

The component of ~x orthogonal to ~y is then the vector perpendicular to


~y ’s line that is the right length so its sum with the orthogonal projection is ~x,
which is shown in the following picture.
10


 
4 �

����  

����������
��������� 2

����������
����������
-2 2 4 6 8 10

-2
334 Computational Vector Geometry

Now that we understand how to compute the orthogonal projection and


orthogonal component with respect to the span of a single vector, let’s extend
that to the span of more vectors. Here we’re still looking to decompose a
vector ~x into the sum of two vectors ~v and w ~ with w ~ in our span W and ~v
orthogonal to W . In other words, we want ~x = w ~ + ~v where w ~ is in W and ~v
is in W ⊥ .
Suppose W = Span{w ~ k }. We know how to create the orthogonal
~ 1, . . . , w
projection of ~x onto the span of any one of the w ~ i ’s, by plugging w~ i in for ~y
in the formula for the orthogonal projection of ~x onto ~y . Let’s try doing that
for each of our spanning vectors and adding up the results. This gives us
   
~x · w
~1 ~x · w
~k
w~1 + · · · + w
~ k.
~1 · w
w ~1 ~k · w
w ~k

This vector is a linear combination of w ~ 1, . . . , w~ k , so it is in W . Let’s try using


this as our w~ and see if we get lucky, by which I mean, see if the resulting
~ is in W ⊥ . To check this, we need to compute
~v = ~x − w
     
~x · w
~1 ~x · w~k
~x − ~1 + · · · +
w w
~k ·w~i
~1 · w
w ~1 w~k · w ~k

for w ~ k . This dot product can be expanded to give


~ 1, . . . , w
   
~x · w
~1 ~x · w
~k
~x · w~i − ~1 · w
w ~i − · · · − ~k · w
w ~ i.
~1 · w
w ~1 ~k · w
w ~k

The ith term of this sum is


 
~x · w
~i
− ~i · w
w ~i
~i · w
w ~i

and we can cancel the w ~i · w


~ i on the top and bottom to get −~x · w ~ i . This
cancels the first term ~x · w
~ i of our entire dot product, so we are left with
   
~x · w
~1 ~x · w
~ i−1
− ~1 · w
w ~i − · · · − ~ i−1 · w
w ~i
~1 · w
w ~1 ~ i−1 · w
w ~ i−1
   
~x · w
~ i+1 ~x · w
~k
− w~ i+1 · w
~i − · · · − ~k · w
w ~ i.
~ i+1 · w
w ~ i+1 ~k · w
w ~k

We’d like this sum to be zero, but unfortunately we don’t know much
about the dot products of our spanning vectors with each other. However,
this does suggest that we consider the special case where our spanning vectors
are orthogonal.

Definition. An orthogonal set is a set of vectors ~v1 , . . . , ~vk in Rn where ~vi


is orthogonal to ~vj whenever i 6= j.
5.3 Orthogonal Projection 335

Geometrically, we can think of an orthogonal set as one where each vector is


at right angles to all the other vectors, which means each points in a completely
different direction from the others. (This is harder to imagine once we move
beyond 3 dimensions.)
     
0 −3 5
Example 2. Show that −2,  2 , and 6 form an orthogonal set.
4 1 3
To show this we need to verify that all three possible dot products are
zero. These three dot products are
   
0 −3
−2 ·  2  = 0(−3) + (−2)(2) + 4(1) = 0,
4 1
   
−3 5
 2  · 6 = −3(5) + 2(6) + 1(3) = 0,
1 3
and    
0 5
−2 · 6 = 0(5) + (−2)(6) + 4(3) = 0.
4 3
Since all three dot products are zero, this is an orthogonal set.

    
0 −3 1
Example 3. Show −2,  2 , and 4 are not an orthogonal set.
4 1 2
Unlike in the previous example where we needed to show that all possible
dot products equaled zero, here we just need to find one dot product which
isn’t zero. The first two vectors are the same as in Example 2, so their dot
product is zero. That doesn’t help, so we’ll need to compute dot products
involving the third vector.
These are    
0 1
−2 · 4 = 0(1) + (−2)(4) + 4(2) = 0
4 2
and    
−3 1
 2  · 4 (−3)(1) + 2(4) + 1(2) = 7.
1 2
Since the dot product of the second and third vectors is nonzero, this is not
an orthogonal set.

Now that we understand orthogonal sets, let’s return to our attempt to


336 Computational Vector Geometry

decompose ~x with respect to W = Span{w ~ k } but add the condition


~ 1, . . . , w
that the spanning vectors are an orthogonal set. In this case, w ~i · w
~j = 0
whenever i 6= j, which we can use to simplify our previous computation.
There we were trying to use
   
~x · w
~1 ~x · w~k
~=
w ~1 + · · · +
w w
~k
~1 · w
w ~1 ~k · w
w ~k

as our orthogonal projection of ~x onto W . We were checking whether ~v = ~x − w ~


was in W ⊥ by computing the dot product ~v · w ~ i , and had reduced this down
to
   
~x · w
~1 ~x · w
~ i−1
− ~1 · w
w ~i − · · · − ~ i−1 · w
w ~i
w~1 · w~1 ~ i−1 · w
w ~ i−1
   
~x · w
~ i+1 ~x · w
~k
− w~ i+1 · w
~i − · · · − ~k · w
w ~i
w~ i+1 · w ~ i+1 ~k · w
w ~k

before we got stuck. Every term of this sum has as its rightmost factor a dot
product of the form w ~j · w
~ i with j 6= i, and now all such dot products are
zero because the w~ i s are an orthogonal set. Therefore we get ~v · w
~ i = 0, so
~ is in W ⊥ . This allows us to make the following definition.
~v = ~x − w

Definition. Let W = Span{w ~ k } where w


~ 1, . . . , w ~ k are an orthogonal
~ 1, . . . , w
x·w
set. The orthogonal projection of ~x onto W is ( w~~x1·w~1
)w~ 1 · · ·+(w
+ ~ ~k
~ k )w
~ k and
 ·w
~1 ~ k ·w 
the component of ~x orthogonal to W is ~x − ( w~~x1·w ~1
~ 1 )w
·w ~ 1 + · · · + ( w~~xk·w
~k
~ k )w
·w ~k .

Notice that the orthogonal projection’s coefficient on each w ~ i is the same


as the one for the orthogonal projection onto a single vector ~y .
   
 −3 5 
Example 4. Let W = Span  2  , 6 . Find the orthogonal projection
 
  1 3
1 1
of 1 onto W and the component of 1 orthogonal to W .
8 8
In the language of our formulas for the orthogonal projection onto W and
component orthogonal to W , we have
     
−3 5 1
~1 =  2  , w
w ~ 2 = 6 , and ~x = 1 .
1 3 8

In order to use the formulas in the previous definition, we first need to check
that the spanning vectors of W are an orthogonal set. Taking their dot product
5.3 Orthogonal Projection 337

gives us    
−3 5
 2  · 6 = −3(5) + 2(6) + 1(3) = 0
1 3
so w
~ 1 and w
~ 2 are orthogonal and we can proceed as planned.
The formulas for the orthogonal projection of ~x onto W and component of
~x orthogonal to W involve the dot products w ~ 1 · ~x, w
~ 2 · ~x, w
~1 · w ~2 · w
~ 1 , and w ~ 2.
For our vectors, we get
   
−3 1
w~ 1 · ~x =  2  · 1 = (−3)(1) + 2(1) + 1(8) = 7,
1 8
   
5 1
~ 2 · ~x = 6 · 1 = 5(1) + 6(1) + 3(8) = 35,
w
3 8
   
−3 −3
w~1 · w ~ 1 =  2  ·  2  = (−3)2 + 22 + 12 = 14,
1 1
and    
5 5
~ 2 = 6 · 6 = 52 + 62 + 32 = 70.
~2 · w
w
3 3
Plugging these dot products into the formulas from our definition, we get
that the orthogonal projection of ~x onto W is
     
      −3   5 1
~x · w
~1 ~x · w
~2 7   35    
~1 +
w ~2 =
w 2 + 6 = 4
~1 · w
w ~1 ~2 · w
w ~2 14 70
1 3 2
and the component of ~x orthogonal to W is
     
     1 1 0
~x · w
~1 ~x · w
~2
~x − ~1 +
w w
~2 = 1 − 4 = −3 .
~1 · w
w ~1 ~2 · w
w ~2
8 2 6
If we want to check that these are correct, we can check that the component
orthogonal to W is in W ⊥ . To do this we need to check its dot products with
~ 1 and w
w ~ 2 . These are
   
0 −3
−3 ·  2  = 0(−3) + (−3)(2) + 6(1) = 0
6 1
and    
0 5
−3 · 6 = 0(5) + (−3)(6) + 3(6) = 0.
6 3
338 Computational Vector Geometry


0
Since they are both zero, −3 is in W ⊥ as required.
6

Let’s wrap up this section by doing a physics problem of the type discussed
at the beginning of the section.

Example 5. A 1.5 kg metal sled is sliding down a hill with a flat slope of 45◦
downward. The only force acting on the sled is gravity, which pulls it straight
downward with a force of −14.7 N. Find the component of the gravitational
force which is pushing the sled along the slope of the hill.

Often in examples like this it can be helpful to sketch the situation. Ours
looks like this:

����

��������
�������

We can view this as an orthogonal projection problem by placing the sled


at the origin and viewing the hill as W . In this context, W is the line through

the origin with a negative  45 slope, i.e., the line y = −x. This means we can
1
view W as the span of . (Note that there are literally infinitely many
−1
other choices for our spanning
 vector.)
 We can view the force of gravity acting
0
on the sled as the vector since it pulls straight downward with a total
−14.7
force of −14.7.
Within this framework, the component of gravity pulling  the sledalong the
0
hill is just the orthogonal projection of gravity’s vector ~x = onto the
  −14.7
1
hill’s spanning vector ~y = . As in Example 1, we can start by computing
−1
   
0 1
~x · ~y = · = 0(1) + (−14.7)(−1) = 14.7
−14.7 −1
5.3 Orthogonal Projection 339

and    
1 1
~y · ~y = · = 12 + (−1)2 = 2.
−1 −1
Plugging these into our orthogonal projection formula we get
      
~x · ~y 14.7 1 7.35
~y = = .
~y · ~y 2 −1 −7.35

Recalling the application which motivated this orthogonal projection


computation, we can now say that the component  of
 the gravitational force
7.35
pulling the sled along the slope of the hill is . We can even go a step
−7.35
further and compute the norm of this vector which is
p
(7.35)2 + (−7.35)2 ≈ 10.39

to conclude that a little more than 2/3 of the magnitude of our gravitational
force is moving the sled along the hill.

Exercises 5.3.
 
−3
1. If we want to compute the orthogonal projection of  4  onto
  −4
0
 2 , which vector is ~x in our formula and which is ~y ?
−2
   
15 4
2. If we want to compute the orthogonal projection of onto ,
−2 9
which vector is ~x in our formula and which is ~y ?
   
8 2
3. Compute the orthogonal projection of onto and the
    −1 1
8 2
component of orthogonal to .
−1 1
   
−1 1
4. Compute the orthogonal projection of onto and the
    7 −1
−1 1
component of orthogonal to .
7 −1
   
3 1
5. Compute the orthogonal projection of −1 onto −1 and the
2 1
   
3 1
component of −1 orthogonal to −1.
2 1
340 Computational Vector Geometry
   
6 1
6. Compute the orthogonal projection of −5 onto  0  and the
   8 −1
6 1
component of −5 orthogonal to  0 .
8 −1
     
 8 0 1 
7. Is −4 , 0 , 2 an orthogonal set?
 
0 1 0
     
 −4 −1 5 
8. Is  6  ,  3  , 2 an orthogonal set?
 
2 −11 4
     

 3 2 4 
      
0 4 13
9. Is   ,   ,   an orthogonal set?
−5  3   1 

 
 
−1 −9 7
     

 6 −2 4 
      
1 ,   ,  3  an orthogonal set?
6
10. Is  2 −3 −7

 
 
3 4 −5
     
 1 0  3
11. Let W = Span  0  , −2 and ~x = 8. Compute the
 
−1 0 5
orthogonal projection of ~x onto W and the component of ~x
orthogonal to W .
     
 1 1  3
12. Let W = Span −2 , 1 and ~x = −4. Compute the
 
1 1 −5
orthogonal projection of ~x onto W and the component of ~x
orthogonal to W .
     

 1 −2   10
    3
−1 0
13. Let W = Span      
 2  ,  1  and ~x =  4 . Compute the

 
 
0 −1 −6
orthogonal projection of ~x onto W and the component of ~x
orthogonal to W .
     

 −1 0  2
     −2
−1 1
14. Let W = Span      
 1  , −1 and ~x = −3. Compute the

 
 
0 2 1
orthogonal projection of ~x onto W and the component of ~x
orthogonal to W .
5.3 Orthogonal Projection 341
 
2
15. What is the shortest distance between and the line spanned by
9
 
1
?
2
 
−2
16. What is the shortest distance between  3  and the plane spanned
−1
   
−1 2
by  1  and 0?
2 1
17. A man is dragging a sled up a hill by pulling on a rope attached to
the sled. The line formed by the rope has a slope of 3, and the hill
has a slope of 1. The man exerts 225 N of force in the direction of
the rope.
(a) What is the vector of the man’s force?
(b) What set of vectors represents the hill?
(c) How much of that force is exerted along the hill in the plane of
motion of the sled?
342 Computational Vector Geometry

5.4 Orthogonal Basis


In the last section, we happened onto the idea of an orthogonal set as part
of our exploration of orthogonal projections. In this section, we’ll merge the
ideas of an orthogonal set and a basis to create orthogonal bases for Rn , and
see why that might make our lives easier. Our observation from the previous
section that orthogonal vectors point in completely different directions should
remind you of our original geometric idea of linear independence as having no
vector in the span of any of the others. The following theorem formalizes that
connection.

Theorem 1. If ~v1 , . . . , ~vk are nonzero vectors in Rn which form an orthogonal


set, then they are linearly independent.

To see why this is true, suppose we have nonzero ~v1 , . . . , ~vk in Rn which are
an orthogonal set. We want to show that they are also linearly independent,
i.e., none of these vectors is in the span of the others. We’ll do this by assuming
one of the ~v s is in the span of the others, and showing that assumption
produces an impossible consequence. For notational ease I’ll assume ~v1 is in
the span of ~v2 , . . . , ~vk . This means

~v1 = a2~v2 + · · · + ak~vk

for some scalars a2 , . . . , ak . Because ~v1 is orthogonal to ~v2 , we have

0 = ~v1 · ~v2 = (a2~v2 + · · · + ak~vk ) · ~v2 = (a2~v2 ) · ~v2 + · · · + (ak~vk ) · ~v2


= a2 (~v2 · ~v2 ) + · · · + ak (~vk · ~v2 ).

The fact that the ~v s are an orthogonal set means ~vi · ~v2 = 0 for i 6= 2, so our
equation reduces to
0 = a2 (~v2 · ~v2 ).
Since ~v2 6= ~0, we know ~v2 · ~v2 6= 0. Therefore we must have a2 = 0. A similar
argument shows that each of our scalars a2 , . . . , ak must be zero. But this
means ~v1 = ~0, which is impossible since we required our orthogonal set to be
made up of nonzero vectors! Therefore we cannot have any of the ~v s in the
span of the others, and our orthogonal set must also be linearly independent.
While this theorem is great, we don’t want to get carried away and try
to reverse it. There are plenty of linearly independent sets which aren’t
orthogonal. Geometrically, we can see this by thinking of any two vectors
in R2 which aren’t on the same line but don’t lie at right angles to each other.
Since a basis is defined as a linearly independent spanning set, Theorem 1
tells us that having an orthogonal set gets us halfway to having a basis. This
leads to the following special case of a basis.
5.4 Orthogonal Basis 343

Definition. ~b1 , . . . , ~bn are an orthogonal basis for Rn if they are a basis for
Rn and an orthogonal set.

Example 1. Our standard basis for Rn is an orthogonal basis.


     
1 0 0
0 1  .. 
     
The standard basis for Rn is  .  ,  .  , · · · ,  .  , which we already know
 ..   ..  0
0 0 1
form a basis. The 1s do not occur in the same entry of any pair of these basis
vectors, so the dot product of any two of them is zero. This means they form
an orthogonal set and hence an orthogonal basis.

We could check that ~b1 , . . . , ~bn form an orthogonal basis by checking that
they are linearly independent and span Rn (and hence are a basis) and
then checking that they are an orthogonal set. However, Theorem 1 provides
a shortcut since orthogonality automatically implies linear independence.
Therefore we can check that ~b1 , . . . , ~bn form an orthogonal basis by checking
that they span Rn and are an orthogonal set. If we like, we can do even
less work by using Theorem 3 from 3.1. That theorem said that any linearly
independent set of n vectors was a basis for Rn . Combining this with Theorem
1 from this section we get the following:

Theorem 2. Any orthogonal set in Rn which contains n vectors is an


orthogonal basis for Rn .


    
0 −3 5
Example 2. Show that −2,  2 , and 6 form an orthogonal basis for
4 1 3
R3 .

These are the three vectors from 5.3’s Example 2, so we’ve seen that they
form an orthogonal set. Since there are three of them and we are in R3 ,
Theorem 2 tells us that our three vectors are an orthogonal basis for R3 .

   
1 1
Example 3. Show that and do not form an orthogonal basis for R2 .
1 −2
A set of vectors can fail to be an orthogonal basis either by failing to be
orthogonal or failing to be a basis. It’s easier to check whether or not these
two vectors are orthogonal, so let’s start there. Since
   
1 1
· = 1(1) + 1(−2) = −1 6= 0
1 −2
344 Computational Vector Geometry

these vectors are not orthogonal which means they do not form an orthogonal
basis for R2 .
(You can check if you like that they do form a basis for R2 , perhaps the
easiest way to see this is to show that the matrix with those two columns has
nonzero determinant.)

One reason to use an orthogonal basis is that it makes finding coordinate


vectors much faster. If you recall from 3.1, we can find the coordinate vector
of ~v with respect to a basis B = {~b1 , . . . , ~bn } by solving the vector equation
~v = x1~b1 + · · · + xn~bn
and setting  
x1
 
[~v ]B =  ...  .
xn
With row reduction, this is fairly straightforward, if sometimes a bit tedious.
It also requires us to compute every entry of the coordinate vector, i.e., we
cannot just compute x2 and ignore the rest of the problem. If our basis B is
orthogonal, then we have another way to compute coordinate vectors which
is both quicker and allows us to compute any entry of the coordinate vector
without computing the other entries.
To do this, let’s forget about the fact that B is a basis and simply think of it
as an orthogonal spanning set for Rn . We can use the techniques developed in
the last section to compute the orthogonal projection of any vector ~v onto the
space spanned by ~b1 , . . . , ~bn . Since W = Span{~b1 , . . . , ~bn } = Rn , we know ~v is
in W . This means when we decompose ~v as the sum of its orthogonal projection
onto W and its component orthogonal to W , the component orthogonal to
W will be ~0 since no part of ~v lies outside of W = Rn . This means ~v ’s
B-coordinate vector is simply its orthogonal projection onto Rn using the
orthogonal spanning set ~b1 , . . . , ~bn . As we saw in 5.3, this orthogonal projection
is ! !
~v · ~b1 ~ ~v · ~bn ~
b1 + · · · + bn
~b1 · ~b1 ~bn · ~bn
which gives the following formula for ~v ’s B-coordinate vector.
 
v ·~b1
~
  ~b1 ·~b1
 . 
Theorem 3. If B is an orthogonal basis for Rn , then [~v ]B =  .. .
 
v ·~bn
~
~bn ·~bn

Not only is this computation faster, but each entry’s computation is


independent of the computation of the other entries, so we can choose to
compute only those components of [~v ]B that we want.
5.4 Orthogonal Basis 345
     
 0 −3 5 
Example 4. Let B = −2 ,  2  , 6 be an orthogonal basis for R3 .
 
4 1 3 

−1
Compute the B-coordinate vector of ~v =  1 .
2
     
0 −3 5
Here we have ~b1 = −2, ~b2 =  2 , and ~b3 = 6. To use our formula
4 1 3
for [~v ]B , we need to compute the six dot products ~v · ~b1 , ~v · ~b2 , ~v · ~b3 , ~b1 · ~b1 ,
~b2 · ~b2 , and ~b3 · ~b3 . For our vectors, these are
   
−1 0
~v · ~b1 =  1  · −2 = (−1)(0) + 1(−2) + 2(4) = 6,
2 4
   
−1 −3
~v · ~b2 =  1  ·  2  = (−1)(−3) + 1(2) + 2(1) = 7,
2 1
   
−1 5
~v · ~b3 =  1  · 6 = (−1)(5) + 1(6) + 2(3) = 7,
2 3
   
0 0
~b1 · ~b1 = −2 · −2 = 02 + (−2)2 + 42 = 20,
4 4
   
−3 −3
~b2 · ~b2 =  2  ·  2  = (−3)2 + 22 + 12 = 14,
1 1
and    
5 5
~b3 · ~b3 = 6 · 6 = 52 + 62 + 32 = 70.
3 3
Plugging these into our coordinate vector formula gives us
 ~v·~b 
1 6 3
~ ~
 b1 ·b1  20 10
 ~  7 1
[~v ]B =  ~b~v··b~b2  =  14  =  2 .
 2 2 7 1
v ·~b3
~ 70 10
~b3 ·~b3
346 Computational Vector Geometry
     
 0 −3 5 
Example 5. Let B = −2 ,  2  , 6 . Compute the third entry of
 
4  1 3
9
the B-coordinate vector of ~v = 8.
4
~
~v · b3
The third entry of [~v ]B is , so we need to compute ~v · ~b3 and ~b3 · ~b3 .
~b3 · ~b3
For our ~v , we have
   
9 5
~v · ~b3 = 8 · 6 = 9(5) + 8(6) + 4(3) = 105
4 3

and from Example 4 we know ~b3 · ~b3 = 70. Therefore the third entry of [~v ]B is

~v · ~b3 105
= = 1.5.
~b3 · ~b3 70

Now that we’ve computed a few coordinate vectors with respect to an


orthogonal basis, you may have noticed that each entry of those coordinate
vectors is a fraction. The ith denominator from that fraction is the dot product
of the basis vector ~bi with itself. Since
q 2
~bi · ~bi = ~bi · ~bi = ||~bi ||2

the denominators in our coordinate vectors are the squares of the norms of
our basis vectors. This suggests an easy way to simplify this computation:
normalize each basis vector so that its length is 1. Then we’d have
~bi · ~bi = ||~bi ||2 = 1

for every i, which would make


 
~v · ~b1
 
[~v ]B =  ...  .
~v · ~bn

An orthogonal basis which has all basis vectors of length 1 is often called an
orthonormal basis.
It turns out that we can take any basis ~v1 , . . . , ~vn for Rn and transform
it into an orthonormal basis using an algorithm called the Gram-Schmidt
process. Like row reduction, it has two stages. The first stage transforms our
original basis into an orthogonal basis ~b1 , . . . , ~bn , and the second normalizes
5.4 Orthogonal Basis 347

each of the orthogonal basis vectors produced in the first stage to produce the
orthonormal basis ~u1 , . . . , ~un . Formally, this can be stated as follows:

Gram-Schmidt Process:

Part 1:
• Let ~b1 = ~v1 .
• Starting with i = 2 and repeating with each successive i until you reach
i = n, let
! ! !
~
v i · ~b1 ~
v i · ~bi−1
~bi = ~vi − ~b1 + · · · + ~bi−1 .
~b1 · ~b1 ~bi−1 · ~bi−1

(This means ~bi is the component of ~vi orthogonal to the span of ~b1 , . . . , ~bi−1 .)
Part 2:
1 ~
• Let ~ui = bi .
||~bi ||

Example 6. Use  the


 Gram-Schmidt
   process to create an orthonormal basis
1 −3 −1
from the basis  0 ,  4 ,  7  for R3 .
−1 1 −7
We’ll start by applying the first part of the process tocreate  an orthogonal
1
basis ~b1 , ~b2 , ~b3 for R3 . Since ~b1 = ~v1 , here we have ~b1 =  0 .
−1
Next we need to compute
!
~v2 · ~b1
~b2 = ~v2 − ~b1 .
~b1 · ~b1

To do this, we need the dot products ~v2 · ~b1 and ~b1 · ~b1 . Plugging in our ~v2 and
~b1 gives us
   
−3 1
~v2 · ~b1 =  4  ·  0  = (−3)(1) + 4(0) + 1(−1) = −4
1 −1
and    
1 1
~b1 · ~b1 =  0  ·  0  = 12 + 02 + (−1)2 = 2.
−1 −1
348 Computational Vector Geometry

Thus !      
−3   1 −1
~b2 = ~v2 − ~v2 · ~b1 ~b1 =  4  − −4  0  =  4 .
~b1 · ~b1 2
1 −1 −1
(We can check our work by computing ~b1 · ~b2 = 0 to see that ~b1 and ~b2 are
orthogonal.)
To complete part 1, we need to compute
! ! !
~v3 · ~b1 ~v3 · ~b2
~b3 = ~v3 − ~b1 + ~b2 .
~b1 · ~b1 ~b2 · ~b2

Here we need the dot products ~v3 · ~b1 , ~b1 · ~b1 , ~v3 · ~b2 , and ~b2 · ~b2 . From the
previous step we know ~b1 · ~b1 = 2, and we can compute
   
−1 1
~v3 · ~b1 =  7  ·  0  = (−1)(1) + 7(0) + (−7)(−1) = 6,
−7 −1
   
−1 −1
~v3 · ~b2 =  7  ·  4  = (−1)(−1) + 7(4) + (−7)(−1) = 36,
−7 −1
and    
−1 −1
~b2 · ~b2 =  4  ·  4  = (−1)2 + 42 + (−1)2 = 18.
−1 −1
Plugging these into our formula for ~b3 gives us
! ! !
~v3 · ~b1 ~v3 · ~b2
~b3 = ~v3 − ~b1 + ~b2
~b1 · ~b1 ~b2 · ~b2
      
−1   1   −1
6   36  
= 7 − 0 + 4
2 18
−7 −1 −1
     
−1 1 −2
=  7  −  8  = −1 .
−7 −5 −2

(Again, we can check our work by computing ~b1 · ~b3 = 0 and ~b2 · ~b3 = 0 to see
that ~b1 , ~b2 , ~b3 are an orthogonal set.)
We are now done with Part 1 of the Gram-Schmidt process and have
created the orthogonal basis
     
1 −1 −2
~b1 =  0  , ~b2 =  4  , ~b3 = −1
−1 −1 −2
5.4 Orthogonal Basis 349

for R3 .
Moving on to part 2 of the Gram-Schmidt process, we want to normalize
1 ~
each of our orthogonal basis vectors using the formula ~ui = bi to create
~
||bi ||
our orthonormal basis ~u1 , ~u2 , ~u3 for R3 . To do this, we need to compute the
norm of each ~bi . These are
 
1 p √
||~b1 || =  0  = 12 + 02 + (−1)2 = 2,
−1
 
−1 p √ √
||~b2 || =  4  = (−1)2 + 42 + (−1)2 = 18 = 3 2,
−1
and  
−2 p √
||~b3 || = −1 = (−2)2 + (−1)2 + (−2)2 = 9 = 3.
−2
Plugging these norms into our formula for ~ui gives us
   √1 
1 2
1 ~ 1  
~u1 = b1 = √  0  =  0  ,
||~b1 || 2 −1
− √12
 1 
  − 3√2
−1  
1 ~ 1
~u2 = b2 = √  4  = 
 3
√4 
2 ,
||~b2 || 3 2 −1
1
−3 2√

and   − 2 
−2 3
1 ~ 1  
~u3 = b3 = −1 = − 13  .
||~b3 || 3
−2 −2 3

(You can check on your own that each of these vectors has norm 1.)
This means our orthonormal basis for R3 is
 1   1   2
√ − 3√2 −3
2
 0     1
~u1 =   4 
 , ~u2 =  3 2  , ~u3 = − 3  .

− √12 1
− 3√ − 23
2

Our newfound ability to create orthogonal bases also allows us to show the
following interesting fact.
350 Computational Vector Geometry

Theorem 4. Let W be a subset of Rn . Then dim(W ) + dim(W ⊥ ) = n.

Suppose dim(W ) = k. This means any basis for W contains k vectors, so


pick a basis w ~ 1, . . . , w ~ k for W . Similarly, suppose dim(W ⊥ ) = ` and ~v1 , . . . , ~v`

is a basis for W .
We can use the Gram-Schmidt process on w ~ k to create an
~ 1, . . . , w
orthonormal basis ~u1 , · · · , ~uk for W . Similarly, we can use the Gram-Schmidt
process on ~v1 , . . . , ~v` to create an orthonormal basis ~y1 , . . . , ~y` for W ⊥ . The
set ~u1 , · · · , ~uk , ~y1 , . . . , ~y` is an orthogonal set, since the sets ~u1 , · · · , ~uk and
~y1 , . . . , ~y` are orthogonal by Gram-Schmidt and ~ui · ~yj = 0 for all i and j
because ~ui is in W and ~yj is in W ⊥ . Therefore by Theorem 1, we know
~u1 , · · · , ~uk , ~y1 , . . . , ~y` is a linearly independent set.
If we take any ~x in Rn , we can use our orthogonal projection process
from 5.3 to write ~x = w ~ + ~v where w~ is in W and ~v is in W ⊥ . This means
we can write w ~ as a linear combination of ~u1 , · · · , ~uk and ~v as a linear
combination of ~y1 , . . . , ~y` , so ~x is a linear combination of ~u1 , · · · , ~uk , ~y1 , . . . , ~y` .
Thus Span{~u1 , · · · , ~uk , ~y1 , . . . , ~y` } = Rn .
Putting this together, we see that ~u1 , · · · , ~uk , ~y1 , . . . , ~y` is a basis for Rn
and therefore must contain n vectors. This means k + ` = n, so we are done.
   

 4 −1 
    
−2 , 5  .
Example 7. Show that Theorem 4 holds for W = Span     

 1 2  
 
0 9
Since W is a subset of R4 , we need to show that dim(W ) + dim(W ⊥ ) = 4.
Our W is the set from 5.2’s Example 8, where we showed that
 1 

 − 2 x3 − x4 
 1 
⊥ − 2 x3 − 2x4 

W =   .

 x3 
 
x4
Therefore our next step is to compute dim(W ) and dim(W ⊥ ).
One way to compute the dimension of a span like W is to row reduce the
matrix whose columns are the spanning vectors. The dimension is then the
number of leading 1s in that reduced echelon form. (For more details, see 2.8’s
discussion of rank as it relates to linear independence.) In our case this matrix
is  
4 −1
−2 5 
 
1 2
0 9
which row reduces to  
1 0
0 1
 
0 0 .
0 0
5.4 Orthogonal Basis 351

Since our reduced echelon form has two leading 1s, we know dim(W ) = 2.
To compute the dimension of W ⊥ , notice that it is written as the solution
set of a matrix equation of the form A~x = ~0. This means we can find a basis
for W ⊥ which has one basis vector corresponding to each of the free variables
x3 and x4 which appear in the entries of the vectors in W ⊥ . (See Example 2
of 3.1 for more discussion of this idea.) Since we have two free variables, we
have dim(W ⊥ ) = 2.
Thus
dim(W ) + dim(W ⊥ ) = 2 + 2 = 4
as claimed.

Exercises 5.4.
   
 3 2 
1. Is −1 , 10 an orthogonal basis for R3 ?
 
4 1
     
 2 41 3 
2. Is 10 , −5 , −1 an orthogonal basis for R3 ?
 
1 32 4
       
 0
 −1 9 41  
  6   1  0  14 
3. Is  , , ,
−2  3  3  9 
 an orthogonal basis for R4 ?

 
 
1 0 6 −66
       

 −4 1 3 0 
        
0 −2
 ,   ,   ,  5  an orthogonal basis for R4 ?
4
4. Is         

 1 −1 6 −1  
 
1 −1 6 1
   
−3 1
5. Let B = , be an orthogonal basis for R2 . Compute the
1 3  
4
first component of [~v ]B for ~v = .
6
   
1 1
6. Let B = , be an orthogonal basis for R2 . Compute the
1 −1
 
8
second component of [~v ]B for ~v = .
−2
     
 −3 1 −6 
7. Let B =  4  , 0 , −5 be an orthogonal basis for R3 .
 
1 3 2
 
4
Compute the second component of [~v ]B for ~v = 7.
6
352 Computational Vector Geometry
     
 1 4 1 
8. Let B =  1  , −2 , 3 be an orthogonal basis for R3 .
 
−2 1 2  
7
Compute the first component of [~v ]B for ~v =  −2 .
2
     
 0 −1 2 
9. Let B =  1  ,  1  , 1 be an orthogonal basis for R3 .
 
−1 1 1

4
Compute [~v ]B for ~v = −2.
3
     
 1 −2 −4 
10. Let B = 1 , −1 ,  5  be an orthogonal basis for R3 .
 
1 3   −1
2
Compute [~v ]B for ~v = 1.
4
     
 1 4 1 
11. Let B =  1  , −2 , 3 be an orthogonal basis for R3 .
 
−2 1 2
 
2
Compute [~v ]B for ~v = 1.
1
     
 −1 1 −1 
12. Let B =  0  , 2 ,  1  be an orthogonal basis for R3 .
 
1 1  −1
3
Compute [~v ]B for ~v = 3.
1
   
1 9
13. Use Part 1 of the Gram-Schmidt process on B = , to
3 7
create an orthogonal basis for R2 .
     
 4 10 −8 
14. Use Part 1 of the Gram-Schmidt process on B = 2 ,  10  ,  6 
 
0 −1 −2
to create an orthogonal basis for R3 .
   
−1 −4
15. Use the Gram-Schmidt process on B = , to create an
2 3
orthonormal basis for R2 .
5.4 Orthogonal Basis 353
     
 −1 1 4 
16. Use the Gram-Schmidt process on B =  0  , −1 , 3 to
 
1 3 2
create an orthonormal basis for R3 .
   
 3 2 
17. Check that Theorem 2 holds for W = Span −1 , 0 .
 
5 4
   

 2 −8 
    
−1 , 3  .
18. Check that Theorem 2 holds for W = Span    

 0 2 
 
1 5
19. Suppose W is a subset of R4 with dim(W ) = 1. What is dim(W ⊥ )?
20. Suppose W is a subset of R3 with dim(W ) = 2. What is dim(W ⊥ )?
21. Suppose W is a subset of R8 with dim(W ) = 2. What is dim(W ⊥ )?
22. Suppose W is a subset of R5 with dim(W ) = 3. What is dim(W ⊥ )?
23. Show that the standard basis for Rn is an orthogonal basis.
24. Can you find an orthogonal set of 3-vectors which is not a basis for
R3 ?
A
Appendices

A.1 Complex Numbers


We briefly introduced the complex numbers, C, in 3.3 as another example of
a vector space. In this appendix, we’ll explore them as an interesting space in
their own right. We’ll show that we can redo much of what we did in this book
with scalars from C instead of R. At the end, we’ll provide a justification for
the practical need to study C, and I’ll make an argument that this is really
the correct set of scalars for a vector space. (If you want to skip straight to
that, see Theorem 1 below.)
Let’s start by restating our definition of C as R with the addition of the
number i from 3.3.

Definition. The complex numbers are C = {a + bi | a, b in R}. Here a is


called the real part and bi is called the imaginary part of the complex number
a + bi.

Example 1. Find the real and imaginary parts of 14 − 7i.

The real part is the piece without a factor of i, which is 14. The imaginary
part is the piece with a factor of i, which is −7i.

Since we saw in 3.3 that the complex numbers are a 2-dimensional vector
space over R, it shouldn’t be surprising to learn that C is often identified with
the plane R2 . This is usually
  done by identifying a complex number a + bi
a
with its coordinate vector with respect to the standard basis {1, i} for C.
b
In other words, we use the real part a as the x-coordinate and coefficient b on
i in the imaginary part bi as the y-coordinate. This sometimes leads people to
refer to the x-axis as the real axis and the y-axis as the imaginary axis. This
is illustrated by the following figure.

355
356 3
Appendices

2
-3 + 2 i = (-3, 2)

-4 -3 -2 -1 1

-1

Definition. Two complex numbers are equal -2


if their real parts are equal
and their imaginary parts are equal.

This makes sense geometrically as well, since two complex numbers with
equal real and imaginary parts occupy the same point in the complex plane.
In 3.3, we mostly treated i as a placeholder variable. However, it is really
the key to C’s good algebraic properties, but to explore C more deeply we
need to connect it to R as follows.

Definition. The complex number i is defined as i = −1.

We can state this in words by saying that i is the positive square root of
−1. (The negative square root of −1 is therefore −i.)
We’ll use the same definition for addition of complex numbers as we did
in 3.3.

Definition. If a + bi and x + yi are in C, then complex addition is defined


to be (a + bi) + (x + yi) = (a + x) + (b + y)i.

Note that this says we add component-wise, i.e., the real part of the sum
is the sum of the real parts and similarly for the imaginary parts.

Example 2. Find the sum of 3 + 5i and 2 − 9i.

This sum is (3 + 5i) + (2 − 9i) = (3 + 2) + (5 − 9)i = 5 − 4i.


 
a
Since we’re identifying a + bi with the 2-vector , complex addition can
b
be visualized in the plane exactly the same way as addition in R2 .
Complex addition behaves very similarly to addition of real numbers. The
most important properties are commutativity, associativity, additive identity,
A.1 Complex Numbers 357

and additive inverses, which are explained below. (We also showed these
properties in 3.3 as part of our check that C is a vector space.)
The order in which we add complex numbers doesn’t matter since

(a + bi) + (x + yi) = (a + x) + (b + y)i = (x + a) + (y + b)i = (x + yi) + (a + bi)

so complex addition is commutative.


If we have three complex numbers a + bi, x + yi, and w + vi, which pair
we add first doesn’t matter since

[(a + bi) + (x + yi)] + (w + vi) = [(a + x) + (b + y)i] + (w + vi)


= (a + x + w) + (b + y + v)i
= (a + bi) + [(x + w) + (y + v)i]
= (a + bi) + [(x + yi) + (w + vi)]

so complex addition is associative.


The complex number 0, which can be thought of as 0+0i, has the property
that for any complex number a + bi we have

(a + bi) + 0 = (a + bi) + (0 + 0i) = (a + 0) + (b + 0)i = a + bi.

Thus 0 is our additive identity for complex addition.


Since complex numbers add component-wise and the additive identity is
0, the additive inverse of a complex number will be one whose values for a and
b are the opposites of our original complex number. This means the additive
inverse of a+bi is −a−bi. Thus every complex number has an additive inverse.

Unlike in 3.3, here we want to define multiplication between two complex


numbers. Let’s explore this first by thinking again of i as acting like a variable.
This means a + bi times x + yi can be expanded to get

(a + bi)(x + yi) = ax + ayi + bix + biyi.

Since a, b, x, y are real numbers, this can be rearranged as

ax + ayi + bix + biyi = ax + ayi + bxi + byi2 .

But remember that i2 = −1, so we have

ax + ayi + bxi + byi2 = ax + ayi + bxi − by.

Now we can write this in our standard complex number format by grouping
the terms without a factor of i to form the real part and the terms with a
factor of i to form the imaginary part. This gives us

ax + ayi + bxi − by = (ax − by) + (ay + bx)i

so we can create the following definition.


358 Appendices

Definition. Let a + bi and x + yi be in C, then complex multiplication is


defined to be (a + bi)(x + yi) = (ax − by) + (ay + bx)i.

Example 3. Compute (1 − 3i)(4 + 2i).

We can either find the values of a, b, x, y and plug them into the formula
above, or we can expand out as we would for (1 − 3x)(4 + 2x) and remember
that i2 = −1. I prefer the latter option so I don’t have to memorize a formula,
but feel free to do this from the formula if you prefer.
Expanding (1 − 3i)(4 + 2i) gives us

4 + 2i − 12i − 6i2 = 4 + 2i − 12i + 6.

Combining like terms now gives

(1 − 3i)(4 + 2i) = 10 − 10i.

As with complex addition, complex multiplication also behaves much like


multiplication in R. Some of the most important properties are commutativity,
associativity, a multiplicative identity, and multiplicative inverses for nonzero
complex numbers. These properties are explained below.
To see that the order in which we multiply complex numbers doesn’t
matter, let’s compute (a + bi)(x + yi) and (x + yi)(a + bi). This gives us
(a + bi)(x + yi) = (ax − by) + (ay + bx)i
and
(x + yi)(a + bi) = (xa − yb) + (xb + ya)i.
Since multiplication of real numbers commutes we know ax − by = xa − yb
and ay + bx = ya + bx, so the real and imaginary parts of (a + bi)(x + yi) and
(x + yi)(a + bi) are equal. Therefore
(a + bi)(x + yi) = (x + yi)(a + bi)
so complex multiplication is commutative.
To check that it doesn’t matter which pair of a trio we multiply first,
consider the complex numbers a + bi, x + yi, and w + vi. We can see that
[(a + bi)(x + yi)](w + vi) = [(ax − by) + (ay + bx)i](w + vi)
= ((ax − by)w − (ay + bx)v) + ((ax − by)v + (ay + bx)w)i
= (axw − byw − ayv − bxv) + (axv − byv + ayw + bxw)i
= (axw − ayv − bxv − byw) + (axv + ayw + bxw − byv)i
= (a(xw − yv) − b(xv + yw)) + (a(xv + yw) + b(xw − yv))i
= (a + bi)[(xw − yv) + (xv + yw)i]
= (a + bi)[(x + yi)(w + vi)]
A.1 Complex Numbers 359

so complex multiplication is associative.


As in R, we have 1(a + bi) = a + bi so 1, or 1 + 0i, is the identity element
for complex multiplication.
If a + bi 6= 0, then
 
a b
(a + bi) − i
a2 + b2 a2 + b2
       
a b a b
=a + a − i + bi + bi − i
a2 + b2 a 2 + b2 a2 + b2 a 2 + b2
a2 ab ba b2
= 2 2
− 2 2
i+ 2 2
i− 2 i2
a +b a +b a +b a + b2
a2 ab ba b2
= 2 − i + i +
a + b2 a2 + b2 a 2 + b2 a2 + b2
 2 2
  
a b ab ba
= + 2 + − 2 + 2 i
a2 + b2 a + b2 a + b2 a + b2
a2 + b2 −ab + ba
= 2 + 2 i = 1 + 0i = 1.
a + b2 a + b2
This means that every nonzero complex number a + bi has the multiplicative
inverse
a b
− 2 i.
a2 + b2 a + b2
As long as a + bi is nonzero, this inverse is well defined, since a + bi 6= 0 means
a 6= 0 or b 6= 0 and hence a2 + b2 6= 0.
Finally on our list of parallels between these complex operations and the
operations in R, we get the expected relationship between multiplication and
addition. To see this, we can compute

(a + bi)[(x + yi) + (v + wi)] = (a + bi)[(x + v) + (y + w)i]


= (a(x + v) − b(y + w)) + (a(y + w) + b(x + v))i
= (ax + av − by − bw) + (ay + aw + bx + bv)i
= (ax − by + av − bw) + (ay + bx + aw + bv)i
= [(ax − by) + (ay + bx)i] + [(av − bw) + (aw + bv)i]
= (a + bi)(x + yi) + (a + bi)(v + wi).

This means complex multiplication distributes over complex addition.


Visualizing complex multiplication using coordinates with respect to the
standard basis 1, i is very difficult. However, our identification of C with the
plane also gives rise to an alternate way to write complex numbers via a radius
and angle.
  We can connect this to our a + bi notation by identifying a + bi
a
with , then with the point in the plane (a, b) and finally with that point’s
b
polar coordinates r and θ. In this r and θ notation, multiplying complex
numbers is much easier. If we want to multiply two complex numbers z1 with
360 Appendices

polar coordinates r1 and θ1 and z2 with polar coordinates r2 and θ2 , then


their product z1 z2 has polar coordinates r1 r2 and θ1 + θ2 . In other words, we
multiply radii and add angles.
Generally speaking, complex multiplication problems are easier in this
polar notation while complex addition problems are easier in terms of the
a + bi notation. The major downside of using the polar notation for complex
numbers is that it isn’t unique, since angles of the form θ and θ + 2πk give
the same position.
Perhaps the biggest complication of working with C rather than R is that
geometric methods break down in most cases. For example, we can’t draw
the graph of a function f : C → C. This is because the dimension of a graph
of f : V → V is twice the dimension of V , since we need enough axes to
describe the domain and then enough more to describe the codomain. (This
is why the graph of f : R → R requires the 2D plane to draw even though
dim(R) = 1.) Since dim(C) = 2, this means the graph of a complex-valued
function will be 4-dimensional. Similarly, this means as soon as we have a
complex vector space with dimension over C at least 2, we’ll be unable to
graph it. Fortunately, we’ve spent some time in this book working in Rn for
n > 3, so have developed computational tools which can replace geometric
arguments in these situations.
And now, finally, here is the real reason that I think linear algebra is a
better place if you’re willing to use complex numbers as your scalars: the
fantastic Fundamental Theorem of Algebra. This is basically a polynomial
analog of the Unique Factorization Theorem (also known as the Fundamental
Theorem of Arithmetic) which states that every integer can be completely
factored into a unique set of prime numbers. Here, instead of prime numbers,
we factor into linear polynomials, i.e., x − a for some scalar a.

Theorem 1. Let f (x) be a polynomial with coefficients in C. If f has degree


n, then we can always factor f as f (x) = b(x − a1 )(x − a2 ) · · · (x − an ) for
some complex numbers b, a1 , . . . , an .

This means that unlike polynomials with real coefficients, our complex
polynomials factor completely into linear factors. In fact, since R is a subset
of C, if we view a real polynomial like x2 + 1 which doesn’t factor completely
over R as a complex polynomial, it will factor completely over C. (In this case
some of the roots will be complex numbers.)

Example 4. Factor x2 + 1 into linear factors.

We can use the factorization formula


x2 − a2 = (x + a)(x − a)
where a2 = −1 since then
x2 − (a)2 = x2 − (−1) = x2 + 1.
A.1 Complex Numbers 361

This shows that it is impossible to factor x2 +1 over R, because no real number


squares to −1. However, in C, we have i2 = −1 so we get

x2 + 1 = (x + i)(x − i).

This comes up in linear algebra when we are solving det(A − λIn ) = 0 to


find the eigenvalues of A. We know det(A − λIn ) is a polynomial, so solving
det(A−λIn ) = 0 is equivalent to factoring. If we allow ourselves to think over C
instead of R, we are guaranteed to be able to solve for all the eigenvalues! Even
if we restrict our matrix entries and therefore our polynomial’s coefficients to
R, there will still be times when our eigenvalues will be in C because our
polynomial didn’t factor completely over R. In fact this is one of my main
counterarguments to the notion that studying C isn’t relevant to a practical
mathematics course. Even if all your input data is from the observable world,
i.e., R, there will still be times when you need to consider complex numbers
to solve your problem.
Now that you’re hopefully convinced that it is worth it to let your scalars
be complex numbers instead of real numbers, let’s spend a moment thinking
about that situation. We can still define vectors and matrices, except that
their entries are now complex numbers instead of being limited to R. The set
of n-vectors with complex entries is called Cn , which generalizes our notation
for Rn as the set of n-vectors with real number entries. More general vector
spaces with C as their set of scalars, for example, the set of m × n complex
matrices, are usually called complex vector spaces, while the vector spaces we
discuss in the body of this book are called real vector spaces.
The good news is that other than smoother sailing when we find
eigenvalues, not much is different structurally. The complex numbers share
all the good properties of R that enabled us to construct vector spaces: C has
addition, subtraction, multiplication, and division with all the good properties
of R’s similar operations. In fact, R and C are both examples of an algebraic
object called a field, and any field can be used as the scalars for a family of
vector spaces which behave in the same way real vector spaces do.
362 Appendices

A.2 Mathematica
The vast majority of calculations from this book have been implemented in
Mathematica. Of course, this is also true of many graphing calculators and
other mathematical software packages. I’ve chosen to discuss Mathematica
over other software packages because it is the software package available at
the school where I teach, and I’ve chosen not to discuss graphing calculators
because there are so many different makes and models that having a unified
discussion is too difficult. If you prefer to use another software package or a
graphing calculator, my advice is to google the linear algebra computation
and the name of your technology of choice. In general, the resulting resources
will be fairly good and accurate.
I’ll start our tour of Mathematica by saying that its help section is
usually excellent. In my version, this is accessed through the Help menu
under Wolfram Documentation where you can then search by topic. Usually
examples are provided to walk you through a sample problem. However, a little
bit of knowledge starting out can help you avoid some of my past frustrations
with this program.
Before we get into linear algebra topics, here are a few basic tips for using
Mathematica.

• Mathematica documents are organized into cells which are each evaluated
as one unit. You can tell where each cell starts and ends by looking at the
right edge of the document where a bracket will indicate what is in that cell.
To leave a cell and start a new cell, simply click below that cell. Just start
typing to default to a Wolfram Language Input which is the basic mode for
executing mathematical commands, or click on the tab with the + at the
left of the page to choose which type of new cell you want. I often use Plain
Text cells for times when I’m trying to write up my work nicely and want to
access the formatting options we commonly associate with a program like
Microsoft Word.
• All commands start with a capital letter. (Not knowing this made me cry
as a student, but hopefully I’ve spared you!)
• All inputs to commands are put in square brackets rather than parentheses,
i.e., Factor[x2 − 1] instead of Factor(x2 − 1).

• To execute the contents of a cell, press Shift and Enter at the same time.
The output(s) will appear in another cell below the one you executed. Just
pressing Enter will simply take you down to the next line without executing
anything. If you don’t want your commands to execute all at once, put each
of them in a separate cell. If you want to execute a command without seeing
its output, put a semicolon (;) after that command.
A.2 Mathematica 363

• You can copy and paste commands either within the same cell or into a new
cell. This is very useful when you’re using the same type of command several
times. You can also copy and paste an entire cell, which is especially useful
if you’ve customized that cell’s format. To select a cell, click on its bracket
on the right-hand side of the document.
• While you can type in most things manually, many of the most common
commands can be entered using buttons on the Classroom Assistant which
is accessed through the Palettes menu. Of particular use are the Calculator
and Basic Commands sections.
Now that you’ve had a whirlwind tour of the basics of Mathematica, let’s
discuss some topics of particular relevance to our study of linear algebra.
• One of the first things you’ll want to do if you’re using Mathematica for
linear algebra computations is to enter vectors and matrices.
Mathematica doesn’t care whether you write your vector as a column of
numbers (which we’ve been doing) or as a row of numbers. Either way,
Mathematica treats a vector as an ordered list of numbers. This means the
easiest way to enter a vector is as a list. Mathematica denotes lists using
curly braces to contain the list and commas to separate the entries as in
{1,2,3}.
The default format used by Mathematica is to write a matrix as a “list
of lists.” Mathematica treats each row of the matrix as a list and then lists
those row lists in order as in {{1,2,3},{3,2,1}}. Each row is listed from left to
right and the rows are listed from top to bottom, basically mirroring the way
English is read. This means that the 3 × 3 identity matrix would be written
{{1,0,0},{0,1,0},{0,0,1}}. Personally I find this presentation of a matrix
less digestible than our “grid of numbers” approach, but it is important
to understand what it means if Mathematica gives you something in this
format.
If you prefer to enter your matrices as grids of numbers, you can use the
piece of the Classroom Assistant devoted to matrices. This is under the
Basic Commands section and is accessed by clicking on the button which is
highlighted blue in the picture below.
364 Appendices

To create a matrix, click on the button that looks like a 2 × 2 matrix of


squares directly under the word Algebra in the previous picture. The default
size is 2 × 2, but you can add rows and columns using the buttons to the
right of the matrix button.
Now that you’ve created the template for your matrix, you can fill in its
entries by clicking on one of the boxes in the matrix and entering the
appropriate number. You can get from one box to the next by hitting Tab,
which will take you through the matrix by going across each row from left
to right and then down to the left of the row below (again as you would read
English).
To make Mathematica give its outputs in our usual format instead of a
list of lists, use the command MatrixForm. Simply add it as the outermost
command when you do a matrix computation as in MatrixForm[other
commands].
• Now that you can enter vectors and matrices, let’s turn to getting Mathe-
matica to do our matrix operations. (Remember that you may want to layer
on the MatrixForm command.) Matrix addition is pretty straightforward,
simply use + between your matrices to add them. To multiply a matrix A
by a scalar r, you can type rA or r ∗ A.
Multiplication of matrices and vectors or matrices and other matrices is
slightly trickier because the natural impulse is to use ∗, but DO NOT do
that! Using ∗ causes Mathematica to multiply corresponding entries as if
we’d used the format for matrix addition in our multiplication. This is not
proper matrix multiplication, so avoid using ∗ at all costs! Instead, use ·
as your multiplication symbol. This can be used to multiply a matrix by
a vector or a matrix by another matrix. When multiplying a matrix by a
vector, Mathematica allows you to enter that vector as a row using the list
format instead of a column and still gives the correct answer.
You can ask Mathematica to find the inverse of a matrix A using the
command Inverse[A]. Again you may want to add on the MatrixForm
command.
Mathematica will also find the determinant of a matrix A via the command
Det[A].
• Now that we’ve worked through most of our matrix operations, let’s move
on to solving matrix equations. If you know the augmented coefficient
matrix of your equation, Mathematica can row reduce it using the command
RowReduce[ ]. If you don’t want to create the augmented coefficient matrix,
you can ask Mathematica to solve your matrix equation directly using the
command LinearSolve. If you’re solving A~x = ~b, then your Mathematica
input would be LinearSolve[A, ~b]. Unless you add MatrixForm, Mathe-
matica will return the solution vector ~x as a list.
Sometimes we want to solve A~x = ~b where ~b is a vector of variables, for
A.2 Mathematica 365

example when 1 of
finding the range
������� Eigensystem
2 a linear function. If A is m×n with m ≤ n,
then RowReduce handles this just 4 3 fine. However, if m > n, then RowReduce
will give������� 1}, 2},
numeric results that aren’t
{{5, - {{1, {- 1,useful.
1}}} In this case, you can use the
more complicated command Reduce. To solve A~x = ~b where the entries of
5 0 ~
. . . , xEigensystem
~x are x1 , ������� n and the entries of b are b1 , . . . , bm , your Mathematica input
0 5
would be
������� {{5, 5}, {{0, 1}, {1, 0}}}
Reduce[A.{x1 , . . . , xn }=={b 1 12
, . . . , bm },{x1 , . . . , xn }]
������� Eigensystem
4 3 5 1
The output Eigensystem
�������will list any conditions
0 5
on the entries of ~b needed to ensure that
there�������
is a{{5, - 1},and
solution, 2},the
then
{{1, {- 1, 1}}}of x1 , . . . , xn in terms of b1 , . . . , bm .
values
������� {{5, 5}, {{1, 0}, {0, 0}}}

������� Eigensystem 1 5 0 2
0 5
������� Reduce - 1 5 .{x1 , x2 } ⩵ {b1 , b2 , b3 }, {x1 , x2 }
������� {{5, 5}, {{0, 01},- {1,
1 0}}}
b1 ⩵ - b2 - 7 b3 && x1 ⩵ - b2 - 5 b3 && x2 ⩵ - b3
Eigensystem 5 1
�������
�������
0 5
1 2 3
������� ������� 5}, {{1,00},
{{5,Reduce 1 0 0}}}
{0,.{x 1 , x2 , x3 } ⩵ {b1 , b2 , b3 }, {x1 , x2 , x3 }
In the example above, we 1 b1 = −b2 − 7b3 in order to have the solution
1 0need
x1 = −b2 − 5b3 and b1x12 =2 −b33. b3 b1 b3
Reduce x ⩵ -- 1 + 5 b2 +.{x1 ,&&x~2x}2⩵⩵{b , bx23, ⩵
b21&& b3 }, -{x
b12,- x2 }
If no condition1 on 0the
������� �������
2 -entries
1 2 of b is needed to ensure
2 a solution, then
2
Mathematica will simply return the solution itself.
������� b1 ⩵ - b2 - 7 b3 && x1 ⩵ - b2 - 5 b3 && x2 ⩵ - b3

1 2 3
������� Reduce 0 1 0 .{x1 , x2 , x3 } ⩵ {b1 , b2 , b3 }, {x1 , x2 , x3 }
1 0 1
b1 3 b3 b1 b3
������� x1 ⩵ - + b2 + && x2 ⩵ b2 && x3 ⩵ - b2 -
2 2 2 2

In the example above, there is no condition involving only b1 , . . . , bm , so we


have the solution x1 = − b21 + b2 + 3 b23 , x2 = b2 , and x3 = b21 − b2 − b23 for
any value of ~b.

• Mathematica has several commands to help you find eigenvalues and


eigenvectors depending on which information you want and how you prefer
to view it.
The command Eigenvalues[A] gives you all the eigenvalues of your matrix
A as a list. Note that these values are the roots of a polynomial, and therefore
might be complicated in their exact formats. For this reason, it is sometimes
easier to use the command Eigenvalues[N[A]] which gives you a list of
decimal representations of the eigenvalues.
The command Eigenvectors[A] gives you a list of linearly independent
eigenvectors of A as a list of lists. This corresponds to our compilation
of basis vectors from each of our eigenspaces. For an n × n matrix A,
366 Appendices

Mathematica always returns n vectors in this list. As we’ve seen, sometimes


an n × n matrix A will not have n linearly independent eigenvectors. In this
case, Mathematica will simply add enough zero vectors to the end of the list
to make the list contain n vectors.
If you want to find both the eigenvalues and eigenvectors at once, then you
can use the command Eigensystem[A]. The formatting of the output can
initially be a little confusing, so I will discuss it in slightly more detail. The
overall format of Mathematica’s output is a list with two entries. The first
entry is a list of all the eigenvalues of A and the second entry is a list of
eigenvectors. The eigenvalues and eigenvectors are listed in corresponding
order, i.e., the first eigenvector listed has the first eigenvalue listed and so
on.
������� Eigensystem 1 2
4 3
������� {{5, - 1}, {{1, 2}, {- 1, 1}}}

������� Eigensystem 5 0
0 5
In the example above, the matrix has
  eigenvalues
 5 and −1 and eigenspaces
1 ������� {{5, 5}, {{0, 1},
−1{1, 0}}}
E5 = Span and E−1 = Span .
2 1
������� Eigensystem 5 1
An eigenvalue which has multiplicity 0greater 5 than one, i.e., is a multiple
root of det(A − λI�������
n ), will {{5, 5}, {{1, 0}, {0, 0}}} In fact, it will be listed
be listed more than once.
as many times as it is a root of det(A − λIn ). If the eigenspace for the
Eigensystem 1 2
multiple eigenvalue has �������dimension1 large
2
4enough
3 x }to match the multiplicity
of the eigenvalue, then Mathematica will list1 ,the
������� Reduce - 1 5 .{x 2 ⩵ {b1 , b2 , b3 }, {x1 , x2 }
basis eigenvectors of that
������� {{5, - 1},
0 - 1 2}, {- 1, 1}}}
{{1,
eigenspace.
�������b1 ⩵ - b2 - 7 b3 && x1 ⩵ - b2 - 5 b3 && x2 ⩵ - b3
Eigensystem 5 0
�������
0 5
1 2 3
������� Reduce
������� {{5, 5},
0 1 0
{{0, .{x
1}, , x20}}}
1{1, , x3 } ⩵ {b1 , b2 , b3 }, {x1 , x2 , x3 }
1 0 1
1 2
Eigensystem
b1
������� Eigensystem 3 b35 1 b b
�������
������� x1 ⩵ - + b2 + 04 3x2 ⩵ b2 && x3 ⩵ 1 - b2 - 3
&&5eigenvalue
In the previous example, the matrix 2 has2 the   5,
 which
 2 is a double
2
������� {{5,
������� - 1},
{{5,5}, {{1,
{{1, 0},2}, 1,01}}}1
{0,{-0}}}
root of det(A − λIn ), and eigenspace E5 = Span , .
1 0
������� Eigensystem
1 2 5 0
theReduce
If the eigenspace for������� 1 5 0 .{x
multiple -eigenvalue , x2 } ⩵ {b
5 1doesn’t 1 , b2enough
have , b3 }, {x 1 , x2 }
basis
vectors, Mathematica�������
will {{5,
put in5}, 0 vectors
zero -1 as placeholders.
{{0, 1}, {1, 0}}}
������� b1 ⩵ - b2 - 7 b3 && x1 ⩵ - b2 - 5 b3 && x2 ⩵ - b3

������� Eigensystem
5 1
1 2 30 5
������� Reduce 0 1 0 .{x1 , x2 , x3 } ⩵ {b1 , b2 , b3 }, {x1 , x2 , x3 }
������� {{5, 5}, 1 0 10}, {0, 0}}}
{{1,
b1 3 b3 b1 b3
x1 ⩵ -
������� + b12 + 2 && x2 ⩵ b2 && x3 ⩵ - b2 -
������� Reduce 2 - 1 52 .{x1 , x2 } ⩵ {b1 , b22, b3 }, {x 2 1 , x2 }
In the previous example, the matrix has the eigenvalue
 5, which is a double
0 -1
1
root of det(A − λIn ),�������
and beigenspace E5 = Span .
1 ⩵ - b2 - 7 b3 && x1 ⩵ - b20- 5 b3 && x2 ⩵ - b3

1 2 3
������� Reduce 0 1 0 .{x1 , x2 , x3 } ⩵ {b1 , b2 , b3 }, {x1 , x2 , x3 }
1 0 1
b1 3 b3 b1 b3
������� x1 ⩵ - + b2 + && x2 ⩵ b2 && x3 ⩵ - b2 -
2 2 2 2
A.2 Mathematica 367

If you’re interested in computing a basis for the eigenspace of a particular


eigenvalue, you can certainly use Eigensystem and then pick out the nonzero
vectors corresponding to your eigenvalue of interest. However, if your matrix
is large enough, it may be easier to use the command NullSpace[A − λIn ]
to find a basis for the null space of A − λIn since Eλ is the null space of
A − λIn .
• To find the dot product of two vectors, simply use the same · symbol that we
used for matrix multiplication. When used between two vectors it computes
the dot product.
• To compute the norm of a vector, use the command Norm[ ]. This will give
you the exact value of the norm, which will typically involve a square root. If
you’d prefer a decimal approximation, an easy fix is to put a decimal point
into one of your matrix entries. This will cause Mathematica to give you a
numeric approximation of your answer. (In fact, you can use this trick inside
most commands to get a decimal version of your answer.)
• Mathematica will also normalize a given vector, i.e., find a vector in the
same direction of your input whose length is 1. To do this, use the command
Normalize[ ]. As with the Norm command, the default output will be exact
and so usually involve square roots in the denominators of your matrix
entries. Use the “add a decimal point” trick to get decimal entries in your
normalized vector.
• Mathematica computes the orthogonal projections of a vector onto another
vector using the command Projection. If you want to compute the
orthogonal projection of ~x onto ~y , you’d use the Mathematica input
Projection[~x,~y ]. Remember, it matters for our computation of orthogonal
projections which vector is being projected and which is having the other
vector projected onto it. This means it also matters which order you enter
your two vectors into the Projection command. I remember the order by
inserting the word onto between my two vectors, i.e., Projection[~x, ~y ]
would be read “projection of ~x onto ~y .”
• Mathematica runs the Gram-Schmidt process we used to create an orthonor-
mal basis, via the command Orthogonalize[ ] and its input is the list
of vectors you want to use as inputs to the Gram-Schmidt process. (In
our discussion of this algorithm in 5.4, these were the vectors ~vi .) If these
input vectors are a basis for Rn , then the output is an orthonormal basis
for Rn . However, Mathematica will accept any list of vectors as the input
for Orthogonalize, and its output in this more general case will be an
orthonormal basis for the span of the input vectors.
Of course, Mathematica has many capabilities not discussed in this
appendix. For those uses, I encourage you to take advantage of both
Mathematica’s own help section and the universal solution method of googling
what you want to understand.
368 Appendices

A.3 Solutions to Odd Exercises


1.1:
1. 4 
−2
3. −4
−6
 
−12
5.
−4
 
12
7.
−8
 
−3
0
9.  
 30 
−9
 
9
11. (a) impossible, (b) , (c) impossible
1
 
−2
13. (a) impossible, (b)  2 
9
 
14
15. (a) impossible, (b) impossible, (c)
3
 
6
17. ~v2 + ~v3 = −3
−4
4

� 2

-4 -2 2 4 6

-2

-� �
-4

-6

19. -8
   
    height 68.5
C 6    
South 2 weight  158 
21. (a) H  = 12, (b) = , (c)    
 temp.  = 98.7
W est 7
O 6
age 38
A.3 Solutions to Odd Exercises 369
 
0.75
 1 
23.  
1.625 where the entries are in gallons and the vector has the
0.125
 
red
 blue 
format  white 

yellow
25. Rotates 180◦ and doubles distance from the origin.
1.2:
 
−7
1.
−11
 
2x1 + 3x3 + 5x4
3.
x1 − 4x2 − x3
5. 2
       
1 0 0 1
7. x1 0 + x2 2 + x3 2 = x4 3
0 0 1 3
9. No
11. No
13. The line y = − 12 x.
15. The xy plane in R3 .
17. No
19. No
21. No n o
23. Smallest is ~0 , largest is Rn .
1.3:
1. linearly dependent
3. linearly independent
3
5. linearly dependent
4
2

1 -2 -1 1 2 3

-2 -1 1 2 3 4 -1

7. (a) (b)
-1

-2

-2
370 Appendices

9. (a) linearly independent, (b) dim = 3


11. dim = 1
13. Their span has dim ≤ 2 while dim(R3 ) = 3.
15. They could span
17. Need four linearly independent 4-vectors to get dim = 4.
2.1:
 
5
1.
2
 
0
8
3. 
2

6
5. Domain is R2 , codomain is R3
7. Domain is R5 , codomain is R4
   
x1 x
9. f = 2
x2 x1
        
x 4x1 − 2x2 4x1 − 2x2 x1
11. f 2 1 = 6= = 2f
x2 2x1 + 2 2x1 + 4 x2
         
x1 y1 x2 + y2 x1 y1
13. f + = =f +f
x2  y 2 x 1 + x2 + y1 + y2 x 2 y2
   
x1 rx2 x1
and f r = = rf
x2 rx1 + rx2 x2
15. f (~0n ) = f (~v − ~v ) = f (~v ) − f (~v ) = ~0m
17. Show f (2~v ) 6= 2f (~v ).
2.2:
1. (a) 6-vector, (b) 4-vector
3. (a) 4-vector, (b) 5-vector
5. (a) R4 , (b) R2
7. (a) R2 , (b) R3
9. (a) R3 , (b) R2 , (c) 2 × 3
11. (a) R4 , (b) R3 , (c) 3 × 4
 
2 −1 1
13.
−1 0 10
 
1 1 0
15. 2 1 0 
0 2 −1
A.3 Solutions to Odd Exercises 371
 
3 0 −7 1
17.  1 0 1 1
−1 0 8 0
 
0 1
19.
−1 0
 
0 −1
21.
−2 0
 
16
23.
5
 
11
25. (a) , (b) impossible
−9
 
11
27. A~v1 =
5
 
−7
29. A~v1 =
20
   
2 4 −2 4 −1
31. ~x =
1 2 −1 0 9
     
−10 3 8
33. (a) x1  1  + x2  0  = −2
7 −2 3
           √ 
1 cos(θ) 0 − sin(θ) 10 5 3
35. fθ = and fθ = , (b) , ,
0 sin(θ) 1 cos(θ) 0 5
       √     √     
5
√ 0 −5 −5 3 −10 −5 3 −5 0
, , √ , , , , √ , ,
5 3 10 5 3 5 0 −5 −5 3 −10
   √ 
5√ 5 3
,
−5 3 −5
2.3:
 
3 −3
1. 6 0 
9 −6
 
1 2
3. −3 0
1
2 4
 
−6 −14
5.
−2 2
 
8 8
7.
4 −16
 
1 2
9.
2 4
372 Appendices
 
2 4
11. A + C = −2 0
4 8
 
4 3 7
13. A + C = −4 −2 8
0 2 8
 
2 −2 7
15. A + C =
−1 0 5
17. (a) 9 × n, (b) m × 5
           
1 0 a b a b a b 1 0 a b
19. · = , · =
0 1 c d c d c d 0 1 c d
 
4 −6 −8
21.
1 9 19
 
35 −13
23. AB = −10 14 
−5 −2
 
3 11
25. AB =  18 6
−22 −4
 
8 −4 14
27. BC =
−11 8 7
   
14 −1 18 −7
29. AB = 6= = BA
23 8 9 4
 
1 1
31. B=
1 1
2.4:
 
−3 −17 0
1.
5 −2 8
3. Use the subspace test.
5. Yes
7. Not closed under multiplication by r < 0.
9. 0 doesn’t have an “additive” inverse
11. Show W is a subspace of M33 .
 
−a1 4a2 a1
13.
0 a1 − a2 a2
15. Yes
15. linearly independent
17. linearly independent
A.3 Solutions to Odd Exercises 373

19. (a) R2 , (b) M22


21. (a) M22 , (b) M23
 
0 0
23.
0 7
 
−1 −2 −3
25.
−1 0 4
    
a b a b
27. f r 6= rf
c d c d
       
x1 y1 x1 y1
29. f + =f +f and
x2 y2 x2 y2
    
x x1
f r 1 = rf
x2 x2
2.5:
1. No
 

 −2x3  
 
3. x3 − 2x4 
  x3 

 

x4
 
a 2a
5.
0 a
 
a a
7.
−a 0
 
 −3x3 
9.  x3 
 
x3
 

 x2 − 2x4 

 
 x2 

11.  3x4 
  

 x4  

 

0
13. A has more columns than rows.
15. Yes
17. No
 
 x1 
19.  x2 
 
−12x1 + 3x2
 
a b
21.
b+d d
23. M22
374 Appendices

25. No
 
 x 
27. 0
 
z
29. A has more rows than columns.
31. Yes
   
1 0 0 1 0 0
33. (a) , (b) , (c) A has more columns than rows
0 0 1 2 0 0
2.6:
 
3 −4 0 −5
1.
4 1 17 8
 
−1 3 5 0
3.  2 4 6 −6
0 9 −2 10
 
2 1 −11 5
5. −5 0 3 −1/2
10 −9 2 0
 
−4 0 −1 0 7
7.  0 1 1 2 9
1 0 0 5 13
 
2 4 −2 4 −1
9.
1 2 −1 0 9
 
3 −2 0
0 1 −5
11.  
−6 8 1
7 4 2
   
0 −1 4 2
13. (a) ~x = , (b) −x2 + 4x3 = 2, 6x1 − 5x3 = 11, (c)
6 0 −5 11
       
0 −1 4 2
x1 + x2 + x3 =
6 0 −5 11
 
1 −3 0 0
 
15. Yes,  0 0 1 0 
0 0 0 1
" #
1 0 1 4
17. Yes,
0 1 0 −3
 
1 −2 0 0 1
0 0 1 0 1
19.  
0 0 0 1 1
0 0 0 0 0
A.3 Solutions to Odd Exercises 375
 
1 0 0 −3
5
21. 0 1 0 − 2 
0 0 1 − 12
 
1 0 0
23. 0 1 0
0 0 1
 
−7
25.  2 
6
 
2
27. −1
5
 
5
29.  3 
−1
 
0
31.  1 
−1
2.7:
 
 5 + 3x3 

 
−3 − 2x3 
1.  
  x3 

 

0
 
11
3. −4
3
 
 x3 
5. x3 
 
x3
 

 3x3  
 
7. −2x3 
  x3 

 

0
9. No
11. Yes
13. linearly independent
15. linearly independent
17. (a) linearly dependent, (b) 2
376 Appendices
 
1 0 −4
0 1 2
19. (a) Their reduced echelon form is 
0 0
. (b) ~v3 = −4~v1 +2~v2 ,
0
0 0 0
(c) dim = 2
 
 x1 
21.  x2 
 
−8x1 + 4x2
23. R2
25. No
27. Yes
 
1 0 0
0 1 0
29. 
0

0 1
0 0 0
31. No
33. Yes
35. Yes
37. See if their reduced echelon form had a leading 1 in every row.
39. $284,639 of agriculture, $492,470 of manufacturing, $893,072 of
service
2.8:
1. One unique solution
3. No solutions
 
5
5.  3 
−1
7. x1 = 4k, x2 = 3k, x3 = 6k, x4 = 4k for some positive integer k
9. (a) No leading1 in rightmost
 column and leading 1’s in all variable’s
1 0 4
columns, (b) 0 1 −1
0 0 0
   

 3 −4 
    
1, 0 
11. Span  0  5 

 
 
0 1
A.3 Solutions to Odd Exercises 377
   

 2 −3  

   0 
 1
   

13. Span  0, 4 
   

 0 −1

 

 
0 1
   

 −2 3 
    
1  , 0
15. Span   0  1

 
 
0 1
   
 −2


1 

 0  −3
17. (a) Span   ,   , (b) It is the span from (a) plus any
 1   0 

 
 
0 1
solution to A~x = ~b.
19. The solution set of A~x = ~b is a shifted version of the solution set to
A~x = ~0.
21. No, because the matrix will have more columns than rows.
23. 3
25. 2
27. n = 3, rk(A) = 2, dim(Nul(A)) = 1, and 2 + 1 = 3
29. 2
2.9:
1. Horizontal cut between rows 1 and 2, vertical cut between columns
1 and 2
3. Horizontal cuts between rows 1 and 2 and rows 3 and 4, vertical
cuts between columns 1 and 2 and columns 3 and 4
 
5. 15 −10
 
−4 16
7.
8 −12
 
9. −3 13
 
7 −1
11.
−4 2
13. A: Horizontal cut between rows 2 and 3, B: vertical cut between
columns 3 and 4
15. A: Horizontal cuts between rows 1 and 2 and rows 2 and 3, B:
vertical cuts between columns 1 and 2 and columns 3 and 4
 
6
17.
4
378 Appendices
 
20 17
19.
25 9
 
4
21. −1
2
 
1
23. −2
−1
    
1 0 a b a b
25. =
0 −2 c d −2c −2d
 
1 0
27.
6 1
 
0 0 1
29. 0 1 0
1 0 0
1 
0
31. 4
0 1
 
1 0 0
33. 0 0 1
0 1 0
  
−1 0 0 1 −1 −3
35. −5 3 0  0 1 8
1 2 −10 0 0 1
  
1 0 0 1 −2 4
37. 0 2 0 0 1 −1
1 3 4 0 0 1
    
a 0 0 x 0 0 ax 0 0
39. (a)  b c 0   y z 0 =  bx + cw cz 0 , (b)
d e f w u v dx + ey + f w ez + f u fv
Use the associative property of matrix multiplication.
2.10:
1. AA−1 = A−1 A = I2
 
−18 −3 5
3. A−1 = −12 −2 3
−5 −1 1
5. Not invertible
 3 
− 2 −12 −2
7. A−1 =  0 −1 0
1 6 1
A.3 Solutions to Odd Exercises 379

9. Not invertible
 
5 −2 −1
11. A−1 =  3 −1 −1
−10 4 3
   
x1 −x1 + x2
13. f −1 x2  =  x1 − x2 + x3 
x3 −2x1 + 3x2 − 2x3
15. Not invertible
17. (A−1 )−1 = A
19. ~x = (I3 − A)−1~b
21. y = 22.124 − 0.274x1 + 0.034x2 where x1 is bikers and x2 is smokers
2.11:

1. If A is invertible then A~x = ~b has the unique solution ~x = A−1~b.


3. If f is onto then A’s reduced echelon form has a leading 1 in every
row.
5. 8 → 5 → 6 → 7 → 3 → 4
7. 5, A has a column of zeros
n o
11. Use a nonzero ~x in B’s kernel to show AB’s kernel isn’t ~0 .
3.1:
1. No
3. Yes
 
1 1
5. has reduced echelon form I2
1 −1
7. There are dim(M22 = 4 matrices, and they are linearly independent.
9. Yes
     
1 0 4
0 2  1 
11.      
0, 0, −1
0 1 2
   
2 0
13. ,
0 1
     
4 −1 0
1  0   0 
     
0  2   0 
15.      
0,  1 ,  1 
     
0  0  −5
0 0 1
380 Appendices
   
−2 −2
3 −5
   
17.  
 1 ,
0
 
0 −2
0 1
19. Yes
21. 36
23. 2
25. R12
   1 
5 5
27. (a)  28 , (b) −2
−2 1
   
5 −1
 10  1
29. (a)    
−14, (b)  1 
4 −2
1
  2
6 5 3
31. (a) , (b) 
1

4 8
3
 
  4
5 0 3
33. (a) , (b) 
−1

5 −5
1
35. linearly dependent
37. do not span M22
3.2:
1. The sum of two even numbers is even, the sum of two positive
numbers is positive, but a negative scalar times a positive number
is negative.
3. W is a subspace of P2
5. There are dim(P2 ) = 3 polynomials, and they are linearly indepen-
dent.
7. No 
2
9. (a) x2 + 3x + 3, (b) −1
3
 
−4
11. (a) 8x2 + x + 5, (b)  5 
−5
A.3 Solutions to Odd Exercises 381

13. It is in their span.


15. It is in their span.
17. linearly independent
19. linearly dependent
21. (a) M22 , (b) P2 (or any Pn with n ≥ 2)
 
0 b −b
23. (a) , (b) {ax4 + bx2 + c}
d e d
 
2 a b
25. (a) {ax + ax − a}, (b)
a + 2b 0
3.3:
1. Use the subspace test.
3. There are dim(C) = 2 complex numbers, and they are linearly
independent.
5. No 

2
7. (a) −3 + 2i, (b)
−8
" #
26
7
9. (a) −13 + 10i, (b)
− 20
7
11. linearly dependent
13. linearly dependent
15. they span C
17. they span C
19. (a) C, (b) M22
 
a −a
21. (a) 0, (b)
0 b
 
2b b
23. (a) , (b) C
0 d
25. Use the subspace test.
27. Show D is a subspace of C .
29. f is a linear map.
4.1:
 
81 0 0
1.  0 16 0
0 0 49
 
1 0 0
3. 0 0 0
0 0 64
382 Appendices
 
6 3 30
5. (a) 4 −3 −20, (b) Multiply each column of A by the
0 −21 10
corresponding diagonal entry of B.
7. Use the subspace test.
9. Not an eigenvector
11. Yes, λ = 3
 
9
13. 153
36
 
−128
15.  64 
64
 
−117
17. −315
45
 
−108
19.  54 
27
21. λ > 1 gives population growth, λ < 1 means population dies out
4.2:
1. −33
3. −14
5. Scales area by 3, doesn’t flip the plane
7. Doesn’t change area, flips the plane
9. −40
11. 16
 
2 −1
13.
1 0
 
−5 2
15.
3 1
17. Multiplies it by −1
19. Multiplies it by −4
21. 22
23. −160
25. Invertible
27. Invertible
A.3 Solutions to Odd Exercises 383
1
29. (a) −18, (b) 3
1
31. (a) −24, (b) 12
33. 48
35. a(−1)2 (d) + b(−1)3 (c) = ad − bc
4.3:
1
1. λ = 4 one time, λ = −1 two times, λ = 2 one time
3. λ = −2 one time, λ = 0 one time, λ = 3 one time
5. λ = −4, λ = −1, λ = 0
7. λ = −9, λ = 7, λ = 0
 
 0 
9. E−2 = Span 1
 
0
 
 −3 
11. E3 = Span  2 
 
1
   
 89   −1 
13. λ = −9, E−9 = Span −82 , λ = −3, E−3 = Span  2  ,
   
 66 0
 2 
λ = 2, E2 = Span 1
 
0
   
 −13   1 
15. λ = 9, E9 = Span  35  , λ = −6, E−6 = Span 0 , λ = 3,
   
  5 0
 −1 
E2 = Span −1
 
1
 
x 1
17. A = , x = ±1
1 x
19. 2
4.4:
1. 3
3. 2
5. No
7. Yes
9. Yes
11. Yes
384 Appendices

13. Yes
15. No
4.5:
 
−18
1.
28
 
9
3. 2
−1
 
3
5.
−7
 
−7
7.  −9 
−14
 
−1 5 2
9.  3 −1 1
4 0 −3
 
−1 5 3 2
1 0 0 3
11.  
 0 −2 −1 −1
2 4 2 −1
 
1 −4 3
13. −1 0 1
1 2 −1
 
3 1 5
15. 7 0 9
−1 −3 1
 
8 −11 4 −2
0 6 −5 3
17.  
9 3 −1 4
1 −3 2 11
 1 
− 4 − 34 −4 3
 1 
19. − 2 − 12 − 12 
1 3 3
2 2 2
 1 1 1 
−6 3 − 12 2
 1 0 1
− 12 
 2 2 
21.  1 1 
− 2 1 − 12 
1
6 − 13 1
2
1
2
A.3 Solutions to Odd Exercises 385
   
4 −1 3 −7 0 0
23. P = 1 0 −4, D =  0 2 0 
0 6 2 0 0 −1
   
−13 4 0 −1 −1 0 0 0
 10 −1 3 1   0 −1 0 0
25. P =   0
, D =  1 
0 2 5 0 0 2 0
1 2 0 −3 0 0 0 6
27. Repeated multiplication by the same matrix of many vectors.
5.1:
1. −9
3. 0
5. 32

7. 21

9. 74

11. 17

13. 34
15. 9
 √1

2
17.  0 
− √12
 2 
− √5
 1 
19.  √ 
5
0
5.2:
1. Yes
3. No
5. Yes
 
1
7. 1
0
9. No
11. No
13. Yes
15. No
 1 
 − 5 x3 
17. − 3 x 
 5 3 
x3
386 Appendices
 1 
 x − 2x
 3 3 3 4 
 
 1 x − 3 x 
19.  2 3 2 4
  x3 

 

x4
 

 x4  
 
 4x4 
21. −3x4 

 
 
x4
5.3:
   
−3 0
1. ~x =  4 , ~y =  2 
−4 −2
   
6 2
3. orthogonal projection: , component orthogonal:
1 −2
   
2 1
5. orthogonal projection: −2, component orthogonal: 1
2 0
7. Yes
9. Yes
   
−1 4
11. orthogonal projection:  8 , component orthogonal: 0
1 4
   
7 3
−3 6
13. orthogonal projection:  , component orthogonal:  
  
4 0
2 −8

15. 5
" 225 #  
√ √
10 1
17. (a) 675 , (b) Span , (c) 90 5
√ 1
10

5.4:
1. No
3. Yes
5. − 35
11
7. 5
 5
−2
9.  −1 
3
2
A.3 Solutions to Odd Exercises 387
1
5
 
11.  13 
1
2
   
1 6
13. ,
3 −2
(" 1 # " 2 #)
− √5 − √5
15. ,
√2 − √15
5

17. dim(W ) = 2, dim(W ⊥ ) = 1, n = 3


19. 3
21. 6
23. The dot product of the ith and jth standard basis vectors is always
0 because their 1s occur in different entries.
Bibliography

[1] D. Hill B. Kolman. Elementary Linear Algebra with Applications, Ninth


Edition. Pearson, 2008.
[2] H. Caswell. Matrix Population Models, Second Edition. Sinauer Asso-
ciates, 2000.
[3] Commutative Diagrams with TikZ. CTAN Archive, http://ctan.math.
washington.edu/tex- archive/graphics/pgf/contrib/tikz- cd/
tikz-cd-doc.pdf.
[4] J. McDonald D. Lay S. Lay. Linear Algebra and Its Applications, Fifth
Edition. Pearson, 2016.
[5] G. Grätzer. Math into LaTeX, Third Edition. Brikhäuser Springer, 2000.
[6] Introductory Mathematical Economics, Second Edition. Oxford Univer-
sity Press, 2004.
[7] D. Lay. Linear Algebra and Its Applications, Third Edition. Pearson,
2006.
[8] A. Hadi S. Chatterjee. Regression Analysis by Example. 4th. Wiley, 2006.
[9] L. Spence S. Friedberg A. Insel. Linear Algebra, Third Edition. Prentice
Hall, 1979.
[10] D. Taylor. The Mathematics of Games. CRC Press, 2015.
[11] TeX-LaTeX Stack Exchange. https://tex.stackexchange.com.

389
Index

1-1 function, 110, 112, 138 of lower triangular, 264


of upper triangular, 265
Aij , 256 diagonal matrix, 242
addition diagonalizable matrix, 290
of vectors, 12 diagonalization of a matrix, 301
of complex numbers, 232, 356 dimension, 211
of matrices, 82 of C, 235
in Mathematica, 364 of C , 238
aij , 60 of Mmn , 211
augmented coefficient matrix, 120 of Rn , 211
of P , 226
B-coordinate map, 207 of Pn , 224
basis, 200, 214 of span, 45
basis of eigenvectors, 284, 288 domain, 54
in Mathematica, 366 dot product, 307
in Mathematica, 367
C, 231, 355
C , 236 ~ei , 71
change of coordinates matrix, 293 eigenspace, 279
inverse, 299 in Mathematica, 366
standard basis of Rn , 300 eigenvalue, 245, 275
codomain, 54 in Mathematica, 365
column space, 114 eigenvector, 245, 275, 279
complex numbers, 231, 355 in Mathematica, 366
component orthogonal elementary matrix, 174
to a span, 336
to a vector, 331 free variable, 152
coordinate map, 207 Fundamental Theorem of Algebra,
coordinate vector, 204 360
orthogonal basis, 344
geometry
demographic matrix, 62 of complex numbers, 355
determinant, 257, 258, 261, 268 of determinant, 251
after row operations, 268 of scalar multiplication of
in Mathematica, 364 vectors, 15, 16
of 2 × 2, 251 of span, 28, 41
of a product, 271 Gram-Schmidt process, 347
of an inverse, 271 in Mathematica, 367

391
392 Index

homogeneous equation, 154 normalizing a vector, 314


in Mathematica, 367
identity matrix, 62 null space, 108
In , 62 in Mathematica, 367
inverse matrix, 183, 184
in Mathematica, 364 onto function, 116, 142
of 2 × 2, 251 orthogonal, 321, 322
invertible function, 182 orthogonal basis, 343
invertible matrix, 183, 184, 270 orthogonal complement, 324, 326,
Invertible Matrix Theorem, 192, 270 327
orthogonal projection
kernel, 108 onto a span, 336
onto a vector, 331
leading 1, 123 in Mathematica, 367
length of a vector, 311 orthogonal set, 334, 342
linear combination, 25 orthonormal basis, 346
linear equation, 121 in Mathematica, 367
linear function, 54, 103
linear system, 121 P , 220
linearly dependent, 43, 101, 140 partition, 166
linearly independent, 43, 101, 140, PC←B , 293
284, 342 perpendicular, 321
lower triangular matrix, 171 Pn , 221
LU -factorization, 172
range, 113
Mathematica, 362 rank, 160, 161
matrix, 60 Rank-Nullity Theorem, 161
entry, 60 reduced echelon form, 123
in Mathematica, 363 Rn , 11
of a linear map, 61 row operations, 125, 174
size, 60 row reduction algorithm, 127
matrix equation, 68 in Mathematica, 364
in Mathematica, 364
Mmn , 95 scalar, 12
multiplication scalar multiplication
of complex numbers, 358 of complex numbers, 232
of matrices, 87, 89 of matrices, 83
in Mathematica, 364 in Mathematica, 364
of matrix and vector, 66 of vectors, 14
in Mathematica, 364 solution, 148
multiplicity, 288 solution set, 148, 150, 153
in Mathematica, 366 as a span, 155
in Mathematica, 364
nonhomogeneous equation, 154 span, 28, 99
norm of a vector, 311, 322 standard basis
in Mathematica, 367 of C, 235
Index 393

of Mmn , 201
of Rn , 202
of P , 226
of Pn , 223
subspace, 38, 98, 100
subspace test, 98

transpose, 187

upper triangular matrix, 171

vector, 9
in Mathematica, 363
vector equation, 35
vector space, 97

W ⊥ , 324

zero vector
of C, 233
of C , 237
of Mmn , 95
of Rn , 17
of P , 221

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy