0% found this document useful (0 votes)
34 views109 pages

Linear Guest (110 218)

The document discusses the definitions and properties of vector spaces, including addition, scalar multiplication, and the identification of zero vectors and additive inverses. It also covers linear transformations, emphasizing their simplicity and the ability to specify them with limited information. Additionally, it explores the concept of bases in vector spaces and how linear operators can be represented through matrices.

Uploaded by

junnie0208
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views109 pages

Linear Guest (110 218)

The document discusses the definitions and properties of vector spaces, including addition, scalar multiplication, and the identification of zero vectors and additive inverses. It also covers linear transformations, emphasizing their simplicity and the ability to specify them with limited information. Additionally, it explores the concept of bases in vector spaces and how linear operators can be represented through matrices.

Uploaded by

junnie0208
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

110 Vector Spaces

Propose definitions for addition and scalar multiplication in V . Identify


the zero vector in V , and check that every matrix in V has an additive
inverse.

5. Let P3R be the set of polynomials with real coefficients of degree three
or less.
(a) Propose a definition of addition and scalar multiplication to make
P3R a vector space.
(b) Identify the zero vector, and find the additive inverse for the vector
3 2x + x2 .
(c) Show that P3R is not a vector space over C. Propose a small
change to the definition of P3R to make it a vector space over C.
(Hint: Every little symbol in the the instructions for par (c) is
important!)

Hint

6. Let V = {x 2 R|x > 0} =: R+ . For x, y 2 V and 2 R, define


x y = xy , ⌦x=x .
Show that (V, , ⌦, R) is a vector space.
7. The component in the ith row and jth column of a matrix can be
labeled mij . In this sense a matrix is a function of a pair of integers.
For what set S is the set of 2 ⇥ 2 matrices the same as the set RS ?
Generalize to other size matrices.
8. Show that any function in R{⇤,?,#} can be written as a sum of multiples
of the functions e⇤ , e? , e# defined by
8 8 8
<1 , k = ⇤ <0 , k = ⇤ <0 , k = ⇤
e⇤ (k) = 0 , k = ? , e? (k) = 1 , k = ? , e# (k) = 0 , k = ? .
: : :
0, k = # 0, k = # 1, k = #

9. Let V be a vector space and S any set. Show that the set V S of all
functions S ! V is a vector space. Hint: first decide upon a rule for
adding functions whose outputs are vectors.

110
Linear Transformations
6
The main objects of study in any course in linear algebra are linear functions:

Definition A function L : V ! W is linear if V and W are vector spaces


and

L(ru + sv) = rL(u) + sL(v)

for all u, v 2 V and r, s 2 R.

Reading homework: problem 1

Remark We will often refer to linear functions by names like “linear map”, “linear
operator” or “linear transformation”. In some contexts you will also see the name
“homomorphism” which generally is applied to functions from one kind of set to the
same kind of set while respecting any structures on the sets; linear maps are from
vector spaces to vector spaces that respect scalar multiplication and addition, the two
structures on vector spaces. It is common to denote a linear function by capital L
as a reminder of its linearity, but sometimes we will use just f , after all we are just
studying very special functions.

The definition above coincides with the two part description in Chapter 1;
the case r = 1, s = 1 describes additivity, while s = 0 describes homogeneity.
We are now ready to learn the powerful consequences of linearity.

111
112 Linear Transformations

6.1 The Consequence of Linearity


Now that we have a sufficiently general notion of vector space it is time to
talk about why linear operators are so special. Think about what is required
to fully specify a real function of one variable. One output must be specified
for each input. That is an infinite amount of information.
By contrast, even though a linear function can have infinitely many ele-
ments in its domain, it is specified by a very small amount of information.

Example 69 (One output specifies infinitely many)


If you know that the function L is linear and that
✓ ◆ ✓ ◆
1 5
L =
0 3

then you do not need any more information to figure out


✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
2 3 4 5
L , L ,L , L , etc . . . ,
0 0 0 0

because by homogeneity
✓ ◆  ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
5 1 1 5 25
L =L 5 = 5L =5 = .
0 0 0 3 15

In this way an an infinite number of outputs is specified by just one.

Example 70 (Two outputs in R2 specifies all outputs)


Likewise, if you know that L is linear and that
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 5 0 2
L = and L =
0 3 1 2

then you don’t need any more information to compute


✓ ◆
1
L
1

because by additivity
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 1 0 1 0 5 2 7
L =L + =L +L = + = .
1 0 1 0 1 3 2 5

112
6.1 The Consequence of Linearity 113

In fact, since every vector in R2 can be expressed as


✓ ◆ ✓ ◆ ✓ ◆
x 1 0
=x +y ,
y 0 1

we know how L acts on every vector from R2 by linearity based on just two pieces of
information;
✓ ◆  ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
x 1 0 1 0 5 2 5x + 2y
L =L x +y = xL +yL =x +y = .
y 0 1 0 1 3 2 3x + 2y

Thus, the value of L at infinitely many inputs is completely specified by its value at
just two inputs. (We can see now that L acts in exactly the way the matrix
✓ ◆
5 2
3 2

acts on vectors from R2 .)

Reading homework: problem 2

This is the reason that linear functions are so nice; they are secretly very
simple functions by virtue of two characteristics:

1. They act on vector spaces.

2. They act additively and homogeneously.

A linear transformation with domain R3 is completely specified by the


way it acts on the three vectors
0 1 0 1 0 1
1 0 0
@0 A , @1 A , @0 A .
0 0 1

Similarly, a linear transformation with domain Rn is completely specified


by its action on the n di↵erent n-vectors that have exactly one non-zero
component, and its matrix form can be read o↵ this information. However,
not all linear functions have such nice domains.

113
114 Linear Transformations

6.2 Linear Functions on Hyperplanes


It is not always so easy to write a linear operator as a matrix. Generally,
this will amount to solving a linear systems problem. Examining a linear
function whose domain is a hyperplane is instructive.

Example 71 Let 8 9
0 1 0 1
< 1 0 =
@ A @ A
V = c 1 1 + c 2 1 c 1 , c2 2 R
: ;
0 1

and consider L : V ! R3 be a linear function that obeys


0 1 0 1 0 1 0 1
1 0 0 0
L @1A = @1A , L @1A = @1A .
0 0 1 0

By linearity this specifies the action of L on any vector from V as


2 0 1 0 13 0 1
1 0 0
L 4c1 @1A + c2 @1A5 = (c1 + c2 ) @1A .
0 1 0

The domain of L is a plane and its range is the line through the origin in the x2
direction.
It is not clear how to formulate L as a matrix; since
0 1 0 10 1 0 1
c1 0 0 0 c1 0
L @c1 + c2 A = @1 0 1A @c1 + c2 A = (c1 + c2 ) @1A ,
c2 0 0 0 c2 0

or
0 1 0 10 1 0 1
c1 0 0 0 c1 0
L @c1 + c2 A = @0 1 0A @c1 + c2 A = (c1 + c2 ) @1A ,
c2 0 0 0 c2 0

you might suspect that L is equivalent to one of these 3 ⇥ 3 matrices. It is not. By


the natural domain convention, all 3 ⇥ 3 matrices have R3 as their domain, and the
domain of L is smaller than that. When we do realize this L as a matrix it will be as a
3 ⇥ 2 matrix. We can tell because the domain of L is 2 dimensional and the codomain
is 3 dimensional. (You probably already know that the plane has dimension 2, and a

114
6.3 Linear Di↵erential Operators 115

line is 1 dimensional, but the careful definition of “dimension” takes some work; this
is tackled in Chapter 11.) This leads us to write
2 0 1 0 13 0 1 0 1 0 1
1 0 0 0 0 0 ✓ ◆
c
L 4c1 @1A + c2 @1A5 = c1 @1A + c2 @1A = @1 1A 1 .
c2
0 1 0 0 0 0
0 1
0 0
This makes sense, but requires a warning: The matrix @1 1A specifies L so long
0 0
as you also provide the information that you are labeling points in the plane V by the
two numbers (c1 , c2 ).

6.3 Linear Di↵erential Operators


Your calculus class became much easier when you stopped using the limit
definition of the derivative, learned the power rule, and started using linearity
of the derivative operator.
Example 72 Let V be the vector space of polynomials of degree 2 or less with standard
addition and scalar multiplication;
V := {a0 · 1 + a1 x + a2 x2 | a0 , a1 , a2 2 R}
d
Let dx : V ! V be the derivative operator. The following three equations, along with
linearity of the derivative operator, allow one to take the derivative of any 2nd degree
polynomial:
d d d 2
1 = 0, x = 1, x = 2x .
dx dx dx
In particular
d d d d
(a0 1 + a1 x + a2 x2 ) = a0 1 + a1 x + a2 x2 = 0 + a1 + 2a2 x.
dx dx dx dx
Thus, the derivative acting any of the infinitely many second order polynomials is
determined by its action for just three inputs.

6.4 Bases (Take 1)


The central idea of linear algebra is to exploit the hidden simplicity of linear
functions. It ends up there is a lot of freedom in how to do this. That
freedom is what makes linear algebra powerful.

115
116 Linear Transformations

You saw that a linear operator ✓ acting


◆ on ◆R2 is completely specified by

1 0
how it acts on the pair of vectors and . In fact, any linear operator
0 1
acting
✓ ◆ on ✓ R2 is◆also completely specified by how it acts on the pair of vectors
1 1
and .
1 1

Example 73 The linear operator L is a linear operator then it is completely specified


by the two equalities

✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 2 1 6
L = , and L = .
1 4 1 8

✓ ◆ ✓ ◆ ✓ ◆
x 1 1
This is because any vector in R is a sum of multiples of
2 and which
y 1 1
can be calculated via a linear systems problem as follows:

✓ ◆ ✓ ◆ ✓ ◆
x 1 1
=a +b
y 1 1
✓ ◆✓ ◆ ✓ ◆
1 1 a x
, =
1 1 b y
✓ ◆ ✓ ◆
1 1 x 1 0 x+y
2
, ⇠
1 1 y 0 1 x2y

a = x+y
2
,
b = x2y .

Thus
! ! !
x x+y 1 x y 1
= + .
y 2 1 2 1

We can then calculate how L acts on any vector by first expressing the vector as a

116
6.4 Bases (Take 1) 117

sum of multiples and then applying linearity;


✓ ◆  ✓ ◆ ✓ ◆
x x+y 1 x y 1
L = L +
y 2 1 2 1
✓ ◆ ✓ ◆
x+y 1 x y 1
= L + L
2 1 2 1
✓ ◆ ✓ ◆
x+y 2 x y 6
= +
2 4 2 8
✓ ◆ ✓ ◆
x+y 3(x y)
= +
2(x + y) 4(x y)
✓ ◆
4x 2y
=
6x y

Thus L is completely specified by its value at just two inputs.

It should not surprise you to learn there are infinitely many pairs of
vectors from R2 with the property that any vector can be expressed as a
linear combination of them; any pair that when used as columns of a matrix
gives an invertible matrix works. Such a pair is called a basis for R2 .
Similarly, there are infinitely many triples of vectors with the property
that any vector from R3 can be expressed as a linear combination of them:
these are the triples that used as columns of a matrix give an invertible
matrix. Such a triple is called a basis for R3 .
In a similar spirit, there are infinitely many pairs of vectors with the
property that every vector in
8 0 1 0 1 9
< 1 0 =
@ A @ A
V = c 1 1 + c 2 1 c 1 , c2 2 R
: ;
0 1

can be expressed as a linear combination of them. Some examples are


8 0 1 0 1 9 8 0 1 0 1 9
< 1 0 = < 1 1 =
@ A @ A @ A @ A
V = c 1 1 + c 2 2 c 1 , c2 2 R = c 1 1 + c 2 3 c 1 , c2 2 R
: ; : ;
0 2 0 2

Such a pair is a called a basis for V .


You probably have some intuitive notion of what dimension means (the
careful mathematical definition is given in chapter 11). Roughly speaking,

117
118 Linear Transformations

dimension is the number of independent directions available. To figure out


the dimension of a vector space, I stand at the origin, and pick a direction.
If there are any vectors in my vector space that aren’t in that direction, then
I choose another direction that isn’t in the line determined by the direction I
chose. If there are any vectors in my vector space not in the plane determined
by the first two directions, then I choose one of them as my next direction. In
other words, I choose a collection of independent vectors in the vector space
(independent vectors are defined in Chapter 10). A minimal set of indepen-
dent vectors is called a basis (see Chapter 11 for the precise definition). The
number of vectors in my basis is the dimension of the vector space. Every
vector space has many bases, but all bases for a particular vector space have
the same number of vectors. Thus dimension is a well-defined concept.
The fact that every vector space (over R) has infinitely many bases is
actually very useful. Often a good choice of basis can reduce the time required
to run a calculation in dramatic ways!
In summary:

A basis is a set of vectors in terms of which it is possible to


uniquely express any other vector.

6.5 Review Problems


Reading problems 1 ,2
Linear? 3
Webwork:
Matrix ⇥ vector 4, 5
Linearity 6, 7

1. Show that the pair of conditions:



L(u + v) = L(u) + L(v)
(1)
L(cv) = cL(v)

(valid for all vectors u, v and any scalar c) is equivalent to the single
condition:
L(ru + sv) = rL(u) + sL(v) , (2)
(for all vectors u, v and any scalars r and s). Your answer should have
two parts. Show that (1) ) (2), and then show that (2) ) (1).

118
6.5 Review Problems 119

2. If f is a linear function of one variable, then how many points on the


graph of the function are needed to specify the function? Give an
explicit expression for f in terms of these points. (You might want
to look up the definition of a graph before you make any assumptions
about the function.)
✓ ◆ ✓ ◆
1 2
3. (a) If p = 1 and p = 3 is it possible that p is a linear
2 4
function?
(b) If Q(x2 ) = x3 and Q(2x2 ) = x4 is it possible that Q is a linear
function from polynomials to polynomials?
4. If f is a linear function such that
✓ ◆ ✓ ◆
1 2
f = 0, and f = 1,
2 3
✓ ◆
x
then what is f ?
y
5. Let Pn be the space of polynomials of degree n or less in the variable t.
Suppose L is a linear transformation from P2 ! P3 such that L(1) = 4,
L(t) = t3 , and L(t2 ) = t 1.
(a) Find L(1 + t + 2t2 ).
(b) Find L(a + bt + ct2 ).
(c) Find all values a, b, c such that L(a + bt + ct2 ) = 1 + 3t + 2t3 .

Hint

R x operator I that maps f to the function If defined


6. Show that the
by If (x) := 0 f (t)dt is a linear operator on the space of continuous
functions.
7. Let z 2 C. Recall that z = x+iy for some x, y 2 R, and we can form the
complex conjugate of z by taking z = x iy. The function c : R2 ! R2
which sends (x, y) 7! (x, y) agrees with complex conjugation.
(a) Show that c is a linear map over R (i.e. scalars in R).
(b) Show that z is not linear over C.

119
120 Linear Transformations

120
Matrices
7
Matrices are a powerful tool for calculations involving linear transformations.
It is important to understand how to find the matrix of a linear transforma-
tion and the properties of matrices.

7.1 Linear Transformations and Matrices


Ordered, finite-dimensional, bases for vector spaces allows us to express linear
operators as matrices.

7.1.1 Basis Notation

A basis allows us to efficiently label arbitrary vectors in terms of column


vectors. Here is an example.

Example 74 Let
⇢✓ ◆
a b
V = a, b, c, d 2 R
c d

be the vector space of 2 ⇥ 2 real matrices, with addition and scalar multiplication
defined componentwise. One choice of basis is the ordered set (or list) of matrices
✓✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆◆
1 0 0 1 0 0 0 0
B= , , , =: (e11 , e12 , e21 , e22 ) .
0 0 0 0 1 0 0 1

121
122 Matrices

Given a particular vector and a basis, your job is to write that vector as a sum of
multiples of basis elements. Here an arbitrary vector v 2 V is just a matrix, so we
write
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
a b a 0 0 b 0 0 0 0
v = = + + +
c d 0 0 0 0 c 0 0 d
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 0 0 1 0 0 0 0
= a +b +c +d
0 0 0 0 1 0 0 1
= a e11 + b e12 + c e21 + d e22 .

The coefficients (a, b, c, d) of the basis vectors (e11 , e12 , e21 , e22 ) encode the information
of which matrix the vector v is. We store them in column vector by writing
0 1 0 1
a a
B b C B C
v = a e11 + b e12 + c e21 + d e22 =: (e11 , e12 , e21 , e22 ) B C B bC
@ cA =: @ cA .
d d B
0 1
a ✓ ◆
B bC a b
B C
The 4-vector @ A 2 R encodes the vector
4 2 V but is NOT equal to it!
c c d
d
(After all, v is a matrix so could not equal a column vector.) Both notations on the
right hand side of the above equation really stand for the vector obtained by multiplying
the coefficients stored in the column vector by the corresponding basis element and
then summing over them.

Next, lets consider a tautological example showing how to label column


vectors in terms of column vectors:

Example 75 (Standard Basis of R2 )


The vectors ✓ ◆ ✓ ◆
1 0
e1 = , e2 =
0 1

are called the standard basis vectors of R2 = R{1,2} . Their description as functions
of {1, 2} are
⇢ ⇢
1 if k = 1 0 if k = 1
e1 (k) = , e2 (k) =
0 if k = 2 1 if k = 2 .

122
7.1 Linear Transformations and Matrices 123

It is natural to assign these the order: e1 is first and e2 is second. An arbitrary vector v
of R2 can be written as ✓ ◆
x
v= = xe1 + ye2 .
y
To emphasize that we are using the standard basis we define the list (or ordered set)

E = (e1 , e2 ) ,

and write ✓ ◆ ✓ ◆
x x
:= (e1 , e2 ) := xe1 + ye2 = v.
y E y
You should read this equation by saying:
✓ ◆
x
“The column vector of the vector v in the basis E is .”
y
Again, the first notation of a column vector with a subscript E refers to the vector
obtained by multiplying each basis vector by the corresponding scalar listed in the
column and then summing these, i.e. xe1 + ye2 . The second notation denotes exactly
the same thing but we first list the basis elements and then the column vector; a
useful trick because this can be read in the same way as matrix multiplication of a row
vector times a column vector–except that the entries of the row vector are themselves
vectors!

You should already try to write down the standard basis vectors for Rn
for other values of n and express an arbitrary vector in Rn in terms of them.
The last example probably seems pedantic because column vectors are al-
ready just ordered lists of numbers and the basis notation has simply allowed
us to “re-express” these as lists of numbers. Of course, this objection does
not apply to more complicated vector spaces like our first matrix example.
Moreover, as we saw earlier, there are infinitely many other pairs of vectors
in R2 that form a basis.

Example 76 (A Non-Standard Basis of R2 = R{1,2} )


✓ ◆ ✓ ◆
1 1
b= , = .
1 1
As functions of {1, 2} they read
⇢ ⇢
1 if k = 1 1 if k = 1
b(k) = , (k) =
1 if k = 2 1 if k = 2 .

123
124 Matrices

Notice something important: there is no reason to say that comes before b or


vice versa. That is, there is no a priori reason to give these basis elements one order
or the other. However, it will be necessary to give the basis elements an order if we
want to use them to encode other vectors. We choose one arbitrarily; let

B = (b, )

be the ordered basis. Note that for an unordered set we use the {} parentheses while
for lists or ordered sets we use ().
As before we define
✓ ◆ ✓ ◆
x x
:= (b, ) := xb + y .
y B y

You might think that the numbers x and y denote exactly the same vector as in the
previous example. However, they do not. Inserting the actual vectors that b and
represent we have
✓ ◆ ✓ ◆ ✓ ◆
1 1 x+y
xb + y = x +y = .
1 1 x y
Thus, to contrast, we have
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
x x+y x x
= and =
y B x y y E y

Only in the standard basis E does the column vector of v agree with the column vector
that v actually is!

Based on the above example, you might think that our aim would be to
find the “standard basis” for any problem. In fact, this is far from the truth.
Notice, for example that the vector
✓ ◆
1
v= = e1 + e2 = b
1
written in the standard basis E is just
✓ ◆
1
v= ,
1 E
which was easy to calculate. But in the basis B we find
✓ ◆
1
v= ,
0 B

124
7.1 Linear Transformations and Matrices 125

which is actually a simpler column vector! The fact that there are many
bases for any given vector space allows us to choose a basis in which our
computation is easiest. In any case, the standard basis only makes sense
for Rn . Suppose your vector space was the set of solutions to a di↵erential
equation–what would a standard basis then be?

Example 77 (A Basis For a Hyperplane)


Lets again consider the hyperplane
8 0 1 0 1 9
< 1 0 =
V = c1 @1A + c2 @1A c1 , c2 2 R
: ;
0 1

One possible choice of ordered basis is


0 1 0 1
1 0
b1 = @1A , b2 = @1A , B = (b1 , b2 ).
0 1

With this choice


0 1 0 1 0 1
✓ ◆ 1 0 x
x
:= xb1 + yb2 = x @1A + y @1A = @ x + y A .
y B
0 1 y E

With the other choice of order B 0 = (b2 , b1 )


0 1 0 1 0 1
✓ ◆ 0 1 y
x
:= xb2 + yb1 = x @1A + y @1A = @ x + y A .
y B0
1 0 x E

We see that the order of basis elements matters.

Finding the column vector of a given vector in a given basis usually


amounts to a linear systems problem:

Example 78 (Pauli Matrices)


Let ⇢✓ ◆
z u
V = z, u, v 2 C
v z
be the vector space of trace-free complex-valued matrices (over C) with basis

B=( x, y, z) ,

125
126 Matrices

where ✓ ◆ ✓ ◆ ✓ ◆
0 1 0 i 1 0
x = , y = , z = .
1 0 i 0 0 1
These three matrices are the famous Pauli matrices; they are used to describe electrons
in quantum theory, or qubits in quantum computation. Let
✓ ◆
2+i 1+i
v= .
3 i 2 i

Find the column vector of v in the basis B.


For this we must solve the equation
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
2+i 1+i x 0 1 y 0 i z 1 0
=↵ +↵ +↵ .
3 i 2 i 1 0 i 0 0 1

This gives four equations, i.e. a linear systems problem, for the ↵’s
8 x
> ↵x
> i↵y = 1+i
<
↵ + i↵y = 3 i
>
> ↵z = 2+i
:
↵z = 2 i

with solution
↵x = 2 , ↵y = 2 2i , ↵z = 2 + i.
Thus 0 1
✓ ◆ 2
2+i 1+i
v= =@ 2 iA .
3 i 2 i
2+i B

To summarize, the column vector of a vector v in an ordered basis B =


(b1 , b2 , . . . , bn ), 0 1
↵1
B ↵2 C
B C
B .. C ,
@ . A
↵n
is defined by solving the linear systems problem
n
X
v = ↵ 1 b1 + ↵ 2 b2 + · · · + ↵ n bn = ↵ i bi .
i=1

126
7.1 Linear Transformations and Matrices 127

The numbers (↵1 , ↵2 , . . . , ↵n ) are called the components of the vector v. Two
useful shorthand notations for this are
0 1 0 1
↵1 ↵1
B ↵2 C B ↵2 C
B C B C
v = B .. C = (b1 , b2 , . . . , bn ) B .. C .
@ . A @ . A
n
↵ B
↵n

7.1.2 From Linear Operators to Matrices


Chapter 6 showed that linear functions are very special kinds of functions;
they are fully specified by their values on any basis for their domain. A
matrix records how a linear operator maps an element of the basis to a sum
of multiples in the target space basis.
More carefully, if L is a linear operator from V to W then the matrix for L
in the ordered bases B = (b1 , b2 , . . . ) for V and B 0 = ( 1 , 2 , . . . ) for W , is
the array of numbers mji specified by
L(bi ) = m1i 1 + · · · + mji j + ···
Remark To calculate the matrix of a linear transformation you must compute what
the linear transformation does to every input basis vector and then write the answers
in terms of the output basis vectors:
L(b1 ), L(b2 ), . . . , L(bj ), . . .
0 1 0 1 0 1
m11 m12 m1i
B m2 C B m2 C B m2 C
⇣ B 2 C B 2 C B i C ⌘
B .. C B .. C B .. C
= ( 1 , 2 , . . . , j , . . .) B
B .
C,(
C ,
1 2 , . . . , j , . . .) B .
B
C,··· ,(
C ,
1 2 , . . . , j , . . .) B .
B
C,···
C
B mj C B mj C B mj C
@ 1 A @ 2 A @ i A
.. .. ..
. . .

0 1 1
m1 m12 · · · m1i · · ·
Bm2 m2 · · · m2 · · ·C
B 1 2 i C
B .. .. .. C
B
= ( 1 , 2 , . . . , j , . . .) B . . . C
C
Bmj mj · · · mj · · ·C
@ 1 2 i A
.. .. ..
. . .
Example 79 Consider L : V ! R3 (as in example 71) defined by
0 1 0 1 0 1 0 1
1 0 0 0
L 1 = 1 , L 1 = 1A .
@ A @ A @ A @
0 0 1 0

127
128 Matrices

By linearity this specifies the action of L on any vector from V as


2 0 1 0 13 0 1
1 0 0
L 4c1 @1A + c2 @1A5 = (c1 + c2 ) @1A .
0 1 0

We had trouble expressing this linear operator as a matrix. Lets take input basis
00 1 0 11
1 0
B = @@1A , @1AA =: (b1 , b2 ) ,
0 1

and output basis 00 1 0 1 0 11


1 0 0
E = @@0A , @1A , @0AA .
0 0 1
Then
Lb1 = 0e1 + 1e2 + 0e3 ,
Lb2 = 0e1 + 1e2 + 0e3 ,
or
0 1 0 1 0 1
0 0 0 0
Lb1 , Lb2 ) = (e1 , e2 , e3 ) @1A , (e1 , e2 , e3 ) @1A = (e1 , e2 , e3 ) @1 1A .
0 0 0 0

The matrix on the right is the matrix of L in these bases. More succinctly we could
write 0 1
✓ ◆ 0
x
L = (x + y) @1A
y B
0 E
0 1
0 0
and thus see that L acts like the matrix 1 1A.
@
0 0
Hence 00 1 1
✓ ◆ 0 0 ✓ ◆
x x A
L = @@1 1A ;
y B y
0 0 E
given input and output bases, the linear operator is now encoded by a matrix.

This is the general rule for this chapter:

128
7.2 Review Problems 129

Linear operators become matrices when given


ordered input and output bases.

Reading homework: problem 1

Example 80 Lets compute a matrix for the derivative operator acting on the vector
space of polynomials of degree 2 or less:

V = {a0 1 + a1 x + a2 x2 | a0 , a1 , a2 2 R} .

In the ordered basis B = (1, x, x2 ) we write


0 1
a
@ bA = a · 1 + bx + cx2
c B

and 0 1 0 1
a b
d @ A
b = b · 1 + 2cx + 0x2 = @ 2c A
dx
c B 0 B
In the ordered basis B for both domain and range
0 1
0 1 0
d B @
7! 0 0 2A
dx
0 0 0
Notice this last line makes no sense without explaining which bases we are using!

7.2 Review Problems


Reading problem 1
Webwork:
Matrix of a Linear Transformation 9, 10, 11, 12, 13

1. A door factory can buy supplies in two kinds of packages, f and g. The
package f contains 3 slabs of wood, 4 fasteners, and 6 brackets. The
package g contains 5 fasteners, 3 brackets, and 7 slabs of wood.
(a) Explain how to view the packages f and g as functions and list
their inputs and outputs.

129
130 Matrices

(b) Choose an ordering for the 3 kinds of supplies and use this to
rewrite f and g as elements of R3 .

(c) Let L be a manufacturing process that takes as inputs supply


packages and outputs two products (doors, and door frames). Ex-
plain how it can be viewed as a function mapping one vector space
into another.
(d) Assuming that L is linear and Lf is 1 door and 2 frames, and Lg
is 3 doors and 1 frame, find a matrix for L. Be sure to specify
the basis vectors you used, both for the input and output vector
space.

2. You are designing a simple keyboard synthesizer with two keys. If you
push the first key with intensity a then the speaker moves in time as
a sin(t). If you push the second key with intensity b then the speaker
moves in time as b sin(2t). If the keys are pressed simultaneously,

(a) describe the set of all sounds that come out of your synthesizer.
(Hint: Sounds can be “added”.)
✓ ◆
3
(b) Graph the function 2 R{1,2} .
5
✓ ◆
3
(c) Let B = (sin(t), sin(2t)). Explain why is not in R{1,2} but
5 B
is still a function.
✓ ◆
3
(d) Graph the function .
5 B

d
3. (a) Find the matrix for dx acting on the vector space V of polynomi-
als of degree 2 or less in the ordered basis B = (x2 , x, 1)

(b) Use the matrix from part (a) to rewrite the di↵erential equation
d
dx
p(x) = x as a matrix equation. Find all solutions of the matrix
equation. Translate them into elements of V .

130
7.2 Review Problems 131

d
(c) Find the matrix for dx acting on the vector space V in the ordered
0 2 2
basis B = (x + x, x x, 1).

(d) Use the matrix from part (c) to rewrite the di↵erential equation
d
dx
p(x) = x as a matrix equation. Find all solutions of the matrix
equation. Translate them into elements of V .

(e) Compare and contrast your results from parts (b) and (d).
d
4. Find the “matrix” for dx acting on the vector space of all power series
in the ordered basis (1, x, x2 , x3 , ...). Use this matrix to find all power
d
series solutions to the di↵erential equation dx f (x) = x. Hint: your
“matrix” may not have finite size.

2
2 acting on {c1 cos(x) + c2 sin(x) | c1 , c2 2 R} in
d
5. Find the matrix for dx
the ordered basis (cos(x), sin(x)).

6. Find the matrix for d


dx
acting on {c1 cosh(x) + c2 sinh(x)|c1 , c2 2 R} in
the ordered basis
(cosh(x), sinh(x))
and in the ordered basis
(cosh(x) + sinh(x), cosh(x) sinh(x)).

7. Let B = (1, x, x2 ) be an ordered basis for


V = {a0 + a1 x + a2 x2 | a0 , a1 , a2 2 R} ,
and let B 0 = (x3 , x2 , x, 1) be an ordered basis for
W = {a0 + a1 x + a2 x2 + a3 x3 | a0 , a1 , a2 , a3 2 R} ,
Find the matrix for the operator I : V ! W defined by
Z x
Ip(x) = p(t)dt
1

relative to these bases.

131
132 Matrices

8. This exercise is meant to show you a generalization of the procedure


you learned long ago for finding the function mx+b given two points on
its graph. It will also show you a way to think of matrices as members
of a much bigger class of arrays of numbers.

Find the

(a) constant function f : R ! R whose graph contains (2, 3).


(b) linear function h : R ! R whose graph contains (5, 4).
(c) first order polynomial function g : R ! R whose graph contains
(1, 2) and (3, 3).
(d) second order polynomial function p : R ! R whose graph contains
(1, 0), (3, 0) and (5, 0).
(e) second order polynomial function q : R ! R whose graph contains
(1, 1), (3, 2) and (5, 7).
(f) second order homogeneous polynomial function r : R ! R whose
graph contains (3, 2).

(g) number of points required to specify a third order polynomial


R ! R.
(h) number of points required to specify a third order homogeneous
polynomial R ! R.
(i) number of points required to specify a n-th order polynomial R !
R.
(j) number of points required to specify a n-th order homogeneous
polynomial R ! R.

◆ ✓✓ F :◆R ! ◆ R whose
2
(k) first
✓✓ order◆ polynomial
◆ ✓✓ ◆ function ✓✓ graph
◆ contains

0 0 1 1
,1 , ,2 , , 3 , and ,4 .
0 1 0 1
(l) homogeneous first
✓✓order
◆ polynomial
◆ ✓✓ ◆function
◆ : R2 !
H✓✓ ◆ R ◆whose
0 1 1
graph contains ,2 , , 3 , and ,4 .
1 0 1

132
7.3 Properties of Matrices 133

(m) second✓✓
order◆polynomial
◆ ✓✓ function
◆ ◆ ✓✓ J : R◆2
! ◆R whose graph con-
0 0 0
tains ,0 , ,2 , ,5 ,
0 1 2
✓✓ ◆ ◆ ✓✓ ◆ ◆ ✓✓ ◆ ◆
1 2 1
,3 , , 6 , and ,4 .
0 0 1

✓✓ ◆K ✓: R ◆◆ ! R2 whose graph con-


2
(n) first order
✓✓ polynomial
◆ ✓ ◆◆function
0 1 0 2
tains , , , ,
0 1 1 2
✓✓ ◆ ✓ ◆◆ ✓✓ ◆ ✓ ◆◆
1 3 1 4
, , and , .
0 3 1 4

(o) How many points in the graph of a q-th order polynomial function
Rn ! Rn would completely determine the function?
(p) In particular, how many points of the graph of linear function
Rn ! Rn would completely determine the function? How does a
matrix (in the standard basis) encode this information?
(q) Propose a way to store the information required in 8g above in an
array of numbers.
(r) Propose a way to store the information required in 8o above in an
array of numbers.

7.3 Properties of Matrices


The objects of study in linear algebra are linear operators. We have seen that
linear operators can be represented as matrices through choices of ordered
bases, and that matrices provide a means of efficient computation.
We now begin an in depth study of matrices.

Definition An r ⇥ k matrix M = (mij ) for i = 1, . . . , r; j = 1, . . . , k is a


rectangular array of real (or complex) numbers:
0 1 1
m1 m12 · · · m1k
Bm 2 m 2 · · · m 2 C
B kC
M = B .1 .2 . C.
@ .. .
. .
. A
mr1 mr2 · · · mrk

133
134 Matrices

The numbers mij are called entries. The superscript indexes the row of
the matrix and the subscript indexes the column of the matrix in which mij
appears.

An r ⇥ 1 matrix v = (v1r ) = (v r ) is called a column vector , written


0 1
v1
B v2 C
B C
v = B .. C .
@ . A
vr

A 1 ⇥ k matrix v = (vk1 ) = (vk ) is called a row vector , written

v = v1 v2 · · · vk .

The transpose of a column vector is the corresponding row vector and vice
versa:

Example 81 Let
0 1
1
v = @2A .
3
Then
vT = 1 2 3 ,
and (v T )T = v. This is an example of an involution, namely an operation which when
performed twice does nothing.

A matrix is an efficient way to store information.

Example 82 In computer graphics, you may have encountered image files with a .gif
extension. These files are actually just matrices: at the start of the file the size of the
matrix is given, after which each number is a matrix entry indicating the color of a
particular pixel in the image.
This matrix then has its rows shu✏ed a bit: by listing, say, every eighth row, a web
browser downloading the file can start displaying an incomplete version of the picture
before the download is complete.
Finally, a compression algorithm is applied to the matrix to reduce the file size.

134
7.3 Properties of Matrices 135

Example 83 Graphs occur in many applications, ranging from telephone networks to


airline routes. In the subject of graph theory , a graph is just a collection of vertices
and some edges connecting vertices. A matrix can be used to indicate how many edges
attach one vertex to another.

For example, the graph pictured above would have the following matrix, where mij
indicates the number of edges between the vertices labeled i and j:
0 1
1 2 1 1
B2 0 1 0C
M =B
@1
C
1 0 1A
1 0 1 3

This is an example of a symmetric matrix, since mij = mji .

Adjacency Matrix Example

The set of all r ⇥ k matrices

Mrk := {(mij )|mij 2 R; i 2 {1, . . . , r}; j 2 {1 . . . k}} ,

is itself a vector space with addition and scalar multiplication defined as


follows:

M + N = (mij ) + (nij ) = (mij + nij )


rM = r(mij ) = (rmij )

135
136 Matrices

In other words, addition just adds corresponding entries in two matrices,


and scalar multiplication multiplies every entry. Notice that M1n = Rn is just
the vector space of column vectors.
Recall that we can multiply an r ⇥ k matrix by a k ⇥ 1 column vector to
produce a r ⇥ 1 column vector using the rule
k
X
MV = mij v j .
j=1

This suggests the rule for multiplying an r ⇥ k matrix M by a k ⇥ s


matrix N : our k ⇥ s matrix N consists of s column vectors side-by-side, each
of dimension k ⇥ 1. We can multiply our r ⇥ k matrix M by each of these s
column vectors using the rule we already know, obtaining s column vectors
each of dimension r ⇥ 1. If we place these s column vectors side-by-side, we
obtain an r ⇥ s matrix M N.
That is, let 0 1
n11 n12 · · · n1s
B n2 n2 · · · n2 C
B 1 2 sC
N = B .. .. .. C
@. . .A
n1 n2 · · · nks
k k

and call the columns N1 through Ns :


0 1 0 1 0 1
n11 n12 n1s
B n2 C B n2 C B n2 C
B 1C B 2 C B s C
N1 = B .. C , N2 = B .. C, . . . , Ns = B .. C.
@ . A @ . A @ . A
k
n1 nk2 nks

Then
0 1 0 1
| | | | | |
M N = M @N1 N2 · · · Ns A = @M N1 M N2 · · · M Ns A
| | | | | |

Concisely: If M = (mij ) for i = 1, . . . , r; j = 1, . . . , k and N = (nij ) for


i = 1, . . . , k; j = 1, . . . , s, then M N = L where L = (`ij ) for i = i, . . . , r; j =
1, . . . , s is given by
k
X
i
`j = mip npj .
p=1

136
7.3 Properties of Matrices 137

This rule obeys linearity.


Notice that in order for the multiplication to make sense, the columns
and rows must match. For an r ⇥ k matrix M and an s ⇥ m matrix N , then
to make the product M N we must have k = s. Likewise, for the product
N M , it is required that m = r. A common shorthand for keeping track of
the sizes of the matrices involved in a given product is the following diagram.

⇣ ⌘ ⇣ ⌘ ⇣ ⌘
r ⇥ k times k ⇥ m is r ⇥ m

Reading homework: problem 2

Example 84 Multiplying a (3 ⇥ 1) matrix and a (1 ⇥ 2) matrix yields a (3 ⇥ 2) matrix.


0 1 0 1 0 1
1 1·2 1·3 2 3
@3A 2 3 = @3 · 2 3 · 3A = @6 9A .
2 2·2 2·3 4 6

Another way to view matrix multiplication is in terms of dot products:

The entries of M N are made from the dot products of the rows of
M with the columns of N .

Example 85 Let
0 1 0 T 1
1 3 u ✓ ◆
2 3 1
M = @3 5A =: @ v T A and N = =: a b c
T 0 1 0
2 6 w

where
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 3 2 2 3 1
u= , v= , w= , a= , b= , c= .
3 5 6 0 1 0

Then 0 1 0 1
u·a u·b u·c 2 6 1
M N = @ v · a v · b v · c A = @6 14 3A .
w·a w·b w·c 4 12 2

137
138 Matrices

This fact has an obvious yet important consequence:


Theorem 7.3.1. Let M be a matrix and x a column vector. If

Mx = 0

then the vector x is orthogonal to the rows of M .

Remark Remember that the set of all vectors that can be obtained by adding up
scalar multiples of the columns of a matrix is called its column space . Similarly the
row space is the set of all row vectors obtained by adding up multiples of the rows
of a matrix. The above theorem says that if M x = 0, then the vector x is orthogonal
to every vector in the row space of M .

We know that r ⇥ k matrices can be used to represent linear transforma-


tions Rk ! Rr via
Xk
i
(M V ) = mij v j ,
j=1

which is the same rule used when we multiply an r ⇥ k matrix by a k ⇥ 1


vector to produce an r ⇥ 1 vector.
Likewise, we can use a matrix N = (nij ) to define a linear transformation
of a vector space of matrices. For example
N
L : Mks ! Mkr ,
s
X
L(M ) = (lki ) where lki = nij mjk .
j=1

This is the same as the rule we use to multiply matrices. In other words,
L(M ) = N M is a linear transformation.

Matrix Terminology Let M = (mij ) be a matrix. The entries mii are called
diagonal, and the set {m11 , m22 , . . .} is called the diagonal of the matrix.
Any r ⇥ r matrix is called a square matrix. A square matrix that is
zero for all non-diagonal entries is called a diagonal matrix. An example
of a square diagonal matrix is
0 1
2 0 0
@0 3 0A .
0 0 0

138
7.3 Properties of Matrices 139

The r ⇥ r diagonal matrix with all diagonal entries equal to 1 is called


the identity matrix, Ir , or just I. An identity matrix looks like
0 1
1 0 0 ··· 0
B0 1 0 · · · 0 C
B C
B C
I = B0 0 1 · · · 0 C .
B .. .. .. . . .. C
@. . . . .A
0 0 0 ··· 1

The identity matrix is special because

Ir M = M I k = M

for all M of size r ⇥ k.

Definition The transpose of an r ⇥ k matrix M = (mij ) is the k ⇥ r matrix

M T = (m̂ij )

with entries that satisfy m̂ij = mji .

A matrix M is symmetric if M = M T .

Example 86 0 1
✓ ◆T 2 1
2 5 6
= @5 3A ,
1 3 4
6 4
and ✓ ◆✓ ◆T ✓ ◆
2 5 6 2 5 6 65 43
= ,
1 3 4 1 3 4 43 26
is symmetric.

Reading homework: problem 3

Observations

• Only square matrices can be symmetric.

• The transpose of a column vector is a row vector, and vice-versa.

139
140 Matrices

• Taking the transpose of a matrix twice does nothing. i.e., (M T )T = M .

Theorem 7.3.2 (Transpose and Multiplication). Let M, N be matrices such


that M N makes sense. Then

(M N )T = N T M T .

The proof of this theorem is left to Review Question 2.

7.3.1 Associativity and Non-Commutativity


Many properties of matrices following from the same property for real num-
bers. Here is an example.
Example 87 Associativity of matrix multiplication. We know for real numbers x, y
and z that
x(yz) = (xy)z ,
i.e., the order of multiplications does not matter. The same property holds for matrix
multiplication, let us show why. Suppose M = mij , N = njk and R = rlk
are, respectively, m ⇥ n, n ⇥ r and r ⇥ t matrices. Then from the rule for matrix
multiplication we have
⇣X n ⌘ ⇣X r ⌘
MN = mij njk and N R = njk rlk .
j=1 k=1

So first we compute
⇣Xr hXn i ⌘ ⇣X n h
r X i ⌘ ⇣X r X
n ⌘
(M N )R = mij njk rlk = mij njk rlk = mij njk rlk .
k=1 j=1 k=1 j=1 k=1 j=1

In the first step we just wrote out the definition for matrix multiplication, in the second
step we moved summation symbol outside the bracket (this is just the distributive
property x(y +z) = xy +xz for numbers) and in the last step we used the associativity
property for real numbers to remove the square brackets. Exactly the same reasoning
shows that
⇣X n hXr i⌘ ⇣ X r X n h i⌘ ⇣ X r X n ⌘
M (N R) = mij njk rlk = mij njk rlk = mij njk rlk .
j=1 k=1 k=1 j=1 k=1 j=1

This is the same as above so we are done. 1

1
As a fun remark, note that Einstein would simply have written
(M N )R = (mij njk )rlk = mij njk rlk = mij (njk rlk ) = M (N R).

140
7.3 Properties of Matrices 141

Sometimes matrices do not share the properties of regular numbers. In


particular, for generic n ⇥ n square matrices M and N ,

M N 6= N M .

Do Matrices Commute?

Example 88 (Matrix multiplication does not commute.)


✓ ◆✓ ◆ ✓ ◆
1 1 1 0 2 1
=
0 1 1 1 1 1

while, on the other hand,


✓ ◆✓ ◆ ✓ ◆
1 0 1 1 1 1
= .
1 1 0 1 1 2

Since n ⇥ n matrices are linear transformations Rn ! Rn , we can see that


the order of successive linear transformations matters.
Here is an example of matrices acting on objects in three dimensions that
also shows matrices not commuting.
Example 89 In Review Problem 3, you learned that the matrix
✓ ◆
cos ✓ sin ✓
M= ,
sin ✓ cos ✓

rotates vectors in the plane by an angle ✓. We can generalize this, using block matrices,
to three dimensions. In fact the following matrices built from a 2 ⇥ 2 rotation matrix,
a 1 ⇥ 1 identity matrix and zeroes everywhere else
0 1 0 1
cos ✓ sin ✓ 0 1 0 0
M = @ sin ✓ cos ✓ 0A and N = @0 cos ✓ sin ✓A ,
0 0 1 0 sin ✓ cos ✓

perform rotations by an angle ✓ in the xy and yz planes, respectively. Because, they


rotate single vectors, you can also use them to rotate objects built from a collection of
vectors like pretty colored blocks! Here is a picture of M and then N acting on such
a block, compared with the case of N followed by M . The special case of ✓ = 90 is
shown.

141
142 Matrices

Notice how the endproducts of M N and N M are di↵erent, so M N 6= N M here.

7.3.2 Block Matrices


It is often convenient to partition a matrix M into smaller matrices called
blocks. For example
0 1
1 2 3 1 ✓ ◆
B4 5 6 0C A B
M =B C
@7 8 9 1A = C D
0 1 2 0
0 1 0 1
1 2 3 1
Where A = 4 5 6 , B = 0A, C = 0 1 2 , D = (0).
@ A @
7 8 9 1

• The
✓ blocks
◆ of a block matrix✓ must ◆
fit together to form a rectangle. So
B A C B
makes sense, but does not.
D C D A

Reading homework: problem 4

• There are many ways to cut up an n ⇥ n matrix into blocks. Often


context or the entries of the matrix will suggest a useful way to divide
the matrix into blocks. For example, if there are large blocks of zeros
in a matrix, or blocks that look like an identity matrix, it can be useful
to partition the matrix accordingly.

142
7.3 Properties of Matrices 143

• Matrix operations on block matrices can be carried out by treating the


blocks as matrix entries. In the example above,
✓ ◆✓ ◆
2 A B A B
M =
C D C D
✓ 2 ◆
A + BC AB + BD
=
CA + DC CB + D2

Computing the individual blocks, we get:


0 1
30 37 44
A2 + BC = @ 66 81 96A
102 127 152
0 1
4
AB + BD = @ 10A
16
CA + DC = 4 10 16
CB + D2 = (2)

Assembling these pieces into a block matrix gives:


0 1
30 37 44 4
B 66 81 96 10C
B C
@102 127 152 16A
4 10 16 2

This is exactly M 2 .

7.3.3 The Algebra of Square Matrices


Not every pair of matrices can be multiplied. When multiplying two matrices,
the number of rows in the left matrix must equal the number of columns in
the right. For an r ⇥ k matrix M and an s ⇥ l matrix N , then we must
have k = s.
This is not a problem for square matrices of the same size, though.
Two n ⇥ n matrices can be multiplied in either order. For a single ma-
trix M 2 Mnn , we can form M 2 = M M , M 3 = M M M , and so on. It is useful

143
144 Matrices

to define
M0 = I ,
the identity matrix, just like x0 = 1 for numbers.
As a result, any polynomial can be have square matrices in it’s domain.

Example 90 Let f (x) = x 2x2 + 3x3 and


✓ ◆
1 t
M= .
0 1

Then ✓ ◆ ✓ ◆
2 1 2t 3 1 3t
M = , M = , ...
0 1 0 1
and so
✓ ◆ ✓ ◆ ✓ ◆
1 t 1 2t 1 3t
f (M ) = 2 +3
0 1 0 1 0 1
✓ ◆
2 6t
= .
0 2

Suppose f (x) is any function defined by a convergent Taylor Series:


1 00
f (x) = f (0) + f 0 (0)x + f (0)x2 + · · · .
2!
Then we can define the matrix function by just plugging in M :
1 00
f (M ) = f (0) + f 0 (0)M + f (0)M 2 + · · · .
2!
There are additional techniques to determine the convergence of Taylor Series
of matrices, based on the fact that the convergence problem is simple for
diagonal matrices. It also turns out that the matrix exponential
1 1
exp(M ) = I + M + M 2 + M 3 + · · · ,
2 3!
always converges.

Matrix Exponential Example

144
7.3 Properties of Matrices 145

7.3.4 Trace
A large matrix contains a great deal of information, some of which often re-
flects the fact that you have not set up your problem efficiently. For example,
a clever choice of basis can often make the matrix of a linear transformation
very simple. Therefore, finding ways to extract the essential information of
a matrix is useful. Here we need to assume that n < 1 otherwise there are
subtleties with convergence that we’d have to address.

Definition The trace of a square matrix M = (mij ) is the sum of its diag-
onal entries: n
X
tr M = mii .
i=1

Example 91 0 1
2 7 6
tr @9 5 1A = 2 + 5 + 8 = 15 .
4 3 8

While matrix multiplication does not commute, the trace of a product of


matrices does not depend on the order of multiplication:

X
tr(M N ) = tr( Mli Njl )
l
XX
= Mli Nil
i l
XX
= Nil Mli
l i
X
= tr( Nil Mli )
i
= tr(N M ).

Proof Explanation
Thus we have a Theorem:
Theorem 7.3.3. For any square matrices M and N
tr(M N ) = tr(N M ).

145
146 Matrices

Example 92 Continuing from the previous example,


✓ ◆ ✓ ◆
1 1 1 0
M= ,N = .
0 1 1 1
so ✓ ◆ ✓ ◆
2 1 1 1
MN = 6= N M = .
1 1 1 2
However, tr(M N ) = 2 + 1 = 3 = 1 + 2 = tr(N M ).

Another useful property of the trace is that:

tr M = tr M T

This is true because the trace only uses the diagonal entries, which are fixed
by the transpose. For example,
✓ ◆ ✓ ◆ ✓ ◆T
1 1 1 2 1 2
tr = 4 = tr = tr .
2 3 1 3 1 3
Finally, trace is a linear transformation from matrices to the real numbers.
This is easy to check.

7.4 Review Problems


Webwork: Reading Problems 2 ,3 ,4

1. Compute the following matrix products


0 1
1
0 10 1
1 2 1 2 4 1 B2 C
3 3 B C
B CB 5 2C B C
@4 5 2A @ 2 3 3A
, 1 2 3 4 5 B3 C ,
B C
7 8 2 1 2 1 @4 A
5
0 1
1
0 10 10 1
B2 C 1 2 1 2 4 1
1 2 1
B C 3 3
B C B CB 5 2C B C
B3 C 1 2 3 4 5 , @4 5 2 A @ 2 3 3A @
4 5 2A ,
B C
@4 A 7 8 2 1 2 1 7 8 2
5

146
7.4 Review Problems 147

0 10 1
2 1 2 1 2 1 2 1 2 1
0 10 1 B0 C B
2 1 1 x B 2 1 2 1 C B0 1 2 1 2CC
B C B CB C
x y z @1 2 1A @ y A , B0 1 2 1 2 C B0 2 1 2 1C ,
B CB C
1 1 2 z @0 2 1 2 1 A @0 1 2 1 2A
0 0 0 0 2 0 0 0 0 1
0 4 1
10 2 2
10 1
2 3 3
4 3 3
1 2 1
B 5 2C B 5 2C B C
@ 2 3 3A @
6 3 3A @
4 5 2A .
16 10
1 2 1 12 3 3
7 8 2

2. Let’s prove the theorem (M N )T = N T M T .


Note: the following is a common technique for proving matrix identities.

(a) Let M = (mij ) and let N = (nij ). Write out a few of the entries of
each matrix in the form given at the beginning of section 7.3.
(b) Multiply out M N and write out a few of its entries in the same
form as in part (a). In terms of the entries of M and the entries
of N , what is the entry in row i and column j of M N ?
(c) Take the transpose (M N )T and write out a few of its entries in
the same form as in part (a). In terms of the entries of M and the
entries of N , what is the entry in row i and column j of (M N )T ?
(d) Take the transposes N T and M T and write out a few of their
entries in the same form as in part (a).
(e) Multiply out N T M T and write out a few of its entries in the same
form as in part a. In terms of the entries of M and the entries of
N , what is the entry in row i and column j of N T M T ?
(f) Show that the answers you got in parts (c) and (e) are the same.
✓ ◆
1 2 0
3. (a) Let A = . Find AAT and AT A and their traces.
3 1 4

(b) Let M be any m ⇥ n matrix. Show that M T M and M M T are


symmetric. (Hint: use the result of the previous problem.) What
are their sizes? What is the relationship between their traces?

147
148 Matrices

0 1 0 1
x1 y1
B .. C B .. C
4. Let x = @ . A and y = @ . A be column vectors. Show that the
xn yn
T
dot product x y = x I y.

Hint

5. Above, we showed that left multiplication by an r ⇥ s matrix N was


N
a linear transformation Mks ! Mkr . Show that right multiplication
R
by a k ⇥ m matrix R is a linear transformation Mks ! Mm s
. In other
words, show that right matrix multiplication obeys linearity.

Hint

6. Let the V be a vector space where B = (v1 , v2 ) is an ordered basis.


Suppose
linear
L:V !V
and
L(v1 ) = v1 + v2 , L(v2 ) = 2v1 + v2 .
Compute the matrix of L in the basis B and then compute the trace of
this matrix. Suppose that ad bc 6= 0 and consider now the new basis

B 0 = (av1 + bv2 , cv1 + dv2 ) .

Compute the matrix of L in the basis B 0 . Compute the trace of this


matrix. What do you find? What do you conclude about the trace
of a matrix? Does it make sense to talk about the “trace of a linear
transformation” without reference to any bases?

7. Explain what happens to a matrix when:

(a) You multiply it on the left by a diagonal matrix.


(b) You multiply it on the right by a diagonal matrix.

148
7.4 Review Problems 149

Give a few simple examples before you start explaining.

8. Compute exp(A) for the following matrices:


✓ ◆
0
• A=
0
✓ ◆
1
• A=
0 1
✓ ◆
0
• A=
0 0

Hint

0 1
1 0 0 0 0 0 0 1
B0 1 0 0 0 0 1 0C
B C
B0 0 1 0 0 1 0 0C
B C
B0 0 0 1 1 0 0 0C
9. Let M = B B0
C. Divide M into named blocks,
B 0 0 0 2 1 0 0CC
B0 0 0 0 0 2 0 0C
B C
@0 0 0 0 0 0 3 1A
0 0 0 0 0 0 0 3
with one block the 4 ⇥ 4 identity matrix, and then multiply blocks to
compute M 2 .

10. A matrix A is called anti-symmetric (or skew-symmetric) if AT = A.


Show that for every n ⇥ n matrix M , we can write M = A + S where
A is an anti-symmetric matrix and S is a symmetric matrix.
Hint: What kind of matrix is M + M T ? How about M MT ?

11. An example of an operation which is not associative is the cross prod-


uct.

(a) Give a simple example of three vectors from 3-space u, v, w such


that u ⇥ (v ⇥ w) 6= (u ⇥ v) ⇥ w.

149
150 Matrices

(b) We saw in Chapter 1 that the operator B = u⇥ (cross product


with a vector) is a linear operator. It can therefore be written as
a matrix (given an ordered basis such as the standard basis). How
is it that composing such linear operators is non-associative even
though matrix multiplication is associative?

7.5 Inverse Matrix


Definition A square matrix M is invertible (or nonsingular) if there
exists a matrix M 1 such that
1 1
M M = I = MM .

If M has no inverse, we say M is singular or non-invertible.

Inverse of a 2 ⇥ 2 Matrix Let M and N be the matrices:


✓ ◆ ✓ ◆
a b d b
M= , N=
c d c a

Multiplying these matrices gives:


✓ ◆
ad bc 0
MN = = (ad bc)I .
0 ad bc
✓ ◆
1 1 d b
Then M = ad bc , so long as ad bc 6= 0.
c a

7.5.1 Three Properties of the Inverse


1. If A is a square matrix and B is the inverse of A, then A is the inverse
of B, since AB = I = BA. So we have the identity

(A 1 ) 1
= A.

2. Notice that B 1 A 1 AB = B 1 IB = I = ABB 1 A 1


so

1
(AB) = B 1A 1

150
7.5 Inverse Matrix 151

Figure 7.1: The formula for the inverse of a 2⇥2 matrix is worth memorizing!

Thus, much like the transpose, taking the inverse of a product reverses
the order of the product.

3. Finally, recall that (AB)T = B T AT . Since I T = I, then (A 1 A)T =


AT (A 1 )T = I. Similarly, (AA 1 )T = (A 1 )T AT = I. Then:

(A 1 )T = (AT ) 1

2 ⇥ 2 Example

7.5.2 Finding Inverses (Redux)


Gaussian elimination can be used to find inverse matrices. This concept is
covered in chapter 2, section 2.3.2, but is presented here again as review in
more sophisticated terms.
Suppose M is a square invertible matrix and M X = V is a linear system.
The solution must be unique because it can be found by multiplying the
equation on both sides by M 1 yielding X = M 1 V . Thus, the reduced row
echelon form of the linear system has an identity matrix on the left:
1
M V ⇠ I M V
1
Solving the linear system M X = V then tells us what M V is.

151
152 Matrices

To solve many linear systems with the same matrix at once,

M X = V1 , M X = V2

we can consider augmented matrices with many columns on the right and
then apply Gaussian row reduction to the left side of the matrix. Once the
identity matrix is on the left side of the augmented matrix, then the solution
of each of the individual linear systems is on the right.

1 1
M V1 V2 ⇠ I M V1 M V2

To compute M 1 , we would like M 1 , rather than M 1 V to appear on


the right side of our augmented matrix. This is achieved by solving the
collection of systems M X = ek , where ek is the column vector of zeroes with
a 1 in the kth entry. I.e., the n ⇥ n identity matrix can be viewed as a bunch
of column vectors In = (e1 e2 · · · en ). So, putting the ek ’s together into an
identity matrix, we get:

1 1
M I ⇠ I M I = I M

0 1 1
1 2 3
Example 93 Find @ 2 1 0A .
4 2 5
We start by writing the augmented matrix, then apply row reduction to the left side.

0 1 0 1
1 2 3 1 0 0 1 2 3 1 0 0
B 2 1 C B
0 0 1 0A ⇠ @0 5 6 2 1 0C
@ A
4 2 5 0 0 1 0 6 7 4 0 1
0 3 1 2 1
1 0 5 4 5 0
B0 1 6 2 1
0C
⇠ @ 5 5 5 A
1 4 6
0 0 5 5 5 1
0 1
1 0 0 5 4 3
B0 1 0 10 7 6C
⇠ @ A
0 0 1 8 6 5

152
7.5 Inverse Matrix 153

At this point, we know M 1 assuming we didn’t goof up. However, row reduction is a
lengthy and involved process with lots of room for arithmetic errors, so we should check
our answer, by confirming that M M 1 = I (or if you prefer M 1 M = I):
0 10 1 0 1
1 2 3 5 4 3 1 0 0
MM 1 = @ 2 1 0A @ 10 7 6A = @0 1 0A
4 2 5 8 6 5 0 0 1
The product of the two matrices is indeed the identity matrix, so we’re done.

Reading homework: problem 5

7.5.3 Linear Systems and Inverses


If M 1 exists and is known, then we can immediately solve linear systems
associated to M .
Example 94 Consider the linear system:

x +2y 3z = 1
2x + y =2
4x
2y +5z = 0
0 1
1
The associated matrix equation is M X = 2A , where M is the same as in the
@
0
previous section, so the system above is equivalent to the matrix equation

0 1 0 1 10 1 0 10 1 0 1
x 1 2 3 1 5 4 3 1 3
@yA = @ 2 1 0A @2A = @ 10 7 6A @2A = @ 4A .
z 4 2 5 0 8 6 5 0 4
0 1 0 1
x 3
That is, the system is equivalent to the equation @ y A = @ 4A, and it is easy to
z 4
see what the solution(s) to this equation are.
1
In summary, when M exists

1
Mx = v , x = M v.

Reading homework: problem 5

153
154 Matrices

7.5.4 Homogeneous Systems


Theorem 7.5.1. A square matrix M is invertible if and only if the homoge-
neous system
Mx = 0
has no non-zero solutions.
Proof. First, suppose that M 1 exists. Then M x = 0 ) x = M 1 0 = 0.
Thus, if M is invertible, then M x = 0 has no non-zero solutions.
On the other hand, M x = 0 always has the solution x = 0. If no other
solutions exist, then M can be put into reduced row echelon form with every
variable a pivot. In this case, M 1 can be computed using the process in the
previous section.

7.5.5 Bit Matrices


In computer science, information is recorded using binary strings of data.
For example, the following string contains an English word:
011011000110100101101110011001010110000101110010
A bit is the basic unit of information, keeping track of a single one or zero.
Computers can add and multiply individual bits very quickly.
In chapter 5, section 5.2 it is explained how to formulate vector spaces over
fields other than real numbers. In particular, al of the properties of a vector
space make sense with numbers Z2 = {0, 1} with addition and multiplication
given by the following tables.
+ 0 1 ⇥ 0 1
0 0 1 0 0 0
1 1 0 1 0 1

154
7.6 Review Problems 155

Notice that 1 = 1, since 1 + 1 = 0. Therefore, we can apply all of the linear


algebra we have learned thus far to matrices with Z2 entries. A matrix with
entries in Z2 is sometimes called a bit matrix .
0 1
1 0 1
Example 95 @0 1 1A is an invertible matrix over Z2 ;
1 1 1
0 1 1 0 1
1 0 1 0 1 1
@0 1 1A = @1 0 1A .
1 1 1 1 1 1
This can be easily verified by multiplying:
0 10 1 0 1
1 0 1 0 1 1 1 0 0
@0 1 1A @1 0 1A = @0 1 0A
1 1 1 1 1 1 0 0 1
Application: Cryptography A very simple way to hide information is to use a sub-
stitution cipher, in which the alphabet is permuted and each letter in a message is
systematically exchanged for another. For example, the ROT-13 cypher just exchanges
a letter with the letter thirteen places before or after it in the alphabet. For example,
HELLO becomes URYYB. Applying the algorithm again decodes the message, turning
URYYB back into HELLO. Substitution ciphers are easy to break, but the basic idea
can be extended to create cryptographic systems that are practically uncrackable. For
example, a one-time pad is a system that uses a di↵erent substitution for each letter
in the message. So long as a particular set of substitutions is not used on more than
one message, the one-time pad is unbreakable.
English characters are often stored in computers in the ASCII format. In ASCII,
a single character is represented by a string of eight bits, which we can consider as a
vector in Z82 (which is like vectors in R8 , where the entries are zeros and ones). One
way to create a substitution cipher, then, is to choose an 8 ⇥ 8 invertible bit matrix
M , and multiply each letter of the message by M . Then to decode the message, each
string of eight characters would be multiplied by M 1 .
To make the message a bit tougher to decode, one could consider pairs (or longer
sequences) of letters as a single vector in Z16 2 (or a higher-dimensional space), and
then use an appropriately-sized invertible matrix. For more on cryptography, see “The
Code Book,” by Simon Singh (1999, Doubleday).

7.6 Review Problems


Webwork: Reading Problems 6 ,7

155
156 Matrices

1. Find formulas for the inverses of the following matrices, when they are
not singular:
0 1
1 a b
(a) @0 1 cA
0 0 1
0 1
a b c
(b) @0 d eA
0 0 f
When are these matrices singular?
2. Write down all 2⇥2 bit matrices and decide which of them are singular.
For those which are not singular, pair them with their inverse.
3. Let M be a square matrix. Explain why the following statements are
equivalent:
(a) M X = V has a unique solution for every column vector V .
(b) M is non-singular.
Hint: In general for problems like this, think about the key words:
First, suppose that there is some column vector V such that the equa-
tion M X = V has two distinct solutions. Show that M must be sin-
gular; that is, show that M can have no inverse.
Next, suppose that there is some column vector V such that the equa-
tion M X = V has no solutions. Show that M must be singular.
Finally, suppose that M is non-singular. Show that no matter what
the column vector V is, there is a unique solution to M X = V.

Hint

4. Left and Right Inverses: So far we have only talked about inverses of
square matrices. This problem will explore the notion of a left and
right inverse for a matrix that is not square. Let
✓ ◆
0 1 1
A=
1 1 0

156
7.6 Review Problems 157

(a) Compute:
i. AAT ,
1
ii. AAT ,
T 1
iii. B := A AAT
(b) Show that the matrix B above is a right inverse for A, i.e., verify
that
AB = I .

(c) Is BA defined? (Why or why not?)


(d) Let A be an n ⇥ m matrix with n > m. Suggest a formula for a
left inverse C such that
CA = I
Hint: you may assume that AT A has an inverse.
(e) Test your proposal for a left inverse for the simple example
✓ ◆
1
A= ,
2

(f) True or false: Left and right inverses are unique. If false give a
counterexample.

Hint

5. Show that if the range (remember that the range of a function is the
set of all its outputs, not the codomain) of a 3 ⇥ 3 matrix M (viewed
as a function R3 ! R3 ) is a plane then one of the columns is a sum of
multiples of the other columns. Show that this relationship is preserved
under EROs. Show, further, that the solutions to M x = 0 describe this
relationship between the columns.
1
6. If M and N are square matrices of the same size such that M exists
and N 1 does not exist, does (M N ) 1 exist?

7. If M is a square matrix which is not invertible, is eM invertible?

157
158 Matrices

8. Elementary Column Operations (ECOs) can be defined in the same 3


types as EROs. Describe the 3 kinds of ECOs. Show that if maximal
elimination using ECOs is performed on a square matrix and a column
of zeros is obtained then that matrix is not invertible.

158
7.7 LU Redux 159

7.7 LU Redux
Certain matrices are easier to work with than others. In this section, we
will see how to write any square2 matrix M as the product of two simpler
matrices. We will write
M = LU ,
where:
• L is lower triangular . This means that all entries above the main
diagonal are zero. In notation, L = (lji ) with lji = 0 for all j > i.
0 1
l11 0 0 · · ·
Bl 2 l 2 0 · · · C
B1 2 C
L = Bl 3 l 3 l 3 · · · C
@1 2 3 A
.. .. .. . .
. . . .

• U is upper triangular . This means that all entries below the main
diagonal are zero. In notation, U = (uij ) with uij = 0 for all j < i.
0 1
u11 u12 u13 · · ·
B 0 u2 u2 · · · C
B 2 3 C
U = B 0 0 u3 · · · C
@ 3 A
.. .. .. . .
. . . .

M = LU is called an LU decomposition of M .
This is a useful trick for computational reasons; it is much easier to com-
pute the inverse of an upper or lower triangular matrix than general matrices.
Since inverses are useful for solving linear systems, this makes solving any lin-
ear system associated to the matrix much faster as well. The determinant—a
very important quantity associated with any square matrix—is very easy to
compute for triangular matrices.

Example 96 Linear systems associated to upper triangular matrices are very easy to
solve by back substitution.
✓ ◆ ✓ ◆
a b 1 e 1 be
) y= , x= 1
0 c e c a c
2
The case where M is not square is dealt with at the end of the section.

159
160 Matrices

0 1 8 9 8
1 0 0 d < x=d = < x=d
@a 1 0 eA ) y=e ax ) y=e ad .
: ; :
b c 1 f z=f bx cy z=f bd c(e ad)
For lower triangular matrices, forward substitution gives a quick solution; for upper
triangular matrices, back substitution gives the solution.

7.7.1 Using LU Decomposition to Solve Linear Systems


Suppose we have M = LU and want to solve the system
M X = LU X = V.
0 1
u
• Step 1: Set W = v A = U X.
@
w
• Step 2: Solve the system LW = V . This should be simple by forward
substitution since L is lower triangular. Suppose the solution to LW =
V is W0 .
• Step 3: Now solve the system U X = W0 . This should be easy by
backward substitution, since U is upper triangular. The solution to
this system is the solution to the original system.
We can think of this as using the matrix L to perform row operations on the
matrix U in order to solve the system; this idea also appears in the study of
determinants.

Reading homework: problem 7

Example 97 Consider the linear system:


6x + 18y + 3z = 3
2x + 12y + z = 19
4x + 15y + 3z = 0
An LU decomposition for the associated matrix M is
0 1 0 10 1
6 18 3 3 0 0 2 6 1
@2 A
12 1 = 1 6 0@ A @ 0 1 0A .
4 15 3 2 3 1 0 0 1

160
7.7 LU Redux 161

0 1
u
• Step 1: Set W = v A = U X.
@
w

• Step 2: Solve the system LW = V :

0 10 1 0 1
3 0 0 u 3
@1 6 0A @ v A = @19A
2 3 1 w 0

By substitution, we get u = 1, v = 3, and w = 11. Then


0 1
1
W 0 = @ 3A
11

• Step 3: Solve the system U X = W0 .


0 10 1 0 1
2 6 1 x 1
@0 1 0A @ y A = @ 3A
0 0 1 z 11

Back substitution gives z = 11, y = 3, and x = 3.


0 1
3
Then X = @ 3A, and we’re done.
11

Using an LU decomposition

161
162 Matrices

7.7.2 Finding an LU Decomposition.


In chapter 2, section 2.3.4, Gaussian elimination was used to find LU matrix
decompositions. These ideas are presented here again as review.
For any given matrix, there are actually many di↵erent LU decomposi-
tions. However, there is a unique LU decomposition in which the L matrix
has ones on the diagonal. In that case L is called a lower unit triangular
matrix .
To find the LU decomposition, we’ll create two sequences of matrices
L1 , L2 , . . . and U1 , U2 , . . . such that at each step, Li Ui = M . Each of the Li
will be lower triangular, but only the last Ui will be upper triangular. The
main trick for this calculation is captured by the following example:
Example 98 (An Elementary Matrix)
Consider ✓ ◆ ✓ ◆
1 0 a b c ···
E= , M= .
1 d e f ···
Lets compute EM
✓ ◆
a b c ···
EM = .
d + a e + b f + c ···
Something neat happened here: multiplying M by E performed the row operation
R2 ! R2 + R1 on M . Another interesting fact:
✓ ◆
1 1 0
E :=
1
obeys (check this yourself...)
1
E E = 1.
Hence M = E 1 EM or, writing this out
✓ ◆ ✓ ◆✓ ◆
a b c ··· 1 0 a b c ···
= .
d e f ··· 1 d + a e + b f + c ···
Here the matrix on the left is lower triangular, while the matrix on the right has had
a row operation performed on it.

We would like to use the first row of M to zero out the first entry of every
row below it. For our running example,
0 1
6 18 3
M = @2 12 1A ,
4 15 3

162
7.7 LU Redux 163

so we would like to perform the row operations


1 2
R2 ! R2 R1 and R3 ! R3 R1 .
3 3
If we perform these row operations on M to produce
0 1
6 18 3
U1 = 0@ 6 0A ,
0 3 1
we need to multiply this on the left by a lower triangular matrix L1 so that
the product L1 U1 = M still. The above example shows how to do this: Set
L1 to be the lower triangular matrix whose first column is filled with minus
the constants used to zero out the first column of M . Then
0 1
1 0 0
B C
L1 = @ 13 1 0A .
2
3
0 1

By construction L1 U1 = M , but you should compute this yourself as a double


check.
Now repeat the process by zeroing the second column of U1 below the
diagonal using the second row of U1 using the row operation R3 ! R3 12 R2
to produce 0 1
6 18 3
U2 = @0 6 0A .
0 0 1
The matrix that undoes this row operation is obtained in the same way we
found L1 above and is: 0 1
1 0 0
@0 1 0A .
0 12 1
Thus our answer for L2 is the product of this matrix with L1 , namely
0 10 1 0 1
1 0 0 1 0 0 1 0 0
B C B C
L2 = @ 13 1 0A @0 1 0A = @ 13 1 0A .
2
0 1 0 12 1 2 1
1
3 3 2

Notice that it is lower triangular because

163
164 Matrices

The product of lower triangular matrices is always lower triangular!

Moreover it is obtained by recording minus the constants used for all our
row operations in the appropriate columns (this always works this way).
Moreover, U2 is upper triangular and M = L2 U2 , we are done! Putting this
all together we have
0 1 0 10 1
6 18 3 1 0 0 6 18 3
B CB C
M = @2 12 1A = @ 13 1 0A @0 6 0A .
4 15 3 2 1
1 0 0 1
3 2

If the matrix you’re working with has more than three rows, just continue
this process by zeroing out the next column below the diagonal, and repeat
until there’s nothing left to do.

Another LU decomposition example

The fractions in the L matrix are admittedly ugly. For two matrices
LU , we can multiply one entire column of L by a constant and divide the
corresponding row of U by the same constant without changing the product
of the two matrices. Then:

0 1 0 1
1 0 0 6 18 3
B C B C
LU = @ 13 1 0A I @0 6 0A
2 1
1 0 0 1
03 2 1 0 1 01 10 1
1 0 0 3 0 0 3
0 0 6 18 3
B C B C
= @ 13 1 0A @0 6 0A @ 0 16 0A @0 6 0A
2 1
1 0 0 1 0 0 1 0 0 1
0 2 10
3
1
3 0 0 2 6 1
= @ 1 6 0 A @ 0 1 0A .
2 3 1 0 0 1

The resulting matrix looks nicer, but isn’t in standard (lower unit triangular
matrix) form.

164
7.7 LU Redux 165

Reading homework: problem 7

For matrices that are not square, LU decomposition still makes sense.
Given an m ⇥ n matrix M , for example we could write M = LU with L
a square lower unit triangular matrix, and U a rectangular matrix. Then
L will be an m ⇥ m matrix, and U will be an m ⇥ n matrix (of the same
shape as M ). From here, the process is exactly the same as for a square
matrix. We create a sequence of matrices Li and Ui that is eventually the
LU decomposition. Again, we start with L0 = I and U0 = M .
✓ ◆
2 1 3
Example 99 Let’s find the LU decomposition of M = U0 = . Since M
4 4 1
is a 2 ⇥ 3 matrix, our decomposition
✓ ◆ consist of a 2 ⇥ 2 matrix and a 2 ⇥ 3 matrix.
will
1 0
Then we start with L0 = I2 = .
0 1
The next step is to zero-out the first column of M below the diagonal. There is
only one row to cancel, then, and it can be removed by subtracting 2 times the first
row of M to the second row of M . Then:
✓ ◆ ✓ ◆
1 0 2 1 3
L1 = , U1 =
2 1 0 2 5
Since U1 is upper triangular, we’re done. With a larger matrix, we would just continue
the process.

7.7.3 Block LDU Decomposition


Let M be a square block matrix with square blocks X, Y, Z, W such that X 1
exists. Then M can be decomposed as a block LDU decomposition, where
D is block diagonal, as follows:
✓ ◆
X Y
M=
Z W

Then:

✓ ◆✓ ◆✓ ◆
I 0 X 0 I X 1Y
M= 1 1
.
ZX I 0 W ZX Y 0 I

165
166 Matrices

This can be checked explicitly simply by block-multiplying these three ma-


trices.

Block LDU Explanation

Example 100 For a 2 ⇥ 2 matrix, we can regard each entry as a 1 ⇥ 1 block.


✓ ◆ ✓ ◆✓ ◆✓ ◆
1 2 1 0 1 0 1 2
=
3 4 3 1 0 2 0 1

By multiplying the diagonal matrix by the upper triangular matrix, we get the standard
LU decomposition of the matrix.

You are now ready to attempt the first sample midterm.

7.8 Review Problems


Reading Problems 7 ,8
Webwork:
LU Decomposition 14

1. Consider the linear system:

x1 = v1
l12 x1 +x2 = v2
.. ..
. .
l1 x +l2 x + · · · + x = v n
n 1 n 2 n

(i) Find x1 .
(ii) Find x2 .
(iii) Find x3 .

166
7.8 Review Problems 167

(k) Try to find a formula or recursive method for finding xk . Don’t


worry about simplifying your answer.
✓ ◆
X Y
2. Let M = be a square n ⇥ n block matrix with W invertible.
Z W
i. If W has r rows, what size are X, Y , and Z?
ii. Find a U DL decomposition for M . In other words, fill in the stars
in the following equation:
✓ ◆ ✓ ◆✓ ◆✓ ◆
X Y I ⇤ ⇤ 0 I 0
=
Z W 0 I 0 ⇤ ⇤ I

3. Show that if M is a square matrix which is not invertible then either


the matrix matrix U or the matrix L in the LU-decomposition M = LU
has a zero on it’s diagonal.

4. Describe what upper and lower triangular matrices do to the unit hy-
percube in their domain.

5. In chapter 3 we saw that, since in general row exchange matrices are


necessary to achieve upper triangular form, LDP U factorization is the
complete decomposition of an invertible matrix into EROs of various
kinds. Suggest a procedure for using LDP U decompositions to solve
linear systems that generalizes the procedure above.

6. Is there a reason to prefer LU decomposition to U L decomposition, or


is the order just a convention?

7. If M is invertible then what are the LU, LDU, and LDP U decompo-
sitions of M T in terms of the decompositions for M ? Can you do the
same for M 1 ?

8. Argue that if M is symmetric then L = U T in the LDU decomposition


of M .

167
168 Matrices

168
Determinants
8
Given a square matrix, is there an easy way to know when it is invertible?
Answering this fundamental question is the goal of this chapter.

8.1 The Determinant Formula


The determinant boils down a square matrix to a a single number. That
number determines whether the square matrix is invertible or not. Lets see
how this works for small matrices first.

8.1.1 Simple Examples


For small cases, we already know when a matrix is invertible. If M is a 1 ⇥ 1
matrix, then M = (m) ) M 1 = (1/m). Then M is invertible if and only if
m 6= 0.
For M a 2 ⇥ 2 matrix, chapter 7 section 7.5 shows that if
✓ 1 ◆
m1 m12
M= ,
m21 m22

then ✓ ◆
1 1 m22 m12
M = .
m11 m22 m12 m21 m21 m11
Thus M is invertible if and only if

169
170 Determinants

Figure 8.1: Memorize the determinant formula for a 2⇥2 matrix!

m11 m22 m12 m21 6= 0 .


For 2 ⇥ 2 matrices, this quantity is called the determinant of M .
✓ 1 ◆
m1 m12
det M = det 2 2
= m11 m22 m12 m21 .
m1 m2

Example 101 For a 3 ⇥ 3 matrix,


0 1 1
m1 m12 m13
B C
M = @m21 m22 m23 A ,
m31 m32 m33

then—see review question 1—M is non-singular if and only if:

det M = m11 m22 m33 m11 m23 m32 + m12 m23 m31 m12 m21 m33 + m13 m21 m32 m13 m22 m31 6= 0.

Notice that in the subscripts, each ordering of the numbers 1, 2, and 3 occurs exactly
once. Each of these is a permutation of the set {1, 2, 3}.

8.1.2 Permutations
Consider n objects labeled 1 through n and shu✏e them. Each possible shuf-
fle is called a permutation. For example, here is an example of a permutation
of 1–5: 
1 2 3 4 5
=
4 2 5 1 3

170
8.1 The Determinant Formula 171

We can consider a permutation as an invertible function from the set of


numbers [n] := {1, 2, . . . , n} to [n], so can write (3) = 5 in the above
example. In general we can write

1 2 3 4 5
,
(1) (2) (3) (4) (5)

but since the top line of any permutation is always the same, we can omit it
and just write: ⇥ ⇤
= (1) (2) (3) (4) (5)
and so our example becomes simply = [4 2 5 1 3].
The mathematics of permutations is extensive; there are a few key prop-
erties of permutations that we’ll need:

• There are n! permutations of n distinct objects, since there are n choices


for the first object, n 1 choices for the second once the first has been
chosen, and so on.

• Every permutation can be built up by successively⇥swapping


⇤ pairs of
objects. For example, ⇤ up the permutation 3 1 2 from the
⇥ to build
trivial permutation 1 2 3 , you can first swap 2 and 3, and then
swap 1 and 3.

• For any given permutation , there is some number of swaps it takes to


build up the permutation. (It’s simplest to use the minimum number of
swaps, but you don’t have to: it turns out that any way of building up
the permutation from swaps will have have the same parity of swaps,
either even or odd.) If this number happens to be even, then is
called an even permutation; if this number is odd, then is an odd
permutation. In fact, n! is even for all n 2, and exactly half of the
permutations are even and the other half are odd. It’s worth noting
that the trivial permutation (which sends i ! i for every i) is an even
permutation, since it uses zero swaps.

Definition The sign function is a function sgn that sends permutations


to the set { 1, 1} with rule of correspondence defined by

1 if is even
sgn( ) =
1 if is odd.

171
172 Determinants

Permutation Example

Reading homework: problem 1

We can use permutations to give a definition of the determinant.

Definition The determinant of n ⇥ n matrix M is

P
det M = sgn( ) m1 (1) m2 (2) · · · mn(n) .

The sum is over all permutations of n objects; a sum over the all elements
of { : {1, . . . , n} ! {1, . . . , n}}. Each summand is a product of n entries
from the matrix with each factor from a di↵erent row. In di↵erent terms of
the sum the column numbers are shu✏ed by di↵erent permutations .
The last statement about the summands yields a nice property of the
determinant:

Theorem 8.1.1. If M = (mij ) has a row consisting entirely of zeros, then


mi (i) = 0 for every and some i. Moreover det M = 0.

Example 102 Because there are many permutations of n, writing the determinant
this way for a general matrix gives a very long sum. For n = 4, there are 24 = 4!
permutations, and for n = 5, there are already 120 = 5! permutations.
0 1 1
m1 m12 m13 m14
Bm2 m2 m2 m2 C
B 4C
For a 4 ⇥ 4 matrix, M = B 13 2 3
C, then det M is:
@m1 m2 m3 m34 A
3 3

m41 m42 m43 m44

det M = m11 m22 m33 m44 m11 m23 m32 m44 m11 m22 m34 m43
m12 m21 m33 m44 + m11 m23 m34 m42 + m11 m24 m32 m43
+ m12 m23 m31 m44 + m12 m21 m34 m43 ± 16 more terms.

172
8.1 The Determinant Formula 173

This is very cumbersome.


Luckily, it is very easy to compute the determinants of certain matrices.
For example, if M is diagonal, meaning that Mji = 0 whenever i 6= j, then
all summands of the determinant involving o↵-diagonal entries vanish and
X
det M = sgn( )m1 (1) m2 (2) · · · mn(n) = m11 m22 · · · mnn .

The determinant of a diagonal matrix is


the product of its diagonal entries.

Since the identity matrix is diagonal with all diagonal entries equal to one,
we have
det I = 1.
We would like to use the determinant to decide whether a matrix is in-
vertible. Previously, we computed the inverse of a matrix by applying row
operations. Therefore we ask what happens to the determinant when row
operations are applied to a matrix.
Swapping rows Lets swap rows i and j of a matrix M and then compute its determi-
nant. For the permutation , let ˆ be the permutation obtained by swapping positions
i and j. Clearly
sgn(ˆ ) = sgn( ) .
Let M 0 be the matrix M with rows i and j swapped. Then (assuming i < j):
X
det M 0 = sgn( ) m1 (1) · · · mj (i) · · · mi (j) · · · mn(n)
X
= sgn( ) m1 (1) · · · mi (j) · · · mj (i) · · · mn(n)
X
= ( sgn(ˆ )) m1ˆ (1) · · · miˆ (i) · · · mjˆ (j) · · · mnˆ (n)
X
= sgn(ˆ ) m1ˆ (1) · · · miˆ (i) · · · mjˆ (j) · · · mnˆ (n)
ˆ
= det M.
P P
The step replacing by ˆ often causes confusion; it holds since we sum over all
permutations (see review problem 3). Thus we see that swapping rows changes the
sign of the determinant. I.e.,
det M 0 = det M .

173
174 Determinants

Figure 8.2: Remember what row swap does to determinants!

Reading homework: problem 8.2

Applying this result to M = I (the identity matrix) yields

det Eji = 1,

where the matrix Eji is the identity matrix with rows i and j swapped. It is a row swap
elementary matrix.
This implies another nice property of the determinant. If two rows of the matrix
are identical, then swapping the rows changes the sign of the matrix, but leaves the
matrix unchanged. Then we see the following:

Theorem 8.1.2. If M has two identical rows, then det M = 0.

8.2 Elementary Matrices and Determinants


In chapter 2 we found the matrices that perform the row operations involved
in Gaussian elimination; we called them elementary matrices.
As a reminder, for any matrix M , and a matrix M 0 equal to M after a
row operation, multiplying by an elementary matrix E gave M 0 = EM .

Elementary Matrices

We now examine what the elementary matrices to do determinants.

174
8.2 Elementary Matrices and Determinants 175

8.2.1 Row Swap


Our first elementary matrix swaps rows i and j when it is applied to a matrix
M . Explicitly, let R1 through Rn denote the rows of M , and let M 0 be the
matrix M with rows i and j swapped. Then M and M 0 can be regarded as
a block matrices (where the blocks are rows);
0 1 0 1
.. ..
. .
B iC B jC
BR C BR C
B . C B C
M =B .
. C and M 0 = B ... C .
B C B C
B Rj C B Ri C
@ A @ A
.. ..
. .
Then notice that
0 1 0 10 1
1
B .. C B .. C B .. C
B . C B . CB . C
B j C B CB i C
BR C B 0 1 CB R C
B . C B .. CB . C
M =B
0
B ..
C=B
C B . C B ..
CB
C.
C
B Ri C B 1 0 C B Rj C
B C B CB C
B . C B ... CB . C
@ .. A @ A @ .. A
1
The matrix 0 1
1
B .. C
B . C
B C
B 0 1 C
B . C
B .. C =: Eji
B C
B 1 0 C
B C
B ... C
@ A
1
is just the identity matrix with rows i and j swapped. The matrix Eji is an
elementary matrix and
M 0 = Eji M .
Because det I = 1 and swapping a pair of rows changes the sign of the
determinant, we have found that
det Eji = 1.

175
176 Determinants

Now we know that swapping a pair of rows flips the sign of the determi-
nant so det M 0 = detM . But det Eji = 1 and M 0 = Eji M so
det Eji M = det Eji det M .
This result hints at a general rule for determinants of products of matrices.

8.2.2 Row Multiplication


The next row operation is multiplying a row by a scalar. Consider
0 1
R1
B C
M = @ ... A ,
Rn
where Ri are row vectors. Let Ri ( ) be the identity matrix, with the ith
diagonal entry replaced by , not to be confused with the row vectors. I.e.,
0 1
1
B ... C
B C
B C
Ri ( ) = B C.
B . C
@ .. A
1
Then: 0
1
R1
B .. C
B . C
0 i B C
M = R ( )M = B Ri C ,
B . C
@ .. A
Rn
equals M with one row multiplied by .
What e↵ect does multiplication by the elementary matrix Ri ( ) have on
the determinant?

X
det M 0 = sgn( )m1 (1) · · · mi (i) · · · mn(n)
X
= sgn( )m1 (1) · · · mi (i) · · · mn(n)

= det M

176
8.2 Elementary Matrices and Determinants 177

Figure 8.3: Rescaling a row rescales the determinant.

Thus, multiplying a row by multiplies the determinant by . I.e.,

det Ri ( )M = det M .

Since Ri ( ) is just the identity matrix with a single row multiplied by ,


then by the above rule, the determinant of Ri ( ) is . Thus

0 1
1
B ... C
B C
B C
det Ri ( ) = det B C= ,
B .. C
@ . A
1

and once again we have a product of determinants formula

det Ri ( )M = det Ri ( ) det M.

8.2.3 Row Addition


The final row operation is adding µRj to Ri . This is done with the elementary
matrix Sji (µ), which is an identity matrix but with an additional µ in the i, j
position;

177
178 Determinants

0 1
1
B .. C
B . C
B C
B 1 µ C
B . C
Sj (µ) = B
i
B .. C.
C
B 1 C
B C
B .. C
@ . A
1
Then multiplying M by Sji (µ) performs a row addition;
0 10 1 0 1
1
B .. C B .. C B .. C
B . CB . C B . C
B CB i C B i C
B 1 µ C B R C B R + µRj C
B .. CB . C B .. C
B . C B .. C = B . C.
B CB C B C
B 1 C B j C B Rj C
B CB R C B C
B .. C B . C B .. C
@ . A @ .. A @ . A
1
What is the e↵ect of multiplying by Sji (µ) on the determinant? Let M 0 =
Sji (µ)M , and let M 00 be the matrix M but with Ri replaced by Rj Then

X
det M 0 = sgn( )m1 (1) · · · (mi (i) + µmj (i) ) · · · mn(n)
X
= sgn( )m1 (1) · · · mi (i) · · · mn(n)
X
+ sgn( )m1 (1) · · · µmj (j) · · · mj (j) · · · mn(n)

= det M + µ det M 00

Since M 00 has two identical rows, its determinant is 0 so

det M 0 = det M,

when M 0 is obtained from M by adding µ times row j to row i.

Reading homework: problem 3

178
8.2 Elementary Matrices and Determinants 179

Figure 8.4: Adding one row to another leaves the determinant unchanged.

We also have learnt that

det Sji (µ)M = det M .

Notice that if M is the identity matrix, then we have

det Sji (µ) = det(Sji (µ)I) = det I = 1 .

8.2.4 Determinant of Products


In summary, the elementary matrices for each of the row operations obey

Eji = I with rows i, j swapped; det Eji = 1


Ri ( ) = I with in position i, i; det Ri ( ) =
Sji (µ) = I with µ in position i, j; det Sji (µ) = 1

Elementary Determinants

Moreover we found a useful formula for determinants of products:

Theorem 8.2.1. If E is any of the elementary matrices Eji , Ri ( ), Sji (µ),


then det(EM ) = det E det M .

179
180 Determinants

We have seen that any matrix M can be put into reduced row echelon form
via a sequence of row operations, and we have seen that any row operation can
be achieved via left matrix multiplication by an elementary matrix. Suppose
that RREF(M ) is the reduced row echelon form of M . Then

RREF(M ) = E1 E2 · · · Ek M ,

where each Ei is an elementary matrix. We know how to compute determi-


nants of elementary matrices and products thereof, so we ask:

What is the determinant of a square matrix in reduced row echelon form?

The answer has two cases:

1. If M is not invertible, then some row of RREF(M ) contains only zeros.


Then we can multiply the zero row by any constant without chang-
ing M ; by our previous observation, this scales the determinant of M
by . Thus, if M is not invertible, det RREF(M ) = det RREF(M ),
and so det RREF(M ) = 0.

2. Otherwise, every row of RREF(M ) has a pivot on the diagonal; since


M is square, this means that RREF(M ) is the identity matrix. So if
M is invertible, det RREF(M ) = 1.

Notice that because det RREF(M ) = det(E1 E2 · · · Ek M ), by the theorem


above,
det RREF(M ) = det(E1 ) · · · det(Ek ) det M .
Since each Ei has non-zero determinant, then det RREF(M ) = 0 if and only
if det M = 0. This establishes an important theorem:

Theorem 8.2.2. For any square matrix M , det M 6= 0 if and only if M is


invertible.

Since we know the determinants of the elementary matrices, we can im-


mediately obtain the following:

Determinants and Inverses

180
8.2 Elementary Matrices and Determinants 181

Figure 8.5: Determinants measure if a matrix is invertible.

Corollary 8.2.3. Any elementary matrix Eji , Ri ( ), Sji (µ) is invertible, ex-
cept for Ri (0). In fact, the inverse of an elementary matrix is another ele-
mentary matrix.

To obtain one last important result, suppose that M and N are square
n ⇥ n matrices, with reduced row echelon forms such that, for elementary
matrices Ei and Fi ,

M = E1 E2 · · · Ek RREF(M ) ,

and
N = F1 F2 · · · Fl RREF(N ) .
If RREF(M ) is the identity matrix (i.e., M is invertible), then:

det(M N ) = det(E1 E2 · · · Ek RREF(M )F1 F2 · · · Fl RREF(N ))


= det(E1 E2 · · · Ek IF1 F2 · · · Fl RREF(N ))
= det(E1 ) · · · det(Ek ) det(I) det(F1 ) · · · det(Fl ) det RREF(N )
= det(M ) det(N )

Otherwise, M is not invertible, and det M = 0 = det RREF(M ). Then there


exists a row of zeros in RREF(M ), so Rn ( ) RREF(M ) = RREF(M ) for
any . Then:

det(M N ) = det(E1 E2 · · · Ek RREF(M )N )


= det(E1 ) · · · det(Ek ) det(RREF(M )N )
= det(E1 ) · · · det(Ek ) det(Rn ( ) RREF(M )N )
= det(E1 ) · · · det(Ek ) det(RREF(M )N )
= det(M N )

181
182 Determinants

Figure 8.6: “The determinant of a product is the product of determinants.”

Which implies that det(M N ) = 0 = det M det N .


Thus we have shown that for any matrices M and N ,

det(M N ) = det M det N

This result is extremely important; do not forget it!

Alternative proof

Reading homework: problem 4

8.3 Review Problems


Reading Problems 1 ,2 ,3 ,4
Webwork: 2 ⇥ 2 Determinant 7
Determinants and invertibility 8, 9, 10, 11

1. Let 0 1
m11 m12 m13
B C
M = @m21 m22 m23 A .
m31 m32 m33

182
8.3 Review Problems 183

Use row operations to put M into row echelon form. For simplicity,
assume that m11 6= 0 6= m11 m22 m21 m12 .
Prove that M is non-singular if and only if:
m11 m22 m33 m11 m23 m32 + m12 m23 m31
m12 m21 m33 + m13 m21 m32 m13 m22 m31 6= 0
✓ ◆ ✓ ◆
1 0 1 a b
2. (a) What does the matrix E2 = do to M = under
1 0 d c
left multiplication? What about right multiplication?
(b) Find elementary matrices R1 ( ) and R2 ( ) that respectively mul-
tiply rows 1 and 2 of M by but otherwise leave M the same
under left multiplication.
(c) Find a matrix S21 ( ) that adds a multiple of row 2 to row 1
under left multiplication.
3. Let ˆ denote the permutation obtained from by transposing the first
two outputs, i.e. ˆ (1) = (2) and ˆ (2) = (1). Suppose the function
f : {1, 2, 3, 4} ! R. Write out explicitly the following two sums:
X X
f (s) and f ˆ (s) .

What do you observe? Now write a brief explanation why the following
equality holds X X
F( ) = F (ˆ ) ,

where the domain of the function F is the set of all permutations of n


objects and ˆ is related to by swapping a given pair of objects.
4. Let M be a matrix and Sji M the same matrix with rows i and j
switched. Explain every line of the series of equations proving that
det M = det(Sji M ).
5. Let M 0 be the matrix obtained from M by swapping two columns i
and j. Show that det M 0 = det M .
6. The scalar triple product of three vectors u, v, w from R3 is u · (v ⇥ w).
Show that this product is the same as the determinant of the matrix
whose columns are u, v, w (in that order). What happens to the scalar
triple product when the factors are permuted?

183
184 Determinants

7. Show that if M is a 3 ⇥ 3 matrix whose third row is a sum of multiples


of the other rows (R3 = aR2 + bR1 ) then det M = 0. Show that the
same is true if one of the columns is a sum of multiples of the others.

8. Calculate the determinant below by factoring the matrix into elemen-


tary matrices times simpler matrices and using the trick
1 1
det(M ) = det(E EM ) = det(E ) det(EM ) .

Explicitly show each ERO matrix.


0 1
2 1 0
det @4 3 1A
2 2 2
✓ ◆ ✓ ◆
a b x y
9. Let M = and N = . Compute the following:
c d z w

(a) det M .
(b) det N .
(c) det(M N ).
(d) det M det N .
1
(e) det(M ) assuming ad bc 6= 0.
T
(f) det(M )
(g) det(M + N ) (det M + det N ). Is the determinant a linear trans-
formation from square matrices to real numbers? Explain.
✓ ◆
a b
10. Suppose M = is invertible. Write M as a product of elemen-
c d
tary row matrices times RREF(M ).

11. Find the inverses of each of the elementary matrices, Eji , Ri ( ), Sji ( ).
Make sure to show that the elementary matrix times its inverse is ac-
tually the identity.

12. Let eij denote the matrix with a 1 in the i-th row and j-th column
and 0’s everywhere else, and let A be an arbitrary 2 ⇥ 2 matrix. Com-
pute det(A + tI2 ). What is the first order term (the t1 term)? Can you

184
8.3 Review Problems 185

express your results in terms of tr(A)? What about the first order term
in det(A + tIn ) for any arbitrary n ⇥ n matrix A in terms of tr(A)?
Note that the result of det(A + tI2 ) is a polynomial in the variable t
known as the characteristic polynomial.

13. (Directional) Derivative of the determinant:


Notice that det : Mnn ! R (where Mnn is the vector space of all n ⇥ n
matrices) det is a function of n2 variables so we can take directional
derivatives of det.
Let A be an arbitrary n ⇥ n matrix, and for all i and j compute the
following:

(a)
det(I2 + teij ) det(I2 )
lim
t!0 t
(b)
det(I3 + teij ) det(I3 )
lim
t!0 t
(c)
det(In + teij ) det(In )
lim
t!0 t
(d)
det(In + At) det(In )
lim
t!0 t
Note, these are the directional derivative in the eij and A directions.

14. How many functions are in the set


1
{f : {1, . . . , n} ! {1, . . . , n}|f exists} ?

What about the set


{1, . . . , n}{1,...,n} ?
Which of these two sets correspond to the set of all permutations of n
objects?

185
186 Determinants

8.4 Properties of the Determinant


We now know that the determinant of a matrix is non-zero if and only if that
matrix is invertible. We also know that the determinant is a multiplicative
function, in the sense that det(M N ) = det M det N . Now we will devise
some methods for calculating the determinant.
Recall that:
X
det M = sgn( )m1 (1) m2 (2) · · · mn(n) .

A minor of an n ⇥ n matrix M is the determinant of any square matrix


obtained from M by deleting one row and one column. In particular, any
entry mij of a square matrix M is associated to a minor obtained by deleting
the ith row and jth column of M .
It is possible to write the determinant of a matrix in terms of its minors
as follows:

X
det M = sgn( ) m1 (1) m2 (2) · · · mn(n)
X
= m11 sgn(/ 1 ) m2/ 1 (2) · · · mn/1 (n)
/1
X
+ m12 sgn(/ 2 ) m2/ 2 (1) m3/ 2 (3) · · · mn/2 (n)
/2
X
+ m13 sgn(/ 3 ) m2/ 3 (1) m3/ 3 (2) m4/ 3 (4) · · · mn/3 (n)
/3
+ ···

Here the symbols / k refers to the permutation with the input k removed.
The summand on the j’th line of the above formula looks like the determinant
of the minor obtained by removing the first and j’th column of M . However
we still need to replace sum of / j by a sum over permutations of column
numbers of the matrix entries of this minor. This costs a minus sign whenever
j 1 is odd. In other words, to expand by minors we pick an entry m1j of the
first row, then add ( 1)j 1 times the determinant of the matrix with row i
and column j deleted. An example will probably help:

186
8.4 Properties of the Determinant 187

Example 103 Let’s compute the determinant of


0 1
1 2 3
M = @4 5 6A
7 8 9

using expansion by minors:

✓ ◆ ✓ ◆ ✓ ◆
5 6 4 6 4 5
det M = 1 det 2 det + 3 det
8 9 7 9 7 8
= 1(5 · 9 8 · 6) 2(4 · 9 7 · 6) + 3(4 · 8 7 · 5)
= 0

Here, M 1 does not exist because1 det M = 0.

Example 104 Sometimes the entries0 of a 1


matrix allow us to simplify the calculation
1 2 3
of the determinant. Take N = @4 0 0A. Notice that the second row has many
7 8 9
zeros; then we can switch the first and second rows of N before expanding in minors
to get:

0 1 0 1
1 2 3 4 0 0
det @4 0 0A = det @1 2 3A
7 8 9 7 8 9
✓ ◆
2 3
= 4 det
8 9
= 24

Example

Since we know how the determinant of a matrix changes when you perform
row operations, it is often very beneficial to perform row operations before
computing the determinant by brute force.
1
A fun exercise is to compute the determinant of a 4 ⇥ 4 matrix filled in order, from
left to right, with the numbers 1, 2, 3, . . . , 16. What do you observe? Try the same for a
5 ⇥ 5 matrix with 1, 2, 3, . . . , 25. Is there a pattern? Can you explain it?

187
188 Determinants

Example 105
0 1 0 1 0 1
1 2 3 1 2 3 1 2 3
det @4 5 6A = det @3 3 3A = det @3 3 3A = 0 .
7 8 9 6 6 6 0 0 0

Try to determine which row operations we made at each step of this computation.

You might suspect that determinants have similar properties with respect
to columns as what applies to rows:

If M is a square matrix then det M T = det M .

Proof. By definition,
X
det M = sgn( )m1 (1) m2 (2) · · · mn(n) .

1
For any permutation , there is a unique inverse permutation that
1
undoes . If sends i ! j, then sends j ! i. In the two-line notation
for a permutation,
 this corresponds to just flipping the permutation over. For
1 2 3
example, if = , then we can find 1 by flipping the permutation
2 3 1
and then putting the columns in order:
 
1 2 3 1 1 2 3
= = .
1 2 3 3 1 2

Since any permutation can be built up by transpositions, one can also find
the inverse of a permutation by undoing each of the transpositions used to
build up ; this shows that one can use the same number of transpositions
to build and 1 . In particular, sgn = sgn 1 .

Reading homework: problem 5

188
8.4 Properties of the Determinant 189

Figure 8.7: Transposes leave the determinant unchanged.

Then we can write out the above in formulas as follows:


X
det M = sgn( )m1 (1) m2 (2) · · · mn(n)
X 1 (1) 1 (2) 1 (n)
= sgn( )m1 m2 · · · mn
X 1 (1) 1 (2) 1 (n)
1
= sgn( )m1 m2 · · · mn
X (1) (2)
= sgn( )m1 m2 · · · mn(n)

= det M T .

The second-to-last equality is due to the existence of a unique inverse permu-


tation: summing over permutations is the same as summing over all inverses
of permutations (see review problem 3). The final equality is by the definition
of the transpose.

Example 106 Because of this, we see that expansion by minors also works over
columns. Let 0 1
1 2 3
M = @0 5 6A .
0 8 9
Then ✓ ◆
5 8
det M = det M T = 1 det = 3.
6 9

189
190 Determinants

8.4.1 Determinant of the Inverse


Let M and N be n ⇥ n matrices. We previously showed that

det(M N ) = det M det N , and det I = 1.


1 1
Then 1 = det I = det(M M ) = det M det M . As such we have:
Theorem 8.4.1.
1 1
det M =
det M

8.4.2 Adjoint of a Matrix


Recall that for a 2 ⇥ 2 matrix
✓ ◆✓ ◆ ✓ ◆
d b a b a b
= det I.
c a c d c d
Or in a more careful notation: if
✓ ◆
m11 m12
M= ,
m21 m22
then ✓ ◆
1 1 m22 m12
M = ,
m11 m22 m12 m21 m21 m11
2 1
◆ ✓
m 2 m 2
so long as det M = m11 m22 m12 m21 6= 0. The matrix that
m21 m11
appears above is a special matrix, called the adjoint of M . Let’s define the
adjoint for an n ⇥ n matrix.
The cofactor of M corresponding to the entry mij of M is the product
of the minor associated to mij and ( 1)i+j . This is written cofactor(mij ).

190
8.4 Properties of the Determinant 191

Definition For M = (mij ) a square matrix, the adjoint matrix adj M is


given by
adj M = (cofactor(mij ))T .

Example 107
0 ✓ ◆ ✓ ◆ ✓ ◆ 1T
2 0 1 0 1 2
B det det det C
0 1 B 1 1 0 1 0 1 C
3 1 1 B ✓ ◆ ✓ ◆ ✓ ◆C
B 1 1 3 1 3 1 C
adj @1 2 0 =B
A
B det det det C
B 1 1 0 1 0 1 C
C
0 1 1 B ✓ ◆ ✓ ◆ ✓ ◆C
@ 1 1 3 1 3 1 A
det det det
2 0 1 0 1 2

Reading homework: problem 6

Let’s compute the product M adj M . For any matrix N , the i, j entry
of M N is given by taking the dot product of the ith row of M and the jth
column of N . Notice that the dot product of the ith row of M and the ith
column of adj M is just the expansion by minors of det M in the ith row.
Further, notice that the dot product of the ith row of M and the jth column
of adj M with j 6= i is the same as expanding M by minors, but with the
jth row replaced by the ith row. Since the determinant of any matrix with
a row repeated is zero, then these dot products are zero as well.
We know that the i, j entry of the product of two matrices is the dot
product of the ith row of the first by the jth column of the second. Then:

M adj M = (det M )I
1
Thus, when det M 6= 0, the adjoint gives an explicit formula for M .

Theorem 8.4.2. For M a square matrix with det M 6= 0 (equivalently, if M


is invertible), then
1
M 1= adj M
det M

The Adjoint Matrix

191
192 Determinants

Example 108 Continuing with the previous example,

0 1 0 1
3 1 1 2 0 2
@
adj 1 2 A
0 = @ 1 3 1A .
0 1 1 1 3 7

Now, multiply:

0 10 1 0 1
3 1 1 2 0 2 6 0 0
@1 2 0A @ 1 3 1A = @0 6 0A
0 1 1 1 3 7 0 0 6
0 1 1 0 1
3 1 1 2 0 2
1@
) @1 2 0A = 1 3 1A
6
0 1 1 1 3 7

This process for finding the inverse matrix is sometimes called Cramer’s Rule .

8.4.3 Application: Volume of a Parallelepiped


Given three vectors u, v, w in R3 , the parallelepiped determined by the three
vectors is the “squished” box whose edges are parallel to u, v, and w as
depicted in Figure 8.8.
You probably learnt in a calculus course that the volume of this object is
|u (v ⇥ w)|. This is the same as expansion by minors of the matrix whose
columns are u, v, w. Then:

Volume = det u v w

192
8.5 Review Problems 193

Figure 8.8: A parallelepiped.

8.5 Review Problems


Reading Problems 5 ,6
Row of zeros 12
3 ⇥ 3 determinant 13
Webwork:
Triangular determinants 14,15,16,17
Expanding in a column 18
Minors and cofactors 19

1. Find the determinant via expanding by minors.


0 1
2 1 3 7
B6 1 4 4C
B C
@2 1 8 0A
1 0 2 0

2. Even if M is not a square matrix, both M M T and M T M are square. Is


it true that det(M M T ) = det(M T M ) for all matrices M ? How about
tr(M M T ) = tr(M T M )?

193
194 Determinants

1
3. Let denote the inverse permutation of . Suppose the function
f : {1, 2, 3, 4} ! R. Write out explicitly the following two sums:
X X
1
f (s) and f (s) .

What do you observe? Now write a brief explanation why the following
equality holds X X
F( ) = F ( 1) ,

where the domain of the function F is the set of all permutations of n


objects.

4. Suppose M = LU is an LU decomposition. Explain how you would


efficiently compute det M in this case. How does this decomposition
allow you to easily see if M is invertible?

5. In computer science, the complexity of an algorithm is (roughly) com-


puted by counting the number of times a given operation is performed.
Suppose adding or subtracting any two numbers takes a seconds, and
multiplying two numbers takes m seconds. Then, for example, com-
puting 2 · 6 5 would take a + m seconds.

(a) How many additions and multiplications does it take to compute


the determinant of a general 2 ⇥ 2 matrix?
(b) Write a formula for the number of additions and multiplications it
takes to compute the determinant of a general n ⇥ n matrix using
the definition of the determinant as a sum over permutations.
Assume that finding and multiplying by the sign of a permutation
is free.
(c) How many additions and multiplications does it take to compute
the determinant of a general 3 ⇥ 3 matrix using expansion by
minors? Assuming m = 2a, is this faster than computing the
determinant from the definition?

Hint

194
Subspaces and Spanning Sets
9
It is time to study vector spaces more carefully and return to some funda-
mental questions:

1. Subspaces: When is a subset of a vector space itself a vector space?


(This is the notion of a subspace.)

2. Linear Independence: Given a collection of vectors, is there a way to


tell whether they are independent, or if one is a “linear combination”
of the others?

3. Dimension: Is there a consistent definition of how “big” a vector space


is?

4. Basis: How do we label vectors? Can we write any vector as a sum of


some basic set of vectors? How do we change our point of view from
vectors labeled one way to vectors labeled in another way?

Let’s start at the top!

9.1 Subspaces
Definition We say that a subset U of a vector space V is a subspace of V
if U is a vector space under the inherited addition and scalar multiplication
operations of V .

195
196 Subspaces and Spanning Sets

Example 109 Consider a plane P in R3 through the origin:


ax + by + cz = 0.

0 1
x
This equation can be expressed as the homogeneous system a b c @ y A = 0, or
z
M X = 0 with M the matrix a b c . If X1 and X2 are both solutions to M X = 0,
then, by linearity of matrix multiplication, so is µX1 + ⌫X2 :
M (µX1 + ⌫X2 ) = µM X1 + ⌫M X2 = 0.
So P is closed under addition and scalar multiplication. Additionally, P contains the
origin (which can be derived from the above by setting µ = ⌫ = 0). All other vector
space requirements hold for P because they hold for all vectors in R3 .

Theorem 9.1.1 (Subspace Theorem). Let U be a non-empty subset of a


vector space V . Then U is a subspace if and only if µu1 + ⌫u2 2 U for
arbitrary u1 , u2 in U , and arbitrary constants µ, ⌫.
Proof. One direction of this proof is easy: if U is a subspace, then it is a vector
space, and so by the additive closure and multiplicative closure properties of
vector spaces, it has to be true that µu1 + ⌫u2 2 U for all u1 , u2 in U and all
constants constants µ, ⌫.
The other direction is almost as easy: we need to show that if µu1 +⌫u2 2
U for all u1 , u2 in U and all constants µ, ⌫, then U is a vector space. That
is, we need to show that the ten properties of vector spaces are satisfied. We
already know that the additive closure and multiplicative closure properties
are satisfied. Further, U has all of the other eight properties because V has
them.

196
9.2 Building Subspaces 197

Note that the requirements of the subspace theorem are often referred to as
“closure”.
We can use this theorem to check if a set is a vector space. That is, if we
have some set U of vectors that come from some bigger vector space V , to
check if U itself forms a smaller vector space we need check only two things:

1. If we add any two vectors in U , do we end up with a vector in U ?

2. If we multiply any vector in U by any constant, do we end up with a


vector in U ?

If the answer to both of these questions is yes, then U is a vector space. If


not, U is not a vector space.

Reading homework: problem 1

9.2 Building Subspaces


Consider the set 80 1 0 19
< 1 0 =
U= @ 0 , 1 A ⇢ R3 .
A @
: ;
0 0
Because U consists of only two vectors, it clear that U is not a vector space,
since any constant multiple of these vectors should also be in U . For example,
the 0-vector is not in U , nor is U closed under vector addition.
But we know that any two vectors define a plane:

197
198 Subspaces and Spanning Sets

In this case, the vectors in U define the xy-plane in R3 . We can view the
xy-plane as the set of all vectors that arise as a linear combination of the two
vectors in U . We call this set of all linear combinations the span of U :
8 0 1 0 1 9
< 1 0 =
@ A @
span(U ) = x 0 + y 1 x, y 2 R .A
: ;
0 0
Notice that any vector in the xy-plane is of the form
0 1 0 1 0 1
x 1 0
@ y A = x @0A + y @1A 2 span(U ).
0 0 0

Definition Let V be a vector space and S = {s1 , s2 , . . .} ⇢ V a subset of V .


Then the span of S, denoted span(S), is the set

span(S) := {r1 s1 + r2 s2 + · · · + rN sN | ri 2 R, N 2 N}.

That is, the span of S is the set of all finite linear combinations1 of
elements of S. Any finite sum of the form “a constant times s1 plus a constant
times s2 plus a constant times s3 and so on” is in the span of S.2 .
0 1
0
Example 110 Let V = R3 and X ⇢ V be the x-axis. Let P = @1A, and set
0

S = X [ {P } .
0 1 0 1 0 1 0 1
2 2 2 0
The vector @3A is in span(S), because @3A = @0A + 3 @1A . Similarly, the vector
0 0 0 0
0 1 0 1 0 1 0 1
12 12 12 0
@17.5A is in span(S), because @17.5A = @ 0A +17.5 @1A . Similarly, any vector
0 0 0 0
1
Usually our vector spaces are defined over R, but in general we can have vector spaces
defined over di↵erent base fields such as C or Z2 . The coefficients ri should come from
whatever our base field is (usually R).
2
It is important that we only allow finitely many terms in our linear combinations; in
the definition above, N must be a finite number. It can be any finite number, but it must
be finite. We can relax the requirement that S = {s1 , s2 , . . .} and just let S be any set of
vectors. Then we shall write span(S) := {r1 s1 +r2 s2 +· · ·+rN sN | ri 2 R, si 2 S, N 2 N, }

198
9.2 Building Subspaces 199

of the form 0 1 0 1 0 1
x 0 x
@ 0A + y @1A = @ y A
0 0 0
is in span(S). On the other hand, any vector in span(S) must have a zero in the
z-coordinate. (Why?) So span(S) is the xy-plane, which is a vector space. (Try
drawing a picture to verify this!)

Reading homework: problem 2

Lemma 9.2.1. For any subset S ⇢ V , span(S) is a subspace of V .

Proof. We need to show that span(S) is a vector space.


It suffices to show that span(S) is closed under linear combinations. Let
u, v 2 span(S) and , µ be constants. By the definition of span(S), there are
constants ci and di (some of which could be zero) such that:

u = c 1 s1 + c 2 s2 + · · ·
v = d 1 s1 + d 2 s2 + · · ·
) u + µv = (c1 s1 + c2 s2 + · · · ) + µ(d1 s1 + d2 s2 + · · · )
= ( c1 + µd1 )s1 + ( c2 + µd2 )s2 + · · ·

This last sum is a linear combination of elements of S, and is thus in span(S).


Then span(S) is closed under linear combinations, and is thus a subspace
of V .

Note that this proof, like many proofs, consisted of little more than just
writing out the definitions.

Example 111 For which values of a does


8 0 1 0 1 0 19
< 1 1 a =
span @0A , @ 2A , @1A = R3 ?
: ;
a 3 0
0 1
x
Given an arbitrary vector y A in R3 , we need to find constants r1 , r2 , r3 such that
@
z

199
200 Subspaces and Spanning Sets

0 1 0 1 0 1 0 1
1 1 a x
r1 @0A + r2 @ 2A + r3 @1A = @ y A .
a 3 0 z
We can write this as a linear system in the unknowns r1 , r2 , r3 as follows:
0 1 0 11 0 1
1 1 a r x
@0 2 1A @r2 A = @ y A .
a 3 0 r3 z
0 1
1 1 a
If the matrix M = @0 2 1A is invertible, then we can find a solution
a 3 0
0 1 0 11
x r
M 1 @ y A = @r 2 A
z r3
0 1
x
for any vector y A 2 R3 .
@
z
Therefore we should choose a so that M is invertible:

i.e., 0 6= det M = 2a2 + 3 + a = (2a 3)(a + 1).


Then the span is R3 if and only if a 6= 1, 32 .

Linear systems as spanning sets

Some other very important ways of building subspaces are given in the
following examples.

Example 112 (The kernel of a linear map).


Suppose L : U ! V is a linear map between vector spaces. Then if

L(u) = 0 = L(u0 ) ,

linearity tells us that

L(↵u + u0 ) = ↵L(u) + L(u0 ) = ↵0 + 0 = 0 .

200
9.2 Building Subspaces 201

Hence, thanks to the subspace theorem, the set of all vectors in U that are mapped
to the zero vector is a subspace of V . It is called the kernel of L:

kerL := {u 2 U |L(u) = 0} ⇢ U.

Note that finding a kernel means finding a solution to a homogeneous linear equation.

Example 113 (The image of a linear map).


Suppose L : U ! V is a linear map between vector spaces. Then if

v = L(u) and v 0 = L(u0 ) ,

linearity tells us that

↵v + v 0 = ↵L(u) + L(u0 ) = L(↵u + u0 ) .

Hence, calling once again on the subspace theorem, the set of all vectors in V that
are obtained as outputs of the map L is a subspace. It is called the image of L:

imL := {L(u) | u 2 U } ⇢ V.

Example 114 (An eigenspace of a linear map).


Suppose L : V ! V is a linear map and V is a vector space. Then if

L(u) = u and L(v) = v ,

linearity tells us that

L(↵u + v) = ↵L(u) + L(v) = ↵L(u) + L(v) = ↵ u + v = (↵u + v) .

Hence, again by subspace theorem, the set of all vectors in V that obey the eigenvector
equation L(v) = v is a subspace of V . It is called an eigenspace

V := {v 2 V |L(v) = v}.

For most scalars , the only solution to L(v) = v will be v = 0, which yields the
trivial subspace {0}. When there are nontrivial solutions to L(v) = v, the number
is called an eigenvalue, and carries essential information about the map L.

Kernels, images and eigenspaces are discussed in great depth in chap-


ters 16 and 12.

201
202 Subspaces and Spanning Sets

9.3 Review Problems


Reading Problems 1 ,2
Webwork: Subspaces 3, 4, 5, 6
Spans 7, 8

1. Determine if x x3 2 span{x2 , 2x + x2 , x + x3 }.

2. Let U and W be subspaces of V . Are:

(a) U [ W
(b) U \ W

also subspaces? Explain why or why not. Draw examples in R3 .

Hint

3. Let L : R3 ! R3 where

L(x, y, z) = (x + 2y + z, 2x + y + z, 0) .

Find kerL, imL and the eigenspaces R3 1 , R33 . Your answers should be
subsets of R3 . Express them using span notation.

202
Linear Independence
10
Consider a plane P that includes the origin in R3 and non-zero vectors
{u, v, w} in P .

If no two of u, v and w are parallel, then P = span{u, v, w}. But any two
vectors determines a plane, so we should be able to span the plane using
only two of the vectors u, v, w. Then we could choose two of the vectors in
{u, v, w} whose span is P , and express the other as a linear combination of
those two. Suppose u and v span P . Then there exist constants d1 , d2 (not
both zero) such that w = d1 u + d2 v. Since w can be expressed in terms of u
and v we say that it is not independent. More generally, the relationship

c1 u + c2 v + c3 w = 0 ci 2 R, some ci 6= 0

expresses the fact that u, v, w are not all independent.

203
204 Linear Independence

Definition We say that the vectors v1 , v2 , . . . , vn are linearly dependent


if there exist constants1 c1 , c2 , . . . , cn not all zero such that

c1 v1 + c2 v2 + · · · + cn vn = 0.

Otherwise, the vectors v1 , v2 , . . . , vn are linearly independent.

Remark The zero vector 0V can never be on a list of independent vectors because
↵0V = 0V for any scalar ↵.

Example 115 Consider the following vectors in R3 :


0 1 0 1 0 1 0 1
4 3 5 1
v 1 = @ 1A , v 2 = @ 7A , v3 = @12A , v4 = @ 1A .
3 4 17 0

Are these vectors linearly independent?


No, since 3v1 + 2v2 v3 + v4 = 0, the vectors are linearly dependent.

Worked Example

10.1 Showing Linear Dependence


In the above example we were given the linear combination 3v1 +2v2 v3 +v4
seemingly by magic. The next example shows how to find such a linear
combination, if it exists.

Example 116 Consider the following vectors in R3 :


0 1 0 1 0 1
0 1 1
v1 = @0A , v2 = @2A , v3 = @2A .
1 1 3

Are they linearly independent?


We need to see whether the system

c1 v1 + c2 v2 + c3 v3 = 0
1
Usually our vector spaces are defined over R, but in general we can have vector spaces
defined over di↵erent base fields such as C or Z2 . The coefficients ci should come from
whatever our base field is (usually R).

204
10.1 Showing Linear Dependence 205

has any solutions for c1 , c2 , c3 . We can rewrite this as a homogeneous system by


building a matrix whose columns are the vectors v1 , v2 and v3 :
0 11
c
v1 v2 v3 @ c2 A = 0.
c3

This system has solutions if and only if the matrix M = v1 v2 v3 is singular, so


we should find the determinant of M :
0 1
0 1 1 ✓ ◆
1 1
det M = det @0 2 2A = det = 0.
2 2
1 1 3

Therefore nontrivial solutions exist. At this point we know that the vectors are
linearly dependent. If we need to, we can find coefficients that demonstrate linear
dependence by solving
0 1 0 1 0 1
0 1 1 0 1 1 3 0 1 0 2 0
@0 2 2 0A ⇠ @0 1 1 0A ⇠ @0 1 1 0A .
1 1 3 0 0 0 0 0 0 0 0 0

The solution set {µ( 2, 1, 1) | µ 2 R} encodes the linear combinations equal to zero;
any choice of µ will produce coefficients c1 , c2 , c3 that satisfy the linear homogeneous
equation. In particular, µ = 1 corresponds to the equation

c1 v1 + c2 v2 + c3 v3 = 0 ) 2v1 v2 + v3 = 0.

Reading homework: problem 1

Definition Any sum of vectors v1 , . . . , vk multiplied by scalars c1 , . . . , ck ,


namely
c1 v1 + · · · + ck vk ,
is called a linear combination of v1 , . . . , vk .

Theorem 10.1.1 (Linear Dependence). An ordered set of non-zero vectors


(v1 , . . . , vn ) is linearly dependent if and only if one of the vectors vk is ex-
pressible as a linear combination of the preceding vectors.

Proof. The theorem is an if and only if statement, so there are two things to
show.

205
206 Linear Independence

i. First, we show that if vk = c1 v1 + · · · ck 1 vk 1 then the set is linearly


dependent.
This is easy. We just rewrite the assumption:

c1 v1 + · · · + ck 1 vk 1 vk + 0vk+1 + · · · + 0vn = 0.

This is a vanishing linear combination of the vectors {v1 , . . . , vn } with


not all coefficients equal to zero, so {v1 , . . . , vn } is a linearly dependent
set.

ii. Now we show that linear dependence implies that there exists k for
which vk is a linear combination of the vectors {v1 , . . . , vk 1 }.
The assumption says that

c1 v1 + c2 v2 + · · · + cn vn = 0.

Take k to be the largest number for which ck is not equal to zero. So:

c1 v1 + c2 v2 + · · · + ck 1 vk 1 + ck vk = 0.

(Note that k > 1, since otherwise we would have c1 v1 = 0 ) v1 = 0,


contradicting the assumption that none of the vi are the zero vector.)
So we can rearrange the equation:

c1 v1 + c2 v2 + · · · + ck 1 vk 1 = ck vk
c1 c2 ck 1
) v 1 v 2 · · · vk 1 = vk .
ck ck ck

Therefore we have expressed vk as a linear combination of the previous


vectors, and we are done.

Worked proof

206
10.2 Showing Linear Independence 207

Example 117 Consider the vector space P2 (t) of polynomials of degree less than or
equal to 2. Set:

v1 = 1 + t
v 2 = 1 + t2
v 3 = t + t2
v 4 = 2 + t + t2
v 5 = 1 + t + t2 .

The set {v1 , . . . , v5 } is linearly dependent, because v4 = v1 + v2 .

10.2 Showing Linear Independence


We have seen two di↵erent ways to show a set of vectors is linearly dependent:
we can either find a linear combination of the vectors which is equal to
zero, or we can express one of the vectors as a linear combination of the
other vectors. On the other hand, to check that a set of vectors is linearly
independent, we must check that every linear combination of our vectors
with non-vanishing coefficients gives something other than the zero vector.
Equivalently, to show that the set v1 , v2 , . . . , vn is linearly independent, we
must show that the equation c1 v1 + c2 v2 + · · · + cn vn = 0 has no solutions
other than c1 = c2 = · · · = cn = 0.

Example 118 Consider the following vectors in R3 :


0 1 0 1 0 1
0 2 1
v1 = @0A , v2 = @2A , v3 = @4A .
2 1 3

Are they linearly independent?


We need to see whether the system

c1 v1 + c2 v2 + c3 v3 = 0

has any solutions for c1 , c2 , c3 . We can rewrite this as a homogeneous system:


0 11
c
v1 v2 v3 @ c2 A = 0.
c3

207
208 Linear Independence

This system has solutions if and only if the matrix M = v1 v2 v3 is singular, so


we should find the determinant of M :
0 1
0 2 1 ✓ ◆
@ A 2 1
det M = det 0 2 4 = 2 det = 12.
2 4
2 1 3

Since the matrix M has non-zero determinant, the only solution to the system of
equations 0 11
c
v1 v2 v3 @c2 A = 0
c3
is c1 = c2 = c3 = 0. So the vectors v1 , v2 , v3 are linearly independent.

Here is another example with bits:

Example 119 Let Z32 be the space of 3 ⇥ 1 bit-valued matrices (i.e., column vectors).
Is the following subset linearly independent?
8 0 1 0 1 0 19
< 1 1 0 =
@1A , @0A , @1A
: ;
0 1 1

If the set is linearly dependent, then we can find non-zero solutions to the system:
0 1 0 1 0 1
1 1 0
1@ A 2@ A 3@ A
c 1 + c 0 + c 1 = 0,
0 1 1

which becomes the linear system


0 1 0 11
1 1 0 c
@1 0 1A @c2 A = 0.
0 1 1 c3

Solutions exist if and only if the determinant of the matrix is non-zero. But:
0 1
1 1 0 ✓ ◆ ✓ ◆
0 1 1 1
det @1 A
0 1 = 1 det 1 det = 1 1=1+1=0
1 1 0 1
0 1 1

Therefore non-trivial solutions exist, and the set is not linearly independent.

Reading homework: problem 2

208
10.3 From Dependent Independent 209

10.3 From Dependent Independent


Now suppose vectors v1 , . . . , vn are linearly dependent,
c1 v1 + c2 v2 + · · · + cn vn = 0
with c1 6= 0. Then:
span{v1 , . . . , vn } = span{v2 , . . . , vn }
because any x 2 span{v1 , . . . , vn } is given by
x = a1 v 1 + · · · + an v n
✓ 2 ◆
1 c cn
= a v2 · · · v n + a2 v 2 + · · · + an v n
c1 c1
✓ 2
◆ ✓ n

2 1c n 1c
= a a v2 + · · · + a a vn .
c1 c1
Then x is in span{v2 , . . . , vn }.
When we write a vector space as the span of a list of vectors, we would like
that list to be as short as possible (this idea is explored further in chapter 11).
This can be achieved by iterating the above procedure.
Example 120 In the above example, we found that v4 = v1 + v2 . In this case,
any expression for a vector as a linear combination involving v4 can be turned into a
combination without v4 by making the substitution v4 = v1 + v2 .
Then:
S = span{1 + t, 1 + t2 , t + t2 , 2 + t + t2 , 1 + t + t2 }
= span{1 + t, 1 + t2 , t + t2 , 1 + t + t2 }.
Now we notice that 1 + t + t2 = 12 (1 + t) + 12 (1 + t2 ) + 12 (t + t2 ). So the vector
1 + t + t2 = v5 is also extraneous, since it can be expressed as a linear combination of
the remaining three vectors, v1 , v2 , v3 . Therefore
S = span{1 + t, 1 + t2 , t + t2 }.
In fact, you can check that there are no (non-zero) solutions to the linear system
c1 (1 + t) + c2 (1 + t2 ) + c3 (t + t2 ) = 0.
Therefore the remaining vectors {1 + t, 1 + t2 , t + t2 } are linearly independent, and
span the vector space S. Then these vectors are a minimal spanning set, in the sense
that no more vectors can be removed since the vectors are linearly independent. Such
a set is called a basis for S.

209
210 Linear Independence

10.4 Review Problems


Reading Problems 1 ,2
Testing for linear independence 3, 4
Webwork:
Gaussian elimination 5
Spanning and linear independence 6

1. Let B n be the space of n ⇥ 1 bit-valued matrices (i.e., column vectors)


over the field Z2 . Remember that this means that the coefficients in
any linear combination can be only 0 or 1, with rules for adding and
multiplying coefficients given here.

(a) How many di↵erent vectors are there in B n ?


(b) Find a collection S of vectors that span B 3 and are linearly inde-
pendent. In other words, find a basis of B 3 .
(c) Write each other vector in B 3 as a linear combination of the vectors
in the set S that you chose.
(d) Would it be possible to span B 3 with only two vectors?

Hint

2. Let ei be the vector in Rn with a 1 in the ith position and 0’s in every
other position. Let v be an arbitrary vector in Rn .

(a) Show that the collection {e1 , . . . , en } is linearly independent.


P
(b) Demonstrate that v = ni=1 (v ei )ei .
(c) The span{e1 , . . . , en } is the same as what vector space?

3. Consider the ordered set of vectors from R3


00 1 0 1 0 1 0 11
1 2 1 1
@@2A , @4A , @0A , @4AA
3 6 1 5

(a) Determine if the set is linearly independent by using the vectors


as the columns of a matrix M and finding RREF(M ).

210
10.4 Review Problems 211

(b) If possible, write each vector as a linear combination of the pre-


ceding ones.
(c) Remove the vectors which can be expressed as linear combinations
of the preceding vectors to form a linearly independent ordered set.
(Every vector in your set set should be from the given set.)

4. Gaussian elimination is a useful tool to figure out whether a set of


vectors spans a vector space and if they are linearly independent.
Consider a matrix M made from an ordered set of column vectors
(v1 , v2 , . . . , vm ) ⇢ Rn and the three cases listed below:

(a) RREF(M ) is the identity matrix.


(b) RREF(M ) has a row of zeros.
(c) Neither case (a) or (b) apply.

First give an explicit example for each case, state whether the col-
umn vectors you use are linearly independent or spanning in each case.
Then, in general, determine whether (v1 , v2 , . . . , vm ) are linearly inde-
pendent and/or spanning Rn in each of the three cases. If they are
linearly dependent, does RREF(M ) tell you which vectors could be
removed to yield an independent set of vectors?

211
212 Linear Independence

212
Basis and Dimension
11
In chapter 10, the notions of a linearly independent set of vectors in a vector
space V , and of a set of vectors that span V were established; any set of
vectors that span V can be reduced to some minimal collection of linearly
independent vectors; such a minimal set is called a basis of the subspace V .
Definition Let V be a vector space. Then a set S is a basis for V if S is
linearly independent and V = span S.
If S is a basis of V and S has only finitely many elements, then we say
that V is finite-dimensional. The number of vectors in S is the dimension
of V .
Suppose V is a finite-dimensional vector space, and S and T are two dif-
ferent bases for V . One might worry that S and T have a di↵erent number of
vectors; then we would have to talk about the dimension of V in terms of the
basis S or in terms of the basis T . Luckily this isn’t what happens. Later in
this chapter, we will show that S and T must have the same number of vec-
tors. This means that the dimension of a vector space is basis-independent.
In fact, dimension is a very important characteristic of a vector space.
Example 121 Pn (t) (polynomials in t of degree n or less) has a basis {1, t, . . . , tn },
since every vector in this space is a sum
a 0 1 + a 1 t + · · · + a n tn , ai 2 R ,
so Pn (t) = span{1, t, . . . , tn }. This set of vectors is linearly independent; If the
polynomial p(t) = c0 1 + c1 t + · · · + cn tn = 0, then c0 = c1 = · · · = cn = 0, so p(t) is
the zero polynomial. Thus Pn (t) is finite dimensional, and dim Pn (t) = n + 1.

213
214 Basis and Dimension

Theorem 11.0.1. Let S = {v1 , . . . , vn } be a basis for a vector space V .


Then every vector w 2 V can be written uniquely as a linear combination of
vectors in the basis S:
w = c1 v1 + · · · + cn vn .

Proof. Since S is a basis for V , then span S = V , and so there exist con-
stants ci such that w = c1 v1 + · · · + cn vn .
Suppose there exists a second set of constants di such that

w = d 1 v1 + · · · + d n vn .

Then

0V = w w
= c1 v1 + · · · + cn vn d 1 v1 ··· d n vn
= (c1 d1 )v1 + · · · + (cn dn )vn .

If it occurs exactly once that ci 6= di , then the equation reduces to 0 =


(ci di )vi , which is a contradiction since the vectors vi are assumed to be
non-zero.
If we have more than one i for which ci 6= di , we can use this last equation
to write one of the vectors in S as a linear combination of other vectors in S,
which contradicts the assumption that S is linearly independent. Then for
every i, ci = di .

Proof Explanation

Remark This theorem is the one that makes bases so useful–they allow us to convert
abstract vectors into column vectors. By ordering the set S we obtain B = (v1 , . . . , vn )
and can write 0 11 0 11
c c
B .. C B .. C
w = (v1 , . . . , vn ) @ . A = @ . A .
cn cn B
Remember that in general it makes no sense to drop the subscript B on the column
vector on the right–most vector spaces are not made from columns of numbers!

214
215

Worked Example

Next, we would like to establish a method for determining whether a


collection of vectors forms a basis for Rn . But first, we need to show that
any two bases for a finite-dimensional vector space has the same number of
vectors.
Lemma 11.0.2. If S = {v1 , . . . , vn } is a basis for a vector space V and
T = {w1 , . . . , wm } is a linearly independent set of vectors in V , then m  n.
The idea of the proof is to start with the set S and replace vectors in S
one at a time with vectors from T , such that after each replacement we still
have a basis for V .

Reading homework: problem 1

Proof. Since S spans V , then the set {w1 , v1 , . . . , vn } is linearly dependent.


Then we can write w1 as a linear combination of the vi ; using that equation,
we can express one of the vi in terms of w1 and the remaining vj with j 6=
i. Then we can discard one of the vi from this set to obtain a linearly
independent set that still spans V . Now we need to prove that S1 is a basis;
we must show that S1 is linearly independent and that S1 spans V .
The set S1 = {w1 , v1 , . . . , vi 1 , vi+1 , . . . , vn } is linearly independent: By
the previous theorem, there was a unique way to express w1 in terms of
the set S. Now, to obtain a contradiction, suppose there is some k and
constants ci such that

vk = c0 w 1 + c1 v1 + · · · + ci 1 vi 1 + ci+1 vi+1 + · · · + cn vn .

Then replacing w1 with its expression in terms of the collection S gives a way
to express the vector vk as a linear combination of the vectors in S, which
contradicts the linear independence of S. On the other hand, we cannot
express w1 as a linear combination of the vectors in {vj |j 6= i}, since the
expression of w1 in terms of S was unique, and had a non-zero coefficient for
the vector vi . Then no vector in S1 can be expressed as a combination of
other vectors in S1 , which demonstrates that S1 is linearly independent.
The set S1 spans V : For any u 2 V , we can express u as a linear com-
bination of vectors in S. But we can express vi as a linear combination of

215
216 Basis and Dimension

vectors in the collection S1 ; rewriting vi as such allows us to express u as


a linear combination of the vectors in S1 . Thus S1 is a basis of V with n
vectors.
We can now iterate this process, replacing one of the vi in S1 with w2 ,
and so on. If m  n, this process ends with the set Sm = {w1 , . . . , wm ,
vi1 , . . . , vin m }, which is fine.
Otherwise, we have m > n, and the set Sn = {w1 , . . . , wn } is a basis
for V . But we still have some vector wn+1 in T that is not in Sn . Since Sn
is a basis, we can write wn+1 as a combination of the vectors in Sn , which
contradicts the linear independence of the set T . Then it must be the case
that m  n, as desired.

Worked Example

Corollary 11.0.3. For a finite-dimensional vector space V , any two bases


for V have the same number of vectors.

Proof. Let S and T be two bases for V . Then both are linearly independent
sets that span V . Suppose S has n vectors and T has m vectors. Then by
the previous lemma, we have that m  n. But (exchanging the roles of S
and T in application of the lemma) we also see that n  m. Then m = n,
as desired.

Reading homework: problem 2

11.1 Bases in Rn.


In review question 2, chapter 10 you checked that
80 1 0 1 0 19
>
> 1 0 0 > >
>
< B0 C B1 C B0 C> =
B C B C B C
Rn = span B ..C , B ..C , . . . , B ..C ,
>
> @ .A @ .A @ . A>>
>
: 0 >
0 1 ;

and that this set of vectors is linearly independent. (If you didn’t do that
problem, check this before reading any further!) So this set of vectors is

216
11.1 Bases in Rn . 217

a basis for Rn , and dim Rn = n. This basis is often called the standard
or canonical basis for Rn . The vector with a one in the ith position and
zeros everywhere else is written ei . (You could also view it as the function
{1, 2, . . . , n} ! R where ei (j) = 1 if i = j and 0 if i 6= j.) It points in the
direction of the ith coordinate axis, and has unit length. In multivariable
calculus classes, this basis is often written {î, ĵ, k̂} for R3 .
Note that it is often convenient to order basis elements, so rather than
writing a set of vectors, we would write a list. This is called an ordered
basis. For example, the canonical ordered basis for Rn is (e1 , e2 , . . . , en ). The
possibility to reorder basis vectors is not the only way in which bases are
non-unique.

Bases are not unique. While there exists a unique way to express a vector in terms
of any particular basis, bases themselves are far from unique. For example, both of
the sets ⇢✓ ◆ ✓ ◆ ⇢✓ ◆ ✓ ◆
1 0 1 1
, and ,
0 1 1 1
are bases for R2 . Rescaling any vector in one of these sets is already enough to show
that R2 has infinitely many bases. But even if we require that all of the basis vectors
have unit length, it turns out that there are still infinitely many bases for R2 (see
review question 3).

To see whether a set of vectors S = {v1 , . . . , vm } is a basis for Rn , we have


to check that the elements are linearly independent and that they span Rn .
From the previous discussion, we also know that m must equal n, so lets
assume S has n vectors. If S is linearly independent, then there is no non-
trivial solution of the equation

0 = x1 v1 + · · · + xn vn .

Let M be a matrix whose columns are the vectors vi and X the column
vector with entries xi . Then the above equation is equivalent to requiring
that there is a unique solution to

MX = 0 .

To see if S spans Rn , we take an arbitrary vector w and solve the linear


system
w = x1 v1 + · · · + xn vn

217
218 Basis and Dimension

in the unknowns xi . For this, we need to find a unique solution for the linear
system M X = w.
Thus, we need to show that M 1 exists, so that
1
X=M w
is the unique solution we desire. Then we see that S is a basis for Rn if and
only if det M 6= 0.
Theorem 11.1.1. Let S = {v1 , . . . , vm } be a collection of vectors in Rn .
Let M be the matrix whose columns are the vectors in S. Then S is a basis
for V if and only if m is the dimension of V and
det M 6= 0.
Remark Also observe that S is a basis if and only if RREF(M ) = I.
Example 122 Let
⇢✓ ◆ ✓ ◆ ⇢✓ ◆ ✓ ◆
1 0 1 1
S= , and T = , .
0 1 1 1
✓ ◆
1 0
Then set MS = . Since det MS = 1 6= 0, then S is a basis for R2 .
0 1
✓ ◆
1 1
Likewise, set MT = . Since det MT = 2 6= 0, then T is a basis for R2 .
1 1

11.2 Matrix of a Linear Transformation (Redux)


Not only do bases allow us to describe arbitrary vectors as column vectors,
they also permit linear transformations to be expressed as matrices. This
is a very powerful tool for computations, which is covered in chapter 7 and
reviewed again here.
Suppose we have a linear transformation L : V ! W and ordered input
and output bases E = (e1 , . . . , en ) and F = (f1 , . . . , fm ) for V and W re-
spectively (of course, these need not be the standard basis–in all likelihood
V is not Rn ). Since for each ej , L(ej ) is a vector in W , there exist unique
numbers mij such that
0 1
m1j
B .. C
L(ej ) = f1 m1j + · · · + fm mm
j = (f1 , . . . , fm ) @ . A .
mmj

218

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy