Matrices Lecture Notes 2
Matrices Lecture Notes 2
III. Matrices
Definition III.1 An m × n matrix is a set of numbers arranged in a rectangular array having m rows and
n columns. It is written
A11 A12 · · · A1n
A21 A22 · · · A12
A= ... .. .. ..
. . .
Am1 Am2 · · · Amn
There are two important special cases. A 1 × n matrix (that is, a matrix with 1 row) is called a row vector.
An m × 1 matrix (that is, a matrix with 1 column) is called a column vector. Our convention will be that
row indices are always written before column indices. As a memory aid, I think of matrices as being RC
(Roman Catholic or rows before columns).
Definitions
1. Equality. For any two matrices A and B
A = B ⇐⇒ (a) A and B have the same number of rows and the same number of columns and
(b) Aij = Bij for all i, j
That is, the entry in row i, column j of the matrix A + B is defined to be the sum of the corresponding
entries in A and B. Note: The sum A + B is only defined if A and B have the same number of rows and
the same number of columns.
3. Scalar multiplication. For any number s and any m × n matrix A
For example
1 2 0 1 2×1+0 2×2+1 2 5
2 + = =
0 3 1 1 2×0+1 2×3+1 1 7
Note (a) AB is only defined if the number of columns of A is the same as the number of rows of B. If A is
m × p and B is p × n, then AB is m × n.
(b) (AB)ik is the dot product of the ith row of A (viewed as a row vector) and the k th column of B
(c) Here is a memory aid. If you write the first factor of AB to the left of AB and the second factor
above AB, then each entry (AB)ik of AB is built from the entries Aij j = 1, 2, · · · of A that are
B is 3 × n
A is m × 3 AB is m × n
directly to its left and the entries Bjk j = 1, 2, · · · of B that are directly above it. These entries
are multiplied in pairs, Aij Bjk , j = 1, 2,P
· · ·, starting as far from AB as possible and then the
products are added up to yield (AB)ik = j Aij Bjk .
(d) If A is a square matrix, then AAA · · · A (n factors) makes sense and is denoted An .
(e) At this stage we have no idea why it is useful to define matrix multiplication in this way. We’ll
get some first hints shortly.
Example III.2 Here is an example of the product of a 2 × 3 matrix with a 3 × 2 matrix, yielding a 2 × 2
matrix
1 3
0 1 2 0×1+1×2+2×3 0×3+1×2+2×1 8 4
2 2 = =
3 4 5 3×1+4×2+5×3 3×3+4×2+5×1 26 22
3 1
Example III.3 Here is an example of the product of a 3 × 3 matrix with a 3 × 1 matrix, yielding a 3 × 1
matrix
1 1 1 x1 x1 + x2 + x3
1 2 3 x2 = x1 + 2x2 + 3x3
2 3 1 x3 2x1 + 3x2 + x3
Hence we may very compactly write the system of equations
x1 + x2 + x3 = 4
x1 + 2x2 + 3x3 = 9
2x1 + 3x2 + x3 = 7
that we dealt with in Example II.2, as A~x = ~b where
1 1 1 x1 4
A = the matrix of coefficients 1 2 3 ~x = the column vector x2 ~b = the column vector 9
2 3 1 x3 7
These properties are all almost immediate consequences of the definitions. For example to verify property
9, it suffices to write down the definitions of the two sides of property 9
X X
[A(B + C)]ik = Aij (B + C)jk = Aij (Bjk + Cjk )
j j
X X
[AB + AC]ik = [AB]ik + [AC]ik = Aij Bjk + Aij Cjk
j j
A consequence of this “unproperty” is that (A − B)(A + B) need not equal A2 − B 2 . Multiplying out
(A − B)(A + B) carefully gives AA + AB − BA − BB. The middle two terms need not cancel. For
example
0 0 0 1 0 0 0 0 0 0
A= B= A2 − B 2 = − =
1 0 0 0 0 0 0 0 0 0
0 −1 0 1 −1 0
A−B = A+B = (A − B)(A + B) =
1 0 1 0 0 1
14. In general AB may be 0 even if A and B are both nonzero. For example
1 1 1 1 0 0
A= 6 0, B =
= 6= 0, AB = = 0,
1 1 −1 −1 0 0
A consequence of this is that AB = AC does not force B = C even if every entry of A is nonzero. For
example
1 −1 1 2 3 4 0 0
A= , B= 6= C = and yet AB = AC =
−2 2 1 2 3 4 0 0
Example III.4 (The Form of the General Solution of Systems of Linear Equations Revisited)
We have just seen that any system of linear equations can be written
A~x = ~b
where A is the matrix of coefficients, ~x is the column vector of unknowns, ~b is the column vector of right
hand sides and A~x is the matrix product of A and ~x. We also saw in §II.3 that, if the ranks of [A] and [A | ~b]
are the same, the general solution to this system is of the form
~x = ~u + c1~v1 + · · · + cn−ρ~vn−ρ
where n is the number of unknowns (that is, the number of components of ~x, or equivalently, the number of
columns of A), ρ is the rank of A and c1 , · · · , cn−ρ are arbitrary constants. That ~u + c1~v1 + · · · + cn−ρ~vn−ρ
is a solution of A~x = ~b for all values of c1 , · · · , cn−ρ means that
for all values of c1 , · · · , cn−ρ . By properties 9 and 12 of matrix operations, this implies that
In other words ~u is a solution of A~x = ~b and each of ~v1 , · · · , ~vn−ρ is a solution of A~x = ~0.
1) Define
−1 2 2
1 2 3
A= B = −3 1 C = [ 2 −2 0 ] D = −11
1 2 1
−2 1 2
Compute all pairwise products (AA, AB, AC, AD, BA, · · ·) that are defined.
2) Compute A2 = AA and A3 = AAA for
0 a b 1 0 a
a) A = 0 0 c b) A = 0 1 0
0 0 0 0 0 1
1 1
3) Let A = .
0 1
a) Find A2 , A3 , A4 .
b) Find Ak for all positive integers.
c) Find eAt . (Part of this problem is to invent a reasonable definition of eAt .)
d) Find a square root of A. That is, find a matrix B obeying B 2 = A.
e) Find all square roots of A.
4) Compute Ak for k = 2, 3, 4 when
0 1 0 0
0 0 1 0
A=
0 0 0 1
0 0 0 0
Definition III.5 A transformation (a.k.a. map, a.k.a. function) T from IRn to IRm is a rule which
assigns to each vector ~x in IRn a vector T (~x) in IRm . T should be thought of as a machine: if you put a
~x
T (~x)
vector ~x into the input hopper, it spits out a new vector T (~x). A transformation is said to be linear if
T (s~x + t~y ) = sT (~x) + tT (~y) for all numbers s, t and vectors ~x, ~y .
Later in this section, we shall see that for each linear transformation T (~x) there is a matrix MT such that
T (~x) is the matrix product of MT times the column vector ~x. In other words, such that T (~x) = MT ~x. First,
however, we look at a number of examples, both of transformations that are not linear and transformations
that are linear.
The map
T ([x1 , x2 ]) = [0, x22 ]
is not linear, because the two quantities
are not equal whenever x2 6= 0. Another example of a map that is not linear is
Example III.6 (Translation) Define tran~v (~x) be the vector gotten by translating the head of the arrow
~x by ~v (while leaving the tail of the arrow fixed). In equations, tran~v (~x) = ~x + ~v . If translation were linear
tran~v (~x)
~v
~x
would be equal for all s and t. But if ~v 6= ~0 and s + t 6= 1, the two expressions are not equal.
We have just seen an example of a geometric operation that is not linear. Many other geometric
operations are linear maps. As a result, linear maps play a big role in computer graphics. Here are some
examples.
Example III.7 (Projection) Define projφ (~x) to be the projection of the vector ~x on the line in IR2 that
passes through the origin at an angle φ from the x–axis. The vector b̂ = [cos φ, sin φ] is a unit vector
~x
projφ (~x)
φ
that lies on the line. So projφ (~x) must have direction [cos φ, sin φ] (or its negative) and must have length
k~xk cos θ = ~x · b̂, where θ is the angle between ~x and b̂. The unique vector with the right direction and length
is
projφ (~x) = (~x · b̂) b̂
It is easy to verify that this really is a linear transformation:
projφ (s~x + t~y) = (s~x + t~y) · b̂ b̂
= s(~x · b̂) b̂ + t(~y · b̂) b̂
= s projφ (~x) + t projφ (~y )
Writing both projφ (~x) and ~x as column vectors
cos φ x1 cos2 φ + x2 sin φ cos φ
projφ (~x) = (~x · b̂) b̂ = (x1 cos φ + x2 sin φ) =
sin φ x1 sin φ cos φ + x2 sin2 φ
cos2 φ sin φ cos φ x1 1 1 + cos 2φ sin 2φ x1
= =
sin φ cos φ sin2 φ x2 2 sin 2φ 1 − cos 2φ x2
where we have used the double angle trig identities
sin(2φ) = 2 sin φ cos φ
cos(2φ) = cos2 φ − sin2 φ = 2 cos2 φ − 1 = 1 − 2 sin2 φ
Notice that this is the matrix product of a matrix that depends only on φ (not on ~x) times the column vector
~x.
Example III.8 (Reflection) Define reflφ (~x) to be the reflection of the vector ~x on the line in IR2 that
passes through the origin at an angle φ from the x–axis. You can get from ~x to reflφ (~x) by first walking
reflφ (~x)
projφ (~x)
~x ~x
from ~x to projφ (~x) and continuing in the same direction an equal distance on the far side of the line. In
terms of vectors, to get from ~x to projφ (~x), you have to add the vector projφ (~x) − ~x to ~x. To continue an
equal distance in the same direction, you have to add a second copy of projφ (~x) − ~x. So
reflφ (~x) = ~x + 2[projφ (~x) − ~x] = 2projφ (~x) − ~x
We may, once again, write this as the matrix product of a matrix that depends only on φ (not on ~x) times
the column vector ~x.
1 1 + cos 2φ sin 2φ x1 x1 1 + cos 2φ sin 2φ x1 1 0 x1
reflφ (~x) = 2 − = −
2 sin 2φ 1 − cos 2φ x2 x2 sin 2φ 1 − cos 2φ x2 0 1 x2
1 + cos 2φ sin 2φ 1 0 x1 cos 2φ sin 2φ x1
= − =
sin 2φ 1 − cos 2φ 0 1 x2 sin 2φ − cos 2φ x2
Example III.9 (Rotation) Define rotφ (~x) to be the result of rotating the vector ~x by an angle φ about
the origin.
rotφ (~x)
φ ~x
θ
If we denote by r the length of ~x and by θ the angle between ~x and the x–axis, then
r cos θ
~x =
r sin θ
Note that in each of examples 7, 8 and 9 there was a matrix A such that the map could be written
T (~x) = A~x, where A~x is the matrix product of the matrix A times the column vector ~x. Every map of the
form T (~x) = A~x is automatically linear, because A(s~x + t~y ) = s(A~x) + t(A~y ) by properties (9) and (12) of
matrix operations. We shall now show that, conversely, for each linear map T (~x) there is a matrix A such
that T (~x) = A~x.
First we consider the case that T (~x) is a linear map from IR2 (that is, two component vectors) to IR2 .
Any vector ~x in IR2 can be written
x1 1 0
= x1 + x2 e1 + x2 b
= x1 b e2
x2 0 1
where
1 0
b
e1 = b
e2 =
0 1
Because T is linear
e1 + x2 b
T (~x) = T (x1 b e2 ) = x1 T (b
e1 ) + x2 T (b
e2 )
Define the numbers a11 , a12 , a21 , a22 by
a11 a12
T (b
e1 ) = T (b
e2 ) =
a21 a22
Then
a11 a12 a11 x1 + a12 x2 a a12 x1
T (~x) = x1 T (b
e1 ) + x2 T (b
e2 ) = x1 + x2 = = 11
a21 a22 a21 x1 + a22 x2 a21 a22 x2
The same construction works for linear transformations from IRm to IRn . Define b ei to be the column
vector all of whose entries are zero, except for the ith , which is one. Note that every m component vector
can be written
x1 1 0 0
x2 0 1 0
~x =
... = x1 ... + x2 ... + · · · + +xm ... = x1 b e2 + · · · xm b
e1 + x2 b em
xm 0 0 1
Example III.10 Let, as in Example III.9, rotφ be the linear transformation which rotates vectors in the
plane by φ. From the figure
b
e2
sin φ
rotφ (b
e1 ) rotφ (b
e2 )
sin φ φ cos φ
φ b
e1
cos φ
we see that
cos φ − sin φ
rotφ (b
e1 ) = rotφ (b
e2 ) =
sin φ cos φ
Hence the matrix which implements rotatiom is
cos φ − sin φ
rotφ (b
e1 ) rotφ (b
e2 ) =
sin φ cos φ
This is exactly the matrix that we found in Example III.9. There is no need to memorize this matrix. This
example has shown how to rederive it very quickly.
This formula T (~x) = A~x is the reason we defined matrix multiplication the way we did. More generally,
if S and T are two linear transformations with associated matrices MS and MT respectively (meaning that
T (~x) = MT ~x and S(~y ) = MS ~y ), then the map constructed by first applying T and then applying S obeys
S T (~x) = MS T (~x) = MS MT ~x
so that the matrix associated with the composite map S T (~x) is the matrix product MS MT of MS and
MT . It is traditional to use the same symbol to stand for both a linear transformation and its associated
matrix. For example, the matrix associated with the linear transformation T (~x) is traditionally denoted T
as well, so that T (~x) = T ~x.
Example III.11 Let, as in Example III.8, reflφ be the linear transformation which reflects vectors in the
line through the origin that makes an angle φ with respect to the x–axis. Define the linear transformation
T (~x) = refl π4 refl0 (~x)
From the figure
b
e2 b
e2 = refl π4 (b
e1 )
b
e1 = refl0 (b
e1 ) b
e1 = refl π4 (b
e2 )
refl0 (b
e2 )
we see that
1 0 0 1
refl0 (b
e1 ) = refl0 (b
e2 ) = refl π4 (b
e1 ) = refl π4 (b
e2 ) =
0 −1 1 0
so that
1 0 x1 0 1 y1
refl0 (~x) = refl0 (b
e1 ) refl0 (b
e2 ) ~x = refl (~y ) = refl 4 (b
π π e1 ) refl 4 (b
π e2 ) ~y =
0 −1 x2 4 1 0 y2
is a rotation in IR3 . What axis does it rotate about and what is the angle of rotation?
8) Find the matrix which first reflects about the line in IR2 that makes an angle φ with the x–axis and then
reflects about the line that makes an angle θ with the x–axis. Give another geometric interpretation of
this matrix.
“Random walks”, or more precisely “discrete random walks”, refer to a class of problems in which you
are given the following information.
(H1 ) The system of interest has a finite number of possible states, labelled 1, 2, 3, · · ·, S.
(H2 ) We are interested in the system at times t = 0, 1, 2, 3, · · ·.
(H3 ) If at some time n the system is in some state j, then, at time n + 1 the system is in state i with
probability pi,j . This is the case for each i = 1, 2, 3, · · · , S. The pi,j ’s are given numbers that
PS
obey i=1 pi,j = 1.
(Here Hp stands for “hypothesis number p”.) Let the components of the column vector
x
n,1
xn,2
~xn =
..
.
xn,S
be the probabilities that, at time n, the system is in state 1, state 2, · · ·, state S, respectively. That is,
xn,j denotes the probability that, at time n, the system is in state j. Rather than using the language of
probability, you can imagine that the system consists of piles of sand located at sites 1 through S. There is
a total of one ton of sand. At time n, the amount of sand at site 1 is xn,1 , the amount of sand at site 2 is
xn,2 and so on. According to (H3 ), between time n and time n + 1, the fraction pi,j of the sand at site j is
moved to site i. So pi,j xn,j tons of sand are moved from site j to site i by time n + 1. The total amount of
sand at site i at time n + 1 is the sum, over j from 1 to S, of the amount pi,j xn,j of sand moved to site i
from site j. Hence
S
X
xn+1,i = pi,j xn,j
j=1
~x1 = P ~x0 ~x2 = P ~x1 = P 2 ~x0 ~x3 = P ~x2 = P 3 ~x0 ··· ~xn = P n ~x0 ···
Example III.12 (Gambler’s Ruin) I will give two descriptions of this random walk. The first description
motivates the name “Gambler’s Ruin”.
Imagine that you are a gambler. At time zero you walk into a casino with a stake of $d. At that time
the house has $(D − d). (So the total amount of money in play is $D.) At each time 1, 2, · · ·, you
play a game of chance in which you win $1 from the house with probability w and loose $1 to the house
with probability ℓ = 1 − w. This continues until either you have $0 (in which case you are broke) or
you have $D (in which case the house is broke). That is, if at time n you have $0, then at time n + 1
you again have $0 and if at time n you have $D, then at time n + 1 you again have $D. The transition
probabilities for Gambler’s Ruin are
1 if j = 0 and i = 0
0 if j = 0 and i 6= 0
w if 0 < j < D and i = j + 1
pi,j = ℓ=1−w if 0 < j < D and i = j − 1
0 if 0 < j < D and i 6= j − 1, j + 1
1 if j = D and i = D
0 if j = D and i 6= D
0 1 2 d D
The second description, of the same mathematical system, motivates why it is called a “random walk”.
At time zero, a drunk is at a location d. Once each unit of time, the drunk staggers to the right one unit
with probability w and staggers to the left one unit with probability ℓ = 1 − w. This continues until the
drunk reaches either the bar at 0 or the bar at D. Once the drunk reaches a bar, he remains there for
ever.
Here is a table giving the time evolution of Gambler’s Ruin, assuming that d = 2, D = 8, w = 0.49 and
ℓ = 0.51. (Entries are rounded to three decimal places.)
0 $0 0 0.260 0.260 0.390 0.390 0.471
0 $1 0.51 0 0.255 0 0.159 0
1 $2 0 0.500 0 0.312 0 0.218
0 $3 0.49 0 0.367 0 0.275 0
0 $4 0 0.240 0 0.240 0 0.210
~x0 = 0 $5 ~x1 = 0 ~x2 = 0 ~x3 = 0.118 ~x4 = 0 ~x5 = 0.147 ~x6 = 0
0 $6 0 0 0 0.058 0 0.086
0 $7 0 0 0 0 0.028 0
0 $8 0 0 0 0 0 0.014
0 $9 0 0 0 0 0 0
0 $10 0 0 0 0 0 0
Here, for example, ~x0 says that the gambler started with $2 at time 0. The vector ~x1 says that, at time 1,
he has $1 with probability 0.51 and $3 with probability 0.49. The vector ~x2 says that, at time 2, he has $0
with probability 0.51 × 0.51 = 0.2601, $2 with probability 0.51 × 0.49 + 0.49 × 0.51 = 0.4998 and $4 with
probability 0.49 × 0.49 = 0.2401.
Definition III.13 The transpose of the m × n matrix A is the n × m matrix At with matrix elements
The rows of At are the columns of A and the columns of At are the rows of A.
Example III.14 The transpose of the 2 × 3 matrix A on the left below is the 3 × 2 matrix At on the right
below.
1 4
1 2 3
A= At = 2 5
4 5 6
3 6
To see this, we just compute the left and right hand sides
m
X m
X P
n X
~x · (A~y ) = xi (A~y )i = xi Ai,j yj = xi Ai,j yj
i=1 i=1 j=1 1≤i≤m
1≤j≤n
n
X n m
X X
P
(At ~x) · ~
y= (At ~x)j yj = Atj,i xi yj = xi Atj,i yj
j=1 j=1 i=1 1≤i≤m
1≤j≤n
and observe that they are the same, because of the definition of At .
2. If A is any ℓ × m and B is any m × n matrix, then
(AB)t = B t At
Be careful about the order of matrix multiplication here. To see this, we also just compute the left and
right hand sides
Xm
(AB)ti,j = (AB)j,i = Aj,k Bk,i
k=1
m
X Xm
(B t At )i,j = t
Bi,k Atk,j = Bk,i Aj,k
k=1 k=1
Suppose A is a matrix. What is A−1 ? It’s the thing you multiply A by to get 1. What is 1?
Definition III.15 The m × m identity matrix Im (generally the subscript m is dropped from the notation)
is the m × m matrix whose (i, j) matrix element is 1 if i = j and 0 if i 6= j.
For example
1 0
I2 =
0 1
The reason we call this the identity matrix is that, for any m × n matrix A
Im A = AIn = A
It is easy to check that this is true. For example, fix anyPi and k with 1 ≤ i ≤ m, 1 ≤ k ≤ n. By the
m
definition of Iij the only nonzero term in the sum (IA)ik = j=1 Iij Ajk is that with j = i. Furthermore the
Iii that appears in that term takes the value one so (IA)ik = Iii Aik = Aik , as desired.
and see if we can find an inverse for it. Let’s call the inverse
X X′
B=
Y Y′
The matrix A is to be treated as a given matrix. At this stage, B is unknown. To help keep straight what
is known and what isn’t, I’ve made all of the knowns lower case and all of the unknowns upper case. To be
an inverse, B must obey
a b X X′ aX + bY aX ′ + bY ′ 1 0
AB = = =
c d Y Y′ cX + dY cX ′ + dY ′ 0 1
aX + bY = 1 (1)
cX + dY = 0 (2)
aX ′ + bY ′ = 0 (1’)
cX ′ + dY ′ = 1 (2’)
If you go ahead and multiply out the matrices on the right, you see that this condition is indeed satisfied.
We conclude that, if det A = ad − bc 6= 0, the inverse of A exists and is
−1
a b 1 d −b
=
c d ad − bc −c a
On the other hand, if det A = ad − bc = 0, equations d(1)−b(2), d(1’)−b(2’), c(1)−a(2) and c(1’)−a(2’)
force d = −b = c = −a = 0 and then the left hand side of equation (1) is zero for all values of X and Y so
that equation (1) cannot be satisfied and A cannot have an inverse.
Properties of Inverses
−1 t
3. If A is invertible, then so is At and At = A−1 .
Proof: Let’s use B to denote the inverse of A (so there won’t be so many superscripts around.) By
definition
AB = BA = I
These three matrices are the same. So their transposes are the same. Since (AB)t = At B t , (BA)t =
At B t and I t = I, we have
B t At = At B t = I t = I
which is exactly the definition of “the inverse of At is B t ”.
4. Suppose that A is invertible. Then A~x = ~b ⇐⇒ ~x = A−1~b.
Proof: Multiplying the first equation by A−1 gives the second and multiplying the second by A gives
the first.
WARNING: This property is conceptually important. But it is usually computationally much more
efficient to solve A~x = ~b by Gaussian elimination than it is to find A−1 and then multiply A−1~b.
5. Only square matrices can be invertible.
Outline of Proof: Let A be an invertible m × n matrix. Then there exists an n × m matrix B such that
AB = I, where I is the m × m identity matrix. We shall see in the next section, that the j th column
of B is solves the system of equations A~x = b ej is the j th column of I. Because the identity
ej , where b
matrix I has rank m, there must exist some 1 ≤ j ≤ m such that the augmented matrix [A|b ej ] also
has rank m. By property 1 above, the corresponding system of equations must have a unique solution.
Consequently, the number of unknowns, n, must equal the rank, m.
6. If A is an n × n matrix then the following are equivalent:
(i) A is invertible.
(ii) For each vector ~b, the system of equations A~x = ~b has exactly one solution.
(This argument is repeated, in more detail, with examples, in the next section.) Since A
has rank n and AB ~i = b
ei is a system of n linear equations in n unknowns, it has a unique
solution.
(v)⇒(vi): Assume AB = I. Then B~x = ~0 ⇒ AB~x = ~0 ⇒ ~x = ~0. So condition (ii) is applicable
to B, which implies that (iii) and subsequently (iv) are also applicable to B. This implies
that there is a matrix C obeying BC = I. But C = (AB)C = A(BC) = A so BA = I.
(vi)⇒(i): Assume that BA = I. Then A~x = ~0 ⇒ BA~x = ~0 ⇒ ~x = ~0. So (ii) applies to A. So (iv)
and (v) apply to A.
Items (ii), (iii), (iv), (v) and (vi) of property 4 are all tests for invertibility. We shall get another test, once we
have generalized the definition of determinant to matrices larger than 3 × 3: a square matrix A is invertible
if and only if det A 6= 0.
Example III.18 Let A be the matrix which implements reflection in the line y = x, let B be the matrix
that implements reflection in the x axis and let C be the matrix for the linear transformation that first
refelcts in the x axis and then reflects in the line y = x. We saw in Example III.11 that
0 1 1 0 0 −1
A= B= C = AB =
1 0 0 −1 1 0
Now the inverse of any reflection is itself. (That is, executing the same reflection twice returns every vector
to its original location.) So
0 1 1 0
A−1 = A = B −1 = B =
1 0 0 −1
(Go ahead and check for yourself that the matrix products AA and BB are both I.) So
−1 −1 −1 −1 1 0 0 1 0 1
C = (AB) = B A = =
0 −1 1 0 −1 0
Suppose that we are given an n × n matrix A and we wish to find a matrix B obeying AB = I. Denote
~ i the (as yet unknown) ith column of B and by b
by B ei the column vector having all entries zero except for
a one in the ith row. For example, when n = 2
b b ~ 1 = b11 ~ 2 = b12 1 0
B = 11 12 B B b
e1 = b
e2 =
b21 b22 b21 b22 0 1
A[B ~2 · · · B
~1 B ~ n ] = [b
e1 b
e2 · · · b
en ]
A[B ~2 · · · B
~1 B ~ n ] = [AB
~ 1 AB
~ 2 · · · AB
~ n]
That is, the first column of AB is AB~ 1 . For example, when n = 2, the first column of
a11 a12 b11 b12 a11 b11 + a12 b21 a11 b12 + a12 b22
AB = =
a21 a22 b21 b22 a21 b11 + a22 b21 a21 b12 + a22 b22
is indeed identical to
~1 = a11 a12 b11 a11 b11 + a12 b21
AB =
a21 a22 b21 a21 b11 + a22 b21
For the two matrices A[B ~2 · · · B
~1 B ~ n ] = [AB
~ 1 AB
~ 2 · · · AB
~ n ] and [b
e1 b
e2 · · · b
en ] to be equal, all of their
columns must agree, so the requirement AB = I may be expressed as
~i = b
AB ei for i = 1, · · · , n
Recall that A and the b ~ i ’s are all unknown. We must solve n different
ei ’s are all given matrices and that the B
systems of linear equations. The augmented matrix for system number i is [A|b ei ]. We could apply Gauss
reduction separately to the n systems. But because the left hand sides of all n systems are the same, we
can solve the n systems simultaneously. We just form one big augmented matrix [A|b e1 b
e2 · · · b
en ]. This
augmented matrix is just short hand notation for the n systems of equations AB1 = b ~ ~
e1 , AB2 = b e2 , · · ·,
AB~n = ben . Here are two examples of this technique.
It is important to always remember that if we were to erase all columns to the right of the vertical line except
for the ith , we would have precisely the augmented matrix appropriate for the linear system AB ~i = bei . So
any row operation applied to the big augmented matrix [A|b e1 be2 ] really is a simultaneous application of the
same row operation to 2 different augmented matrices [A|ei ] at the same time. Row reduce in the usual way.
The row echelon (upper triangular) form of this augment matrix is
1 1 1 0
0 1 −1 1 (2) − (1)
We could backsolve the two systems of equations separately. But it is easier to treat the two at the same
time by further reducing the augmented matrix to reduced row echelon form.
1 0 2 −1 (1) − (2)
0 1 −1 1
What conclusion do we come to? Concentrate on, for example, the first column to the right of the vertical line.
In fact, mentally erase the second column to the right of the vertical line in all of the above computations.
Then the above row operations converted
1 1 1 1 0 2 2
A|b
e1 = to = I
1 2 0 0 1 −1 −1
Because row operations have no effect on the set of solutions of a linear system we can conclude that
~ 1 obeys ~1 = b ~1 = 2
B AB e1 if and only if it obeys IB
−1
~1 = B
~ 1 , we have that B
~1 = 2 ~ −1
Since I B . Similarly, B1 = . Thus
−1 1
−1 2 −1
A =
−1 1
which is exactly the matrix to the right of the vertical bar in the row reduced echelon form.
Example III.20 Now let’s compute the inverse of a matrix for which we do not already have a canned
formula. Let
2 −3 2 5
1 −1 1 2
A=
3 2 2 1
1 1 −3 1
Again note that if we were to erase all columns to the right of the vertical line except for the ith , we would
have precisely the augmented matrix appropriate for the linear system AB ~i = b
ei , so that any row operation
applied to the big augmented matrix [A|b e1 b
e2 · · · b
en ] really is a simultaneous application of the same row
operation to n different augmented matrices [A|ei ] at the same time. Row reduce in the usual way.
2 −3 2 5 1 0 0 0
(2) − 0.5(1) 0 0.5 0 −0.5 −0.5 1 0 0
(3) − 1.5(1) 0 6.5 −1 −6.5 −1.5 0 1 0
(4) − 0.5(1) 0 2.5 −4 −1.5 −0.5 0 0 1
2 −3 2 5 1 0 0 0
0 0.5 0 −0.5 −0.5 1 0 0
(3) − 13(2) 0 0 −1 0 5 −13 1 0
(4) − 5(2) 0 0 −4 1 2 −5 0 1
2 −3 2 5 1 0 0 0
0 0.5 0 −0.5 −0.5 1 0 0
0 0 −1 0 5 −13 1 0
(4) − 4(3) 0 0 0 1 −18 47 −4 1
Again, rather than backsolving the four systems individually, it is easier to do all four at the same time by
applying more row operations chosen to turn the left hand side into the identity matrix.
2 −3 2 5 1 0 0 0
0 0.5 0 −0.5 −0.5 1 0 0
−(3) 0 0 1 0 −5 13 −1 0
0 0 0 1 −18 47 −4 1
2 −3 2 5 1 0 0 0
2(2) + (4) 0 1 0 0 −19 49 −4 1
0 0 1 0 −5 13 −1 0
0 0 0 1 −18 47 −4 1
0.5[(1) + 3(2) − 2(3) − 5(4)]1 0 0 0 22 −57 5 −1
0 1 0 0 −19 49 −4 1
0 0 1 0 −5 13 −1 0
0 0 0 1 −18 47 −4 1
By exactly the same argument as we used at the end of Example III.19, the inverse is the matrix to the right
of the vertical bar in the row reduced echelon form. That is,
22 −57 5 −1
−19 49 −4 1
A−1 =
−5 13 −1 0
−18 47 −4 1
For n > 2, (in fact n ≥ 2) the determinant of an n × n matrix A, whose ij entry is denoted aij , is defined by
n
X
det A = (−1)1+j a1j det M1j
j=1
where M1j is the (n − 1) × (n − 1) matrix formed by deleting, from the original matrix A, the row and column
containing a1j . This formula is called “expansion along the top row”. There is one term in the formula for
each entry in the top row. The term is a sign times the entry times the determinant of the (n − 1) × (n − 1)
matrix obtained by deleting the row and column that contains the entry. The sign alternates, starting with
a +.
a11 a12 a13 a11 a12 a13 a11 a12 a13 a11 a12 a13
det a21 a22 a23 = a11 det a21 a22 a23 − a12 det a21 a22 a23 + a13 det a21 a22 a23
a31 a32 a33 a31 a32 a33 a31 a32 a33 a31 a32 a33
= a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 )
Example III.21
1 2 3
0 2 1 2 1 0
det 1
0 2 = 1 × det − 2 × det + 3 × det
2 1 3 1 3 2
3 2 1
= 1 × (0 − 4) − 2(1 − 6) + 3(2 − 0) = 12
Example III.22 In this example we compute, using the definition of the determinant,
1 2 3 4
3 2 1 4 2 1 4 3 1 4 3 2
4 3 2 1
det = det 3 3 3 − 2 det 2 3 3 + 3 det 2 3 3 − 4 det 2 3 3
2 3 3 3
8 9 12 7 9 12 7 8 12 7 8 9
7 8 9 12
As side computations, we evaluate the four 3 × 3 determinants
3 2 1
3 3 3 3 3 3
det 3 3 3 = 3 det − 2 det + det
9 12 8 12 8 9
8 9 12
= 3(36 − 27) − 2(36 − 24) + (27 − 24)
=6
4 2 1
3 3 2 3 2 3
det 2 3 3 = 4 det − 2 det + det
9 12 7 12 7 9
7 9 12
= 4(36 − 27) − 2(24 − 21) + (18 − 21)
= 27
4 3 1
3 3 2 3 2 3
det 2
3 3 = 4 det − 3 det + det
8 12 7 12 7 8
7 8 12
= 4(36 − 24) − 3(24 − 21) + (16 − 21)
= 34
4 3 2
3 3 2 3 2 3
det 2 3 3 = 4 det − 3 det + 2 det
8 9 7 9 7 8
7 8 9
= 4(27 − 24) − 3(18 − 21) + 2(16 − 21)
= 11
So
1 2 3 4
4 3 2 1
det = 6 − 2 × 27 + 3 × 34 − 4 × 11 = 10
2 3 3 3
7 8 9 12
This is clearly a very tedious procedure. We will develop a better one soon.
Property E
If two rows of an n × n matrix are exchanged, the determinant is multiplied by −1.
.. ..
. .
~ai ~aj
..
det . = − det ...
~aj ~ai
.. ..
. .
where, for each 1 ≤ k ≤ n, ~ak is the k th row of the matrix and is an n component row vector.
Property M
Multiplying any single row of a matrix by t multiplies the determinant by t as well.
~a1 ~a1
.. ..
. .
det t~ai = t det ~ai
. .
.. ..
~an ~an
To multiply every entry in an n × n matrix by t, we have to apply Property M once for each row in the
matrix, so we end up with a factor tn . For example
t 0 1 0 1 0
det = t × det = t × t × det = t2
0 t 0 t 0 1
Property A
Adding any multiple of any row to any other row has no effect on the determinant.
.. .
. ..
~ai + t~am ~ai
..
det = det ..
. .
~am ~am
.. ..
. .
Property D
The determinant of any triangular matrix is the product of its diagonal entries.
a11 ∗ ··· ∗ ∗
0 a22 ··· ∗ ∗
..
det
. = a11 a22 · · · ann
0 0 · · · an−1 n−1 ∗
0 0 ··· 0 ann
Property P
Property T
det At = det A
Here Pn is the set of all orderings of (1, 2, · · · , n). The symbol “P ” stands for “permutation”, which is the
mathematical name for a reordering. So
P3 = (1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)
We use σ(i) to denote the ith entry of the permutation σ. For example, when σ = (2, 3, 1), σ(2) = 3. The
sign of a permutation, here denoted sgn σ is +1 if (1, 2, · · · , n) can be transformed into σ using an even
number of exchanges of pairs of numbers. Otherwise, the sign is −1. For example (1, 2, 3) can be tranformed
into (1, 3, 2) by just exchanging 2 and 3, so the sign of σ = (1, 3, 2) is −1. On the other hand, (1, 2, 3) can be
tranformed into (3, 1, 2) by first exchanging 1 and 3 to yield (3, 2, 1) and then exchanging 1 and 2 to yield
(3, 1, 2). So the sign of σ = (3, 1, 2) is +1. Of course (1, 2, 3) can also be transformed into (3, 1, 2) by first
exchanging 1 and 2 to yield (2, 1, 3), then exchanging 1 and 3 to yield (2, 3, 1), then exchanging 2 and 3 to
yield (3, 2, 1) and finally exchanging 1 and 2 to yield (3, 1, 2). This used four exchanges, which is still even.
It is possible to prove that the number of exchanges of pairs of numbers used to transform (1, 2, 3) to (3, 1, 2)
is always even. It is also possible to prove the sgn σ is well–defined for all permutations σ.
If A is a 2 × 2 matrix, the above definition is
with the first term being the σ = (1, 2) contribution and the second term being the σ = (2, 1) contribution.
If A is a 3 × 3 matrix, the above definition is
det A = A11 A22 A33 − A11 A23 A32 − A12 A21 A33 + A12 A23 A31 + A13 A21 A32 − A13 A22 A31
with the terms being the contributions, in order, from σ = (1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2),
(3, 2, 1).
To verify that the new and old definitions of determinant agree, it suffices to prove that they agree
for 1 × 1 matrices, which is trivial, and that the new definition obeys the “expansion along the first row”
formula.
with the exception of row (3) coincide with the corresponding row of A. Row (3) of the product is row (3)
of A plus 4 times row (1) of A. The matrices that implement row operations are called elementary matrices.
We shall denote them by Qj here. We have
Qh · · · Q1 A = I
Now suppose that A is not invertible. Then there is a sequence of row operations, that when applied to
A, convert it into a matrix Ã, whose last row is identically zero. Implementing these row operations using
multiplication by elementary matrices, we have
Any matrix that has at least one row identically zero, like Ã, has determinant zero. Applying Qh · · · Q1 A = I
=⇒ det Qh · · · det Q1 det A = 1 with h = 1 and A replaced by the inverse of Q1 , we see that every
elementary matrix has nonzero determinant. So we conclude that if A is not invertible it necessarily has
determinant zero. Finally, observe that if A fails to be invertible, the same is true for AB (otherwise
B(AB)−1 is an inverse for A) and so both det A = 0 and det AB = 0.
X n
Y
det At = sgn σ Aσ(i) i
σ∈Pn i=1
Concentrate on one term in this sum. By the definition of permutation, each of the integers 1, 2, · · ·, n
appears exactly once in σ(1), σ(2), · · ·, σ(n). Reorder the factors in the product Aσ(1) 1 Aσ(2) 2 · · · Aσ(n) n so
that the first indices, rather than the second indices are in increasing order. This can be impemented by
making the change of variables i = σ −1 (j) in the product.
X n
Y
det At = sgn σ Aj σ−1 (j)
σ∈Pn j=1
On the right hand side, the two rows containing ~b have been interchanged.
2) Thanks to Property E, a determinant may be expanded along any row. That is, for any 1 ≤ i ≤ n,
n
X
det A = (−1)i+j aij det Mij
j=1
where Mij is the (n − 1) × (n − 1) matrix formed by deleting from the original matrix A the row and column
containing aij . To get the signs (−1)i+j right, you just have to remember the checkerboard
+ − + ···
− + − ···
+ − + ···
.. .. .. . .
. . . .
Example III.23 If we expand the matrix of Example III.21 along its second row, we get
1 2 3
2 3 1 3 1 2
det 1 0 2 = −1 × det + 0 × det − 2 × det
2 1 3 1 3 2
3 2 1
= −1 × (2 − 6) + 0 − 2(2 − 6) = 12
3) Thanks to Property T, a determinant may be expanded along any column too. That is, for any 1 ≤ j ≤ n,
n
X
det A = (−1)i+j aij det Mij
i=1
Example III.24 If we expand the matrix of Example III.21 along its second column, we get
1 2 3
1 2 1 3 1 3
det 1 0 2 = −2 × det + 0 × det − 2 × det
3 1 3 1 1 2
3 2 1
= −2 × (1 − 6) + 0 − 2(2 − 3) = 12
3) We can now use Gaussian elimination to evaluate determinants. Properties A, E, M say that if you take
a matrix U and apply to it a row operation, the resulting matrix V obeys
det U = det V if the row operation is (i) → (i) + k(j) for some j 6= i
det U = − det V if the row operation is (i) ↔ (j) for some j 6= i
1
det U = k det V if the row operation is (i) → k(i) for some k 6= 0
These properties, combined with Gaussian elimination, allow us to relate the determinant of any given matrix
to the determinant of a triangular matrix, that is trivially computed using Property D.
Example III.25
1 2 1 2 1 2 1 2 (1) 1 2 1 2 (1)
2 4 0 3 A 0 0 −2 −1 (2) − 2(1) E 0 1 4 4 (3)
det = det = − det
1 3 5 6 0 1 4 4 (3) − (1) 0 0 −2 −1 (2)
1 3 3 9 0 1 2 7 (4) − (1) 0 1 2 7 (4)
1 2 1 2 (1) 1 2 1 2 (1)
A 0 1 4 4 (2) A 0 1 4 4 (2)
= − det = − det
0 0 −2 −1 (3) 0 0 −2 −1 (3)
0 0 −2 3 (4) − (2) 0 0 0 4 (4) − (3)
D
= − 1 × 1 × (−2) × 4 = 8
Example III.26 Let’s redo the computation of the determinant in Example III.22, using row operations
as well as the fact that we may expand along any row or column. Use Cj to denote expansion along column
j and Rj to denote expansion along row j.
1 2 3 4 1 2 3 4 (1)
−5 −10 −15
4 3 2 1 A 0 −5 −10 −15 (2) − 4(1) C1
det = det = det −1 −3 −5
2 3 3 3 0 −1 −3 −5 (3) − 2(1)
−6 −12 −16
7 8 9 12 0 −6 −12 −16 (4) − 7(1)
1 2 3 1 2 3 (1)
M A
= − 5 det −1 −3 −5 = − 5 det 0 −1 −2 (2) + (1)
−6 −12 −16 0 0 2 (3) + 6(1)
D
= (−5){1 × (−1) × 2} = 10
A is invertible ⇐⇒ det A 6= 0
Proof: Let R be the triangular matrix that results from the application of Gaussian elimination to the
matrix A. Then, by properties E, M and A,
So
det A 6= 0 ⇐⇒ det R 6= 0
⇐⇒ the diagonal entries Rjj of R are all nonzero (Property D)
⇐⇒ rank R = n (See the definition of rank in §II.3)
⇐⇒ A is invertible (Property 6 of §III.5)
5)
−1
det A−1 = det A
m
det Am = det A
are easy consequences of property P.
2) Evaluate the determinant of each of the following matrices using row reduction
1 −1 1 −1 0 1 −2 3
1 2 4 8 −1 0 1 2
a) b)
1 −2 4 −8 2 −1 0 1
1 1 1 1 −3 −2 −1 0
3 1 2 0 1 1 1 1
−2 −1 5 −2 1 2 4 8
c) d)
1 −3 1 1 1 3 9 27
4 1 2 −3 1 4 16 64
have an inverse?
4) For which values of the parameter λ does the system of linear equations
2x1 − x2 = λx1
2x1 + 5x2 = λx2
Concise Formulae
a) If A is a square matrix with det A 6= 0,
det Mji
A−1 ij
= (−1)i+j
det A
where Mji is the matrix gotten by deleting from A the row and column containing Aji .
WARNING: This formula is useful, for example, for studying the dependence of A−1 on matrix
elements of A. But it is usually computationally much more efficient to solve AB = I by Gaussian
elimination than it is to apply this formula.
Proof: The foundation for this formula is the expansion formula
X
det A = (−1)i+j Aji det Mji
i
It is true for any j. This was implication number 2 in the last section. Let B be the matrix with
If k = j this is precisely the formula for det A given above. If k 6= j, this is the expansion along row j
for the determinant of another matrix Ã. This other matrix is constructed from A by replacing row j
of A by row k of A. Row numbers j and k of à are identical so that det à = 0 by implication number
1 of Property E.
Then
2 3 1 3 1 2
det M11 = det = −7 det M12 = det = −5 det M13 = det = −1
3 1 2 1 2 3
1 1 1 1 1 1
det M21 = det = −2 det M22 = det = −1 det M23 = det =1
3 1 2 1 2 3
1 1 1 1 1 1
det M31 = det =1 det M32 = det =2 det M33 = det =1
2 3 1 3 1 2
so that
det A = 1 × M11 − 1 × M12 + 1 × M13 = −7 + 5 − 1 = −3
and
−7 2 1
1
A−1 = 5 −1 −2
−3
−1 −1 1
WARNING: This formula is useful, for example, for studying the dependence of A−1~b on matrix
elements of A and ~b. But it is usually computationally much more efficient to solve A~x = ~b by Gaussian
elimination than it is to apply this formula.
Proof: Since ~x = A−1~b,
X X
det A × xi = det A × A−1~b)i = det A A−1 ~
ij bj = (−1)i+j bj det Mji
j j
The right hand side is the expansion along column i of the determinant of the matrix constructed from
A by replacing column number i with ~b.
x1 + x2 + x3 = 4
x1 + 2x2 + 3x3 = 9
2x1 + 3x2 + x3 = 7
2) Let
1 2 3 −3 1 2
A= B=
4 5 6 −3 2 0
0 0 1 0
1 2 3
0 0 0 1
(f ) 4 5 6 (g)
0 2 0 0
7 8 9
3 0 0 0
19) Let
−1 2 p
B = 0 −1 1
2 1 0
(a) For which values of p does B have an inverse?
(b) Find B −1 , for those values of p.
P∞
20) Suppose that, for some square matrix A, the series n=0 An = I + A + A2 + A3 + · · · converges. (In
series notation, A0 is defined to be I.) Show that
X
∞
(I − A)−1 = An
n=0
21) Suppose that some square matrix obeys An = 0 for some positive integer.
(a) Find the inverse of A.
(b) Find the inverse of I − A.
22) Suppose that L is a linear transformation from IRn to IRn . Prove that its inverse, if it exists, is also
linear.
3 2 p
23) Suppose that det 0 p 1 = 10. What are the possible values of p?
1 0 2
24) Let
1 3 5 ∗
0 4 0 6
A=
0 1 0 2
3 ∗ 7 8
where the ∗’s denote unknown entries. Find all possible values of det A.
25) Suppose that the 3 × 3 matrix A obeys det A = 5. Compute (a) det(4A) (b) det(A2 ) (c) det(4A2 )
26) Suppose that the 6 × 6 matrix A obeys A4 = 2A. Find all possible values of det A.
27) Evaluate
1 a a2 a3
a a2 a3 1
det 2
a a3 1 a
a3 1 a a 2
Solutions
1) Compute the following matrix products:
2 1 1 2 3 0 4 3
1 0 0 3 1 2 0
(a) (b) 0 1 (c) 0 0 12 1 1
3 5 1 4 0 −1 1
3 0 1 0 0 8 0 4
1 0 2 3
1 0 3 5 4
(d) 2 3 (e) [ 3 1 0]3 0 (f ) [ 2 3]
0 1 1 2 5
5 0 0 5
3 3 5 3 2
2 1 0 1 −1 0 1 2 −1 0 1 3 a b
(g) (h) 1 0 01 0 (i)
0 4 1 2 2 1 0 0 0 3 5 4 c d
0 2 0 2 0
x y a b
(j)
2 3 −1 4
Solution.
(a)
1 0 0 3 1×0+0×1 1×3+0×4 0 3
= =
3 5 1 4 3×0+5×1 3×3+5×4 5 29
(b)
2 1
1 2 0 1×2+2×0+0×3 1×1+2×1+0×0 2 3
0 1 = =
0 −1 1 0×2−1×0+1×3 0×1−1×1+1×0 3 −1
3 0
(c)
1 2 3 0 4 3 1×0+2×2+3×8 1×4+2×1+3×0 1×3+2×1+3×4
0 0 12 1 1 = 0 × 0 + 0× 2+ 1 × 8 0×4+0×1+1×0 0×3+0×1+1×4
1 0 0 8 0 4 1×0+0×2+0×8 1×4+0×1+0×0 1×3+0×1+0×4
28 6 17
= 8 0 4
0 4 3
(d)
1 0
1×1+0×0 1×0+0×1 1×3+0×1 1×5+0×2
2 3 1 0 3 5
= 2 × 1 + 3 × 0 2 × 0 + 3 × 1 2 × 3 + 3 × 1 2 × 5 + 3 × 2
0 1 1 2
5 0 5×1+0×0 5×0+0×1 5×3+0×1 5×5+0×2
1 0 3 5
= 2 3 9 16
5 0 15 25
(e)
2 3
[3 1 0]3 0 = [3 × 2+ 1× 3 + 0 × 0 3 × 3 + 1 × 0 + 0 × 5] = [9 9]
0 5
(f)
4
[2 3] = [ 2 × 4 + 3 × 5 ] = [ 23 ]
5
(g)
2 1 0 1 −1 0 1 1 4 −1 0 1 7 4 1
= =
0 4 1 2 2 1 0 4 8 2 1 0 12 8 4
(h)
3 3 5 3 2
3 2
2 −1 0 5 6 10 41 10
1 0 01 0 = 1 0 =
0 0 3 0 6 0 6 0
0 2 0 2 0 2 0
(i)
1 3 a b a + 3c b + 3d
=
5 4 c d 5a + 4c 5b + 4d
(j)
x y a b ax − y bx + 4y
=
2 3 −1 4 2a − 3 2b + 12
2) Let
1 2 3 −3 1 2
A= B=
4 5 6 −3 2 0
(a) Compute 2A, 3B, 2A + 3B and 3(2A + 3B).
(b) Compute 6A, 9B and 6A + 9B.
(c) Why are the last results in parts (a) and (b) the same?
Solution. (a)
2 4 6 −9 3 6 2 4 6 −9 3 6 −7 7 12
2A = 3B = 2A + 3B = + =
8 10 12 −9 6 0 8 10 12 −9 6 0 −1 16 12
−7 7 12 −21 21 36
3(2A + 3B) = 3 =
−1 16 12 −3 48 36
(b)
6 12 18 −27 9 18
6A = 9B =
24 30 36 −27 18 0
6 12 18 −27 9 18 −21 21 36
6A + 9B = + =
24 30 36 −27 18 0 −3 48 36
(c) Applying properties 3 and 5 of the “Basic Properties of Matrix Operations” given in §III.1, we have,
3 5
3(2A + 3B)=3(2A) + 3(3B)=(3 × 2)A + (3 × 3)B = 6A + 9B
3) Let
3 2
1 2 2 6 1
A= B= C = 1 2
3 −1 1 −1 0
4 1
(a) Compute AB, (AB)C, BC and A(BC) and verify that (AB)C = A(BC). So it is not necessary to
bracket ABC.
(b) Can the order of the factors in the product ABC be changed?
Solution. (a)
3 2
4 4 1 4 4 1 20 17
AB = (AB)C = 1 2 =
5 19 3 5 19 3 46 51
4 1
16 17 1 2 16 17 20 17
BC = A(BC) = =
2 0 3 −1 2 0 46 51
(b)
−1 2 1 −1 0 1
A+B = + =
3 −4 0 3 3 −1
0 1 0 1 3 −1
(A + B)2 = =
3 −1 3 −1 −3 4
(c) (A + B)2 = A(A + B) + B(A + B) = A2 + AB + BA + B 2 , so the answer to part a minus the answer
to part b ought to be
Then
1 0 a b a b
AM = =
0 −1 c d −c −d
a b 1 0 a −b
MA = =
c d 0 −1 c −d
These are the same if and only if b = −b and c = −c which in turn is the case if and only if b = c = 0.
a 0
So we need M to be of the form for some numbers a and d.
0 d
(b) In order for both BM and M B to be defined, M must be a 2 × 2 matrix. Let
a b
M=
c d
Then
0 1 a b c d
BM = =
1 0 c d a b
a b 0 1 b a
MB = =
c d 1 0 d c
a b
These are the same if and only if c = b and a = d. So we need M to be of the form for some
b a
numbers a and b.
(c) In order to satisfy the conditions of both part a and part b, we need b = c = 0 and a = d, so M must
1 0
be of the form a for some number a.
0 1
So if ab = 0 (e.g. a = 1, b = 0), A2 = 0.
(c) This statement is false . For example
a 0 a2 0
A= =⇒ A2 =
0 b 0 b2
Of the x1 (n) cars that had their first birthday on January 1, .75x1 (n) remain with the company. During
year n + 1, these cars are all between one and two years old. So
x2 (n + 1) = .75x1 (n)
Similarly,
x3 (n + 1) = .50x2 (n)
In summary,
B~x(n) 1 = x1 (n + 1) = .25x1 (n) + .50x2 (n) + 1.0x3 (n)
B~x(n) 2 = x2 (n + 1) = .75x1 (n)
B~x(n) 3 = x3 (n + 1) = .50x2 (n)
So
.25 .5 1
B = .75 0 0
0 .5 0
so that
1
.25 .5 1 0 q .25 .5 1 q 4q
~x(2) = B~x(1) = .75 0 00 = 0 ~x(3) = B~x(2) = .75 0 0 0 = 34 q
0 .5 0 q 0 0 .5 0 0 0
1 7 7 37
.25 .5 1 4q 16 q .25 .5 1 16 q 64 q
~x(4) = B~x(3) = .75 0 0 3
4q
3
= 16 q
~x(5) = B~x(4) = .75 0 0 3
16 q
= 21
64 q
0 .5 0 0 3 0 .5 0 3 6
8q 8q 64 q
3
The proportions are 0, 0, 38 , 32 .
(c) In equilibrium (using I to denote the identity matrix)
~x(n + 1) = ~x(n)
B~x(n) = I~x(n)
(B − I)~x(n) = 0
The last equation forces β = 2γ and the second equation forces α = 43 β = 38 γ. In order to have an
integer numbers of cars, γ has to be a positive multiple of 3. Let γ = 3p. The general solution is
8
~x(n) = p 6 , p = 0, 1, 2, 3 · · ·
3
(d) We wish to find a matrix C obeying ~x(n+2) = C~x(n). As ~x(n+1) = B~x(n) and ~x(n+2) = B~x(n+1),
we have ~x(n + 2) = B~x(n + 1) = B 2 ~x(n). So
1/4 2/4 1 1/4 2/4 1 7 10 4
1
C = B 2 = 3/4 0 0 3/4 0 0 = 3 6 12
16
0 2/4 0 0 2/4 0 6 0 0
The last equation forces α = 83 γ, just as in part c. The last equation minus twice the second equation
forces 20β − 40γ = 0 or β = 2γ just as in part c. The general solution is once again
8
~x(n) = p 6 , p = 0, 1, 2, 3 · · ·
3
No fleet vector repeats itself every two years but not every year.
9) Determine whether or not each of the following functions is a linear transformation.
(a) f (x, y, z) = 3x − 2y + 5z (b) f (x, y, z) = −2y + 9z − 12 (c) f (x, y, z) = x2 + y 2 + z 2
(d) f (x, y, z) = [x + y, x − y, 0] (e) f (x, y, z) = [x + y, x − y, 1] (f) f (x, y, z) = [x, y, z 2 ]
Solution. We use the notations ~x to stand for [x, y, z], ~x′ to stand for [x′ , y ′ , z ′ ] and f (~x) to stand for
f (x, y, z). Observe that s~x + t~x′ = [sx + tx′ , sy + ty ′ , sz + tz ′ ].
(a) f (~x) = 3x − 2y + 5z is a function from IR3 to IR. For it to be linear, the two expressions
have to be equal for all ~x, ~x′ in IR3 and all s, t in IR. They are, so this f is linear .
(b) f (~x) = −2y + 9z − 12 is a function from IR3 to IR. For it to be linear, the two expressions
have to be equal for all ~x, ~x′ in IR3 and all s, t in IR. They aren’t, because when x = y = z = x′ = y ′ =
z ′ = 0 the first expression reduces to −12 and the second reduces to −12(s + t). These are equal only
when s + t = 1. So this f is not linear .
(c) f (~x) = x2 + y 2 + z 2 is a function from IR3 to IR. For it to be linear, the two expressions
have to be equal for all ~x, ~x′ in IR3 and all s, t in IR. They aren’t, because when y = z = x′ = y ′ = z ′ = 0
the first expression reduces to s2 x2 and the second reduces to sx2 . These are equal only when s2 = s or
when x = 0. So this f is not linear .
(d) f (~x) = [x + y, x − y, 0] is a function from IR3 to IR3 . For it to be linear, the two expressions
have to be equal for all ~x, ~x′ in IR3 and all s, t in IR. They are, so this f is linear .
(e) f (~x) = [x + y, x − y, 1] is a function from IR3 to IR3 . For it to be linear, the two expressions
have to be equal for all ~x, ~x′ in IR3 and all s, t in IR. The third components are equal only for s + t = 1,
so this f is not linear .
(f) f (~x) = [x, y, z 2 ] is a function from IR3 to IR3 . For it to be linear, the two expressions
have to be equal for all ~x, ~x′ in IR3 and all s, t in IR. If z ′ = 0 the third component of the first expression
reduces to s2 z 2 while that of the second expression reduces to sz 2 . These agree only for z = 0 or s = 0, 1.
So this f is not linear .
10) Suppose that a linear transformation maps [1, 1] to [4, 7] and [1, −1] to [8, 3]. What vector does it map
[5, 14] to?
Solution 1. The vector
is mapped to
19
4
2 [4, 7] − 29 [8, 3] = 106
2, 2 = [2, 53]
Solution 2. Write the vectors as column, rather than row vectors. Let
a b
c d
This consists of a system of two equations in the two unknowns a and b as well as a system of two
equations in the two unknowns c and d. Both systems are easy to solve
Now that we know the matrix of the linear transformation, we just have to apply it to the specified
input vector.
a b 5 6 −2 5 30 − 28 2
= = =
c d 14 5 2 14 25 + 28 53
11) Is it possible for a linear transformation to map [1, 2] to [1, 0, −1], [3, 4] to [1, 2, 3] and [5, 8] to [3, 1, 6]?
Solution. [5, 8] = 2[1, 2] + [3, 4]. So for the map to be linear, it is necessary that [3, 1, 6] = 2[1, 0, −1] +
[1, 2, 3] = [3, 2, 1], which is false. So, it is not possible .
12) Find the matrices representing the linear transformations
(a) f (x, y) = [x + 2y, y − x, x + y] (b) g(x, y, z) = [x − y − z, 2y + 5z − 3x]
(c) f g(x, y, z) (d) g f (x, y)
Solution. Writing column, rather than row vectors.
(a)
x + 2y a b ax + by
y − x = c d x = cx + dy
y
x+y e f ex + f y
provided we choose a = 1, b = 2, c = −1, d = 1, e = 1, f = 1. So the matrix is
1 2
−1 1
1 1
(b)
x
x−y−z a b c ax + by + cz
= y =
2y + 5z − 3x d e f dx + ey + f z
z
provided we choose a = 1, b = −1, c = −1, d = −3, e = 2, f = 5. So the matrix is
1 −1 −1
−3 2 5
(c) Substituting
x x
u 1 −1 −1
= g y = y
v −3 2 5
z z
into
1 2
u u
f = −1 1
v v
1 1
gives
x 1 2 x 1 2 x −5 3 9 x
1 −1 −1
f g y = −1 1 g y = −1 1 y = −4 3 6 y
−3 2 5
z 1 1 z 1 1 z −2 1 4 z
(d) Substituting
u 1 2
v=f x x
= −1 1
y y
w 1 1
into
u u
1 −1 −1
g v = v
−3 2 5
w w
gives
1 2
x 1 −1 −1 x 1 −1 −1 x 1 0 x
g f = f = −1 1 =
y −3 2 5 y −3 2 5 y 0 1 y
1 1
(b) Perform the following little experiment. Take a book. Rotate the book, about its spine, by 45◦ .
Observe that the spine of the book does not move at all. Vectors lying on the axis of rotation, do not
change when the rotation is executed. To find the axis of rotation of T , we just need to find a nonzero
vector ~n obeying T ~n = ~n.
0 0 −1 n1 n1 −n3 n1 n1 1
−1 0 0 n2 = n2 =⇒ −n1 = n2 =⇒ n2 = c −1 for any c
0 1 0 n3 n3 n2 n3 n3 −1
(c) Rotate the book, about its spine, by 45◦ again. Observe that the bottom and top edges of the book
rotate by 45◦ . Under a rotation of θ◦ , vectors perpendicular to the axis of rotation rotate by θ◦ . Vectors
that are neither perpendicular to nor parallel to the axis of rotation, rotate by angles that are strictly
between 0◦ and θ◦ . (Repeat the book experiment a few times, concentrating on vectors that are almost
parallel to the spine and then on vectors that are almost perpendicular to the spine, to convince yourself
that this is true.) So to determine the angle of rotation, we select a vector, ~v , perpendicular to the axis
of rotation, [1, −1, −1], and compute the angle between ~v and T ~v. The vector ~v = ı̂ + ̂ is perpendicular
to the axis of rotation, because [1, 1, 0] · [1, −1, −1] = 0. It gets mapped to
0 0 −1 1 0
−1 0 0 1 = −1
0 1 0 0 1
The angle, θ, between ı̂ + ̂ and the vector −̂ + k̂ that it is mapped to obeys
(ı̂ + ̂) · (−̂ + k̂) = kı̂ + ̂k k − ̂ + k̂k cos θ =⇒ 2 cos θ = −1 =⇒ θ = 120◦
Remark. We have already seen, in part b, that every vector parallel to ı̂ − ̂ − k̂ gets mapped to itself.
That is, the linear transformation does not move it at all. If the linear transformation really is a rotation
then every vector perpendicular to ı̂− ̂− k̂, i.e. every vector cı̂+â+bk̂ obeying (cı̂+â+bk̂)·(ı̂− ̂− k̂) =
c − a − b = 0, i.e. every vector of the form (a + b)ı̂ + â + bk̂, should get mapped to a vector which is
perpendicular to ı̂ − ̂ − k̂, has the same length as (a + b)ı̂ + â + bk̂ and makes an angle 120◦ with respect
to (a + b)ı̂ + â + bk̂. The vector (a + b)ı̂ + â + bk̂ gets mapped to
0 0 −1 a+b −b
−1 0 0 a = −a − b
0 1 0 b a
refl(~x) x 2 x − 2y − 2z
x + y + z 1
~x − 2proj~n (~x) = y − 2 = y − 2x − 2z
proj~n (~x) 3 3
z 2 z − 2x − 2y
~x
1 −2 −2 x
1
= −2 1 −2 y
3
−2 −2 1 z
Check: The vector ~n, which is perpendicular to the plane, should be mapped to −~n. On the other hand,
the vectors ı̂ − ̂ and ̂ − k̂, both of which are parallel to the plane, should be mapped to themselves.
1 −2 −2 1 1
1
−2 1 −2 1 = − 1
3
−2 −2 1 1 1
1 −2 −2 1 1
1
−2 1 −2 −1 = −1
3
−2 −2 1 0 0
1 −2 −2 0 0
1
−2 1 −2 1 = 1
3
−2 −2 1 −1 −1
(b) The vector ~n = 2ı̂ − 2̂ − k̂ is perpendicular to the given plane. The projection of any vector ~x on ~n
is
2
~n · ~x 2x − 2y − z
proj~n (~x) = ~n = −2
k~nk2 9
−1
The reflection is
x 2 x + 8y + 4z 1 8 4 x
2x − 2y − z 1 1
~x − 2proj~n (~x) = y − 2 −2 = y + 8x − 4z = 8 1 −4 y
9 9 9
z −1 7z + 4x − 4y 4 −4 7 z
Check: The vector ~n, which is perpendicular to the plane, should be mapped to −~n. On the other hand,
the vectors ı̂ + ̂ and ı̂ + 2k̂, both of which are parallel to the plane, should be mapped to themselves.
1 8 4 2 2
1
8 1 −4 −2 = − −2
9
4 −4 7 −1 −1
1 8 4 1 1
1
8 1 −4 1 = 1
9
4 −4 7 0 0
1 8 4 1 1
1
8 1 −4 0 = 0
9
4 −4 7 2 2
15) A solid body is rotating about an axis which passes through the origin and has direction Ω ~ = Ω1 ı̂ +
~ radians per second. Denote by ~x the coordinates, at some fixed
Ω2 ̂ + Ω3 k̂. The rate of rotation is kΩk
time, of a point fixed to the body and by ~v the velocity vector of the point at that time. Find a matrix
A such that ~v = A~x.
Solution. We saw in §I.7 that the velocity vector is
ı̂ ̂ k̂ Ω2 z − Ω3 y 0 −Ω3 Ω2 x
~ × ~x = Ω1
~v = Ω Ω2 Ω3 = −Ω1 z + Ω3 x = Ω3 0 −Ω1 y
x y z Ω1 y − Ω2 x −Ω2 Ω1 0 z
In particular, using (m, n) to denote the equation that matches the matrix element in row m and column
n of the product of the two matrices on the left with the corresponding matrix element of the identity
matrix on the right
0 0 1 0
1 2 3
0 0 0 1
(f ) 4 5 6 (g)
0 2 0 0
7 8 9
3 0 0 0
Solution. We have a canned formula for the inverse of 2 × 2 matrices. But I’ll use row reduction, just
for practice.
(a) " #
1 4 1 0 (1) 1 4 1 0 (1) + 4(2) 1 0 −7 4
2 7 0 1 (2) − 2(1) 0 −1 −2 1 −(2) 0 1 2 −1
Check:
1 4 −7 4 1 0
=
2 7 2 −1 0 1
(b) " #
2 5 1 0 (1) 2 5 1 0 (1)/2 + 5(2)/4 1 0 −2 5/4
4 8 0 1 (2) − 2(1) 0 −2 −2 1 −(2)/2 0 1 1 −1/2
Check:
2 5 −2 5/4 1 0
=
4 8 1 −1/2 0 1
(c)
" #
3 4 1 0 (1)/3 1 4/3 1/3 0 (1) + 4(2)/25 1 0 3/25 4/25
4 −3 0 1 (2) − 4(1)/3 0 −25/3 −4/3 1 −3(2)/25 0 1 4/25 −3/25
Check:
3 4 3/25 4/25 1 0
=
4 −3 4/25 −3/25 0 1
(d)
1 2 −1 1 0 0 (1) 1 2 −1 1 0 0 (1) 1 2 −1 1 0 0
2 5 3 0 1 0 (2) − 2(1) 0 1 5 −2 1 0 (2) 0 1 5 −2 1 0
1 3 9 0 0 1 (3) − (1) 0 1 10 −1 0 1 (3) − (2) 0 0 5 1 −1 1
(1) + (3)/5 1 2 0 6/5 −1/5 1/5 (1) − 2(2) 1 0 0 36/5 −21/5 11/5
(2) − (3) 0 1 0 −3 2 −1 (2) 0 1 0 −3 2 −1
(3)/5 0 0 1 1/5 −1/5 1/5 (3) 0 0 1 1/5 −1/5 1/5
Check:
1 2 −1 36/5 −21/5 11/5 1 0 0
2 5 3 −15/5 10/5 −5/5 = 0 1 0
1 3 9 1/5 −1/5 1/5 0 0 1
(e)
1 2 3 1 0 0 (1) 1 2 3 1 0 0 (1) 1 2 3 1 0 0
2 3 4 0 1 0 (2) − 2(1) 0 −1 −2 −2 1 0 (2) 0 −1 −2 −2 1 0
3 4 6 0 0 1 (3) − 3(1) 0 −2 −3 −3 0 1 (3) − 2(2) 0 0 1 1 −2 1
(1) − 3(3) 1 2 0 −2 6 −3 (1) − 2(2) 1 0 0 −2 0 1
−(2) − 2(3) 0 1 0 0 3 −2 (2) 0 1 0 0 3 −2
(3) 0 0 1 1 −2 1 (3) 0 0 1 1 −2 1
Check:
1 2 3 −2 0 1 1 0 0
2 3 4 0 3 −2 = 0 1 0
3 4 6 1 −2 1 0 0 1
(f)
1 2 3 1 0 0 (1) 1 2 3 1 0 0 (1) 1 2 3 1 0 0
4 5 6 0 1 0 (2) − 4(1) 0 −3 −6 −4 1 0 (2) 0 −3 −6 −4 1 0
7 8 9 0 0 1 (3) − 7(1) 0 −6 −12 −7 0 1 (3) − 2(2) 0 0 0 1 −2 1
The last equation has a zero left hand side and nonzero right hand side and so cannot be satisfied. There
is no inverse . As a check, observe that
1 2 3 1 0
4 5 6 −2 = 0
7 8 9 1 0
Check:
0 0 1 0 0 0 0 1/3 1 0 0 0
0 0 0 10 0 1/2 0 0 1 0 0
=
0 2 0 0 1 0 0 0 0 0 1 0
3 0 0 0 0 1 0 0 0 0 0 1
19) Let
−1 2 p
B = 0 −1 1
2 1 0
(a) For which values of p does B have an inverse?
(b) Find B −1 , for those values of p.
Solution.
−1 2 p 1 0 0 (1) −1 2 p 1 0 0
0 −1 1 0 1 0 (2) 0 −1 1 0 1 0
2 1 0 0 0 1 (3) + 2(1) 0 5 2p 2 0 1
(1) −1 2 p 1 0 0
(2) 0 −1 1 0 1 0
(3) + 5(2) 0 0 2p + 5 2 5 1
−5 5p p
−(1) 1 −2 −p −1 0 0 (1) + p(3) 1 −2 0 2p+5 2p+5 2p+5
0 1 −1 0 −1 0 2 −2p 1
−(2) (2) + (3) 0 1 0 2p+5 2p+5 2p+5
2 5 1
(3)/(2p + 5) 0 0 1 2p+5 2p+5 2p+5 (3) 0 0 1 2 5 1
2p+5 2p+5 2p+5
−1 p p+2
(1) + 2(2) 1 0 0 2p+5 2p+5 2p+5
2 −2p 1
(2) 0 1 0 2p+5 2p+5 2p+5
(3) 0 0 1 2 5 1
2p+5 2p+5 2p+5
P∞
20) Suppose that, for some square matrix A, the series n=0 An = I + A + A2 + A3 + · · · converges. (In
series notation, A0 is defined to be I.) Show that
X
∞
(I − A) −1
= An
n=0
21) Suppose that some square matrix obeys An = 0 for some positive integer.
(a) Find the inverse of A.
(b) Find the inverse of I − A.
Solution. (a) Trick question!! A has no inverse. If A had an inverse then multiplying both sides of
An = 0 by (A−1 )n would give A−1 · · · A−1 A · · · A = 0 (with n A−1 ’s and n A’s) and then I = 0.
(b) From problem 20, we would guess (I − A)−1 = I + A + A2 + · · · + An−1 . All the remaining terms
in the series I + A + A2 + · · · vanish because of An = 0. To verify that the guess is correct we multiply
out
For this to be 10
−p2 + 6p + 2 = 10 ⇐⇒ p2 − 6p + 8 = 0 ⇐⇒ p = 2, 4
24) Let
1 3 5 ∗
0 4 0 6
A=
0 1 0 2
3 ∗ 7 8
where the ∗’s denote unknown entries. Find all possible values of det A.
Solution.
1 3 5 ∗ 1 3 5 ∗ (1)
0 4 0 6 0 4 0 6 (2)
det = det
0 1 0 2 0 1 0 2 (3)
3 ∗ 7 8 0 ∗ −8 ∗ (4) − 3(1)
4 0 6 0 0 −2 (1) − 4(2)
= det 1 0 2 = det 1 0 2 (2)
∗ −8 ∗ ∗ −8 ∗ (3)
1 0
= −2 det = 16
∗ −8
for all values of the ∗’s.
25) Suppose that the 3 × 3 matrix A obeys det A = 5. Compute (a) det(4A) (b) det(A2 ) (c) det(4A2 )
Solution.
4 0 0 4 0 0
det(4A) = det 0 4 0 A = det 0 4 0 det A = 43 × 5 = 320
0 0 4 0 0 4
det(A2 ) = det A det A = 5 × 5 = 25
det(4A2 ) = det(4A) det A = 320 × 5 = 1600
26) Suppose that the 6 × 6 matrix A obeys A4 = 2A. Find all possible values of det A.
Solution.
A4 = 2A =⇒ det(A4 ) = det(2A) =⇒ (det A)4 = det(2I) det A
As in the last question det(2I) = 26 = 64. So
0 = (det A4 ) − 64 det A = det A (det A)3 − 64
Consequently det A = 0 or (det A)3 = 64 or det A = 0, 4 , assuming that A has real matrix entries.
27) Evaluate
1 a a2 a3
a a2 a3 1
det 2
a a3 1 a
a3 1 a a 2
Solution.
1 a a2 a3 1 a a2 a3 (1)
2 3 4
a a a 1 0 0 0 1 − a (2) − a(1)
det 2 = det
a a3 1 a 0 0 1 − a4 a − a5 (3) − a2 (1)
3 2 4 5 2 6
a 1 a a 0 1−a a−a a −a (4) − a3 (1)
0 0 1 − a4
0 1 − a4
= det 0 1 − a4 a − a5 = (1 − a4 ) det
1 − a4 a − a5
1 − a4 a − a5 a2 − a6
= (1 − a4 ) − (1 − a4 )(1 − a4 ) = −(1 − a4 )3 = (a4 − 1)3
We expanded along the first row to achieve each of the the second, third and fourth equalities.