The Fundamental Theorem of Linear Algebra 1993
The Fundamental Theorem of Linear Algebra 1993
Gilbert Strang
To cite this article: Gilbert Strang (1993) The Fundamental Theorem of Linear Algebra, The
American Mathematical Monthly, 100:9, 848-855, DOI: 10.1080/00029890.1993.11990500
Article views: 2
Gilbert Strang
This paper is about a theorem and the pictures that go with it. The theorem
describes the action of an m by n matrix. The matrix A produces a linear
transformation from Rn to Rm-but this picture by itself is too large. The "truth"
about Ax = b is expressed in terms of four subspaces (two of Rn and two of Rm).
The pictures aim to illustrate the action of A on those subspaces, in a way that
students won't forget.
The first step is to see Ax as a combination of the columns of A. Until then the
multiplication Ax is just numbers. This step raises the viewpoint to subspaces. We
see Ax in the column space. Solving Ax = b means finding all combinations of the
columns that produce b in the column space:
Columns of A xn
The column space is the range R(A), a subspace of Rm. This abstraction, from
entries in A or x or b to the picture based on subspaces, is absolutely essential.
Note how subspaces enter for a purpose. We could invent vector spaces and
construct bases at random. That misses the purpose. Virtually all algorithms and
all applications of linear algebra are understood by moving to subspaces.
The key algorithm is elimination. Multiples of rows are subtracted from other
rows (and rows are exchanged). There is no change in the row space. This subspace
contains all combinations of the rows of A, which are the columns of AT. The row
space of A is the column space R(AT).
The other subspace of Rn is the nullspace N(A). It contains all solutions to
Ax = 0. Those solutions are not changed by elimination, whose purpose is to
compute them. A by-product of elimination is to display the dimensions of these
subspaces, which is the first part of the theorem.
The Fundamental Theorem of Linear Algebra has as many as four parts. Its
presentation often stops with Part 1, but the reader is urged to include Part 2.
(That is the only part we will prove-it is too valuable to miss. This is also as far as
we go in teaching.) The last two parts, at the end of this paper, sharpen the first
two. The complete picture shows the action of A on the four subspaces with the
right bases. Those bases come from the singular value decomposition.
The Fundamental Theorem begins with
Part 1. The dimensions of the subspaces.
Part 2. The orthogonality of the subspaces.
T
Ay-
_ [ column_ 1 of A
:
column n of A
l _[?l
Y-:·
0
Since y is orthogonal to each column (producing each zero), y is orthogonal to the
whole column space. The point is that AT is just as good a matrix as A. Nothing is
new, except AT is n by m. Therefore the left nullspace has dimension m - r.
ATy = 0 means the same as y TA = 0 T. With the vector on the left, y TA is a
combination of the rows of A. Contrast that with Ax =combination of the
columns.
The First Picture: Linear Equations
Figure 1 shows how A takes x into the column space. The nullspace goes to the
zero vector. Nothing goes elsewhere in the left nullspace-which is waiting its
turn.
With b in the column space, Ax = b can be solved. There is a particular
solution x, in the row space. The homogeneous solutions xn form the nullspace.
The general solution is x, + xn. The particularity of x, is that it is orthogonal to
every xn.
May I add a personal note about this figure? Many readers of Linear Algebra
and Its Applications [4] have seen it as fundamental. It captures so much about
Ax =b. Some letters suggested other ways to draw the orthogonal subspaces-
artistically this is the hardest part. The four subspaces (and very possibly the figure
itself) are of course not original. But as a key to the teaching of linear algebra, this
illustration is a gold mine.
''
'.r=.rr+xn
R" 0
dim 11- r
Other writers made a further suggestion. They proposed a lower level textbook,
recognizing that the range of students who need linear algebra (and the variety of
preparation) is enormous. That new book contains Figures 1 and 2-also Figure 0,
to show the dimensions first. The explanation is much more gradual than in this
paper-but every course has to study subspaces! We should teach the important
ones.
The Second Figure: Least Squares Equations
If b is not in the column space, Ax = b cannot be solved. In practice we still
have to come up with a "solution." It is extremely common to have more equations
than unknowns-more output data than input controls, more measurements than
parameters to describe them. The data may lie close to a straight line b = C + Dt.
A parabola C + Dt + Et 2 would come closer. Whether we use polynomials or
sines and cosines or exponentials, the problem is still linear in the coefficients
C,D,E:
or
C + Dt m + Et;, = bm
There are n = 2 or n = 3 unknowns, and m is larger. There is no x = (C, D) or
x = (C, D, E) that satisfies all m equations. Ax = b has a solution only when the
points lie exactly on a line or a parabola-then b is in the column space of the m
by 2 or m by 3 matrix A.
The solution is to make the error b - Ax as small as possible. Since Ax can
never leave the column space, choose the closest point to b in that subspace. This
point is the projection p. Then the error vector e = b - p has minimal length.
To repeat: The best combination p = AX is the projection of b onto the column
space. The error e is perpendicular to that subspace. Therefore e = b - AX is in
the left nullspace:
AT(b- AX) = 0 or ATAX = ATb.
Calculus reaches the same linear equations by minimizing the quadratic lib - Axll 2 •
The chain rule just multiplies both sides of Ax = b by AT.
column
row
space
space Ar = P
_r -----,;<-------------!--~- P
Ax=b ,'
----not possible-- -•- "'fJ = p + e
All these matrices have rank r. The r positive eigenvalues u/ give the diagonal
entries u; of 2.
Example
Av,.= a,.u,.
row
space
null space
of A
The SVD gives an easy formula for A+, because it chooses the right bases. Since
Au; = u;u;, the inverse has to be A +u; = v;/u;. Thus the pseudoinverse of !,
contains the reciprocals 1/u;. The orthogonal matrices U and VT are inverted by
UT and V. All together, the pseudoinverse of A = U!,VT is A+= V!. + UT.
Example (continued)
A+=
[12 -2]1 [1/v'SO o][-~ i] =~[1 3]
15 0 /10 0 50 2 6 .
Always A+A is the identity matrix on the row space, and zero on the nullspace:
A +A =
5~ [ ~~ ~~] = projection onto the line through [ ~] .
1 [ 5
AA += 50 15 !~ ]
= projection onto the line through [ j ].
A Summary of the Key Ideas
From its r-dimensional row space to its r-dimensional column space, A yields
an invertible linear transformation.
Proof" Suppose x and x' are in the row space, and Ax equals Ax' in the column
space. Then x - x' is in both the row space and nullspace. It is perpendicular to
itself. Therefore x = x' and the transformation is one-to-one.
The SVD chooses good bases for those subspaces. Compare with the Jordan form
for a real square matrix. There we are choosing the same basis for both domain
and range-our hands are tied. The best we can do is SAS- 1 = J or SA =IS. In
general J is not real. If real, then in general it is not diagonal. If diagonal, then in
general S is not orthogonal. By choosing two bases, not one, every matrix does as
well as a symmetric matrix. The bases are orthonormal and A is diagonalized.
Some applications permit two bases and others don't. For powers An we need
s- 1 to cancel S. Only a similarity is allowed (one basis). In a differential equation
u' =Au, we can make one change of variable u = Su. Then v' = s-~:Asv. But for
Ax = b, the domain and range are philosophically "not the same space." The row
and column spaces are isomorphic, but their bases can be different. And for least
squares the SVD is perfect.
This figure by Tom Hem and Cliff Long [2] shows the diagonalization of A.
Basis vectors go to basis vectors (principal axes). A circle goes to an ellipse. The
matrix is factored into U!.VT. Behind the scenes are two symmetric matrices ATA
and AAT. So we reach two orthogonal matrices U and V.
u
~ lTzUz ".
1 ~ i ~ r.
The nullspaces go to zero. Linearity does the rest.
REFERENCES
1. Gene Golub and Charles Van Loan, Matrix Computations, 2nd ed., Johns Hopkins University Press
(1989).
2. Thomas Hern and Cliff Long, Viewing some concepts and applications in linear algebra, VISualiza-
tion in Teaching and Learning Mathematics, MAA Notes 19 (1991) 173-190.
3. Roger Howe, Very basic Lie theory, American Mathematical Monthly, 90 (1983) 600-623.
4. Gilbert Strang, Linear Algebra and Its Applications, 3rd ed., Harcourt Brace Jovanovich (1988).
5. Gilbert Strang, Introduction to Linear Algebra, Wellesley-Cambridge Press (1993).
Department of Mathematics
Massachusetts Institute of Technology
Cambridge, M4 02139
gs@math.mit.edu
An Identity of Daubechies
The generalization of an identity of
Daubechies using a probabilistic interpre-
tation by D. Zeilberger [100 (1993) 487],
has already appeared in SIAM Review
Problem 85-10 (June, 1985) in a slightly
more general context. In addition to a
similar probabilistic derivation there is
also a direct algebraic proof. Incidentally,
problem 10223 [99 (1992) 462] is the same
as the identity of Daubechies and a slight
generalization of this identity has ap-
peared previously as problem 183, Crux
Math. 3(1977) 69-70 and came from a list
of problems considered for the Canadian
Mathematical Olympiad. There was an
inductive solution of the latter by Mark
Kleinman, a high school student at the
time and one of the top students in the
U.S.A.M.O. and the I.M.O.
M. S. Klamkin
Department of Mathematics
University of Alberta
Edmonton, Alberta
CANADA T6G 2G1