Exercises For 8.5
Exercises For 8.5
Exercise 8.5.1 In each case, find the exact eigenvalues Exercise 8.5.4 If A is symmetric, show that each matrix
and determine
corresponding eigenvectors. Then start Ak in the QR-algorithm is also symmetric. Deduce that
1 they converge to a diagonal matrix.
with x0 = and compute x4 and r3 using the power
1
method. 8.5.5 Apply the QR-algorithm to
Exercise
2 −3
2 −4 5 2 A= . Explain.
a. A = b. A = 1 −2
−3 3 −3 −2
Exercise 8.5.6 Given a matrix A, let Ak , Qk , and Rk ,
1 2 3 1 k ≥ 1, be the matrices constructed in the QR-algorithm.
c. A = d. A =
2 1 1 0 Show that Ak = (Q1 Q2 · · · Qk )(Rk · · · R2 R1 ) for each k ≥ 1
and hence that this is a QR-factorization of Ak .
Exercise 8.5.2 In each case, find the exact eigenvalues
[Hint: Show that Qk Rk = Rk−1 Qk−1 for each k ≥ 2, and
and then approximate them using the QR-algorithm.
use this equality to compute (Q1 Q2 · · · Qk )(Rk · · · R2 R1 )
1 1 3 1 “from the centre out.” Use the fact that (AB)n+1 =
a. A = b. A =
1 0 1 0 A(BA)n B for any square matrices A and B.]
When working with a square matrix A it is clearly useful to be able to “diagonalize” A, that is to find
a factorization A = Q−1 DQ where Q is invertible and D is diagonal. Unfortunately such a factorization
may not exist for A. However, even if A is not square gaussian elimination provides a factorization of
the form A = PDQ where P and Q are invertible and D is diagonal—the Smith Normal form (Theorem
2.5.3). However, if A is real we can choose P and Q to be orthogonal real matrices and D to be real. Such
a factorization is called a singular value decomposition (SVD) for A, one of the most useful tools in
applied linear algebra. In this Section we show how to explicitly compute an SVD for any real matrix A,
and illustrate some of its many applications.
We need a fact about two subspaces associated with an m × n matrix A:
Then im A is called the image of A (so named because of the linear transformation Rn → Rm with x 7→ Ax);
and col A is called the column space of A (Definition 5.10). Surprisingly, these spaces are equal:
Lemma 8.6.1
For any m × n matrix A, im A = col A.
446 Orthogonality
Proof. Let A = a1 a2 · · · an in terms of its columns. Let x ∈ im A, say x = Ay, y in Rn . If
T
y = y1 y2 · · · yn , then Ay = y1 a1 + y2 a2 + · · · + yn an ∈ col A by Definition 2.5. This shows that
im A ⊆ col A. For the other inclusion, each ak = Aek where ek is column k of In .
We know a lot about any real symmetric matrix: Its eigenvalues are real (Theorem 5.5.7), and it is orthog-
onally diagonalizable by the Principal Axes Theorem (Theorem 8.2.2). So for any real matrix A (square
or not), the fact that both AT A and AAT are real and symmetric suggests that we can learn a lot about A by
studying them. This section shows just how true this is.
The following Lemma reveals some similarities between AT A and AAT which simplify the statement
and the proof of the SVD we are constructing.
Lemma 8.6.2
Let A be a real m × n matrix. Then:
Proof.
Then (1.) follows for AT A, and the case AAT follows by replacing A by AT .
2. Write N(B) for the set of positive eigenvalues of a matrix B. We must show that N(AT A) = N(AAT ).
If λ ∈ N(AT A) with eigenvector 0 6= q ∈ Rn , then Aq ∈ Rm and
To analyze an m × n matrix A we have two symmetric matrices to work with: AT A and AAT . In view
of Lemma 8.6.2, we choose AT A (sometimes called the Gram matrix of A), and derive a series of facts
which we will need. This narrative is a bit long, but trust that it will be worth the effort. We parse it out in
several steps:
1. The n × n matrix AT A is real and symmetric so, by the Principal Axes Theorem 8.2.2, let
{q1 , q2 , . . . , qn } ⊆ Rn be an orthonormal basis of eigenvectors of AT A, with corresponding eigenval-
ues λ1 , λ2 , . . . , λn . By Lemma 8.6.2(1), λi is real for each i and λi ≥ 0. By re-ordering the qi we may
8.6. The Singular Value Decomposition 447
2. Even though the λi are the eigenvalues of AT A, the number r in (i) turns out to be rank A. To understand
why, consider the vectors Aqi ∈ im A. For all i, j:
With this write U = span {Aq1 , Aq2 , . . . , Aqr } ⊆ im A; we claim that U = im A, that is im A ⊆ U .
For this we must show that Ax ∈ U for each x ∈ Rn . Since {q1 , . . . , qr , . . . , qn } is a basis of Rn (it is
orthonormal), we can write xk = t1 q1 + · · · + tr qr + · · · + tn qn where each t j ∈ R. Then, using (iv) we
obtain
Ax = t1 Aq1 + · · · + tr Aqr + · · · + tn Aqn = t1 Aq1 + · · · + tr Aqr ∈ U
But col A = im A by Lemma 8.6.1, and rank A = dim ( col A) by Theorem 5.4.1, so
(v)
rank A = dim ( col A) = dim ( im A) = r (vi)
Definition 8.7
√ (iii)
The real numbers σi = λi = kAq̄i k for i = 1, 2, . . . , n, are called the singular values of the
matrix A.
With (vi) this makes the following definitions depend only upon A.
8 Of course they could all be positive (r = n) or all zero (so AT A = 0, and hence A = 0 by Exercise 5.3.9).
448 Orthogonality
Definition 8.8
Let A be a real, m × n matrix of rank r, with positive singular values σ1 ≥ σ2 ≥ · · · ≥ σr > 0 and
σi = 0 if i > r. Define:
DA 0
DA = diag (σ1 , . . . , σr ) and ΣA =
0 0 m×n
Here ΣA is in block form and is called the singular matrix of A.
The singular values σi and the matrices DA and ΣA will be referred to frequently below.
4. Returning to our narrative, normalize the vectors Aq1 , Aq2 , . . . , Aqr , by defining
1
pi = kAqi k Aqi ∈ Rm for each i = 1, 2, . . . , r (viii)
Then we compute:
σ1 · · · 0 0 ··· 0
.. . . . .. .. ..
. . . .
0 ··· σr 0 ··· 0
PΣA = p1 · · · pr pr+1 · · · pm
0 ··· 0 0 ··· 0
. .. .. ..
.. . . .
0 ··· 0 0 ··· 0
= σ1 p1 · · · σr pr 0 · · · 0
(xii)
= AQ
Theorem 8.6.1
Let A be a real m × n matrix, and let σ1 ≥ σ2 ≥ · · · ≥ σr > 0 be the positive singular values of A.
Then r is the rank of A and we have the factorization
The factorization A = PΣA QT in Theorem 8.6.1, where P and Q are orthogonal matrices, is called a
Singular Value Decomposition (SVD) of A. This decomposition is not unique. For example if r < m then
the vectors pr+1 , . . . , pm can be any extension of {p1 , . . ., pr } to an orthonormal basis of Rm , and each
will lead to a different matrix P in the decomposition. For a more dramatic example, if A = In then ΣA = In ,
and A = PΣA PT is a SVD of A for any orthogonal n × n matrix P.
Example 8.6.1
1 0 1
Find a singular value decomposition for A = .
−1 1 0
2 −1 1
Solution. We have AT A = −1 1 0 , so the characteristic polynomial is
1 0 1
x−2 1 −1
cAT A (x) = det 1 x−1 0 = (x − 3)(x − 1)x
−1 0 x−1
In this case, {p1 , p2 } is already a basis of R2 (so the Gram-Schmidt algorithm is not needed), and
we have the 2 × 2 orthogonal matrix
1 1 1
P = p1 p2 = √
2 −1 1
Finally (by Theorem 8.6.1) the singular value decomposition for A is
√ 2 −1
√ √ 1
1 1 3 0 0 √1
A = PΣA QT = √12 √ 0 √ 3 √3
−1 1 0 1 0 6
− 2 − 2 2
Of course this can be confirmed by direct matrix multiplication.
Thus, computing an SVD for a real matrix A is a routine matter, and we now describe a systematic
procedure for doing so.
SVD Algorithm
Given a real m × n matrix A, find an SVD A = PΣA QT as follows:
1. Use the Diagonalization Algorithm (see page 181) to find the (real and non-negative)
eigenvalues λ1 , λ2 , . . . , λn of AT A with corresponding (orthonormal) eigenvectors
q1 , q2 , . . . , qn . Reorder the qi (if necessary) to ensure that the nonzero eigenvalues are
λ1 ≥ λ2 ≥ · · · ≥ λr > 0 and λi = 0 if i > r.
In practise the singular values σi , the matrices P and Q, and even the rank of an m × n matrix are not
8.6. The Singular Value Decomposition 451
calculated this way. There are sophisticated numerical algorithms for calculating them to a high degree of
accuracy. The reader is referred to books on numerical linear algebra.
So the main virtue of Theorem 8.6.1 is that it provides a way of constructing an SVD for every real
matrix A. In particular it shows that every real matrix A has a singular value decomposition9 in the
following, more general, sense:
Definition 8.9
A Singular Value Decomposition (SVD) of an m ×n matrix A of rank r is a factorization
D 0
A = U ΣV T where U and V are orthogonal and Σ = in block form where
0 0 m×n
D = diag (d1 , d2 , . . . , dr ) where each di > 0, and r ≤ m and r ≤ n.
Note that for any SVD A = U ΣV T we immediately obtain some information about A:
Lemma 8.6.3
If A = U ΣV T is any SVD for A as in Definition 8.9, then:
1. r = rank A.
so ΣT Σ and AT A are similar n ×n matrices (Definition 5.11). Hence r = rank A by Corollary 5.4.3, proving
(1.). Furthermore, ΣT Σ and AT A have the same eigenvalues by Theorem 5.5.1; that is (using (1.)):
We note in passing that more is true. Let A be m × n of rank r, and let A = U ΣV T be any SVD for A.
Using the proof of Lemma 8.6.3 we have di = σiτ for some permutation τ of {1, 2, . . . , r}. In fact, it can
be shown that there exist orthogonal matrices U1 and V1 obtained from U and V by τ -permuting columns
and rows respectively, such that A = U1 ΣAV1T is an SVD of A.
9 In fact every complex matrix has an SVD [J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997]
452 Orthogonality
It turns out that any singular value decomposition contains a great deal of information about an m ×
n matrix A and the subspaces associated with A. For example, in addition to Lemma 8.6.3, the set
{p1 , p2 , . . . , pr } of vectors constructed in the proof of Theorem 8.6.1 is an orthonormal basis of col A
(by (v) and (viii) in the proof). There are more such examples, which is the thrust of this subsection.
In particular, there are four subspaces associated to a real m × n matrix A that have come to be called
fundamental:
Definition 8.10
The fundamental subspaces of an m × n matrix A are:
null A = {x ∈ Rn | Ax = 0}
null AT = {x ∈ Rn | AT x = 0}
If A = U ΣV T is any SVD for the real m × n matrix A, any orthonormal bases of U and V provide orthonor-
mal bases for each of these fundamental subspaces. We are going to prove this, but first we need three
properties related to the orthogonal complement U ⊥ of a subspace U of Rn , where (Definition 8.1):
U ⊥ = {x ∈ Rn | u · x = 0 for all u ∈ U }
The orthogonal complement plays an important role in the Projection Theorem (Theorem 8.1.3), and we
return to it in Section 10.2. For now we need:
Lemma 8.6.4
If A is any matrix then:
U ⊥ = span {fk+1 , . . . , fm }
Proof.
Hence null A = ( row A)⊥ . Now replace A by AT to get null AT = ( row AT )⊥ = ( col A)⊥ , which is
the other identity in (1).
3. We have span {fk+1 , . . . , fm } ⊆ U ⊥ because {f1 , . . . , fm } is orthogonal. For the other inclusion, let
x ∈ U ⊥ so fi · x = 0 for i = 1, 2, . . . , k. By the Expansion Theorem 5.3.6:
With this we can see how any SVD for a matrix A provides orthonormal bases for each of the four
fundamental subspaces of A.
Theorem 8.6.2
Let A be an m × n real matrix, let A = U ΣV T be any SVD for A where U and V are orthogonal of
size m × m and n × n respectively, and let
D 0
Σ= where D = diag (λ1 , λ2 , . . . , λr ), with each λi > 0
0 0 m×n
Write U = u1 · · · ur · · · um and V = v1 · · · vr · · · vn , so {u1 , . . . , ur , . . . , um }
and {v1 , . . . , vr , . . . , vn } are orthonormal bases of Rm and Rn respectively. Then
√ √ √
1. r = rank A, and the singular values of A are λ1 , λ2 , . . . , λr .
Proof.
(a.)
b. We have ( col A)⊥ = ( span {u1 , . . . , ur })⊥ = span {ur+1 , . . . , um } by Lemma 8.6.4(3). This
proves (b.) because ( col A)⊥ = null AT by Lemma 8.6.4(1).
c. We have dim ( null A) + dim ( im A) = n by the Dimension Theorem 7.2.4, applied to
T : Rn → Rm where T (x) = Ax. Since also im A = col A by Lemma 8.6.1, we obtain
So to prove (c.) it is enough to show that v j ∈ null A whenever j > r. To this end write
Example 8.6.2
Consider the homogeneous linear system
Ax = 0 of m equations in n variables
Then the set of all solutions is null A. Hence if A = U ΣV T is any SVD for A then (in the notation
of Theorem 8.6.2) {vr+1 , . . . , vn } is an orthonormal basis of the set of solutions for the system. As
such they are a set of basic solutions for the system, the most basic notion in Chapter 1.
8.6. The Singular Value Decomposition 455
If A is real and n × n the factorization in the title is related to the polar decomposition A. Unlike the SVD,
in this case the decomposition is uniquely determined by A.
Recall (Section 8.3) that a symmetric matrix A is called positive definite if and only if xT Ax > 0 for
every column x 6= 0 ∈ Rn . Before proceeding, we must explore the following weaker notion:
Definition 8.11
A real n × n matrix G is called positive10 if it is symmetric and
xT Gx ≥ 0 for all x ∈ Rn
1 1
Clearly every positive definite matrix is positive, but the converse fails. Indeed, A = is positive
1 1
T T
because, if x = a b in R2 , then xT Ax = (a + b)2 ≥ 0. But yT Ay = 0 if y = 1 −1 , so A is not
positive definite.
Lemma 8.6.5
Let G denote an n × n positive matrix.
Proof.
Definition 8.12
If A is a real n × n matrix, a factorization
Any SVD for a real square matrix A yields a polar form for A.
Theorem 8.6.3
Every square real matrix has a polar form.
Proof. Let A = U ΣV T be a SVD for A with Σ as in Definition 8.9 and m = n. Since U T U = In here we
have
A = U ΣV T = (U Σ)(U T U )V T = (U ΣU T )(UV T )
So if we write G = U ΣU T and Q = UV T , then Q is orthogonal, and it remains to show that G is positive.
But this follows from Lemma 8.6.5.
The SVD for a square matrix A is not unique (In = PIn PT for any orthogonal matrix P). But given the
proof of Theorem 8.6.3 it is surprising that the polar decomposition is unique.11 We omit the proof.
The name “polar form” is reminiscent of the same form for complex numbers (see Appendix A). This
is no coincidence. To see why, we represent the complex numbers as real 2 × 2 matrices. Write M2 (R) for
the set of all real 2 × 2 matrices, and define
a −b
σ : C → M2 (R) by σ (a + bi) = for all a + bi in C
b a
One verifies that σ preserves addition and multiplication in the sense that
for all complex numbers z and w. Since θ is one-to-one we may identify each complex number a + bi with
the matrix θ (a + bi), that is we write
a −b
a + bi = for all a + bi in C
b a
0 0 1 0 0 −1 r 0
Thus 0 = ,1= = I2 , i = , and r = if r is real.
0 0 0 1 1 0 0 r
√
If z = a + bi is nonzero then the absolute value r = |z| = a2 + b2 6= 0. If θ is the angle of z in standard
position, then cos θ = a/r and sin θ = b/r. Observe:
a −b r 0 a/r −b/r r 0 cos θ − sin θ
= = = GQ (xiii)
b a 0 r b/r a/r 0 r sin θ cos θ
r 0 cos θ − sin θ
where G = is positive and Q = is orthogonal. But in C we have G = r and
0 r sin θ cos θ
Q = cos θ + i sin θ so (xiii) reads z = r(cos θ + i sin θ ) = reiθ which is the classical
polar form for the
a −b
complex number a + bi. This is why (xiii) is called the polar form of the matrix ; Definition
b a
8.12 simply adopts the terminology for n × n matrices.
11 See J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997, page 379.
8.6. The Singular Value Decomposition 457
It is impossible for a non-square matrix A to have an inverse (see the footnote to Definition 2.11). Nonethe-
less, one candidate for an “inverse” of A is an m × n matrix B such that
Such a matrix B is called a middle inverse for A. If A is invertible then A−1 is the unique middle inverse
for
1 0
A, but a middle inverse is not unique in general, even for square matrices. For example, if A = 0 0
0 0
1 0 0
then B = is a middle inverse for A for any b.
b 0 0
If ABA = A and BAB = B it is easy to see that AB and BA are both idempotent matrices. In 1955 Roger
Penrose observed that the middle inverse is unique if both AB and BA are symmetric. We omit the proof.
Definition 8.13
Let A be a real m × n matrix. The pseudoinverse of A is the unique n × m matrix A+ such that A
and A+ satisfy P1 and P2, that is:
If A is invertible then A+ = A−1 as expected. In general, the symmetry in conditions P1 and P2 shows
that A is the pseudoinverse of A+ , that is A++ = A.
12 R. Penrose, A generalized inverse for matrices, Proceedings of the Cambridge Philosophical Society 5l (1955), 406-413.
In fact Penrose proved this for any complex matrix, where AB and BA are both required to be hermitian (see Definition 8.18 in
the following section).
13 Penrose called the matrix A+ the generalized inverse of A, but the term pseudoinverse is now commonly used. The matrix
+
A is also called the Moore-Penrose inverse after E.H. Moore who had the idea in 1935 as part of a larger work on “General
Analysis”. Penrose independently re-discovered it 20 years later.
458 Orthogonality
Theorem 8.6.5
Let A be an m × n matrix.
Proof. Here AAT (respectively AT A) is invertible by Theorem 5.4.4 (respectively Theorem 5.4.3). The rest
is a routine verification.
In general, given an m × n matrix A, the pseudoinverse A+ can be computed from any SVD for A. To
see how, we need some notation. Let A = U ΣV T be an SVD for A (as in Definition 8.9) where U and V
D 0
are orthogonal and Σ = in block form where D = diag (d1 , d2 , . . . , dr ) where each di > 0.
0 0 m×n
Hence D is invertible, so we make:
Definition 8.14
−1
′ D 0
Σ = .
0 0 n×m
Lemma 8.6.6
• ΣΣ′ Σ = Σ Ir 0
• ΣΣ′ =
0 0 m×m
Ir 0
• Σ′ Σ =
• Σ′ ΣΣ′ = Σ′ 0 0 n×n
by Lemma 8.6.6. Similarly BAB = B. Moreover AB = U (ΣΣ′)U T and BA = V (Σ′ Σ)V T are both symmetric
again by Lemma 8.6.6. This proves
Theorem 8.6.6
Let A be real and m × n, and let A = U ΣV T is any SVD for A as in Definition 8.9. Then
A+ = V Σ′U T .
8.6. The Singular Value Decomposition 459
Of
coursewe can always use the SVD constructed in Theorem 8.6.1 to find the pseudoinverse. If
1 0
1 0 0
A = 0 0 , we observed above that B = is a middle inverse for A for any b. Furthermore
b 0 0
0 0
AB is symmetric but BA is not, so B 6= A+ .
Example 8.6.3
1 0
Find A+ if A = 0 0 .
0 0
1 0
T
Solution. A A = with eigenvalues λ1 = 1 and λ2 = 0 and corresponding eigenvectors
0 0
1 0
q1 = and q2 = . Hence Q = q1 q2 = I2 . Also A has rank 1 with singular values
0 1
1 0
′ 1 0 0
σ1 = 1 and σ2 = 0, so ΣA = 0 0 = A and ΣA = = AT in this case.
0 0 0
0 0
1 0 1
Since Aq1 = 0 and Aq2 = 0 , we have p1 = 0 which extends to an orthonormal
0 0 0
0 0
basis {p1 , p2 , p3 } of R3 where (say) p2 = 1 and p3 = 0 . Hence
0 1
P = p1 p2 p3 =I, so the SVD for A is A = PΣA Q . Finally, the pseudoinverse of A is
T
1 0 0
A+ = QΣ′A PT = Σ′A = . Note that A+ = AT in this case.
0 0 0
The following Lemma collects some properties of the pseudoinverse that mimic those of the inverse.
The verifications are left as exercises.
Lemma 8.6.7
Let A be an m × n matrix of rank r.
1. A++ = A.
3. (AT )+ = (A+ )T .
Exercise 8.6.2 For any matrix A show that Exercise 8.6.11 If A = U ΣV T is an SVD for A, find an
SVD for AT .
ΣAT = (ΣA )T
Exercise 8.6.12 Let A be a real, m × n matrix with pos-
itive singular values σ1 , σ2 , . . . , σr , and write
Exercise 8.6.3 If A is m × n with all singular values pos-
itive, what is rank A? s(x) = (x − σ1 )(x − σ2 ) · · · (x − σr )
Exercise 8.6.4 If A has singular values σ1 , . . . , σr , what a. Show that cAT A (x) = s(x)xn−r and
are the singular values of: cAT A (c) = s(x)xm−r .
If A is an n × n matrix, the characteristic polynomial cA (x) is a polynomial of degree n and the eigenvalues
of A are just the roots of cA (x). In most of our examples these roots have been real numbers (in fact,
the examples have been carefully chosen so this will be the case!); but it need not happen, even when
0 1
the characteristic polynomial has real coefficients. For example, if A = then cA (x) = x2 + 1
−1 0
has roots i and −i, where i is a complex number satisfying i2 = −1. Therefore, we have to deal with the
possibility that the eigenvalues of a (real) square matrix might be complex numbers.
In fact, nearly everything in this book would remain true if the phrase real number were replaced by
complex number wherever it occurs. Then we would deal with matrices with complex entries, systems
of linear equations with complex coefficients (and complex solutions), determinants of complex matrices,
and vector spaces with scalar multiplication by any complex number allowed. Moreover, the proofs of
most theorems about (the real version of) these concepts extend easily to the complex case. It is not our
intention here to give a full treatment of complex linear algebra. However, we will carry the theory far
enough to give another proof that the eigenvalues of a real symmetric matrix A are real (Theorem 5.5.7)
and to prove the spectral theorem, an extension of the principal axes theorem (Theorem 8.2.2).
The set of complex numbers is denoted C . We will use only the most basic properties of these numbers
(mainly conjugation and absolute values), and the reader can find this material in Appendix A.
If n ≥ 1, we denote the set of all n-tuples of complex numbers by Cn . As with Rn , these n-tuples will
be written either as row or column matrices and will be referred to as vectors. We define vector operations
on Cn as follows:
With these definitions, Cn satisfies the axioms for a vector space (with complex scalars) given in Chapter 6.
Thus we can speak of spanning sets for Cn , of linearly independent subsets, and of bases. In all cases,
the definitions are identical to the real case, except that the scalars are allowed to be complex numbers. In
particular, the standard basis of Rn remains a basis of Cn , called the standard basis of Cn .
A matrix A = ai j is called a complex matrix if every entry ai j is a complex number. The notion of
conjugation for complex numbers extends to matrices as follows: Define the conjugate of A = ai j to be
the matrix
A = ai j
obtained from A by conjugating every entry. Then (using Appendix A)