Iterative Methods For Solving Linear Systems
Iterative Methods For Solving Linear Systems
Bounds for \\D+ dqv/dtq\\h independent of h can be obtained along the same
lines as the a priori estimates for u in the previous section. Thus we can proceed
as in Section 3.2: Fourier interpolate v = vh with respect to x and send h —> 0.
We obtain existence in some interval 0 < t < T, T depending only on
Note that
The goal of the series is to promote, through short, inexpensive, expertly written
monographs, cutting edge research poised to have a substantial impact on the solutions
of problems that advance science and technology.The volumes encompass a broad
spectrum of topics important to the applied mathematical areas of education,
government, and industry.
EDITORIAL BOARD
H.T. Banks, Editor-in-Chief, North Carolina State University
Lewis, F. L; Campos, J.; and Selmic, R., Neuro-Fuzzy Control of Industrial Systems with Actuator
Nonlinearities
Bao, Gang; Cowsar, Lawrence; and Masters,Wen, editors, Mathematical Modeling in Optical
Science
Banks, H.T.; Buksas, M. W.; and Lin.T., E/ectromagnetic Material Interrogation Using Conductive
Interfaces and Acoustic Wavefronts
Bank, Randolph E., PLTMG:A Software Package for Solving Elliptic Partial Differential Equations.
Users'Guide 7.0
Rude, Ulrich, Mathematical and Computational Techniques for Multilevel Adaptive Methods
Van Loan, Charles, Computational Frameworks for the Fast Fourier Transform
Bank, R. E., PLTMG: A Software Package for Solving Elliptic Partial Differential Equations.
Users' Guide 6.0
McCormick, Stephen F., Multilevel Adaptive Methods for Partial Differential Equations
Anne Greenbaum
University of Washington
Seattle, Washington
109876543
All rights reserved. Printed in the United States of America. No part of this book may be
reproduced, stored, or transmitted in any manner without the written permission of the publisher.
For information, write to the Society for Industrial and Applied Mathematics, 3600 University City
Science Center, Philadelphia, PA 19104-2688.
Greenbaum, Anne.
Iterative methods for solving linear systems / Anne Greenbaum.
p. cm. -- (Frontiers in applied mathematics ; 17)
Includes bibliographical references and index.
ISBN 0-89871-396-X (pbk.)
1. Iterative methods (Mathematics) 2. Equations, Simultaneous
-Numerical solutions. I. Title. II. Series.
QA297.8.G74 1997
519.4-dc21 97-23271
Exercise 3.2 is reprinted with permission from K.-C. Toh, GRMES vs. ideal GMRES, SIAM
Journal on Matrix Analysis and Applications, 18 (1994), pp. 30-36. Copyright 1997 by the Society
for Industrial and Applied Mathematics. All rights reserved.
Exercise 5.4 is reprinted with permission from N. M. Nachtigal, S. C. Reddy, and L. N. Trefethen,
How fast are nonsymmetric iterations?, SIAM Journal on Matrix Analysis and Applications, 13
(1994), pp. 778-795. Copyright 1992 by the Society for Industrial and Applied Mathematics. All
rights reserved.
is a registered trademark.
Contents
List of Algorithms xi
Preface xiii
CHAPTER 1. Introduction 1
1.1 Brief Overview of the State of the Art 3
1.1.1 Hermitian Matrices 3
1.1.2 Non-Hermitian Matrices 5
1.1.3 Preconditioners 6
1.2 Notation 6
1.3 Review of Relevant Linear Algebra 7
1.3.1 Vector Norms and Inner Products 7
1.3.2 Orthogonality 8
1.3.3 Matrix Norms 9
1.3.4 The Spectral Radius 11
1.3.5 Canonical Forms and Decompositions 13
1.3.6 Eigenvalues and the Field of Values 16
II Preconditioners 117
CHAPTER 8. Overview and Preconditioned Algorithms 119
References 205
Index 213
This page intentionally left blank
List of Algorithms
XI
This page intentionally left blank
Preface
In recent years much research has focused on the efficient solution of large
sparse or structured linear systems using iterative methods. A language full
of acronyms for a thousand different algorithms has developed, and it is often
difficult for the nonspecialist (or sometimes even the specialist) to identify the
basic principles involved. With this book, I hope to discuss a few of the most
useful algorithms and the mathematical principles behind their derivation and
analysis. The book does not constitute a complete survey. Instead I have
tried to include the most useful algorithms from a practical point of view and
the most interesting analysis from both a practical and mathematical point of
view.
The material should be accessible to anyone with graduate-level knowledge
of linear algebra and some experience with numerical computing. The relevant
linear algebra concepts are reviewed in a separate section and are restated
as they are used, but it is expected that the reader will already be familiar
with most of this material. In particular, it may be appropriate to review
the QR decomposition using the modified Gram-Schmidt algorithm or Givens
rotations, since these form the basis for a number of algorithms described here.
Part I of the book, entitled Krylov Subspace Approximations, deals with
general linear systems, although it is noted that the methods described are
most often useful for very large sparse or structured matrices, for which
direct methods are too costly in terms of computer time and/or storage. No
specific applications are mentioned there. Part II of the book deals with
Preconditioned, and here applications must be described in order to define and
analyze some of the most efficient preconditioners, e.g., multigrid methods. It
is assumed that the reader is acquainted with the concept of finite difference
approximations, but no detailed knowledge of finite difference or finite element
methods is assumed. This means that the analysis of preconditioners must
generally be limited to model problems, but, in most cases, the proof techniques
carry over easily to more general equations. It is appropriate to separate the
study of iterative methods into these two parts because, as the reader will
see, the tools of analysis for Krylov space methods and for preconditioners are
really quite different. The field of preconditioners is a much broader one, since
xin
xiv Preface
• A brief overview of the state of the art in section 1.1. This gives the
reader an understanding of what has been accomplished and what open
problems remain in this field, without going into the details of any
particular algorithm.
• Analysis of the effect of rounding errors on the convergence rate of the
conjugate gradient method in Chapter 4 and discussion of how this
problem relates to some other areas of mathematics. In particular, the
analysis is presented as a matrix completion result or as a result about
orthogonal polynomials.
• Discussion of open problems involving error bounds for GMRES in
section 3.2, along with exercises in which some recently proved results
are derived (with many hints included).
• Discussion of the transport equation as an example problem in section
9.2. This important equation has received far less attention from nu-
merical analysts than the more commonly studied diffusion equation
of section 9.1, yet it serves to illustrate many of the principles of non-
Hermitian matrix iterations.
• Inclusion of multigrid methods in the part of the book on preconditioners
(Chapter 12). Multigrid methods have proved extremely effective for
solving the linear systems that arise from differential equations, and they
should not be omitted from a book on iterative methods. Other recent
books on iterative methods have also included this topic; see, e.g., [77].
• A small set of recommended algorithms and implementations. These are
enclosed in boxes throughout the text.
This last item should prove helpful to those interested in solving particular
problems as well as those more interested in general properties of iterative
Preface xv
methods. Most of these algorithms have been implemented in the Templates for
the Solution of Linear Systems: Building Blocks for Iterative Methods [10], and
the reader is encouraged to experiment with these or other iterative routines for
solving" linear systems. This book could serve as a supplement to the Templates
documentation, providing a deeper look at the theory behind these algorithms.
I would like to thank the graduate students and faculty at Cornell
University who attended my seminar on iterative methods during the fall of
1994 for their many helpful questions and comments. I also wish to thank a
number of people who read through earlier drafts or sections of this manuscript
and made important suggestions for improvement. This list includes Michele
Benzi, Jim Ferguson, Martin Gutknecht, Paul Holloway, Zdenek Strakos, and
Nick Trefethen.
Finally, I wish to thank the Courant Institute for providing me the
opportunity for many years of uninterrupted research, without which this book
might not have developed. I look forward to further work at the University of
Washington, where I have recently joined the Mathematics Department.
Anne Greenbaum
Seattle
This page intentionally left blank
Chapter 1
Introduction
The subject of this book is what at first appears to be a very simple problem—
how to solve the system of linear equations Ax = 6, where A is an n-by-n
nonsingular matrix and b is a given n-vector. One well-known method is
Gaussian elimination. In general, this requires storage of all n2 entries of
the matrix A as well as approximately 2n3/3 arithmetic operations (additions,
subtractions, multiplications, and divisions). Matrices that arise in practice,
however, often have special properties that only partially can be exploited by
Gaussian elimination. For example, the matrices that arise from differencing
partial differential equations are often sparse, having only a few nonzeros per
row. If the (i, j)-entry of matrix A is zero whenever \i — j\ > m, then a banded
version of Gaussian elimination can solve the linear system by storing only the
approximately 2mn entries inside the band (i.e., those with i — j\ < ra) and
performing about 2m2n operations. The algorithm cannot take advantage of
any zeros inside the band, however, as these fill in with nonzeros during the
process of elimination.
In contrast, sparseness and other special properties of matrices can often be
used to great advantage in matrix-vector multiplication. If a matrix has just
a few nonzeros per row, then the number of operations required to compute
the product of that matrix with a given vector is just a few n, instead of the
2n2 operations required for a general dense matrix-vector multiplication. The
storage required is only that for the nonzeros of the matrix, and, if these
are sufficiently simple to compute, even this can be avoided. For certain
special dense matrices, a matrix-vector product can also be computed with
just O(ri) operations. For example, a Cauchy matrix is one whose (^,j)-entry
is l/(zi — Zj) for i ^ j, where z\,...,zn are some complex numbers. The
product of this matrix with a given vector can be computed in time O(n)
using the fast multipole method [73], and the actual matrix entries need never
be computed or stored. This leads one to ask whether the system of linear
equations Ax — b can be solved (or an acceptably good approximate solution
obtained) using matrix-vector multiplications. If this can be accomplished
with a moderate number of matrix-vector multiplications and little additional
work, then the iterative procedure that does this may far outperform Gaussian
I
2 Iterative Methods for Solving Linear Systems
One then computes the product Ab and takes the next approximation to be
some linear combination of b and Ab:
The space represented on the right in (1.1) is called a Krylov subspace for the
matrix A and initial vector b.
Given that x^ is to be taken from the Krylov space in (1.1), one must ask
the following two questions:
(i) How good an approximate solution is contained in the space (1.1)?
(ii) How can a good (optimal) approximation from this space be computed
with a moderate amount of work and storage?
These questions are the subject of Part I of this book.
If it turns out that the space (1.1) does not contain a good approximate
solution for any moderate size k or if such an approximate solution cannot be
computed easily, then one might consider modifying the original problem to
obtain a better Krylov subspace. For example, one might use a preconditioner
M and effectively solve the modified problem
mations are (for the worst possible right-hand side vector 6) in terms of the
eigenvalues of the matrix. Consider the 2-norm of the residual in the MINRES
algorithm. It follows from (1.1) that the residual Tk = b — Axk can be written
in the form
and since this holds for any fcth-degree polynomial Pk with Pfe(0) = 1, we have
It turns out that the bound (1.5) on the size of the MINRES residual at step
k is sharp—that is, for each k there is a vector b for which this bound will be
attained [63, 68, 85]. Thus the question of the size of the MINRES residual
at step k is reduced to a problem in approximation theory—how well can one
approximate zero on the set of eigenvalues of A using a fcth-degree polynomial
with value 1 at the origin? One can answer this question precisely with a
complicated expression involving all of the eigenvalues of A, or one can give
simple bounds in terms of just a few of the eigenvalues of A. The important
point is that the norm of the residual (for the worst-case right-hand side vector)
is completely determined by the eigenvalues of the matrix, and we have at least
intuitive ideas of what constitutes good and bad eigenvalue distributions. The
same reasoning shows that the .A-norm of the error e^ = A~lb — Xk in the CG
algorithm satisfies
anything close to it! In fact, the CG algorithm originally lost favor partly
because it did not behave the way exact arithmetic theory predicted [43]. More
recent work [65, 71, 35, 34] has gone a long way toward explaining the behavior
of the MINRES and CG algorithms in finite precision arithmetic, although
open problems remain. This work is discussed in Chapter 4.
where n(V) = \\V\\ • \\V~l\\ is the condition number of the eigenvector matrix.
The scaling of the columns of V can be chosen to minimize this condition
number. When A is a normal matrix, so that V can be taken to have condition
number one, it turns out that the bound (1.7) is sharp, just as for the Hermitian
case. Thus for normal matrices, the analysis of GMRES again reduces to a
question of polynomial approximation—how well can one approximate zero on
6 Iterative Methods for Solving Linear Systems
1.2. Notation.
We assume complex matrices and vectors throughout this book. The results
for real problems are almost always the same, and we point out any differences
that might be encountered. The symbol i is used for \/—I, and a superscript
H
denotes the Hermitian transpose (A^ = Aji, where the overbar denotes the
complex conjugate). The symbol || • || will always denote the 2-norm for vectors
and the induced spectral norm for matrices. An arbitrary norm will be denoted
HI-HI-
The linear system (or sometimes the preconditioned linear system) under
consideration is denoted Ax = 6, where A is an n-by-n nonsingular matrix and
b is a given n-vector. If x^ is an approximate solution then the residual b — Ax^
Introduction 7
is denoted as r^ and the error A~lb — Xk as e^. The symbol ^ denotes the jth
unit vector, i.e., the vector whose jth entry is 1 and whose other entries are 0,
with the size of the vector being determined by the context.
A number of algorithms are considered, and these are first stated in the
form most suitable for presentation. This does not always correspond to
the best computational implementation. Algorithms enclosed in boxes are
the recommended computational procedures, although details of the actual
implementation (such as how to carry out a matrix-vector multiplication or
how to solve a linear system with the preconditioner as coefficient matrix)
are not included. Most of these algorithms are implemented in the Templates
[10], with MATLAB, Fortran, and C versions. To see what is available in this
package, type
mail netlib@ornl.gov
send index for templates
mail netlibQornl.gov
send mltemplates.shar from templates
or download the appropriate file from the web. The reader is encouraged to
experiment with the iterative methods described in this book, either through
use of the Templates or another software package or through coding the
algorithms directly.
By definition we have \\v\\2 = (v, v), and it follows that IHIc^G = (^v^v) =
(V,V)GHQ. If HI • HI is any norm associated with an inner product ({-, •}}, then
there is a nonsingular matrix G such that
The i,j entry of GHG is {{£i,£j}}, where & and £j are the unit vectors with
one in position i and j, respectively, and zeros elsewhere.
and to minimize the G^G-norm of v in the direction iu, one subtracts off the
G^G-projection of v onto w. That is, if
then of all vectors of the form v — aw where a is a scalar, v has the smallest
G^G-norm.
An n-by-n complex matrix with orthonormal columns is called a unitary
matrix. For a unitary matrix Q, we have QHQ = QQH = I, where / is the
n-by-n identity matrix. If the matrix Q is real, then it can also be called an
orthogonal matrix.
Given a set of linearly independent vectors {vi,..., vn}, one can construct
an orthonormal set {?/i,..., un} using the Gram-Schmidt procedure:
Set
Introduction 9
Here, instead of computing the projection of v^ onto each of the basis vectors
it,, i = 1,. . . , & — 1, the next basis vector is formed by first subtracting off the
projection of Vk in the direction of one of the basis vectors and then subtracting
off the projection of the new vector Uk in the direction of another basis vector,
etc. The modified Gram-Schmidt procedure forms the core of many iterative
methods for solving linear systems.
If Uk is the matrix whose columns are the orthonormal vectors u\,..., Uk,
then the closest vector to a given vector v from the space spanjui,..., Uk} is
the projection of v onto this space,
1.3.3. Matrix Norms. Let Mn denote the set of n-by-n complex matrices.
DEFINITION 1.3.1. A function ||| • | : Mn —>• R is called a matrix norm if,
for all A, B £ Mn and all complex scalars c,
The "max" in the above definition could be replaced by "sup." The two are
equivalent since |||-Ay||| is a continuous function of y and the unit ball in C n ,
10 Iterative Methods for Solving Linear Systems
being a compact set, contains the vector for which the sup is attained. Another
equivalent definition is
the maximal absolute column sum of A. To see this, write A in terms of its
columns A = [ai,..., an]. Then
Thus, max||y||1=1 \\Ay\\i < \\A\\i. But if y is the unit vector with a 1 in the
position of the column of A having the greatest 1-norm and zeros elsewhere,
then \\Ay\\i = \\A\\i, so we also have max||y||1=i \\Ay\\i > \\A\\i.
The norm || • ||oo induced on Mn by the co-norm for vectors is
the largest absolute row sum of A. To see this, first note that
and hence max||y||oo=1 ||A?/||oo < Halloo- On the other hand, suppose the kth
row of A has the largest absolute row sum. Take y to be the vector whose
jth entry is akj/\akj\ if a^j ^ 0 and 1 if a^j = 0. Then the kih entry of Ay
is the sum of the absolute values of the entries in row k of A, so we have
Pylloo > HAIIoo.
The norm || • |J2 induced on Mn by the 2-norm for vectors is
This matrix norm will also be denoted, simply, as || • || from here on.
THEOREM 1.3.1. // ||| • ||| is a matrix norm on Mn and if G & Mn is
nonsingular, then
is a matrix norm. If \\\ • \\\ is induced by a vector norm \\\ • |||, then ||| • |||GHG
is induced by the vector norm \\\ • |||GHG.
Proof. Axioms 1-3 of Definition 1.3.1 are easy to verify, and axiom 4 follows
from
If I H 4 I I = max^0|Py|||/||M||, then
so III • |||G»G *s the matrix norm induced by the vector norm ||| • )||GHG.
Proof. The proof of this theorem will use the Schur triangularization, which
is stated as a theorem in the next section. According to this theorem, there is
a unitary matrix Q and an upper triangular matrix U whose diagonal entries
12 Iterative Methods for Solving Linear Systems
are the eigenvalues of A such that A = QUQH. Set Dt = diag(£, t2,..., tn)
and note that
For t sufficiently large, the sum of absolute values of all off-diagonal elements
in a column is less than e, so HDfC/D^lli < p(A) + e. Thus if we define the
matrix norm 111 • 111 by
for any matrix B <E Mn, then we will have |||A||| = \\DtUD^\\i < p(A) + e. It
follows from Theorem 1.3.1 that ||| • ||| is a matrix norm since
where
It can be shown by induction that the kth power of a j-by-j Jordan block
corresponding to the eigenvalue A is given by
where the symbol ~ means that, asymptotically, the left-hand side behaves
like the right-hand side. The 2-norm of an arbitrary matrix Ak satisfies
where j is the largest order of all diagonal submatrices Jr of the Jordan form
with p( Jr} — p(A) and v is a positive constant.
THEOREM 1.3.6 (Schur form). Let A be an n-by-n matrix with eigenvalues
A i , . . . , A n in any prescribed order. There is a unitary matrix Q such that
A = QUQH, where U is an upper triangular matrix and U^i = Aj.
Note that while the transformation S taking a matrix to its Jordan form
may be extremely ill conditioned (that is, S in Theorem 1.3.5 may be nearly
singular), the transformation to upper triangular form is perfectly conditioned
(Q in Theorem 1.3.6 is unitary). Consequently, the Schur form often proves
more useful in numerical analysis.
The Schur form is not unique, since the diagonal entries of U may appear in
any order, and the entries of the upper triangle may be very different depending
on the ordering of the diagonal entries. For example, the upper triangular
matrices
are two Schur forms of the same matrix, since they are unitarily equivalent via
Introduction 15
Proof. From the definition of the Frobenius norm, it is seen that ||^4.|||' =
ti(AHA), where tr(-) denotes the trace, i.e., the sum of the diagonal entries.
If A — VSWH is the singular value decomposition of A, then
1.3.6. Eigenvalues and the Field of Values. We will see later that the
eigenvalues of a normal matrix provide all of the essential information about
that matrix, as far as iterative linear system solvers are concerned. There is no
corresponding simple set of characteristics of a nonnormal matrix that provide
Introduction 17
such complete information, but the field of values captures certain important
properties.
We begin with the useful theorem of Gerschgorin for locating the eigenval-
ues of a matrix.
THEOREM 1.3.11 (Gerschgorin). Let A be an n-by-n matrix and let
denote the sum of the absolute values of all off-diagonal entries in row i. Then
all eigenvalues of A are located in the union of disks
or, equivalently,
Since vp\ > 0, it follows that |A — ap;p| < Rp(A); that is, the eigenvalue A lies
in the Gerschgorin disk for the row corresponding to its eigenvector's largest
entry, and hence all of the eigenvalues lie in the union of the Gerschgorin disks.
denote the sum of the absolute values of all off-diagonal entries in column j.
Then all eigenvalues of A are located in the union of disks
The field of values is a compact set in the complex plane, since it is the
continuous image of a compact set—the surface of the Euclidean ball. It can
also be shown to be a convex set. This is known as the Toeplitz-Hausdorff
theorem. See, for example, [81] for a proof. The numerical radius v(A] is the
largest absolute value of an element of F(A):
Introduction 19
since
Also,
since
it follows that ^(A) is just the set of all convex combinations of the eigenvalues
AI, • • • i A n .
For a general matrix A, let H(A) = \(A + AH) denote the Hermitian part
of A. Then
Thus each point in f(H(A)) is of the form Re(z) for some z € f(A) and vice
versa.
The analogue of Gerschgorin's theorem for the field of values is as follows.
20 Iterative Methods for Solving Linear Systems
Let Gp(A) denote the set in (1.19). If Gp(A) is contained in the open right
half-plane {z : Re(z) > 0}, then Re(ai)i) > \(Ri(A) + Ci(A)) for all 2, and
hence the set on the right in (1.20) is contained in the open right half-plane.
Since ^(A) is convex, it follows that F(A) lies in the open right half-plane.
Now suppose only that Gp(A) is contained in some open half-plane
about the origin. Since Gp(A) is convex, this is equivalent to the condition
0 £ GF(A). Then there is some 0 <E [0,2?r) such that eieGF(A) = GF(el0A) is
contained in the open right half-plane. It follows from the previous argument
that F(elBA) = et9J:(A) lies in the open right half-plane, and hence 0 £ F(A).
Finally, for any complex number a, if a £ GF(A) then 0 $. GF(A — a/),
and the previous argument implies that 0 ^ F(A — al). Using (1.16), it follows
that a g F(A). Therefore, F(A) C GF(A).
The following procedure can be used to approximate the field of values
numerically. First note that since F(A) is convex and compact it is necessary
only to compute the boundary. If many well-spaced points are computed
around the boundary, then the convex hull of these points is a polygon p(A)
that is contained in F(A), while the intersection of the half-planes determined
by the support lines at these points is a polygon P(A) that contains J-'(A).
To compute points around the boundary of ^(A), first note from (1.18)
that the rightmost point in ^(A) has real part equal to the rightmost point in
f(H(A)), which is the largest eigenvalue of H(A). If we compute the largest
eigenvalue Xmax of H(A) and the corresponding unit eigenvector v, then VHAv
Introduction 21
is a boundary point of ^(A) and the vertical line {Amoa; + ti, t € R}, is a
support line for J-(A); that is, F(A) is contained in the half-plane to the left
of this line.
Note also that since e"16 J-(el® A) — ^(A), we can use this same procedure
for rotated matrices el0A, 0 € [0,27r). If Xe denotes the largest eigenvalue of
H(etdA) and vg the corresponding unit eigenvector, then vffAvg is a boundary
point of F(A) and the line [e~lB(\o+ti), t € R} is a support line. By choosing
values of 0 throughout the interval [0,2?r), the approximating polygons p(A)
and P(A) can be made arbitrarily close to the true field of values F(A).
The numerical radius v(A) also has a number of interesting properties. For
any two matrices A and B, it is clear that
Although the numerical radius is not itself a matrix norm (since the
requirement v(AB) < v(A) • v(B) does not always hold), it is closely related
to the 2-norm:
The second inequality in (1.21) follows from the fact that for any vector y with
\\y\\ = 1, we have
The first inequality in (1.21) is derived as follows. First note that v(A) =
v(AH). Writing A in the form A = H(A) + N(A), where N(A) = (A - AH)/2,
and noting that both H(A) and N(A) are normal matrices, we observe that
values does not have this property: F(p(A)) ^ p(f(A}}. We will see in later
sections, however, that the weaker property (1.22) can still be useful in deriving
error bounds for iterative methods. From (1-22) it follows that if the field of
values of A is contained in a disk of radius r centered at the origin, then the
field of values of Am is contained in a disk of radius rm centered at the origin.
There are many interesting generalizations of the field of values. One that
is especially relevant to the analysis of iterative methods is as follows.
DEFINITION 1.3.7. The generalized field of values of a set of matrices
{Ai,..., A}.} in Mn is the subset of Cfc defined by
Note that for k = 1, this is just the ordinary field of values. One can also
define the conical generalized field of values as
It is clear that this object is a cone, in the sense that if z € Fk({Ai}*=l) and
a > 0, then az e ^({Ai}^) . Note also that the conical generalized field of
values is preserved by simultaneous congruence transformation: for P e Mn
nonsingular, ^({AO*Li) = fk({PH'>U>}ii)•
Krylov Subspace
Approximations
This page intentionally left blank
Chapter 2
This procedure of starting with an initial guess XQ for the solution and
generating successive approximations using (2.1) for k = 0,1.... is sometimes
called simple iteration, but more often it goes by different names according to
the choice of M. For M equal to the diagonal of A, it is called Jacobi iteration;
25
26 Iterative Methods for Solving Linear Systems
Given an initial guess XQ, compute r0 = 6 - Ax0, and solve Mz0 = r0.
For A; = 1,2,...,
Set
Compute
Solve
where ||| • ||| can be any vector norm provided that the matrix norm is taken
to be the one induced by the vector norm |||B||| = max||U||=i |||5y|||. In this
case, the bound in (2.3) is sharp, since, for each A;, there is an initial error CQ
for which equality holds.
LEMMA 2.1.1. The norm of the error in iteration (2.1) will approach zero
and Xk will approach A~lb for every initial error SQ if and only if
many values of k. The vectors eo,fc with norm 1 for which equality holds in (2.3)
form a bounded infinite set in Cn, so, by the Bolzano-Weierstrass theorem,
they contain a convergent subsequence. Let CQ be the limit of this subsequence.
Then for k sufficiently large in this subsequence, we have |||eo — eo,fc||| < e < 1,
and
Since this holds for infinitely many values of &, it follows that limfc^oo |||(7 —
M~ 1 A) fc eo|||, if it exists, is greater than 0.
It was shown in section 1.3.1 that, independent of the matrix norm used
in (2.3), the quantity |||(/ — M~1vl)A:|||1/fc approaches the spectral radius,
p(I — M~1A) as k —> oo. Thus we have the following result.
THEOREM 2.1.1. The iteration (2.1) converges to A~lb for every initial
error e$ if and only if p(I — M~1A) < 1.
Proof. If p(I - M~1A) < 1, then
while if p(I - M~1A) > 1, then lim^oo |||(/ - M~1A)fc|||, if it exists, must be
greater than or equal to 1. In either case the result then follows from Lemma
2.1.1.
Having established necessary and sufficient conditions for convergence, we
must now consider the rate of convergence. How many iterations will be
required to obtain an approximation that is within, say, 6 of the true solution?
In general, this question is not so easy to answer.
Taking norms on each side in (2.2), we can write
from which it follows that if |||J — M-1.A||| < 1, then the error is reduced by
at least this factor at each iteration. The error will satisfy |||efc|||/| eo|| < 6
provided that
It was shown in section 1.3.1 that for any e > 0, there is a matrix norm such
that |||/ - M~1A\\\ < p(I - M~1A) + e. Hence if p(I - M~1A) < 1, then
there is a norm for which the error is reduced monotonically, and convergence
is at least linear with a reduction factor approximately equal to p(I — M~1A).
Unfortunately, however, this norm is sometimes a very strange one (as might
be deduced from the proof of Theorem 1.3.3, since the matrix Dt involved an
exponential scaling), and it is unlikely that one would really want to measure
convergence in terms of this norm!
It is usually the 2-norm or the oo-norm or some closely related norm of
the error that is of interest. For the class of normal matrices (diagonalizable
28 Iterative Methods for Solving Linear Systems
and M was taken to be the lower triangle of A. For problem size n =• 30, the
spectral radius of / — M~1A is about .74, while the 2-norm of / — M~1A is
about 1.4. As Figure 2.1 shows, the 2-norm of the error increases by about
four orders of magnitude over its original value before starting to decrease.
While the spectral radius generally does not determine the convergence
rate of early iterations, it does describe the asymptotic convergence rate of
(2.1). We will prove this only in the case where M~1A is diagonalizable
and has a single eigenvalue of largest absolute value. Then, writing the
eigendecomposition of M~1A as VkV~l, where A = diag(Ai,..., A n ) and,
Some Iteration Methods 29
say,
Assuming that the first component of V~I&Q is nonzero, for k sufficiently large,
the largest component of V~1eie will be the first, (V~lek)i. At each subsequent
iteration, this dominant component is multiplied by the factor (1 — AI), and
so we have
Instead of considering the error ratios at successive steps as defining the asymp-
totic convergence rate, one might consider ratios of errors (IHefc+jlll/lllefclH) 1 /- 7
for any k and for j sufficiently large. Then, a more general proof that this
quantity approaches the spectral radius p(I — M"1 A) as j —> oo can be carried
out using the Jordan form, as described in section 1.3.2. Note in Figure 2.1
that eventually the error decreases by about the factor p(I — M~^A) — .74 at
each step, since this matrix is diagonalizable with a single eigenvalue of largest
absolute value.
Since the residual satisfies r^+i = r^ ~ a^Ar^, one can minimize the 2-nonn of
r fc+ i by choosing
30 Iterative Methods for Solving Linear Systems
If the matrix A is Hermitian and positive definite, one might instead minimize
the .A-norm of the error, ||efc+i|U = (ek+i,Aek+i)1/2. Since the error satisfies
efc+i = efc —flfcrfc,the coefficient that minimizes this error norm is
For Hermitian positive definite problems, the iteration (2.5) with coefficient
formula (2.7) is called the method of steepest descent because if the problem
of solving the linear system is identified with that of minimizing the quadratic
form XH Ax — 1bHx (which has its minimum where Ax = 6), then the
negative gradient or direction of steepest descent of this function at x = xk is
r/t = b — Axk- The coefficient formula (2.6), which can be used with arbitrary
nonsingular matrices A, does not have a special name like steepest descent but
is a special case of a number of different methods. In particular, it can be
called Orthomin(l).
By choosing a^ as in (2.6), the Orthomin(l) method produces a residual
rfc+i that is equal to r^ minus its projection onto Ar^. It follows that
lkfc+i|| < \\fk\\ with equality if and only if rk is already orthogonal to Ark.
Recall the definition of the field of values F(B) of a matrix B as the set of all
complex numbers of the form yHBy/yHy, where y is any complex vector other
than the zero vector.
THEOREM 2.2.1. The 1-norm of the residual in iteration (2.5) with
coefficient formula (2.6) decreases strictly monotonically for every initial vector
TO if and only ifQg. F(AH).
Proof. If 0 € F(AH) and TO is a nonzero vector satisfying (r0,Aro) =
r^AHr0 = 0, then ||ri|| = ||r0||. On the other hand, if 0 g f(AH), then
(rk,Arh) cannot be 0 for any k and ||rfc+i|| < ||rfc||.
Since the field of values of AH is just the complex conjugate of the field of
values of A, the condition in the theorem can be replaced by 0 0 f(A).
Suppose 0 ^ F(AH). To show that the method (2.5-2.6) converges to the
solution A~lb, we will show not only that the 2-norm of the residual is reduced
at each step but that it is reduced by at least some fixed factor, independent of
k. Since the field of values is a closed set, if 0 ^ f(Aa) then there is a positive
number d—the distance of f(AH) from the origin—such that \v „ y\ > d for
\ I O I ytly I —
Bounding the last two factors in (2.8) independently, in terms of d and \\A\\,
we have
for all k, where d is the distance from the origin to the field of values of AH.
In the special case where A is real and the Hermitian part of A, H(A) =
(A + A f f ) / 2 , is positive definite, the distance d in (2.9) is just the smallest
eigenvalue of H(A). This is because the field of values of H(A) is the real part
of the field of values of AH. which is convex and symmetric about the real axis
and hence has its closest point to the origin on the real axis.
The bound (2.9) on the rate at which the 2-norm of the residual is reduced
is not necessarily sharp, since the vectors r^ for which the first factor in (2.8) is
equal to d2 are not necessarily the ones for which the second factor is 1/||>1||2.
Sometimes a stronger bound can be obtained by noting that, because of the
choice of ajt,
for any coefficient a. In the special case where A is Hermitian and positive
definite, consider a = 2/(\n + AI), where Xn is the largest and AI the smallest
eigenvalue of A. Inequality (2.10) then implies that
The same argument applied to the steepest descent method for Hermitian
positive definite problems shows that for that algorithm,
Using relation (1.21) between the numerical radius and the norm of a matrix,
we conclude that for this choice of a
and hence
This estimate may be stronger or weaker than that in (2.9), depending on the
exact size and shape of the field of values. For example, if F(A) is contained
in a disk of radius s = (\\A\\ - d)/2 centered at c = (\\A\\ + d)/2, then (2.13)
implies
This is smaller than the bound in (2.9) if d/||A|| is greater than about .37;
otherwise, (2.9) is smaller.
Stronger results have recently been proved by Eiermann [38, 39]. Suppose
F(A) C 17, where 17 is a compact convex set with 0 ^ fi. Eiermann has
shown that if (f>m(z) — Fm(z)/Fm(Q), where Fm(z) is the mth-degree Faber
polynomial for the set 17 (the analytic part of ($(z))m, where 4>(z) maps the
exterior of fi to the exterior of the unit disk), then
where the minimum is over all rath-degree polynomials with value one at the
origin and the constant Cm depends on 17 but is independent of A. For ra = 1,
as in (2.10), inequality (2.14) is of limited use because the constant Cm may be
larger than the inverse of the norm of the first degree minimax polynomial on
(7. We will see later, however, that for methods such as (restarted) GMRES
involving higher degree polynomials, estimate (2.14) can sometimes lead to
very good bounds on the norm of the residual.
Orthomin(l) can be generalized by using different inner products in (2.6).
That is, if B is a Hermitian positive definite matrix, then, instead of minimizing
the 2-norm of the residual, one could minimize the B-norm of the residual by
taking
it is clear that for this variant the 5-norm of the residual decreases strictly
monotonically for all TO if and only if 0 ^ J:((Blf2AB~lf'2)H). Using arguments
similar to those used to establish Theorem 2.2.2, it can be seen that
Some Iteration Methods 33
where ds is the distance from the origin to the field of values of (B*^AB~l/2)H.
If 0 € ^(A1*), it may still be possible to find a Hermitian positive definite
matrix B such that 0 i ^((Bl^AB^2)H).
where the direction vector pk is equal to the residual TV The new residual and
error vectors then satisfy
Now the residual norm is minimized in the plane spanned by Ar^ and Apk-i,
since we can write
and the coefficients force orthogonality between r^+i and spanjAr^, Ap^-i}.
The new algorithm, known as Orthomin(2), is the following.
Compute
Set
34 Iterative Methods for Solving Linear Systems
The new algorithm can also fail. If (ro,Aro) = 0, then r\ will be equal to
r*o, and pi will be 0. An attempt to compute the coefficient a\ will result
in dividing 0 by 0. As for Orthomin(l), however, it can be shown that
Orthomin(2) cannot fail if 0 0 F(AH). If an Orthomin(2) step does succeed,
then the 2-norm of the residual is reduced by at least as much as it would
be reduced by an Orthomin(l) step from the same point. This is because
the residual norm is minimized over a larger space—r^ + spanjAr-fc, Apk-i]
instead of r^ + span{.Arfc}. It follows that the bound (2.9) holds also for
Orthomin(2) when 0 ^ F(AH). Unfortunately, no stronger a priori bounds
on the residual norm are known for Orthomin(2) applied to a general matrix
whose field of values does not contain the origin although, in practice, it may
perform significantly better than Orthomin(l).
In the special case when A is Hermitian, if the vectors at steps 1 through
k + l of the Orthomin(2) algorithm are defined, then r^+i is minimized not just
over the two-dimensional space r^ + span{Arfc, Apk-i] but over the (k -f 1)-
dimensional space TO + span{Apo, • • • , Ap^}.
THEOREM 2.3.1. Suppose that A is Hermitian, the coefficients OQ, • • • , flfc-i
are nonzero, and the vectors r i , . . . , r^+i and pi,... ,pfc+i in the Orthomin(1}
algorithm are defined. Then
r/j+i has the smallest Euclidean norm. It also follows that if OQ, ..., an-2 are
nonzero and r i , . . . , rn and pi,... ,pn are defined, then rn = 0.
Proof. By construction, we have (ri,Apo) = (Api,Apo} = 0. Assume that
(rfc, Apj) = (Apk, Apj) = 0 V? < k-1. The coefficients at step k + l are chosen
to force (rk+i,Apk) = (Apk+i,Apk) = 0. For j < k - 1, we have
to span{./4po,. • • >^Pfc}> it follows that r^i is the vector in the space (2.16)
with minimal Euclidean norm. For k = n — 1, this implies that rn = 0.
The assumption in Theorem 2.3.1 that r i , . . . ,rk+i and pi,... ,pfc+i are
defined is actually implied by the other hypothesis. It can be shown that these
vectors are defined provided that CQ, ... , flfe-i are denned and nonzero and rk
is nonzero.
An algorithm that approximates the solution of a Hermitian linear system
Ax = b by minimizing the residual over the affine space (2.16) is known as
the MINRES algorithm. It should not be implemented in the form given
here, however, unless the matrix is positive (or negative) definite because, as
noted, this iteration can fail if 0 6 F(A}. An appropriate implementation of
the MINRES algorithm for Hermitian indefinite linear systems is derived in
section 2.5.
In a similar way, the steepest descent method for Hermitian positive definite
matrices can be modified so that it eliminates the A-projection of the error in
a direction that is already ,4-orthogonal to the previous direction vector, i.e.,
in the direction
Then we have
and the A-norm of the error is minimized over the two-dimensional affine space
ek 4- span{rfc,pfc_i}. The algorithm that does this is called the CG method.
It is usually implemented with slightly different (but equivalent) coefficient
formulas, as shown in Algorithm 2.
Compute
Set
Compute
Set
It is left as an exercise for the reader to prove that these coefficient formulas
are equivalent to the more obvious expressions
36 Iterative Methods for Solving Linear Systems
Since the CG algorithm is used only with positive definite matrices, the
coefficients are always defined, and it can be shown, analogous to the MINRES
method, that the -A-norm of the error is actually minimized over the much
larger affine space CQ + span{po,Pi, • • • ,Pk}-
THEOREM 2.3.2. Assume that A is Hermitian and positive definite. The
CG algorithm generates the exact solution to the linear system Ax = b in at
most n steps. The error, residual, and direction vectors generated before the
exact solution is obtained are well defined and satisfy
where the last equality holds because (ri, TO) = 0 and a$l is real. Assume that
For we have
This defines the Orthomin(j) procedure. Unfortunately, the algorithm can still
fail if 0 € f ( A f f ) , and again the only proven a priori bound on the residual
norm is estimate (2.9), although this bound is often pessimistic.
It turns out that the possibility of failure can be eliminated by replacing
rfc in formula (2.19) with Apk-i- This algorithm, known as Orthodir, generally
has worse convergence behavior than Orthomin for j < n, however. The bound
(2.9) can no longer be established because the space over which the norm of
rjb+i is minimized may not contain the vector Ark.
An exception is the case of Hermitian matrices, where it can be shown that
for j — 3, the Orthodir^') algorithm minimizes the 2-norm of the residual over
the entire affine space
Set where
Compute
Set For
A difficulty with this algorithm is that in finite precision arithmetic, the vectors
Sfc, which are supposed to be equal to Apk, may differ from this if there is
much cancellation in the computation of Sk- This could be corrected with an
occasional extra matrix-vector multiplication to explicitly set Sk = Apk at the
end of an iteration. Another possible implementation is given in section 2.5.
For general non-Hermitian matrices, if j = n, then the Orthodir(n)
algorithm minimizes the 2-norm of the residual at each step A; over the affine
space in (2.20). It follows that the exact solution is obtained in n or fewer steps
(assuming exact arithmetic) but at the cost of storing up to n search directions
Pk (as well as auxiliary vectors Sfc = Apk) and orthogonalizing against k
direction vectors at each step k = 1,..., n. If the full n steps are required,
then Orthodir(n) requires O(ra2) storage and O(n3) work, just as would be
required by a standard dense Gaussian elimination routine. The power of the
method lies in the fact that at each step the residual norm is minimized over
the space (2.20) so that, hopefully, an acceptably good approximate solution
can be obtained in far fewer than n steps.
There is another way to compute the approximation x^ for which the norm
of rjt is minimized over the space (2.20). This method requires about half
the storage of Orthodir(n) (no auxiliary vectors) and has better numerical
properties. It is the GMRES method.
The GMRES method uses the modified Gram-Schmidt process to construct
an orthonormal basis for the Krylov space span{ro, Ar$,..., Akro}. When the
modified Gram-Schmidt process is applied to this space in the form given
below it is called Arnoldi's method.
Arnold! Algorithm.
Here Hk is the k-by-k upper Hessenberg matrix with (i, j^-element equal to
hij for j = 1,..., k, i — 1,..., min{j + !,£}, and all other elements zero. The
vector £fc is the fcth unit fc-vector, ( 0 , . . . , 0,1)T. The fc + 1-by-fc matrix Hk+\,k
is the matrix whose top k-by-k block is Hk and whose last row is zero except
for the (k + l,fc)-element, which is /ifc+i,fc- Pictorially, the matrix equation
(2.21) looks like
where 0 = \\ro\\, £i is the first unit (fc + l)-vector ( 1 , 0 . . . , 0)T, and the second
equality is obtained by using the fact that Qk+i£i, the first orthonormal basis
vector, is just ro//3.
The basic steps of the GMRES algorithm are as follows.
A standard method for solving the least squares problem min^ ||/3£i —
Hk+i,kV\\ is to factor the k+l-by-k matrix Hk+\,k into the product of a fc+l-by-
fc+1 unitary matrix FH and a k+l-by-k upper triangular matrix R (that is, the
top k-by-k block is upper triangular and the last row is 0). This factorization,
known as the QR factorization, can be accomplished using plane rotations.
The solution yk is then obtained by solving the upper triangular system
where Rkxk IS the top k-by-k block of R and (F£i)fc x i is the top k entries of
the first column of F.
40 Iterative Methods for Solving Linear Systems
where Q = cos(0j) and Sj = sin(^). The dimension of the matrix Fj, that is,
the size of the second identity block, will depend on the context in which it is
used. Assume that the rotations Fj, i = 1,..., k have previously been applied
to Hk+i,k so that
where the x's denote nonzeros. In order to obtain R^k+1\ the upper triangular
factor for Hk+2,k+i, nrst premultiply the last column of Hk+2,k+i by the
previous rotations to obtain
and 0F£i — Ry^ is zero except for its bottom entry, which is just the bottom
entry of (3F&.
Some Iteration Methods 41
Compute Qk+i and /i^fc = H(i, fc), i — 1,..., k + 1, using the Arnoldi algorithm.
Compute the fcth rotation, c^ and s/t, to annihilate the (k + 1, k) entry of /f.1
1
The formula is ck = \H(k,k)\/^/\H(k,fc)|2 + \H(k + l,fc)| 2 , sk = ckH(k+ l,k)/H(k,k), but a
more robust implementation should be used. See, for example, BLAS routine DROTG [32].
42 Iterative Methods for Solving Linear Systems
Given
To see that the vectors constructed by this algorithm are the same as those
constructed by the Arnoldi algorithm when the matrix A is Hermitian, we must
show that they form an orthonormal basis for the Krylov space formed from A
and qi. It is clear that the vectors lie in this Krylov space and each vector has
norm one because of the choice of the /?/s. From the formula for ay, it follows
that (qj+i,qj) = 0. Suppose {<&, qi} — 0 for i ^ k whenever k, i < j. Then
Thus the vectors q\,..., qj+i form an orthonormal basis for the Krylov space
span{qi,Aqi,...,A3qi}.
The Lanczos algorithm can be written in matrix form as
where Qk is the n-by-fc matrix whose columns are the orthonormal basis
vectors qi,..., q^, £& is the fcth unit fc-vector, and T^ is the k-by-k Hermitian
tridiagonal matrix of recurrence coefficients:
The k + 1-by-fc matrix Tk+i,k nas 2fc as its upper k-by-k block and @k£% as its
last row.
It was shown in section 2.3 that the MINRES and CG algorithms generate
the Krylov space approximations Xk for which the 2-norm of the residual
and the >l-norm of the error, respectively, are minimal. That is, if q\ in the
Some Iteration Methods 43
where ak-i is the /cth entry of /3(F£i). This leads to the following implemen-
tation of the MINRES algorithm.
44 Iterative Methods for Solving Linear Systems
Compute the fcth rotation, ck and sfc, to annihilate the (fc + 1, fc)-entry of T.2
Compute
where underfind terms are zero
Set where
that is, yk is the solution to the k-by-k linear system T^y — (3£i- While the
least squares problem (2.25) always has a solution, the linear system (2.26) has
a unique solution if and only if Tk is nonsingular. When A is positive definite,
it follows from the minimax characterization of eigenvalues of a Hermitian
2
The formula is ck = \T(k,k)\/^\T(k,k)\* + \T(k + l,fc)| 2 , sk = ckT(k + l,k)/T(k,k), but a
more robust implementation should be used. See, for example, BLAS routine DROTG [32].
Some Iteration Methods 45
matrix that the tridiagonal matrix TI, = Q^AQk is also positive definite, since
its eigenvalues lie between the smallest and largest eigenvalues of A.
If 7fc is positive definite, and sometimes even if it is not, then one way to
solve (2.26) is to factor Tk in the form
where bk-i is the (k,k — l)-entry of Lfc. It is not difficult to see that the
columns of Pk are, to within constant factors, the direction vectors from the
CG algorithm of section 2.3. The Lanczos vectors are normalized versions of the
CG residuals, with opposite signs at every other step. With this factorization,
then, Xfc is given by
where dk-i = d^l0(L^l)k,i and dk is the (k, fc)-entry of Dk. The coefficient
afc_i is defined, provided that Lk is invertible and dk ^ 0.
With this interpretation it can be seen that if the CG algorithm of section
2.3 were applied to a Hermitian indefinite matrix, then it would fail at step k
if and only if the LDLH factorization of Tk does not exist. If this factorization
exists for Ti,..., 2V_i, then it can fail to exist at step k only if Tk is singular.
For indefinite problems, it is possible that Tk will be singular, but subsequent
tridiagonal matrices, e.g., Tk+i, will be nonsingular. The CG algorithm of
section 2.3 cannot recover from a singular intermediate matrix Tk. To overcome
this difficulty, Paige and Saunders [111] proposed a CG-like algorithm based
46 Iterative Methods for Solving Linear Systems
Exercises.
2.1. Use the Jordan form discussed in section 1.3.2 to describe the asymptotic
convergence of simple iteration for a nondiagonalizable matrix.
2.2. Show that if A and 6 are real and 0 G F(AH), then there is a real initial
vector for which the Orthomin iteration fails.
2.4. Verify that the coefficient formulas in (2.17) are equivalent to those in
Algorithm 2.
Some Iteration Methods 47
2.5. Show that for Hermitian matrices, the MINRES algorithm given in
section 2.4 minimizes ||r^+i|| over the space in (2.20).
2.6. Express the entries of the tridiagonal matrix generated by the Lanczos
algorithm in terms of the coefficients in the CG algorithm (Algorithm
2). (Hint: Write down the 3-term recurrence for qk+i = (—l)krk/\\rh\\ in
terms of q^ and Qk-i-)
2.7. Derive a necessary and sufficient condition for the convergence of the
restarted GMRES algorithm, GMRES(j), in terms of the generalized
field of values defined in section 1.3.3; that is, show that GMRES(j)
converges to the solution of the linear system for all initial vectors if and
only if the zero vector is not contained in the set
2.8. Show that for a normal matrix whose eigenvalues have real parts greater
than or equal to the imaginary parts in absolute value, the GMRES (2)
iteration converges for all initial vectors. (Hint: Since r^ = P2(A)ro
for a certain second-degree polynomial PI with /^(O) = 1 and since PI
minimizes the 2-norm of the residual over all such polynomials, we have
\\r-2\\ < minp2 ||p2(-<4)|| • ||ro||. If a second-degree polynomial p2 with value
1 at the origin can be found which satisfies ||p2(^4)|| < 1) then this will
show that each GMRES (2) cycle reduces the residual norm by at least a
constant factor. Since A is normal, its eigendecomposition can be written
in the form A = UA.UT, where U is unitary and A = diag(Ai,..., A n );
it follows that ||p2(^)|| = rnaxj=ii...:Tl |p2(Ai)|. Hence, find a polynomial
P2(z) that is strictly less than 1 in absolute value throughout a region
It was shown in Chapter 2 that the CG, MINRES, and GMRES algorithms
each generate the optimal approximate solution from a Krylov subspace, where
"optimal" is taken to mean having an error with minimal A-norm in the case
of CG or having a residual with minimal 2-norm in the case of MINRES and
GMRES. In this chapter we derive bounds on the appropriate error norm for
the optimal approximation from a Krylov subspace.
A goal is to derive a sharp upper bound on the reduction in the A-norm
of the error for CG or in the 2-norm of the residual for both MINRES and
GMRES—that is, an upper bound that is independent of the initial vector but
that is actually attained for certain initial vectors. This describes the worst-
case behavior of the algorithms (for a given matrix A). It can sometimes
be shown that the "typical" behavior of the algorithms is not much different
from the worst-case behavior. That is, if the initial vector is random, then
convergence may be only moderately faster than for the worst initial vector.
For certain special initial vectors, however, convergence may be much faster
than the worst-case analysis would suggest. Still, it is usually the same analysis
that enables one to identify these "special" initial vectors, and it is often clear
how the bounds must be modified to account for special properties of the initial
vector.
For normal matrices, a sharp upper bound on the appropriate error norm is
known. This is not the case for nonnormal matrices,and a number of possible
approaches to this problem are discussed in section 3.2.
respectively. It follows that the CG error vector and the MINRES residual
49
50 Iterative Methods for Solving Linear Systems
where the minimum is taken over all polynomials pk of degree k or less with
Pfe(O) = I-
In this section we derive bounds on the expressions in the right-hand sides
of (3.2) and (3.3) that are independent of the direction of the initial error SQ
or residual r0, although they do depend on the size of these quantities. A
sharp upper bound is derived involving all of the eigenvalues of A, and then
simpler (but nonsharp) bounds are given based on knowledge of just a few of
the eigenvalues of A.
Let an eigendecomposition of A be written as A = UAUH, where U is a
unitary matrix and A = diag(Ai,..., An) is a diagonal matrix of eigenvalues.
If A is positive definite, define .A1/2 to be UA.l/2UH. Then the A-norm of a
vector v is just the 2-norm of the vector A1/2^. Equalities (3.2) and (3.3) imply
that
for any vector w. Of course, the polynomial that minimizes the expressions in
the equalities of (3.4) and (3.5) is not necessarily the same one that minimizes
||pfc(A)|| in the inequalities. The MINRES and CG polynomials depend on the
initial vector, while this polynomial does not. Hence it is not immediately
obvious that the bounds in (3.4) and (3.5) are sharp, that is, that they can
Error Bounds for CG, MINRES, and GMRES 51
actually be attained for certain initial vectors. It turns out that this is the case,
however. See, for example, [63, 68, 85]. For each k there is an initial vector
CQ for which the CG polynomial at step k is the polynomial that minimizes
||pfc(A)|| and for which equality holds in (3.4). An analogous result holds for
MINRES.
The sharp upper bounds (3.4) and (3.5) can be written in the form
where Tk(z) is the Chebyshev polynomial of the first kind on the interval [—1,1]
satisfying
52 Iterative Methods for Solving Linear Systems
In the interval [—1,1], we have Tfc(z) = cos(fccos~1(z)), so |Tfc(z)| < 1 and the
absolute value of the numerator in (3.9) is bounded by 1 for z in the interval
[Armn, ^max]- It attains this bound at the endpoints of the interval and at k — 1
interior points. To determine the size of the denominator in (3.9), note that
outside the interval [—1,1], we have
Since the second factor is zero at Xn and less than one in absolute value at
each of the other eigenvalues, the maximum absolute value of this polynomial
on {Ai, ..., An} is less than the maximum absolute value of the first factor on
{Ai,..., An_i}. Using arguments like those in Theorem 3.1.1, it follows that
Similarly, if the matrix A has just a few large outlying eigenvalues, say,
AI
n <
-t «• • • <A n
\ _£ + i < ••• < A n (i.e., A n _ ^ + i / A n _ £ » 1), one can
consider a polynomial pk that is the product of an ^th-degree factor that is
zero at each of the outliers (and less than one in magnitude at each of the other
eigenvalues) and a scaled and shifted Chebyshev polynomial of degree k — i on
the interval [Ai, An_^]. Bounding the size of this polynomial gives
Analogous results hold for the 2-norm of the residual in the MINRES
algorithm applied to a Hermitian positive definite linear system, and the proofs
are identical. For example, for any t > 0, we have
where t = [|], [•] denotes the integer part, and Tf is the £th Chebyshev
polynomial. Note that the function q(z) maps each of the intervals [a, b]
and [c, d] to the interval [—1,1]. It follows that for z G [a, 6]U[ c >d]> the
absolute value of the numerator in (3.13) is bounded by 1. The size of the
denominator is determined in the same way as before: if q(Q) = \(y + y~l),
then Ti(q(Q)) = *(y* + y~^}- To determine y, we must solve the equation
In the special case when a = — d and b = — c (so the two intervals are placed
symmetrically about the origin), the bound in (3.14) becomes
This is the bound one would obtain at step [fe/2] for a Hermitian positive
definite matrix with condition number (d/cf\ It is as difficult to approximate
zero on two intervals situated symmetrically about the origin as it is to
approximate zero on a single interval lying on one side of the origin whose
ratio of largest to smallest point is equal to the square of that in the 2-
interval problem. Remember, however, that estimate (3.14) implies better
approximation properties for intervals not symmetrically placed about the
origin. For further discussion of approximation problems on two intervals,
see [50, sees. 3.3-3.4]
where K(V) = \\V\\ • \\V-~1 || is the condition number of the eigenvector matrix V.
We will assume that the columns of V have been scaled to make this condition
number as small as possible. As in the Hermitian case, the polynomial that
minimizes ll^p^AJV^Vo | is not necessarily the one that minimizes ||pfc(A)||,
and it is not clear whether the bound in (3.16) is sharp. It turns out that if A is
a normal matrix (a diagonalizable matrix with a complete set of orthonormal
Error Bounds for CG, MINRES, and GMRES 55
eigenvectors), then n(V) = 1 and the bound in (3.16) is sharp [68, 85]. In
this case, as in the Hermitian case, the problem of describing the convergence
of GMRES reduces to a problem in approximation theory—how well can
one approximate zero on the set of complex eigenvalues using a fcth-degree
polynomial with value 1 at the origin? We do not have simple estimates, such
as that obtained in Theorem 3.1.1 based on the ratio of largest to smallest
eigenvalue, but one's intuition about good and bad eigenvalue distributions in
the complex plane still applies. Eigenvalues tightly clustered about a single
point (away from the origin) are good, since the polynomial (1 — z/c)k is
small at all points close to c in the complex plane. Eigenvalues all around the
origin are bad because (by the maximum principle) it is impossible to have a
polynomial that is 1 at the origin and less than 1 everywhere on some closed
curve around the origin. Similarly, a low-degree polynomial cannot be 1 at the
origin and small in absolute value at many points distributed all around the
origin.
If the matrix A is nonnormal but has a fairly well-conditioned eigenvector
matrix V, then the bound (3.16), while not necessarily sharp, gives a reasonable
estimate of the actual size of the residual. In this case again, it is A's eigenvalue
distribution that essentially determines the behavior of GMRES.
In general, however, the behavior of GMRES cannot be determined from
eigenvalues alone. In fact, it is shown in [72, 69] that any nonincreasing curve
represents a plot of residual norm versus iteration number for the GMRES
method applied to some problem; moreover, that problem can be taken to
have any desired eigenvalues. Thus, for example, eigenvalues tightly clustered
around 1 are not necessarily good for nonnormal matrices, as they are for
normal ones.
A simple way to see this is to consider a matrix A with the following
sparsity pattern:
where the *'s represent any values and the other entries are 0. If the initial
residual r$ is a multiple of the first unit vector £1 = (1,0,..., 0)T, then Ar$ is
a multiple of £n, A^TQ is a linear combination of £n and £n-i, etc. All vectors
A*TO, k = 1,..., re — 1 are orthogonal to TO, so the optimal approximation from
the space XQ-fspan{ro, ATQ, . . . , Ak~lro} is simply xfc = XQ for k = 1,..., n — 1;
i.e., GMRES makes no progress until step n\ Now, the class of matrices of the
56 Iterative Methods for Solving Linear Systems
The eigenvalues of this matrix are the roots of the polynomial zn — Y^J=Q cjz~*'>
and the coefficients C Q , . . . ,c n _i can be chosen to make this matrix have any
desired eigenvalues. If GMRES is applied to (3.17) with a different initial
residual, say, a random TO, then, while some progress will be made before step
n, it is likely that a significant residual component will remain, until that final
step.
Of course, one probably would not use the GMRES algorithm to solve a
linear system with the sparsity pattern of that in (3.17), but the same result
holds for any matrix that is unitarily similar to one of the form (3.17). Note
that (3.17) is simply a permuted lower triangular matrix. Every matrix is
unitarily similar to a lower triangular matrix, but, fortunately, most matrices
are not unitarily similar to one of the form (3.17)!
When the eigenvector matrix V is extremely ill-conditioned, the bound
(3.16) is less useful. It may be greater than 1 for all k < n, but we know
from other arguments that ||rfc||/||ro|| < 1 for all k. In such cases, it is not
clear whether GMRES converges poorly or whether the bound (3.16) is simply
a large overestimate of the actual residual norm. Attempts have been made
to delineate those cases in which GMRES actually does converge poorly from
those for which GMRES converges well and the bound (3.16) is just a large
overestimate.
Different bounds on the residual norm can be obtained based on the field
of values of A, provided 0 ^ F(A). For example, suppose ^(A) is contained in
a disk D = {z € C : \z — c| < s} which does not contain the origin. Consider
the polynomial Pk(z) = (1 — z/c}k. It follows from (1.16-1.17) that
and hence that v(I— (l/c)A) < s/\c\. The power inequality (1.22) implies that
j/((/ - (l/c)A)k) < (s/\c\)k and hence, by (1.21),
and this bound holds for the restarted GMRES algorithm GMRES (j) provided
that j > k. It is somewhat stronger than the bound (2.13) for GMRES(l)
Error Bounds for CG, MINRES, and GMRES 57
(which is the same as Orthomin(l)), because the factor 2 does not have to be
raised to the fcth power. Still, (3.18) sometimes gives a significant overestimate
of the actual GMRES residual norm. In many cases, a disk D may have to be
much larger than f(A) in order to include F(A).
Using the recent result (2.14), however, involving the Faber polynomials
for an arbitrary convex set S that contains T(A} and not the origin, one can
more closely fit J-'(A) while choosing S so that the Faber polynomials for S are
close to the minimax polynomials. For example, suppose F(A) is contained in
the ellipse
where
and the branch of the square root is chosen so that K < 1. We assume here
that s < K~I. For further details, see [38, 39].
Inequality (2.14) still does not lead to a sharp bound on the residual
norm in most cases, and it can be applied only when 0 ^ ^(A). Another
approach to estimating ||p(^4)|| in terms of the size of p(z) in some region of
the complex plane has been suggested by Trefethen [129]. It is the idea of
pseudo-eigenvalues.
For any polynomial p, the matrix p(A) can be written as a Cauchy integral
where F is any simple closed curve or union of simple closed curves containing
the spectrum of A. Taking norms on each side in (3.19) and replacing the
norm of the integral by the length £(F) of the curve times the maximum norm
of the integrand gives
The curve on which \\(zl — A)~l\\ = e-1 is referred to as the boundary of the
e-pseudospectrum of A:
for any choice of the parameter e. For certain problems, and with carefully
chosen values of e, the bound (3.22) may be much smaller than that in (3.16).
Still, the bound (3.22) is not sharp, and for some problems there is no choice
of e that yields a realistic estimate of the actual GMRES residual [72]. It
is easy to see where the main overestimate occurs. In going from (3.19) to
(3.20) and replacing the norm of the integral by the length of the curve times
the maximum norm of the integrand, one may lose important cancellation
properties of the integral.
Each of the inequalities (3.16), (3.18), and (3.22) provides bounds on the
GMRES residual by bounding the quantity minpfc ||pfc(A)||. Now, the worst-
case behavior of GMRES is given by
It is known that the right-hand sides of (3.23) and (3.24) are equal if A is a
normal matrix or if the dimension of A is less than or equal to 3 or if k — 1,
and many numerical experiments have shown that these two quantities are
equal (to within the accuracy limits of the computation) for a wide variety of
matrices and values of k. Recently, however, it has been shown that the two
quantities may differ. Faber et al. [44] constructed an example in which the
right-hand side of (3.24) is 1, while that of (3.23) is .9995. Subsequently, Toh
[127] generated examples in which the ratio of the right-hand side of (3.24)
to that of (3.23) can be made arbitrarily large by varying a parameter in the
matrix. Thus neither of the approaches leading to inequalities (3.16) and (3.22)
can be expected to yield a sharp bound on the size of the GMRES residual,
and it remains an open problem to describe the convergence of GMRES in
terms of some simple characteristic properties of the coefficient matrix.
Exercises.
3.1. Suppose a positive definite matrix has a small, well-separated eigenvalue,
AI « A2 < • • • < An (that is, Ai/A2 « 1). Derive an error bound for
Error Bounds for CG, MINRES, and GMRES 59
(a) Show that the polynomial of degree 3 or less with value one at the
origin that minimizes ||p(A)|| over all such polynomials is
In the previous chapter, error bounds were derived for the CG, MINRES,
and GMRES algorithms, using the fact that these methods find the optimal
approximation from a Krylov subspace. In the Arnoldi algorithm, on which
the GMRES method is based, all of the Krylov space basis vectors are retained,
and a new vector is formed by explicitly orthogonalizing against all previous
vectors using the modified Gram-Schmidt procedure. The modified Gram-
Schmidt procedure is known to yield nearly orthogonal vectors if the vectors
being orthogonalized are not too nearly linearly dependent. In the special
case where the vectors are almost linearly dependent, the modified Gram-
Schmidt procedure can be replaced by Householder transformations, at the cost
of some extra arithmetic [139]. In this case, one would expect the basis vectors
generated in the GMRES method to be almost orthogonal and the approximate
solution obtained to be nearly optimal, at least in the space spanned by these
vectors. For discussions of the effect of rounding errors on the GMRES method,
see [33, 70, 2].
This is not the case for the CG and MINRES algorithms, which use short
recurrences to generate orthogonal basis vectors for the Krylov subspace.
The proof of orthogonality, and hence of the optimality of the approximate
solution, relies on induction (e.g., Theorems 2.3.1 and 2.3.2 and the arguments
after the Lanczos algorithm in section 2.5), and such arguments may be
destroyed by the effects of finite precision arithmetic. In fact, the basis vectors
generated by the Lanczos algorithm (or the residual vectors generated by
the CG algorithm) in finite precision arithmetic frequently lose orthogonality
completely and may even become linearly dependent! In such cases, the
approximate solutions generated by the CG and MINRES algorithms are not
the optimal approximations from the Krylov subspace, and it is not clear that
any of the results from Chapter 3 should hold.
In this chapter we show why the nonorthogonal vectors generated by the
Lanczos algorithm can still be used effectively for solving linear systems and
which of the results from Chapter 3 can and cannot be expected to hold (to a
close approximation) in finite precision arithmetic. It is shown that for both
the MINRES and CG algorithms, the 2-norm of the residual is essentially
61
62 Iterative Methods for Solving Linear Systems
FIG. 4.1. CG convergence curves for (a) finite precision arithmetic and (b) exact
arithmetic, p = 1 solid, p = .8 dash-dot, p = .6 dotted, p = A dashed.
Note that although the theory of Chapter 3 guarantees that the exact
solution is obtained after n = 24 steps, the computations with p = .6 and
p = .8 do not generate good approximate solutions by step 24. It is at about
step 31 that the error in the p = .8 computation begins to decrease rapidly.
The p — .4 computation has reduced the A-norm of the error to l.e -12 by step
24, but the corresponding exact arithmetic calculation would have reduced it
to this level after just 14 steps. In contrast, for p = 1 (the case with equally
spaced eigenvalues), the exact and finite precision computations behave very
similarly. In all cases, the finite precision computation eventually finds a good
approximate solution, but it is clear that estimates of the number of iterations
required to do so cannot be based on the error bounds of Chapter 3. In this
chapter we develop error bounds that hold in finite precision arithmetic.
where the columns of Fk represent the rounding errors at each step. Let e
denote the machine precision and define
64 Iterative Methods for Solving Linear Systems
and ignoring higher order terms in e, Paige [109] showed that the rounding
error matrix Fk satisfies
Paige also showed that the coefficient formulas in the Lanczos algorithm can
be implemented sufficiently accurately to ensure that
We will assume throughout that the inequalities (4.4) and hence (4.5-4.7) hold.
Although the individual roundoff terms are tiny, their effect on the
recurrence (4.2) may be great. The Lanczos vectors may lose orthogonality
and even become linearly dependent. The recurrence coefficients generated
in finite precision arithmetic may be quite different from those that would be
generated in exact arithmetic.
for the CG algorithm. Of course, in practice, one does not first compute the
Lanczos vectors and then apply formulas (4.8-4.10), since this would require
saving all of the Lanczos vectors. Still, it is reasonable to try and separate the
effects of roundoff on the three-term Lanczos recurrence from that on other
aspects of the (implicit) evaluation of (4.8-4.10). It is the effect of using
the nonorthogonal vectors produced by a finite precision Lanczos computation
that is analyzed here, so from here on we assume that formulas (4.8-4.10) hold
exactly, where Qk, Ifc, and Tk+i,k satisfy (4.2).
Effects of Finite Precision Arithmetic 65
The residual in the CG algorithm, which we denote here as r%, then satisfies
where y£ denotes the solution to (4.10). The 2-norm of the residual satisfies
It follows that at steps A:, where ^/k e\ \\A\\ \\y^\\/(3 is much smaller than
the residual norm, the 2-norm of the residual is essentially determined by the
tridiagonal matrix Tk and the next recurrence coefficient /3k.
The residual in the MINRES algorithm, which we denote here as r^,
satisfies
where y^ denotes the solution to (4.9). The 2-norm of the residual satisfies
It follows that at steps fc, where \fk e.\ \\A\\ \\y^\\//3 is tiny compared to the
residual norm, the 2-norm of the residual is essentially bounded, to within a
possible factor of v/F+T (which is usually an overestimate), by an expression
involving only the k + 1-by-A; tridiagonal matrix Tk+iik.
Thus, for both the MINRES and CG algorithms, the 2-norm of the residual
(or at least a realistic bound on the 2-norm of the residual) is essentially
determined by the recurrence coefficients computed in the finite precision
66 Iterative Methods for Solving Linear Systems
whose eigenvalues are related to those of A in such a way that the exact
arithmetic error bounds of Chapter 3 yield useful results about the convergence
of the exact CG or MINRES algorithms applied to linear systems with
coefficient matrix T. In this section we state such results but refer the reader
to the literature for their proofs.
Effects of Finite Precision Arithmetic 67
4.4.1. Paige's Theorem. The following result of Paige [110] shows that
the eigenvalues of Tk+i, the tridiagonal matrix generated at step k + 1 of a
finite precision Lanczos computation, lie essentially between the largest and
smallest eigenvalues of A.
THEOREM 4.4.1 (Paige). The, eigenvalues 9\3', i = 1,..., j of the tridiag-
onal matrix Tj satisfy
where
Since the expression l/^^Tj^1^ in (4.12) is the size of the residual at step k
of the exact CG algorithm applied to a linear system with coefficient matrix
Tfc+i and right-hand side £1 and since the eigenvalues of T/-+i satisfy (4.15), it
follows from (3.8) that
where k is given by (4.18). Here we have used the fact that since the expression
in (3.8) bounds the reduction in the Tfc+i-norm of the error in the exact
68 Iterative Methods for Solving Linear Systems
CG iterate, the reduction in the 2-norm of the residual for this exact CG
iterate is bounded by %/£ times the expression in (3.8); i.e., ||Su||/||Biu|| <
^/K(B} \\V\\B/\\W\\B for any vectors v and w and any positive definite matrix
B. Making these substitutions into (4.12) gives the desired result (4.16).
For the MINRES algorithm, it can be seen from the Cauchy interlace
theorem (Theorem 1.3.12) applied to Tk^-i^Tk+i k that the smallest singular
value of Tk+itk is greater than or equal to the smallest eigenvalue of 7^.
Consequently, we have
so, similar to the CG algorithm, the second term on the right-hand side of
(4.14) satisfies
Since the expression ||£i — Tk+i^l/k1 / P\\ m (4-14) is the size of the residual at
step k of the exact MINRES algorithm applied to the linear system T^+ix — £i>
where the eigenvalues of Tfe+i satisfy (4.15), it follows from (3.12) that
Making these substitutions into (4.14) gives the desired result (4.17).
Theorem 4.4.2 shows that, at least to a close approximation, the exact
arithmetic residual bounds based on the size of the Chebyshev polynomial
on the interval from the smallest to the largest eigenvalue of A hold in finite
precision arithmetic as well. Exact arithmetic bounds such as (3.11) and (3.12)
for i > 0, based on approximation on discrete, subsets of the eigenvalues of
A may fail, however, as may the sharp bounds (3.6) and (3.7). This was
illustrated in section 4.1. Still, stronger bounds than (4.16) and (4.17) may
hold in finite precision arithmetic, and such bounds are derived in the next
subsection.
in [65] involves a number of constants as well as a factor of the form n3fe2 \/e|| A||
or. in some cases, n3fce1//4||A||, but better bounds are believed possible.
Suppose the eigenvalues of such a matrix T have been shown to lie in
intervals of width 6 about the eigenvalues of A. One can then relate the size
of the residual at step A; of a finite precision computation to the maximum
value of the minimax polynomial on the union of tiny intervals containing the
eigenvalues of T, using the same types of arguments as given in Theorem 4.4.2.
THEOREM 4.4.3. Let A be a Hermitian matrix with eigenvalues AI <
• • • < \n and let Tk+i,k be the k + l-by-k tridiagonal matrix generated by a
finite precision Lanczos computation. Assume that there exists a Hermitian
tridiagonal matrix T, with T^+i^k as ^s upper left k + l-by-k block, whose
eigenvalues all lie in the intervals
where none of the intervals contains the origin. Let d denote the distance from
the origin to the set S. Then the MINRES residual r^f satisfies
Proof. Since the expression ||£i — Tk+i^yj^//3\\ in (4.14) is the size of the
residual at step k of the exact MINRES algorithm applied to the linear system
TX = £ii where the eigenvalues of T lie in S, it follows from (3.7) that
To bound the second term in (4.14), note that the approximate solution
generated at step k of this corresponding exact MINRES calculation is of the
form Xfe = QkUk*'/'0i where the columns of Qk are orthonormal and the vector
yM is the same one generated by the finite precision computation. It follows
that \\yj^11//? = \\Xk\\- Since the 2-norm of the residual decreases monotonically
in the exact algorithm, we have
where the factor v^CO = v(^n + 6)/d must be included, since this gives a
bound on the 2-norm of the residual instead of the T-norm of the error. The
second term in (4.12) can be bounded as in Theorem 4.4.2. Since y% = TJ"1/??!
and since, by the Cauchy interlace theorem, the smallest eigenvalue of Tfc is
greater than or equal to that of T, we have
FlG. 4.2. Exact arithmetic error bound (dotted), finite precision arithmetic
error bound (assuming 6 = l.e — 15) (dashed), and actual error in a finite precision
computation (solid).
of the initial vector in the direction of each eigenvector of A. To see this, let
A = UhUH be an eigendecomposition of A and let qj = UHqj, where the
vectors qj, j = 1,2,..., are the Lanczos vectors generated by the algorithm in
section 2.5. Then, following the algorithm of section 2.5, we have
where
where qn is the iih component of <?i, then the coefficients in the Lanczos
algorithm are given by
exact formulas and the computed values can be included in the perturbation
term Cj(z).)
It is possible that some coefficient /3j in a finite precision Lanczos
computation will be exactly 0 and that the recurrence will terminate, but this
is unlikely. If 0j is not 0, then it is positive because of formula (4.27). It follows
from a theorem due to Favard [48] that the recurrence coefficients constructed
in a finite precision Lanczos computation are the exact recurrence coefficients
for the orthonormal polynomials corresponding to some nonnegative measure.
That is, if we define p-i(z) = 0, po(z) = 1, and
for j = 1,2,..., where otj and /3j are defined by (4.26-4.28), then we have the
following theorem.
THEOREM 4.5.1 (Favard). If the coefficients (3j in (4.29) are all positive
and the QJ 's are real, then there is a measure du>(z) such that
algorithm. Still, a more global approach was needed to explain why the CG
algorithm converges so much faster than the method of steepest descent; e.g.,
it converges at least as fast as the Chebyshev algorithm. Paige's work on the
Lanczos algorithm [109] provided a key in this direction. A number of analyses
were developed to explain the behavior of the CG algorithm using information
from the entire computation (i.e., the matrix equation (2.23)), instead of just
one or two steps (e.g., [35. 62, 65, 121]). The analogy developed in this chapter,
identifying the finite precision computation with the exact algorithm applied to
a different matrix, appears to be very effective in explaining and predicting the
behavior of the CG algorithm in finite precision arithmetic [71]. The numerical
examples presented in section 4.1 were first presented in [126].
Exercises.
4.1. Show that the orthonormal polynomials defined by (4.24) are the
characteristic polynomials of the tridiagonal matrices generated by the
Lanczos algorithm.
4.2. How must the error bound you derived in Exercise 3.1 for a matrix
with a small, well-separated eigenvalue be modified for finite precision
arithmetic? Does the finite precision error bound differ more from that
of exact arithmetic in the case when a positive definite coefficient matrix
has one eigenvalue much smaller than the others or in the case when it
has one eigenvalue much larger than the others? (This comparison can
be used to explain why one preconditioner might be considered better
based on exact arithmetic theory, but a different preconditioner might
perform better in actual computations. See [133] for a comparison of
incomplete Cholesky and modified incomplete Cholesky decompositions,
which will be discussed in Chapter 11.)
This page intentionally left blank
Chapter 5
77
78 Iterative Methods for Solving Linear Systems
Set «j = (AVJ,WJ).
Compute
Set
Set
Here we have given the non-Hermitian Lanczos formulation that scales so that
each basis vector Vj has norm 1 and (wj,Vj) = 1. The scaling of the basis
vectors can be chosen differently. Another formulation of the algorithm uses
the ordinary transpose AT, instead of AH.
Letting Vjt be the matrix with columns v\,..., Vk and Wk be the matrix
with columns i u i , . . . , w^, this pair of recurrences can be written in matrix form
as
The k -f 1-by-fc matrices Tk+i,k and Tfc+i^ have Tfc and T^, respectively, as
their top k-by-k blocks, and their last rows consist of zeros except for the last
entry, which is 7^ and /3/t, respectively. The biorthogonality condition implies
that
Proof. Assume that (5.4) holds for i, j < k. The choice of the coefficients
f3j and 7,- assures that for all j, (wj,Vj) — I and ||v,-|| = 1. By construction of
the coefficient a^, we have, using the induction hypothesis,
Using the recurrences for Vk+i and Wk along with the induction hypothesis, we
have
and, similarly, it is seen that (wf.+i,Vj) = 0. Since Vk+i and w^+i are just
multiples of Vk+i and u)fc+i, the result (5.4) is proved.
The vectors generated by the two-sided Lanczos process can become
undefined in two different situations. First, if Vj+i = 0 or ibj+i — 0, then
the Lanczos algorithm has found an invariant subspace. If Vj+i = 0, then the
right Lanczos vectors vi,... ,Vj form an A-invariant subspace. If Wj+i = 0,
then the left Lanczos vectors wi,...,Wj form an A^-invariant subspace. This
is referred to as regular termination.
The second case, referred to as serious breakdown, occurs when
(vj+i,Wj+i) = 0 but neither Vj+i = 0 nor Wj+i = 0. In this case, nonzero
vectors Vj+i € K,j+\(A,TO) and Wj+\ € JCj+i(AH,ro) satisfying (vj+i,Wi) =
(wj+i,Vi) = 0 for all i < j simply do not exist. Note, however, that while
such vectors may not exist at step j' + 1, at some later step j '+ £ there may
be nonzero vectors Vj+t 6 ICj+i(A,ro) and Wj+f € )Cj+i(AH, fo) such that
Vj+f is orthogonal to K.j+i-\(AH,?$) and Wj+t is orthogonal to ICj+i-i(A,ro).
Procedures that simply skip steps at which the Lanczos vectors are undefined
and construct the Lanczos vectors for the steps at which they are defined are
referred to as look-ahead Lanczos methods. We will not discuss look-ahead
Lanczos methods here but refer the reader to [101, 113, 20] for details.
then there are several natural ways to choose the vector yk. One choice is
to force rk = TO — AVkyk to be orthogonal to w\,...,wk. This leads to the
equation
It follows from (5.1) and the biorthogonality condition (5.3) that W^AVk =
Tk and that W^TQ = /3£i, /3 = \\TO\\, so the equation for yk becomes
Compute
Set wherer
Since the columns of V^+\ are not orthogonal, it would be difficult to choose y^
to minimize ||ffc||, but y^ can easily be chosen to minimize the second factor in
(5.8). Since the columns of Vf.+i each have norm one, the first factor in (5.8)
satisfies ||Vfc+i|| < ^/k + 1. In the QMR method, y^ solves the least squares
problem
which always has a solution, even if the tridiagonal matrix Tf. is singular. Thus
the QMR iterates are defined provided that the underlying Lanczos recurrence
does not break down.
The norm of the QMR residual can be related to that of the optimal
GMRES residual as follows.
THEOREM 5.3.1 (Nachtigal [101]). Ifr% denotes the GMRES residual at
step k and r^ denotes the QMR residual at step k, then
where V^i is the matrix of basis vectors for the space K,k+i(A, T-Q) constructed
by the Lanczos algorithm and K(-) denotes the condition number.
Proof. The GMRES residual is also of the form (5.7), but the vector yj? is
chosen to minimize the 2-norm of the GMRES residual. It follows that
where amin(Vk+i) is the smallest singular value. Combining this with inequality
(5.8) for the QMR residual gives the desired result (5.10).
Unfortunately, the condition number of the basis vectors 14+1 produced
by the non-Hermitian Lanczos algorithm cannot be bounded a priori. This
matrix may be ill conditioned, even if the Lanczos vectors are well defined. If
one could devise a short recurrence that would generate well-conditioned basis
vectors, then one could use the quasi-minimization strategy (5.9) to solve the
problem addressed in Chapter 6.
The actual implementation of the QMR algorithm, without saving all of
the Lanczos vectors, is similar to that of the MINRES algorithm described in
section 2.5. The least squares problem (5.9) is solved by factoring the k + 1-
by-k matrix Tk+i,k into the product of a k + l-by-k + 1 unitary matrix FH
82 Iterative Methods for Solving Linear Systems
where the x's denote nonzeros and where the (fc + l,fc)-entry, /i, is just 7^,
since this entry is unaffected by the previous rotations. The next rotation, Fjt,
is chosen to annihilate this entry by setting Ck — \d\/VMP + N 2 5 ^fc = Ckh/d
if d 7^ 0, and Ck = 0, Sfc = 1 if d = 0. To solve the least squares problem,
the successive rotations are also applied to the right-hand side vector /7£i to
obtain g — Fk • • • Fi/3£i. Clearly, g differs from the corresponding vector at
step k — 1 only in positions k and k + 1. If Rkxk denotes the top k-by-k block
of R and gkxi denotes the first k entries of g, then the solution to the least
squares problem is the solution of the triangular linear system
Then since
and
BiCG and Related Methods 83
we can write
where a^_i is the fcth entry of g. Finally, from the equation -Pfc-Rfcxfc = ^L we
can update the auxiliary vectors using
Compute the fcth rotation Cfc and s^, to annihilate the (A; + 1, k) entry of T.1
Compute
where underfined terms are zero
^he formula is cfe = ITffc.fcJI/^/ITfJk,*:)! 2 + |T(fc+ l,fc)| 2 , sfc = c fc T(fc + l,k)/T(k,k), but a
more robust implementation should be used. See, for example, BLAS routine DROTG [32].
84 Iterative Methods for Solving Linear Systems
Note that the first k — 1 sines and cosines Sj, Cj, i — 1,..., k — 1, are those used
in the factorization of H^k-i-
Let ft > 0 be given and assume that Hk is nonsingular. Let y^ denote the
solution of the linear system H^y = /3£i, and let y& denote the solution of the
least squares problem miny Hflfc+i^y — /3£i||. Finally, let
LEMMA 5.4.1. Using the above notation, the norms ojv^ and v^ are related
to the sines and cosines of the Givens rotations by
It follows that
Proof. The least squares problem with the extended Hessenberg matrix
Hk+i,k can be written in the form
BiCG and Related Methods 85
which is zero except for the last entry, which is flhk+i,k times the (fc, l)-entry
of H^1. Now Hk can be factored in the form FHR, where F = F^~\ • • • FI and
Fi is the k-by-k principal submatrix of Fi. The matrix Hk+i,k, after applying
the first k — I plane rotations, has the form
where r is the (fc, fc)-entry of R and h = hk+i,fc- The fcth rotation is chosen to
annihilate the nonzero entry in the last row:
TABLE 5.1
Relation between QMR quasi-residual norm reduction and ratio of BiCG residual
norm to QMR quasi-residual norm.
THEOREM 5.4.1. Assume that the Lanczos vectors at steps 1 through k are
defined and that the tridiagonal matrix generated by the Lanczos algorithm at
step k is nonsingular. Then the BiCG residual rj? and the QMR quasi-residual
z£ are related by
Proof. From (5.1), (5.5), and (5.6), it follows that the BiCG residual can
be written in the form
The quantity in parentheses has only one nonzero entry (in its (k + l)st
position), and since ||ffc+i|| = 1, we have
The desired result now follows from Lemma 5.4.1 and the definition (5.15) of
In most cases, the quasi-residual norms and the actual residual norms in
the QMR algorithm are of the same order of magnitude. Inequality (5.16)
shows that the latter can exceed the former by at most a factor of \/k -f 1, and
a bound in the other direction is given by
where crmjn denotes the smallest singular value. While it is possible that
<7min(Vk+i) is very small (especially in finite precision arithmetic), it is unlikely
that \\rjf\\ would be much smaller than \\zj~\\. The vector yj* is chosen to satisfy
the least squares problem (5.9), without regard to the matrix Vk+i.
Theorem 5.4.1 shows that if the QMR quasi-residual norm is reduced by a
significant factor at step A;, then the BiCG residual norm will be approximately
BiCG and Related Methods 87
FIG. 5.1. BiCG residual norms (dashed), QMR residual norms (dotted), and
QMR quasi-residual norms (solid).
equal to the QMR quasi-residual norm at step fc, since the denominator in the
right-hand side of (5.17) will be close to 1. If the QMR quasi-residual norm
remains almost constant, however, then the denominator in the right-hand side
of (5.17) will be close to 0, and the BiCG residual norm will be much larger.
Table 5.1 shows the relation between the QMR quasi-residual norm reduction
and the ratio of BiCG residual norm to QMR quasi-residual norm. Note that
the QMR quasi-residual norm must be very flat before the BiCG residual norm
is orders-of-magnitude larger.
Figure 5.1 shows a plot of the logarithms of the norms of the BiCG
residuals (dashed line), the QMR residuals (dotted line), and the QMR quasi-
residuals (solid line) versus iteration ntimber for a simple example problem.
The matrix A, a real 103-by-103 matrix, was taken to have 50 pairs of complex
conjugate eigenvalues, randomly distributed in the rectangle [1,2] x [—z,z],
and 3 additional real eigenvalues at 4, .5, and —1. A random matrix V was
generated and A was set equal to VDV~l, where D is a block-diagonal matrix
with 3 1-by-l blocks corresponding to the separated real eigenvalues of A and
50 2-by-2 blocks of the form
for certain fcth-degree polynomials <f>k and ifrk- If the algorithm is converging
well, then ||^fc(-A)ro|| is small and one might expect that ||^f (.A)ro|| would be
even smaller. If <p%.(A)rQ could be computed with about the same amount of
work as <f>k(A)rQ, then this would likely result in a faster converging algorithm.
This is the idea of CGS.
Rewriting the BiCG recurrence in terms of these polynomials, we see that
BiCG and Related Methods 89
where
Note that the coefficients can be computed if we know fg and (pj(A}rQ and
^(A)r 0 ,j = l,2,....
From (5.19-5.20), it can be seen that the polynomials (pk(z) and ^k(z)
satisfy the recurrences
Denning
Set
Then
Compute where
Set and
where <f>k is again the BiCG polynomial but Xk is chosen to try and keep the
residual norm small at each step while retaining the rapid overall convergence
of CGS. For example, if Xk(z) is of the form
It also follows from these same biorthogonality and recurrence relations that
the BiCGSTAB vectors satisfy
Algorithm 6. BiCGSTAB.
Compute Apk-.i.
Set where
Compute
Compute
Set x where
Compute
Compute where
92 Iterative Methods for Solving Linear Systems
FIG. 5.2. Performance of full GMRES (solid with o's), GMRES(W) (dashed),
QMR (solid), COS (dotted), BiCGSTAB (dash-dot), and CGNE (solid with x's).
GMRES, GMRES(IO) (that is, GMRES restarted after every 10 steps), QMR,
CGS, BiCGSTAB, and CGNE.
The problem is the one described in section 5.4—a real 103-by-103 matrix
A with random eigenvectors and with 50 pairs of complex conjugate eigenvalues
randomly distributed in [1,2] x [—i,i] and 3 additional real eigenvalues
at 4, .5, and —1. Results are shown in Figure 5.2. The full GMRES
algorithm necessarily requires the fewest matrix-vector multiplications to
achieve a given residual norm. In terms of floating point operations, however,
when matrix-vector multiplication requires only 9n operations, full GMRES
is the most expensive method. The QMR algorithm uses two matrix-
vector multiplications per step (oner with A and one with AH) to generate
an approximation whose residual lies in the same Krylov space as the
GMRES residual. Hence QMR requires at least twice as many matrix-vector
multiplications to reduce the residual norm to a given level, and for this
problem it requires only slightly more than this. A transpose-free variant
of QMR would likely be more competitive. Since the CGS and BiCGSTAB
methods construct a residual at step k that comes from the Krylov space
of dimension 2k (using two matrix-vector multiplications per step), these
methods could conceivably require as few matrix-vector multiplications as
GMRES. For this example they require a moderate number of additional
matrix-vector multiplications, but these seem to be the most efficient in terms
of floating point operations. The CGNE method proved very inefficient for this
problem, hardly reducing the error at all over the first 52 steps (104 matrix-
94 Iterative Methods for Solving Linear Systems
Exercises.
5.1. Use Lemma 5.4.1 to show that for Hermitian matrices A, the CG residual
r% is related to the MINRES residual r%* by
5.4. The following examples are taken from [102]. They demonstrate that
the performance of various iterative methods can differ dramatically for
a given problem and that the best method for one problem may be the
worst for another.
and 6 is the first unit vector £1. How many iterations will the
full GMRES method need to solve Ax = 6, with a zero initial
guess? What is a lower bound on the number of matrix-vector
multiplications required by CGS? How many iterations are required
if one applies CG to the normal equations AAHy — 6, x = AHyl
(b) CGNE loses. Suppose A is the block diagonal matrix
that is, an n-by-ra block diagonal matrix with 2-by-2 blocks. Show
that this matrix is normal and has eigenvalues ±z and singular value
1. How many steps are required to solve a linear system Ax = b
using CGNE? GMRES? Show, however, that for any real initial
residual ro, if TO = TO then CGS breaks down with a division by 0
at the first step.
5.5. When the two-sided Lanczos algorithm is used in the solution of linear
systems, the right starting vector is always the initial residual, ro/||ro||,
but the left starting vector fo is not specified. Consider an arbitrary
3-term recurrence:
where
The 7's are chosen so that the vectors have norm 1, but the a's and /?'s
can be anything. Show that if this recurrence is run for no more than
[(n + 2)/2] steps, then there is a nonzero vector w\ such that
i.e., assuming there is no exact breakdown with (vj, Wj) = 0, the arbitrary
recurrence is the two-sided Lanczos algorithm for a certain left starting
vector wi. (Hint: The condition (5.24) is equivalent to (WI,A*VJ) = 0
V t < j — 1, j = 2 , . . . , [(n + 2)/2]. Show that there are only n — I linearly
independent vectors to which w\ must be orthogonal.)
This somewhat disturbing result suggests that some assumptions must
be made about the left starting vector, if we are to have any hope of
establishing good a priori error bounds for the Lanczos-based linear
system solvers [67]. In practice, however, it is observed that the
convergence behavior of these methods is about the same for most
randomly chosen left starting vectors or for TO = TO, which is sometimes
recommended.
Chapter 6
and that the direction vectors po> • • • ,Pk-i form a basis for the Krylov space
algorithm we have
If ((-,-}) denotes the inner product in which the norm of e^ is minimized, then
Cfc must be the unique vector of the form (6.5) satisfying
Is There a Short Recurrence for a Near-Optimal Approximation? 99
for all vectors v and it?, where (-, •} denotes the standard Euclidean inner
product. The 5-adjoint of A, denoted A* in the theorem, is the unique matrix
satisfying
where the superscript H denotes the adjoint in the Euclidean norm AH = AT.
The matrix A is said to be B-normal if and only if A*A = AA*. If f?1/2 denotes
the Hermitian positive definite square root of B, then this is equivalent to the
condition that
Condition (ii') still may seem obscure, but the following theorem, also from
[45], shows that matrices A with 5-normal degree 77 greater than 1 but less
than v/n also have minimal polynomials of degree less than n. These matrices
belong to a subspace of C nxn of dimension less than n2, so they might just be
considered anomalies. The more interesting case is 77 = 1 or the £?-normal(l)
matrices in (ii').
THEOREM 6.1.2 (Faber and Manteuffel [45]). If A has B-normal degree
77 > 1, then the minimal polynomial of A has degree less than or equal to rj2.
Is There a Short Recurrence for a Near-Optimal Approximation? 101
How many distinct complex numbers z can satisfy q(z) = z? Note that q(z] = z
or q(q(z)) — z. The expression q(q(z)) — z is a polynomial of degree exactly
ry2 if q has degree 77 > 1. (If the degree of q were 1, this expression could
be identically zero.) It follows that there are at most rf distinct roots, so
d(A) <rf. D
The B-normal(l) matrices, for which a 3-term CG method exists, are
characterized in the following theorem.
THEOREM 6.1.3 (Faber and Manteuffel [45]). If A is B-normal(l) then
d(A) = 1, A* = A, or
There is more than one root \i only if the expression on the left is identically
zero, which means that a = —b/b. Let 6 = rez0, i = \J— I. Then
If q(z] = z, then
which yields
6.2. Implications.
The class of B-normal(l) matrices of the previous section are matrices for
which CG methods are already known. They are diagonalizable matrices whose
spectrum is contained in a line segment in the complex plane. See [26, 142].
Theorems 6.1.1-6.1.3 imply that for most non-Hermitian problems, one
cannot expect to find a short recurrence that generates the optimal approxi-
mation from successive Krylov spaces, if "optimality" is defined in terms of an
inner product norm that is independent of the initial vector. It turns out that
most non-Hermitian iterative methods actually do find the optimal approxi-
mation in some norm [11] (see Exercise 6.3). Unfortunately, however, it is a
norm that cannot be related easily to the 2-norm or the oo-norm or any other
norm that is likely to be of interest. For example, the BiCG approximation is
optimal in the P^^P^-norm, where the columns of Pn are the biconjugate
direction vectors. The QMR approximation is optimal in the AHV~HV~1A-
norm, where the columns of Vn are the biorthogonal basis vectors.
The possibility of a short recurrence that would generate optimal approxi-
mations in some norm that depends on the initial vector but that can be shown
to differ from, say, the 2-norm by no more than some moderate size factor re-
mains. This might be the best hope for developing a clear "method of choice"
for non-Hermitian linear systems.
It should also be noted that the Faber and Manteuffel result deals only
with a single recurrence. It is still an open question whether coupled short
recurrences can generate optimal approximations. For some preliminary
results, see [12].
It remains a major open problem to find a method that generates provably
"near-optimal" approximations in some standard norm while still requiring
only O(ri) work and storage (in addition to the matrix-vector multiplication)
at each iteration—or to prove that such a method does not exist.
Exercises.
6.1. Show that a matrix A is of the form (6.4) if and only if it is of the form
B^^CB1/2, where C is either Hermitian or of the form el&(dl + F),
with d real and FH = -F.
6.2. Show that a matrix A is normal if and only if AH = q(A) for some
polynomial q. (Hint: If A is normal, write A in the from A = UhUH,
where A is diagonal and U is unitary, and determine a polynomial q for
which q(K) = A.)
6.3. The following are special instances of results due to Earth and Manteuffel
[U]:
(a) Assume that the BiCG iteration does not break down or find the
exact solution before step n. Use the fact that the BiCG error at
Is There a Short Recurrence for a Near-Optimal Approximation? 103
6.4. Write down a CG method for matrices of the form /—F, where F = —FH,
which minimizes the 2-norm of the residual at each step. (Hint: Note
that one can use a 3-term recurrence to construct an orthonormal basis
for the Krylov space spanjgi, (/ — -F)gi,...,(/ — F)k~lq\}, when F is
skew-Hermitian.)
This page intentionally left blank
Chapter 7
Miscellaneous Issues
Compute Apk-i-
Sert where
Compute
Compute
Set where
The CGNR algorithm minimizes the AHA-norm of the error, which is the
105
106 Iterative Methods for Solving Linear Systems
The CGNE algorithm minimizes the AAH-norm of the error in y^, which is
the 2-norm of the error x — Xfc, over the affine space
Note that these two spaces are the same, and both involve powers of the
symmetrized matrix AH A or AAH.
Numerical analysts sometimes cringe at the thought of solving the normal
equations for two reasons. First, since the condition number of AHA or AAH
is the square of the condition number of A, if there were an iterative method
for solving Ax — 6 whose convergence rate was governed by the condition
number of A, then squaring this condition number would significantly degrade
the convergence rate. Unfortunately, however, for a non-Hermitian matrix
A, there is no iterative method whose convergence rate is governed by the
condition number of A.
The other objection to solving the normal equations is that one cannot
expect to achieve as high a level of accuracy when solving a linear system
Cy = d as when solving Ax — 6, if the condition number of C is greater
than that of A. This statement can be based on a simple perturbation
argument. Since the entries of C and d probably cannot be represented exactly
on the computer (or in the case of iterative methods, the product of C with
a given vector cannot be computed exactly), the best approximation y to y
that one can hope to find numerically is one that satisfies a nearby system
(C + 8C)y = d + 8d, where the size of 6C and 8d are determined by the
machine precision. If y is the solution of such a perturbed system, then it can
be shown, for sufficiently small perturbations, that
If C and d were the only data available for the problem, then the terms
1)5(711/11(711 and ||<5d||/||d|| on the right-hand side of this inequality could not be
expected to be less than about e, the machine precision. For the CGNR and
CGNE methods, however, not only is C = AHA available, but the matrix A
itself is available. (That is, one can apply A to a given vector.) As a result, it
will be shown in section 7.4 that the achievable level of accuracy is about the
same as that for the original linear system.
Thus, neither of the standard arguments against solving the normal
equations is convincing. There are problems for which the CGNR and CGNE
methods are best (Exercise 5.4a), and there are other problems for which one
of the non-Hermitian matrix iterations far outperforms these two (Exercise
5.4b). In practice, the latter situation seems to be more common. There is
Miscellaneous Issues 107
little theory characterizing problems for which the normal equations approach
is or is not to be preferred to a non-Hermitian iterative method. (See Exercise
7.1, however.)
where «(A) = |||A||| • \\\A 1 ||j and ||| • ||| represents any vector norm and its
induced matrix norm. To see this, note that since b — Axk = A(A~lb — £fc),
we have
Since we also have IH-A" 1 ^)! < HJA" 1 !!! • |||6|||, combining this with (7.3) gives
the first inequality in (7.2). Using the inequality |||b||| < |||A||| • IP"1^!! with
(7.4) gives the second inequality in (7.2). To obtain upper and lower bounds on
the desired error norm, one might therefore attempt to estimate the condition
number of A in this norm.
It was noted at the end of Chapter 4 that the eigenvalues of the tridiagonal
matrix T^ generated by the Lanczos algorithm (and, implicitly, by the CG and
MINRES algorithms for Hermitian matrices) provide estimates of some of the
eigenvalues of A. Hence, if A is Hermitian and the norm in (7.2) is the 2-norm,
then the eigenvalues of T^ can be used to estimate K,(A). It is easy to show
(Exercise 7.2) that the ratio of largest to smallest eigenvalue of TJt gives a
lower bound on the condition number of A, but in practice it is usually a very
good estimate, even for moderate size values of k. Hence one might stop the
iteration when
where tol is the desired tolerance for the 2-norm of the relative error. This
could cause the iteration to terminate too soon, since ft(Tfc) < ft(-A), but, more
108 Iterative Methods for Solving Linear Systems
The A-norm of the difference Xk — %k-i = a>k-\Pk-i gives a lower bound on the
error at step k — I:
To obtain an upper bound, one can use the fact that the A-norm of the
error is reduced by at least the factor (K — !)/(« + 1) at every step, or, more
generally, that the A-norm of the error is reduced by at least the factor
after every d steps. Here K denotes the condition number of A in the 2-norm
and is the ratio of largest to smallest eigenvalue of A. This error bound follows
from the fact that the CG polynomial is optimal for minimizing the ^4-norm of
the error and hence is at least as good as the product of the (k—rf)th-degree CG
polynomial with the dth-degree Chebyshev polynomial. (See Theorem 3.1.1.)
Miscellaneous Issues 109
One could again estimate K using the ratio of largest to smallest eigenvalue of
Tfc and stop when the quantity in (7.9) is less than a given tolerance for some
value of d.
Since (7.9) still involves an upper bound on the error which, for some
problems, is not a very good estimate, different approaches have been
considered. One of the more interesting ones involves looking at the quadratic
form r^A~lrk as an integral and using different quadrature formulas to obtain
upper and lower bounds for this integral [57]. The bounds obtained in this
way appear to give very good estimates of the actual A-norm of the error. The
subject of effective stopping criteria for iterative methods remains a topic of
current research.
Here pk-i is some direction vector and fljt_i is some coefficient. It is assumed
that the initial residual is computed directly as TO = 6 — AXQ.
It can be shown that when formulas (7.10) are implemented in finite
precision arithmetic, the difference between the true residual b — Axk and
the updated vector r^ satisfies
It is often observed numerically (but in most cases has not been proved)
that the vectors r^ converge to zero as k —> oo or, at least, that their norms
become many orders of magnitude smaller than the machine precision. In
such cases, the right-hand side of (7.11) (without the O(k] factor, which is an
overestimate) gives a reasonable estimate of the best attainable actual residual:
on the unit square with Dirichlet boundary conditions, using centered dif-
ferences on a 32-by-32 mesh. The solution was taken to be u(x, y) =
x(x — I}2y2(y — I) 2 , and the initial guess was set to zero. The initial vec-
tor f 0 was set equal to TQ.
Miscellaneous Issues 111
FIG. 7.1. Actual residual norm (solid) and updated residual norm (dashed). Top
curves are for CGS, bottom ones for BiCG. The asterisk shows the maximum ratio
The lower solid line in Figure 7.1 represents the true BiCG residual norm
\\b — Axk\\/(\\A\\\\x\\), while the lower dashed line shows the updated residual
norm, ||rfc||/(||j4||||x||). The lower asterisk in the figure shows the maximum
ratio ||xfc||/||x|| at the step at which it occurred. The experiment was run on
a machine with unit roundoff e ~ l.le — 16, and the maximum ratio ||xfc||/||x||
was approximately 103. As a result, instead of achieving a final residual norm
of about e, the final residual norm is about l.e — 13«103e.
Also shown in Figure 7.1 (upper solid and dashed lines) are the results of
running the CGS algorithm given in section 5.5 for this same problem. Again,
there are no a priori bounds on the size of intermediate iterates, and in this
case we had maxfc ||xfc||/||x|| « 4 • 1010. As a result, the final actual residual
norm reaches the level Q.e — 6, which is roughly 4 • 1010 e.
Since the CGNE and CGNR algorithms are also of the form (7.10), the
estimate (7.12) is applicable to them as well. Since CGNE minimizes the 2-
norm of the error, it follows, as for CG, that ||xfc|| < 2||x|| + ||xo||, so the
backward error in the final approximation will be a moderate multiple of the
machine precision. The CGNR method minimizes the 2-norm of the residual,
but since it is equivalent to CG for the linear system AHAx — AHb, it follows
that the 2-norm of the error also decreases monotonically. Hence we again
expect a final backward error of order e.
An example for the CGNE method is shown in Figure 7.2. The matrix A
was taken to be of the form A = U*EVT, where U and V are random orthogonal
matrices and E = diag(<7i,..., crn), with
112 Iterative Methods for Solving Linear Systems
FIG. 7.2. Actual residual norm (solid) and updated residual norm (dashed) for
CGNE. The asterisk shows the maximum ratio ||xfc||/||a;|l.
For a problem of size n = 40, a random solution was set and a zero initial
guess was used. The solid line in Figure 7.2 shows the actual residual norm
\\b — Axk^/(\\A\\\\x\\), while the dashed line represents the updated residual
norm ||»"fe||/(||A||||a;||). The maximum ratio ||xfc||/||z|| is approximately 1, as
indicated by the asterisk in Figure 7.2. Note that while rounding errors
greatly affect the convergence rate of the method—in exact arithmetic, the
exact solution would be obtained after 40 steps—the ultimately attainable
accuracy is as great as one could reasonably expect—a backward error of size
approximately 4e. There is no loss of final accuracy due to the fact that we
are (implicitly) solving the normal equations.
When a preconditioner is used with the above algorithms, the 2-norm of the
error may not decrease monotonically, and then one must use other properties
to establish bounds on the norms of the iterates.
It is sometimes asked whether one can accurately solve a very ill-
conditioned linear system if a very good preconditioner is available. That
is, suppose n(A) is very large but n(M~lA) or K(M~l^AM~1^2), where M is
a known preconditioning matrix, is not. We will see in Part II that the pre-
conditioned CG algorithm, for instance, still uses formulas of the form (7.10).
There is simply an additional formula, Mz^ = r/fc, to determine a precondi-
tioned residual Zfc. The final residual norm is still given approximately by
(7.12), and, unless the final residual vector is deficient in certain eigencompo-
nents of A, this suggests an error satisfying
from direct Gaussian elimination) does not appear to improve this error bound
for algorithms of the form (7.10).
For a discussion of the effect of rounding errors on the attainable accuracy
with some different implementations, see, for example, [33, 70, 122].
where X is the n-by-s matrix of solution vectors and B is the n-by-s matrix
of right-hand sides.
Let A be Hermitian positive definite and consider the block CG algorithm.
Instead of minimizing the -A-norm of the error for each linear system over a
single Krylov space, one can minimize
That is, the approximation rcjj. for the ^th equation is equal to XQ plus a
linear combination of vectors from all of the Krylov spaces
Compute
Set. where
Compute
Set where
for different machines. Some examples are described in [118, 82]. More
information about the parallelization of iterative methods can be found in
[117].
Exercises.
7.1. Let A be a matrix of the form / — F, where F = —FH. Suppose the
eigenvalues of A are contained in the line segment [1 —17,1 +17]. It was
shown by Freund and Ruscheweyh [55] that if a MINRES algorithm is
applied to this matrix, then the residual at step k satisfies
Preconditioners
This page intentionally left blank
Chapter 8
All of the iterative methods discussed in Part I of this book converge very
rapidly if the coefficient matrix A is close to the identity. Unfortunately,
in most applications, A is not close to the identity, but one might consider
replacing the original linear system Ax = b by the modified system
Defining
Compute Apk-\.
Set i where
Compute
Solve
Set where
Solve
Set where
Compute the kth rotation, ck and sk, to annihilate the (k + 1,fc)entry of T.1
Compute
where undefined terms are zero for k < 2.
Set where
For example, suppose the region fi is the unit square [0,1] x [0,1]. Introduce
a uniform grid {xj, yj : i = 0,1,..., nx + 1, j = 0,1,..., ny + 1} with spacing
hx = l/(nx + 1) in the x-direction and hy = l/(ny + 1) in the y-direction,
as shown in Figure 9.1 for nx = 3, ny = 5. A standard centered difference
approximation to the partial derivative in the x-direction in (9.1) is
For the time-dependent problem (9.4), the diagonal entries of A are increased
by I/At, and the terms «| -/Ai are added to the right-hand side vector.
THEOREM 9.1.1. Assume that a(x,y) > a > 0 m (0,1) x (0,1). Then the
coefficient matrix A defined in (9.5-9.8) is symmetric and positive definite.
Proof. Symmetry is obvious. The matrix is weakly diagonally dominant,
so by Gerschgorin's theorem (Theorem 1.3.11) its eigenvalues are all greater
than or equal to zero. Suppose there is a nonzero vector v such that Av = 0,
and suppose that the component of v with the largest absolute value is the
one corresponding to the ( i , j ) grid point. We can choose the sign of v so that
this component is positive. From the definition of A and the assumption that
a(x, y] > 0, it follows that v^j can be written as a weighted average of the
surrounding values of v:
of the weights is less than 1. It follows that the value of v at one of these other
interior points must be greater than v^j if Vij > 0, which is a contradiction.
Therefore the only vector v for which Av = 0 is the zero vector, and A is
positive definite.
It is clear that the coefficient matrix for the time dependent problem (9.4)
is also positive definite, since it is strictly diagonally dominant.
The argument used in Theorem 9.1.1 is a type of discrete maximum
principle. Note that it did not make use of the specific values of the entries
of A—only that A has positive diagonal entries and nonpositive off-diagonal
entries (so that the weights in the weighted average are positive); that A is
rowwise weakly diagonally dominant, with strong diagonal dominance in at
least one row; and that starting from any point (z, j) in the grid, one can
reach any other point through a path connecting nearest neighbors. This last
property will be associated with an irreducible matrix to be defined in section
10.2.
Other orderings of the equations and unknowns are also possible. These
change the appearance of the matrix but, provided that the equations and
unknowns are ordered in the same way—that is, provided that the rows
and columns of A are permuted symmetrically to form a matrix PTAP—
the eigenvalues remain the same. For example, if the nodes of the grid in
Figure 9.1 are colored in a checkerboard fashion, with red nodes coupling only
to black nodes and vice versa, then if the red nodes are ordered first and the
black nodes second, then the matrix A takes the form
9.1.1. Poisson's Equation. In the special case when the diffusion coef-
ficient a(x,y] is constant, say, a(x,y] = 1, the coefficient matrix (with the
natural ordering of nodes) for the steady-state problem (now known as Pois-
130 Iterative Methods for Solving Linear Systems
If the roots of this polynomial are denoted z+ and z_, then the general solution
of the difference equation (9.14) can be seen to be
Two Example Problems 131
Rearranging, we find
and squaring both sides and solving the quadratic equation for A gives
Taking the plus sign we obtain (9.12), while the minus sign repeats these same
values and can be discarded.
Substituting (9.12) for A in (9.15), we find
and therefore
Proof. According to (9.13), all such matrices have the same orthonormal
eigenvectors. If GI = QAiQT and G2 = Q^2QT, then GiG2 = QAiA 2 Q T =
QA2&iQT = G2Gl.
THEOREM 9.1.2. The eigenvalues of the matrix A defined in (9.10-9.11)
are
where u££' denotes the component corresponding to grid point (ra, £) in the
eigenvector associated with A^/-.
Proof. Let A be an eigenvalue of A with corresponding eigenvector u, which
can be partitioned in the form
Since the matrices here are diagonal, equations along different vertical lines in
the grid decouple:
Two Example Problems 133
If, for a fixed value of j, the vector (2/7,1, • • -yj,ny)T is an eigenvector of the
TST matrix
Since the £th block of u^^ is equal to Q times the £th block of y and since
only the jth entry of the £th block of y is nonzero, we have
Deriving the eigenvalues Xj^ and corresponding vectors u^M for each j =
1,..., n x , we obtain all nxny eigenpairs of A. D
COROLLARY 9.1.2. Assume that hx — hy = h. Then the smallest and
largest eigenvalues of A in (9.10-9.11) behave like
Expanding sin(x) and sin(?r/2 — x) in a Taylor series gives the desired result
(9.21), and dividing Xmax by \min gives the condition number estimate.
134 Iterative Methods for Solving Linear Systems
The proof of Theorem 9.1.4 provides the basis for a direct solution
technique for Poisson's equation known as a fast Poisson solver. The idea
is to separate the problem into individual tridiagonal systems that can be
solved independently. The only difficult part is then applying the eigenvector
matrix Q to the vectors y obtained from the tridiagonal systems, and this is
accomplished using the fast Fourier transform. We will not discuss fast Poisson
solvers here but refer the reader to [83] for a discussion of this subject.
Because the eigenvalues and eigenvectors of the 5-point finite difference
matrix for Poisson's equation on a square are known, preconditioners are often
analyzed and even tested numerically on this particular problem, known as
the model problem. It should be noted, however, that except for multigrid
methods, none of the preconditioned iterative methods discussed in this book is
competitive with a fast Poisson solver for the model problem. The advantage of
iterative methods is that they can be applied to more general problems, such
as the diffusion equation with a nonconstant diffusion coefficient, Poisson's
equation on an irregular region, or Poisson's equation with a nonuniform grid.
Fast Poisson solvers apply only to block-TST matrices. They are sometimes
used as preconditioners in iterative methods for solving more general problems.
Analysis of a preconditioner for the model problem is useful, only to the extent
that it can be expected to carry over to more general situations.
Here tjjj is the approximation to ^(a;,/Uj,i), and the quadrature points p,j and
weights Wj are such that for any polynomial p(fi) of degree 2nM — 1 or less,
We assume an even number of quadrature points n^ so that the points p,j ar<
nonzero and symmetric about the origin, fj,n^^j+i = — /zj.
Equation (9.27) can be approximated further by a method known as
diamond differencing—replacing derivatives in x by centered differences am
approximating function values at zone centers by the average of their value;
at the surrounding nodes. Let the domain in x be discretized by
and define (Ax) i+ i/ 2 = Xi+i — Xi and xi+1/2 = (xi+i + Xj)/2. Equation (9.27
is replaced by
To solve this equation, one does not actually form the Schur complement matrix
AQ = I — X)j=i UjSH^Hsj, which is a dense n^-by-n^ matrix. To apply this
matrix to a given vector v, one steps through each value of j, multiplying v by
Ssj, solving a triangular system with coefficient matrix Hj, multiplying the
result by S, and subtracting the weighted outcome from the final vector, which
has been initialized to v. In this way, only three vectors of length nx need be
stored simultaneously.
One popular method for solving equation (9.32) is to use the simple itera-
tion defined in section 2.1 without a preconditioned that is, the preconditioner
is M = /. In the neutron transport literature, this is known as source iteration.
Given an initial guess ^°\ for k = 0,1,..., set
Note that this unpreconditioned iteration for the Schur complement system
(9.32) is equivalent to a preconditioned iteration for the original linear system
(9.31), where the preconditioner is the block lower triangle of the matrix. That
is, suppose (i{?i , . . . , ^/4M )T is an arbitrary initial guess for the angular flux
and 0<°) = £"=iU>jSi^.0). For k = 0,1,..., choose the (k + l)st iterate to
satisfy
Two Example Problems 139
which means physically that the mesh width is no more than two mean free
paths of the particles being simulated. It is often desirable, however, to use a
coarser mesh.
Even in the more general case when (9.35) is not satisfied, it turns out that
the iteration (9.33) converges. Before proving this, however, let us return to
the differential equation (9.24) and use a Fourier analysis argument to derive
an estimate of the rate of convergence that might be expected from the linear
system solver. Assume that as and crt are constant and that the problem is
denned on an infinite domain. If the iterative method (9.33) is applied directly
to the steady-state version of the differential equation (9.24), then we can write
Define #( fc+1 > = t/» - iMfc+1) and $(fc+1> = <f> - <j>(k+l\ where V, 0 are the true
solution to the steady-state version of (9.24). Then equations (9.36-9.37) give
140 Iterative Methods for Solving Linear Systems
Thus the functions exp(zAx) are eigenfunctions of this iteration, with corre-
sponding eigenvalues
where
It can be seen that for each j, Gj = 2(1 — Sj). Dropping the subscript j for
convenience, we can write
The matrix norm on the right-hand side in (9.41) is equal to the inverse of the
square root of the smallest eigenvalue of
\ /
is nonsingular and |/z_,-| > 0. It will follow that the smallest eigenvalue of the
matrix in (9.42) is strictly greater than 1 if the second term,
142 Iterative Methods for Solving Linear Systems
and taking norms on both sides and recalling that the weights Wj are
nonnegative and sum to 1, we find
Since 7 < 1 and since the inequality in (9.44) is strict, with the amount by
which the actual reduction factor differs from 7 being independent of k, it
follows that the iteration (9.33) converges to the solution of (9.32).
For 7 « 1, Corollary 9.2.1 shows that the simple source iteration (9.33)
converges rapidly, but for 7 sa 1, convergence may be slow. In Part I of this
book, we discussed many ways to accelerate the simple iteration method, such
as Orthomin(l), QMR, BiCGSTAB, or full GMRES. Figure 9.2 shows the
convergence of simple iteration, Orthomin(l), and full GMRES applied to two
test problems. QMR and BiCGSTAB were also tested on these problems, and
each required only slightly more iterations than full GMRES, but at twice the
cost in terms of matrix-vector multiplications. The vertical axis is the oo-
norm of the error in the approximate solution. The exact solution to the linear
system was computed directly for comparison. Here we used a uniform mesh
spacing Ax = .25 (nx = 120) and eight angles, but the convergence rate was
not very sensitive to these mesh parameters.
The first problem, taken from [92], is a model shielding problem, with cross
sections corresponding to water and iron in different regions, as illustrated
below.
Two Example Problems 143
FIG. 9.2. Error curves for (a) 7 = .994 and (b) 7 = .497. Simple iteration
(solid), Orthomin(l) (dashed), full GMRES (dotted), and DSA-preconditioned simple
iteration (dash-dot).
The slab thicknesses are in cm and the cross sections are in cm""1. There is
a vacuum boundary condition at the right end (i^nx,j — 0, j < n^/2) and
a reflecting boundary condition at the left (^o.ra^-j+i — "00j> j < n,u/2)- A.
uniform source / = 1 is placed in the first (leftmost) region. In the second test
problem, we simply replaced as in each region by half its value: as — 1.6568
in the first, second, and fourth regions; as — .55385 in the third.
Also shown in Figure 9.2 is the convergence of the simple iteration method
with a preconditioner designed specifically for the transport equation known as
diffusion synthetic acceleration (DSA). In the first problem, where 7 = .994, it
is clear that the unpreconditioned simple iteration (9.33) is unacceptably slow
to converge. The convergence rate is improved significantly by Orthomin(l),
with little extra work and storage per iteration, and it is improved even more
by full GMRES but at the cost of extra work and storage. The most effective
method for solving this problem, however, is the DSA-preconditioned simple
iteration.
For the second problem, the reduction in the number of iterations is less
dramatic. (Note the different horizontal scales in the two graphs.) Unpre-
conditioned simple iteration converges fairly rapidly, Orthomin(l) reduces the
144 Iterative Methods for Solving Linear Systems
The matrix equation for the flux i(;e+l in terms of / and if>e is like that in
(9.31), except that the entries di+i/2,j and ei+1/2j of Hj are each increased
by l/(uAt). One would obtain the same coefficient matrix for the steady-
state problem if at were replaced by at + 2/(uAt). Thus, for time-dependent
problems the convergence rate of iteration (9.33) at time step (.+1 is governed
by the quantity
In many cases, this quantity is bounded well away from 1, even if asjat is not.
For steady-state problems with 7 w 1, it is clear from Figure 9.2a that the
DSA preconditioner is extremely effective in terms of reducing the number of
iterations. At each iteration a linear system corresponding to the steady-state
diffusion equation must be solved. Since this is a book on iterative methods
and not specifically on the transport equation, we will not give a complete
account of diffusion synthetic acceleration. For a discussion and analysis, see
[91, 3]. The basic idea, however, is that when as/at « 1, the scalar flux <j)
approximately satisfies a diffusion equation and therefore the diffusion operator
is an effective preconditioner for the linear system. In one dimension, the
diffusion operator is represented by a tridiagonal matrix which is easy to solve,
but in higher dimensions, the diffusion equation itself may require an iterative
solution technique. The advantage of solving the diffusion equation is that it
is independent of angle. An iteration for the diffusion equation requires about
l/n M times as much work as an iteration for equation (9.32), so a number of
inner iterations on the preconditioner may be acceptable in order to reduce
the number of outer iterations. Of course, the diffusion operator could be used
as a preconditioner for other iterative methods as well; that is, DSA could
be further accelerated by replacing the simple iteration strategy with, say,
GMRES.
Two Example Problems 145
Exercises.
9.1. Use the Taylor series to show that the approximation
where (p — O"1/2^, then it will converge to the solution for any initial
vector, and, at each step, the 2-norm of the residual (in the scaled
equation) will be reduced by at least the factor 7 in (9.41).
9.5. A physicist has a code that solves the transport equation using source
iteration (9.33). She decides to improve the approximation by replacing
^(fc+i) aj. eacn s^ep with the linear combination ak+i<j>(k+i' + (1 —
atk+i)<t>(k\ where oifc+i is chosen to make the 2-norm of the residual as
small as possible. Which of the methods described in this book is she
using?
Chapter 10
Comparison of Preconditioners
The simple iteration algorithm was traditionally described by (10.2), and the
decomposition A = M — N was referred to as a matrix splitting. The terms
"matrix splitting" and "preconditioner," when referring to the matrix M, are
synonymous.
If M is taken to be the diagonal of A, then the simple iteration procedure
with this matrix splitting is called Jacobi's method. We assume here that the
diagonal entries of A are nonzero, so M"1 is defined. It is sometimes useful
to write the matrix equation (10.2) in element form to see exactly how the
update to the approximate solution vector is accomplished. Using parentheses
147
148 Iterative Methods for Solving Linear Systems
Note that the new vector Xk cannot overwrite Xk-i in Jacobi's method until
all of its entries have been computed.
If M is taken to be the lower triangle of A, then the simple iteration
procedure is called the Gauss-Seidel method. Equations (10.2) become
—then the simple iteration procedure with this matrix splitting is called the
block Jacobi method. Similarly, for M equal to the block lower triangle of A,
we obtain the block Gauss-Seidel method; for M of the form u>~lD — L, where
D is the block diagonal and L is the strictly block lower triangular part of A,
we obtain the block SOR method.
When A is real symmetric or complex Hermitian, then symmetric or
Hermitian versions of the Gauss-Seidel and SOR preconditioners can be
defined. If one defines MI = u~lD — L, as in the SOR method, and
M-2 = u~lD — U and sets
then the resulting iteration is known as the symmetric SOR or SSOR method.
It is left as an exercise to show that the preconditioner M in this case is
that is, if we eliminate #fc_i/2> then Xk satisfies Mx^ = Nx^-i + ft, where
A = M — N. The SSOR preconditioner is sometimes used with the CG
algorithm for Hermitian positive definite problems.
We first note that for the SOR method, we need only consider values of uj
in the open interval (0, 2).
THEOREM 10.1.1. For any u e C, we have
Since the matrices here are triangular, their determinants are equal to the
product of their diagonal entries, so we have det(Gij) — (1 — u;)n. The
determinant of G^ is also equal to the product of its eigenvalues, and it follows
that at least one of the n eigenvalues must have absolute value greater than or
equal to |1 — u\. D
Theorem 10.1.1 holds for any matrix A (with nonzero diagonal entries).
By making additional assumptions about the matrix A, one can prove more
about the relation between the convergence rates of the Jacobi, Gauss-Seidel,
and SOR iterations. In the following theorems, we make what seems to be a
rather unusual assumption (10.9). We subsequently note that this assumption
can sometimes be verified just by considering the sparsity pattern of A.
Comparison of Preconditioned 151
Since the eigenvalues of Gj are the numbers fj. for which det(Gj — ///) = 0
and their multiplicities are also determined by this characteristic polynomial,
result (i) follows.
Since the matrix / — uL is lower triangular with ones on the diagonal, its
determinant is 1; for any number A we have
If all eigenvalues A of G\ are 0, then part (iv) of Theorem 10.1.2 implies that
all eigenvalues of Gj are 0 as well. If there is a nonzero eigenvalue A of GI,
then part (iii) of Theorem 10.1.2 implies that there is an eigenvalue // of Gj
such that n = A 1 / 2 . Hence p(Gj)2 > p(Gi). Part (iv) of Theorem 10.1.2
implies that there is no eigenvalue /j, of Gj such that |/u|2 > p(G\); if there
were such an eigenvalue //, then A = /j2 would be an eigenvalue of GI, which
is a contradiction. Hence p(Gj)2 = p(G\).
In some cases—for example, when A is Hermitian—the Jacobi iteration
matrix Gj has only real eigenvalues. The SOR iteration matrix is non-
Hermitian and may well have complex eigenvalues, but one can prove the
following theorem about the optimal value of o> for the SOR iteration and the
corresponding optimal convergence rate.
THEOREM 10.1.3. Suppose that A satisfies (10.9), that Gj has only real
eigenvalues, and that /3 = p(Gj) < 1. Then the SOR iteration converges for
every u> £ (0,2), and the spectral radius of the SOR matrix is
It follows from Theorem 10.1.2 that if fj, is an eigenvalue of Gj, then both roots
A are eigenvalues of G^.
Since fj, is real, the term inside the square root in (10.16) is negative if
Comparison of Preconditioners 153
In the remaining part of the range of a;, both roots A are positive and the
larger one is
Also, this value is greater than or equal to u — 1 for u € (0, u>] since in this
range we have
FIG. 10.1. Spectral radius of the SOR matrix for different values of ui and
0 = p(Gj).
where the last equality comes from setting i = k = loii = k = mto obtain
the maximum absolute value and then using a Taylor expansion for sin(:r).
Knowing the value of p(Gj), Theorem 10.1.3 tells us the optimal value of
u as well as the convergence rate of the SOR iteration for this and other values
of (jj. It follows from Theorem 10.1.3 that
and, therefore,
In contrast, for u = 1, Theorem 10.1.3 shows that the spectral radius of the
Gauss-Seidel iteration matrix is
FIG. 10.2. Convergence of iterative methods for the model problem, h = 1/51.
Jacobi (dotted), Gauss-Seidel (dashed), SOR with optimal ui (solid), unpreconditioned
CG (dash-dot).
10.2d. If A > 0 and v > 0 and v is not the 0 vector, then Av > 0.
10.2e. If A > 0 and v > 0 and Av > av for some a > 0, then Akv > akv for all
£=1,2,....
Comparison of Preconditioners 157
so y = A\v\ — p(A)\v\ > 0. If y is the 0 vector, then this implies that p(A)
is an eigenvalue of A with the nonnegative eigenvector }v\. If \v\ had a zero
component, then that component of A\v\ would have to be zero, and since each
entry of A is positive, this would imply that v is the 0 vector (Exercise 10.2d),
which is a contradiction. Thus, if y is the 0 vector, Theorem 10,2.2 is proved.
If y is not the 0 vector, then Ay > 0 (Exercise 10.2d); setting z = A\v\ > 0,
we have 0 < Ay = Az — p(A)z or Az > p(A)z. It follows that there is some
number a > p(A) such that Az > az. From Exercise 10.2e, it follows that for
every k > 1, Akz > akz. From this we conclude that ||J4fc|j1/fc > a > p(A)
for all k. But since lim^-^oo ll^^ll 1 ^ = p(A), this leads to the contradiction
p(A) > a > p(A). a
Theorem 10.2.2 is part of the Perron theorem, which also states that there
is a unique eigenvalue A with modulus equal to p(A) and that this eigenvalue
is simple.
THEOREM 10.2.3 (Perron). If A is an n-by-n matrix and A > 0, then
158 Iterative Methods for Solving Linear Systems
and the fact that v is not the zero vector, it follows that p is an eigenvalue of
A and so p < p(A). Hence it must be that p = p(A). D
The parts of Theorem 10.2.3 that are not contained in Theorem 10.2.4 do
not carry over to all nonnegative matrices. They can, however, be extended
to irreducible nonnegative matrices, and this extension was carried out by
Frobenius.
DEFINITION 10.2.1. Let A be an n-by-n matrix. The graph of A is
The set G(A) can be visualized as follows. For each integer i = 1,... ,ra,
draw a vertex, and for each pair (i, j ) E G(A), draw a directed edge from
vertex i to vertex j. This is illustrated in Figure 10.3.
Comparison of Preconditioners 159
where An and A?? are square blocks of dimension greater than or equal to
1. To see this, first suppose that A is of the form (10.25) for some ordering
of the indices. Let I\ be the set of row numbers of the entries of A\\ and
let /2 be the set of row numbers of the entries of ^22- If j € /2 is connected
to i e /i, then somewhere in the path from j to i there must be an edge
connecting an element of /2 to an element of /i, but this would correspond
to a nonzero entry in the (2,1) block of (10.25). Conversely, if A is reducible,
then there must be indices j and i such that j is not connected to i. Let
/i = {k : k is connected to i} and let /2 consist of the remaining indices. The
sets /i and /2 are nonempty, since i 6 I\ and j G /2- Enumerate first /i, then
/2- If an entry in /2 were connected to any entry in /i, it would be connected
to i, which is a contradiction. Therefore, the (2,1) block in the representation
of A using this ordering would have to be 0, as in (10.25). The matrix on the
left in Figure 10.3 is irreducible, while that on the right is reducible.
THEOREM 10.2.5 (Perron-Probenius). Let A be an n-by-n real matrix and
suppose that A is irreducible and nonnegative. Then
(a) p(A) > 0;
(b) p(A) is a simple eigenvalue of A;
(c) if A has exactly k eigenvalues of maximum modulus p(A), then these
eigenvalues are the kth roots of unity times p(A): \j = e2m:''kp(A); and
160 Iterative Methods for Solving Linear Systems
So
If p(M 1N) > 1, then this would imply that A 1Nv has negative components,
which is impossible since A~l > 0, N > 0, and v > 0. This proves that
p(M~lN] < 1. It also follows from (10.26) that p(M~lN}/(\ - P(M-1N)) is
an eigenvalue of A~1N, so we have
or equivalently, since
Now, we also have A 1N > 0, from which it follows by Theorem 10.2.4 that
there is a vector w > 0 such that A~lNw = p(A~1N)w. Using the relation
we cn write
Comparison of Preconditioners 161
With the slightly stronger assumption that A~l > 0, the inequalities in
Corollary 10.3.1 can be replaced by strict inequalities.
COROLLARY 10.3.2. Let A = MI - NI = M2 - N2 be two regular splittings
of A, where A~* > 0. // NI < N2 and neither NI nor N2 — NI is the null
matrix, then
1. A is an M-matrix.
2. A is nonsingular and A~l > 0. (Note that condition (i) in the definition
of an M-matrix is not necessary. It is implied by the other two
conditions.)
It was noted in section 9.2 that under assumption (9.35) the coefficient
matrix (9.31) arising from the transport equation has positive diagonal entries
and nonpositive off-diagonal entries. It was also noted that the matrix is weakly
row diagonally dominant. It is only weakly diagonally dominant because
off-diagonal entries of the rows in the last block sum to 1. If we assume,
however, that 7 = supx as(x)/0t(x) < 1> then the other rows are strongly
diagonally dominant. If the last block column is multiplied by a number
greater than 1 but less than 7"1, then the resulting matrix will be strictly row
diagonally dominant. Thus this matrix satisfies criterion (7) of Theorem 10.3.2,
and therefore it is an M-matrix. The block Gauss-Seidel splitting described
in section 9.2 is a regular splitting, so by Theorem 10.3.1. iteration (9.34)
converges. Additionally, if the initial error has all components of one sign, then
the same holds for the error at each successive step, since the iteration matrix
/ — M~1A = M~1N has nonnegative entries. This property is often important
when subsequent computations with the approximate solution vector expect a
nonnegative vector because the physical flux is nonnegative.
In the case of real symmetric matrices, criterion (3) (or (4)) of Theorem
10.3.2 implies that a positive definite matrix with nonpositive off-diagonal
entries is an M-matrix. We provide a proof of this part.
DEFINITION 10.3.3. A real matrix A is a Stieltjes matrix if A is symmetric
positive definite and the off-diagonal entries of A are nonpositive.
THEOREM 10.3.3. Any Stieltjes matrix is an M-matrix.
Proof. Let A be a Stieltjes matrix. The diagonal elements of A are positive
because A is positive definite, so we need only verify that A~l > 0. Write
A — D — C, where D — diag(A) is positive and C is nonnegative. Since A is
positive definite, it is nonsingular, and A~l — [D(I — .B)]"1 = (I — B)~lD~l,
where B = D~1C. If p(B] < 1, then the inverse of / — B is given by the
Neumann series
and since B > 0 it would follow that (/ - B)~l > 0 and, hence, A~l > 0.
Thus, we need only show that p(B) < 1.
Suppose p(B] > 1. Since B > 0, it follows from Theorem 10.2.4 that
p(B] is an eigenvalue of B. But then D~1A = / — B must have a nonpositive
eigenvalue, 1 — p(B). This matrix is similar to the symmetric positive definite
matrix D"l/'2AD~1^, so we have a contradiction. Thus p(B) < 1.
The matrix arising from the diffusion equation defined in (9.6-9.8) is a
Stieltjes matrix and, hence, an M-matrix.
Comparison of Preconditioners 163
matrices satisfying the hypotheses of Corollary 10.3.1, and suppose that the
largest eigenvalue of M%1A is greater than or equal to 1. Then the ratios of
largest to smallest eigenvalues of M^1A and M%1A satisfy
or, equivale
Since, by assumption, Amax(M2~1^l) > 1 and since p(M^lN2) < 1 implies that
Amin(M2~1-A) > 0, the second factor on the right-hand side is less than 2, and
the theorem is proved.
THEOREM 10.4.2. The assumption in Theorem 10.4.1 that the largest
eigenvalue of M%1A is greater than or equal to 1 is satisfied if A and M%
have at least one diagonal element in common.
Proof. If A and M? have a diagonal element in common, then the symmetric
matrix N% has a zero diagonal element. This implies that M^N-i has a
nonpositive eigenvalue since the smallest eigenvalue of this matrix satisfies
if £j is the vector with a 1 in the position of this zero diagonal element and O's
elsewhere. Therefore, M^1A = I — M^lNz has an eigenvalue greater than or
equal to 1.
Theorems 10.4.1 and 10.4.2 show that once a pair of regular splittings
have been scaled properly for comparison (that is, MI has been multiplied
by a constant, if necessary, so that A and MI have at least one diagonal
element in common), the one that is closer to A elementwise gives a smaller
condition number for the PCG or PMINRES iteration matrix (except possibly
Comparison of Preconditioners 165
for a factor of 2). This means that the Chebyshev bound (3.8) on the error
at each step will be smaller (or, at worst, only slightly larger) for the closer
preconditioner. Other properties, however, such as tight clustering of most
of the eigenvalues, also affect the convergence rate of PCG and PMINRES.
Unfortunately, it would be difficult to provide general comparison theorems
based on all of these factors, so the condition number is generally used for this
purpose.
Forsythe and Strauss [52] showed that for a Hermitian positive definite
matrix A in 2-cyclic form, the optimal diagonal preconditioner is M = diag(A).
Eisenstat, Lewis, and Schultz [41] later generalized this to cover matrices in
block 2-cyclic form with block diagonal preconditioners. They showed that
if each block D^j is the identity, then A is optimally scaled with respect to
all block diagonal matrices with blocks of order n^j. The following slightly
stronger result is due to Eisner [42].
THEOREM 10.5.1 (Eisner). // a Hermitian positive definite matrix A has
the form
Thus we have
Now so we have
(Note that we have not yet made any assumption about the matrix D. The
result holds for any nonsingular matrix D such that \\UD\\ > \\D\\.)
Now assume that D is a positive definite diagonal matrix with largest entry
djj. Let £j be the jth unit vector. Then
then
Exercises.
10.1. Show that the SSOR preconditioner is of the form (10.6).
where ipg(r,Fl) is the unknown flux associated with energy group g and
ag(r, Q), crflifl/(r, f2 • fi'), and fg(r,fl) are known cross section and source
terms. (Appropriate boundary conditions are also given.) A standard
170 Iterative Methods for Solving Linear Systems
method for solving this set of equations is to move the terms of the sum
corresponding to different energy groups to the right-hand side and solve
the resulting set of equations for ^i,..., I/JG in increasing order of index,
using the most recently updated quantities on the right-hand side; that
is,
Incomplete Decompositions
171
172 Iterative Methods for Solving Linear Systems
the lower triangular system Ly = r and then solves the upper triangular system
LHz = y.
The same idea can also be applied to non-Hermitian matrices to obtain
an approximate LU factorization. The product M = LU of the incomplete
LU factors then can be used as a preconditioner in a non-Hermitian matrix
iteration such as GMRES, QMR, or BiCGSTAB. The idea of generating such
approximate factorizations has been discussed by a number of people, the
first of whom was Varga [136]. The idea became popular when it was used
by Meijerink and van der Vorst [99] to generate preconditioned for the CG
method and related iterations. It has proved a very successful technique in a
range of applications and is now widely used in large physics codes. The main
results of this section are from [99].
We will show that the incomplete LU decomposition exists if the coefficient
matrix A is an M-matrix. This result was generalized by Manteuffel [95] to
cover H-matrices with positive diagonal elements. The matrix A = [aij] is
an H-matrix if its comparison matrix—the matrix with diagonal entries |aji|,
i = 1,..., n and off-diagonal entries — \Oij\, i,j = 1,...,n, j ^ i—is an M-
matrix. Any diagonally dominant matrix is an H-matrix, regardless of the
signs of its entries.
In fact, this decomposition often exists even when A is not an ff-matrix. It
is frequently applied to problems in which the coefficient matrix is not an H-
matrix, and entries are modified, when necessary, to make the decomposition
stable [87, 95].
The proof will use two results about M-matrices, one due to Fan [47] and
one due to Varga [135].
LEMMA 11.1.1 (Fan). If A = [a^] is an M-matrix, then A^ = [aL ] is an
M-matrix, where A^ is the matrix that arises by eliminating the first column
of A using the first row.
LEMMA 11.1.2 (Varga). If A = [aij] is an M-matrix and the elements of
B = [bij] satisfy
by the relations
From this it is easily seen that A^ is the matrix that arises from A^ by
eliminating elements in the fcth column using row fc, while A^ is obtained
from A(k~l) by replacing entries in row or column k whose indices are in P by
0.
Now, A(°) = A is an M-matrix, so R^ > 0. From Lemma 11.1.2 it follows
that AW is an M-matrix and, therefore, L^ > 0. From Lemma 11.1.1 it
follows that A^ is an M-matrix. Continuing the argument in this fashion, we
can prove that A^ and A^ are M-matrices and L^ > 0 and R^ > 0 for
k = I,... ,n — 1. From the definitions it follows immediately that
174 Iterative Methods for Solving Linear Systems
One can solve a linear system with coefficient matrix L by first determining
the red components of the solution in parallel, then applying the matrix BT
to these components, and then solving for the black components in parallel.
Unfortunately, however, the incomplete Cholesky preconditioner obtained with
this ordering is significantly less effective in reducing the number of CG
iterations required than that obtained with the natural ordering.
Because of the zero row sum property of .A, we have ]T^ a^v2 = 0 unless node
i is coupled to the boundary of fi, and this happens only if the distance from
node i to the boundary is O(h). Since v(x,y) € CQ(Q), it follows that at such
points \Vi\ is bounded by O(h). Consequently, since the nonzero entries of A
are of order 0(1), the second sum in (11.4) is bounded in magnitude by the
Incomplete Decompositions 177
number of nodes i that couple to d$l times O(h2). In most cases this will be
0(h).
Because of the local property of A, it follows that for nodes i and j such
that dij is nonzero, the distance between nodes i and j is O(h) and, therefore,
\Vi — Vj\ is bounded by O(h). The first sum in (11.4) therefore satisfies
Suppose that the remainder matrix also has the property that nonzero entries
Tij correspond only to nodes i and j that are separated by no more than
O(h), and suppose also that the nonzero entries of R are of size O(l) (but
are perhaps smaller than the nonzero entries of A). This is the case for the
incomplete Cholesky decomposition where, for the 5-point Laplacian, r^- is
nonzero only if j = i + m — 1 OT j = i — m - f l . These positions correspond to
the nodes pictured below, whose distance from node i is \f2h.
Then, by the same argument as used for A, the first sum in (11.5) is bounded
in absolute value by O(l).
The bound on the second term in (11.4), however, depended on the zero
row sum property of A. If this property is not shared by R (and it is not for
the incomplete Cholesky decomposition or for any regular splitting, since the
entries of R axe all nonnegative), then this second sum could be much larger.
It is bounded by the number of nonzero entries of R in rows corresponding to
nodes away from the boundary, which is typically O(/i~ 2 ), times the nonzero
values of r^, which are of size 0(1), times the value of the function v(x,y)
away from the boundary, which is O(l). Hence the second sum in (11.5) may
be as large as O(h~2). For vectors v representing a Co-function, the ratio
(Rv,v)/(Av,v) in (11.3) is then of size O(h~2), so if (Rv,v} is positive (as it
is for a regular splitting if v > 0), then the ratio (Av,v}/(Mv,v) in (11-3) is
of size O(h2). In contrast, if we consider the first unit vector £1, for example,
178 Iterative Methods for Solving Linear Systems
and where R is negative semidefinite (that is, (Rv,v) < 0 Vu), X^^j = 0 Vi,
and E is a positive definite diagonal matrix. Assume also that R has nonzero
entries only in positions (i,j) corresponding to nodes i and j that are within
O(h) of each other. Our choice of the matrix E depends on the boundary
conditions. For Dirichlet problems, which will be dealt with here, we choose
E = r]h2dia,g(A), where 77 > 0 is a parameter. For Neumann and mixed
problems, similar results can be proved if some elements of E, corresponding
to points on the part of the boundary with Neumann conditions, are taken to
be of order O(h).
From (11.5), it can be seen that .R in (11.6) satisfies
when v(x,y) E CQ(^), since the first sum in (11.5) is of size O(l). Since the
row sums of R are all zero and the nonzero entries of E are of size O(/i 2 ), we
have
For the model problem, the recurrence equations (11.8-11.9) can be written in
the form
(In fact, for n sufficiently large, the elements §j approach a constant value 7
satisfying
The Value is
Since we obtain
for any vector v. An analogous expression for (Rv,v) shows, since the row
sums of R are all zero,
Using the inequality ^(a — 6)2 < (a — c)2 4- (c — 6)2, which holds for any
real numbers a, 6, and c, inequality (11.11) can be written in the form
Incomplete Decompositions 181
FIG. 11.1. Convergence of iterative methods for the model problem, h = 1/51.
Unpreconditioned CG (dash-dot), ICCG (dashed), MICCG (solid).
where the right-hand side can also be expressed as (1 -I- ch) L X^r^oK^+i ~
Vif1 + (vi — ^j+m)2]. Since FJ is nonzero only when bj and Cj are nonzero, we
combine this with inequality (11.10) and obtain
for by a smaller sharp error bound (3.6) for ICCG, but the reason appears to be
that rounding errors have a greater effect on the convergence rate of MICCG,
because of more large, well-separated eigenvalues. For a discussion, see [133].
183
184 Iterative Methods for Solving Linear Systems
Thus, for multigrid methods, the matrix / — M~1A in (12.2) is of the special
form (/ — GAY(I — CA) for certain matrices C and G.
The quantity ||(/ — GA)f(I — CA)\\A is called the contraction number of the
method and will be denoted by a. In terms of the 2-norm, a is given by
The following theorem shows that when C is defined in this way, the matrix
Ai/2CA1/2 is just the orthogonal projector from Cn to the n-dimensional
subspace A1/2 • range(7£).
THEOREM 12.1.1. IfC is defined by (12.9-12.11), then
where ??.(•) denotes the range and jV(-) the null space of an operator. The
matrix A:/2CA1/2 is an orthogonal projector from Cn to an n-dimensional
subspace and hence
and since the null space of /" is the orthogonal complement of the range of /£,
this can be written as
Suppose the matrix G is given. Let d2 > • • • > d^ denote the eigenvalues of
(I-A^2GAl'2YH(I-A^2GA^2)e, and let u i , . . . , vn denote the corresponding
orthonormal eigenvectors. For any vector y we can write y — ^2i=i(y,Vi)vi and
This is the usual bound on the convergence rate for the method of steepest
descent.
On the other hand, suppose some of the eigenvectors of A, say, those
corresponding to the h smallest eigenvalues of .A, lie in the desired space
A1/2 • n(F£). Then an improved bound like (12.17) holds, and this bound
becomes
as shown in Theorem 9.1.2. The eigenvalues are all positive and the smallest
and largest eigenvalues are given in Corollary 9.1.2:
THEOREM 12.1.2. Let A be the 5-point Laplacian matrix so that \itj and
«W> satisfy (12.21-12.22), and let I? be defined by (12.25). Let v^,..'.,v^
denote the eigenvectors corresponding to the s smallest eigenvalues, \i < • • • <
\s. Ifv is any vector in span[v^,..., v^], with \\v\\ = 1, then v can be written
in the form
for some
Multigrid and Domain Decomposition Methods 189
Proof. First suppose that v = v^'^ is an eigenvector. Then for any re-vector
•u;, we have
Let u>(l'fi match v^'^ at the nodes of the coarse grid so that
Note that if Zptq is denned by (12.28) for all points p and q, then the vector;
z(^j) are orthonormal (Exercise 12.1), as are the vectors v^^ (Exercise 9.3). I:
z(*,j) is defined to be 0 at the other grid points, then it can be checked (Exercise
12.1) that
Summing over p and q and using the formula 1 — cosx = 2 sin2 (o;/2), w<
have
190 Iterative Methods for Solving Linear Systems
Making these substitutions and using the fact that ||v(*J')||2 = 1 and ||z(t>J')||2 <
1/4, we can write
or
Now let V^ be the matrix whose columns are the eigenvectors v^'^
corresponding to the 5 smallest eigenvalues, and let v = V8£ be an arbitrary
vector in the span of the first s eigenvectors, with ||u|| = ||£|| = 1. Consider
approximating v by the vector A1/2I^(W8A7 £), where W3 has columns w^
corresponding to the s smallest eigenvalues and As is the diagonal matrix of
these eigenvalues. The difference S = v — Al/2I£(WsA.s f) is given by
Each of the matrices Fsoe, V*0, V°°, and Z™ has norm less than or equal
to 1, because it is part of an orthogonal matrix. Since ||£|| = 1, it follows that
The constant 2 + \/6 in (12.26) is not the best possible estimate because the
piecewise linear interpolant of uW) used in the theorem is not the best possible
coarse grid approximation to u( tj ').
Corollary then
Proof. The left-hand side of inequality (12.34) is the square of the norm of
the vector v = S|=i(j/, u^)u^, and according to Theorem 12.1.2 this vector
satisfies
We now use Theorem 12.1.2 and Corollary 12.1.1 to bound the quantity
on the right-hand side of (12.14), again assuming that G = 7/. In this case,
inequality (12.14) can be written in the form
where K' = \n/\s+i. Applying Corollary 12.1.1 (and using the fact that a
function of the form x + ((K' — !)/(«' + l)) 2f (l — x) is an increasing function
of x for 0 < x < 1), this becomes
We thus obtain a bound on a2 that is strictly less than one and independent
of h. For example, choosing 0 — l/(2c2) gives
For the model problem, this establishes K' < 16(2 + \/6)2 and, for i = 1,
<r < .997. This is a large overestimate of the actual contraction number for the
two-grid method, but it does establish convergence at a rate that is independent
of h. To obtain a better estimate of cr, it would be necessary to derive a sharper
bound on the constant c in Theorem 12.1.2.
Multigrid and Domain Decomposition Methods 193
(7—1)
Project r%_i onto grid level j; that is, set
and compute
endfor
For j = J- !,...,!,
Interpolate d^_ r ' to grid level j and add to tf{~x ; that
is, replace
endfor
Interpolate d^^ to grid level 0 and replace Uk-i •*— «fc-i + ^i^Jb-i-
Perform a relaxation sweep with initial guess UK-I on grid level 0;
that is, set Uk = v,k-i+G~l(f—Auk-\). Compute the new residual
rk~rf=f-Auk.
The restriction and prolongation matrices /|+1 and /|+1, as well as the
relaxation scheme with matrix G, can be tuned to the particular problem.
For the model problem, the linear interpolation matrix /j+1 is appropriate,
although it is not the only choice, and it is reasonable to define the restriction
matrix I3-+ to be /L_i, as in section 12.1.1. The damped Jacobi relaxation
scheme described in section 12.1.1 is convenient for analysis, but other
relaxation schemes maj|i perform better in practice. The red-black Gauss-
Seidel relaxation method is often used. (That is, if nodes are ordered so that
the matrix A has the form (9.9), then G is taken to be the lower triangle of
A.)
Figure 12.1 shows the convergence of the multigrid V-cycle with red-black
Gauss-Seidel relaxation for the model problem for grid sizes h = 1/64 and
h — 1/128. The coarsest grid, on which the problem was solved directly,
was of size h = 1/4. Also shown in Figure 12.1 is the convergence curve for
MICCG(O). The work per iteration (or per cycle for multigrid) for these two
algorithms is similar. During a multigrid V-cycle, a (red-black) Gauss-Seidel
relaxation step is performed once on the fine grid and twice on each of the
coarser grids. Since the number of points on each coarser level grid is about
1/4 that of the finer grid, this is the equivalent of about 1 and 2/3 Gauss-
Seidel sweeps on the finest grid. After the fine grid relaxation is complete,
a new residual must be computed, requiring an additional matrix-vector
multiplication on the fine grid. In the MICCG(O) algorithm, backsolving with
the L and LT factors of the MIC decomposition is twice the work of backsolving
with a single lower triangular matrix in the Gauss-Seidel method, but only one
matrix-vector multiplication is performed at each step. The CG algorithm also
requires some inner products that are not present in the multigrid algorithm,
but the multigrid method requires prolongation and restriction operations that
roughly balance with the work for the inner products. The exact operation
count is implementation dependent, but for the implementation used here
(which was designed for a general 5-point matrix, not just the Laplacian),
the operation count per cycle/iteration was about 4 In for multigrid and about
30n for MICCG(O).
It is clear from Figure 12.1 that for the model problem, the multigrid
method is by far the most efficient of the iterative methods we have discussed.
Moreover, the multigrid method demonstrated here is not the best. The
number of cycles can be reduced even further (from 9 down to about 5 to
achieve an error of size 10~6) by using the W-cycle or the full multigrid V-
cycle, with more accurate restriction and prolongation operators. The work
per cycle is somewhat greater, but the reduction in number of cycles more than
makes up for the slight increase in cycle time. (See Exercise 12.2.)
The multigrid method described here works well for a variety of problems,
including nonsymmetric differential equations, such as — A u + cux = /, as
well as for the model problem. It should be noted, however, that while the
performance of ICCG and MICCG is not greatly changed if the model problem
196 Iterative Methods for Solving Linear Systems
FlG. 12.1. Convergence of the multigrid V-cyde (solid) and MICCG(Q) (dashed)
for the model problem.
should even be defined on coarser level grids, and one cannot expect to gain
much information from a "solution" on a such a grid.
represented on a coarser grid and those that cannot. With this interpretation,
multigrid methods fall under the heading of domain decomposition methods.
In this chapter, we describe some basic domain decomposition methods but
give little of the convergence theory. For further discussion, see [123] or [77].
The domain Q is an open set in the plane or in 3-space, and d£l denotes the
boundary of Q. We denote the closure of Cl by £2 = O U 5J7. We have chosen
Dirichlet boundary conditions (u = g on <9£1), but Neumann or Robin boundary
conditions could be specified as well.
The domain f2 might be divided into two overlapping pieces, fl\ and £1%,
such that f2 = HI U J72) as pictured in Figure 12.2. Let I\ and T% denote the
parts of the boundaries of QI and £l<z, respectively, that are not part of the
boundary of fi. To solve this problem, one might guess the solution on FI and
solve the problem
where g\ is the initial guess for the solution on FI. Letting g% be the value of
u\ on F2, one then solves
If the computed solutions u\ and 1/2 are the same in the region where they
overlap then the solution to problem (12.36) is
If the values of u\ and u^ differ in the overlap region, then the process can
be repeated, replacing gi by the value of u-2 on T\, and re-solving problem
(12.37), etc. This idea was introduced by Schwarz in 1870 [120], not as a
computational technique, but to establish the existence of solutions to elliptic
problems on regions where analytic solutions were not known. When used as
a computational technique it is called the alternating Schwarz method.
A slight variation of the alternating Schwarz method, known as the
multiplicative Schwarz method, is more often used in computations. Let the
problem (12.36) be discretized using a standard finite difference or finite
element method, and assume that the overlap region is sufficiently wide so
Multigrid and Domain Decomposition Methods 199
that nodes in fii\f?2 do not couple to nodes in i^V^i, and vice versa. Assume
also that the boundaries FI and F2 are grid lines. If nodes in Jl^f^ are
numbered first, followed by nodes in fii n J72, and then followed by nodes in
£&2\Oi, then the discretized problem can be written in the form
where the right-hand side vector / includes contributions from the boundary
term u ~ g on d$l.
Starting with an initial guess u^ (which actually need only be defined on
FI for a standard 5-point discretization or, more generally, on points in f^2\^i
that couple to points in QI n f^), the multiplicative Schwarz method for the
discretized system generates approximations u^k\ k = 1,2,..., satisfying
The first equation corresponds to solving the problem on QI, using boundary
data obtained from u^ > ^ . The second equation corresponds to solving the
problem on £22, using boundary data obtained from u^ \n.
200 Iterative Methods for Solving Linear Systems
Note that this is somewhat like a block Gauss-Seidel method for (12.39),
since one solves the first block equation using old data on the right-hand side
and the second block equation using updated data on the right-hand side, but
in this case the blocks overlap:
Let Ej', i = 1,2, be the rectangular matrix that takes a vector defined on
all of fi and restricts it to £1$:
The matrix Ei takes a vector defined on Qj and extends it with zeros to the
rest of £2. The matrices on the left in (12.40) are of the form E?AEi, where A
is the coefficient matrix in (12.39). Using this notation, iteration (12.40) can
be written in the equivalent form
Writing this as two half-steps and extending the equations to the entire
domain, the iteration becomes
This is the simple iteration method described in section 2.1 with preconditioner
M"1 = Bi + B2 - BiABi.
One could also consider solving (12.39) using an overlapping block Jacobir
type method; that is, using data from the previous iterate in the right-hand
sides of both equations in (12.40). This leads to the set of equations
Multigrid and Domain Decomposition Methods 201
where u^. = Eju^. The value of u^ in the overlap region has been set in
two different ways by these equations. For the multiplicative Schwarz method,
we used the second equation to define the value of u^ in the overlap region;
for this variant it is customary to take w[j in rj 2 to be the sum of the two values
defined by these equations. This leads to the additive Schwarz method:
(For a Hermitian positive definite problem we could equally well consider the
Krylov space generated by L~H f and L~1AL~H', where M = LLH.) Suppose
/ has nonzero components in only one of the subdomains, say, £l\. Since
a standard finite difference or finite element matrix A contains only local
couplings, the vector Af will be nonzero only in subdomains that overlap with
fii (or, perhaps, subdomains that are separated from fii by just a few mesh
widths). It is only for these subdomains that the right-hand side E?f of the
subdomain problem will be nonzero and hence that M~lAf will be nonzero.
If this set of subdomains is denoted Si, then it is only for subdomains that
overlap with (or are separated by just a few mesh widths from) subdomains in
Si that the next Krylov vector (M~lA)^f will be nonzero. And so on. The
number of Krylov space vectors will have to reach the length of the shortest
path from J7i to the most distant subregion, say, J7j, before any of the Krylov
vectors will have nonzero components in that subregion. Yet the solution u(x)
of the differential equation and the vector u satisfying the discretized problem
Au = / may well have nonzero (and not particularly small) components in all
of the subdomains. Hence any Krylov space method for solving Au = / with a
zero initial guess and the additive Schwarz preconditioner will require at least
this shortest path length number of iterations to converge (that is, to satisfy
a reasonable error tolerance). As the number of subdomains increases, the
shortest path between the most distant subregions also increases, so the number
of iterations required by, say, the GMRES method with the additive Schwarz
preconditioner will also increase. The reason is that there is no mechanism for
global communication among the subdomains.
An interesting cure for this problem was proposed by Dryja and Widlund
[36]. In addition to the subdomain solves, solve the problem on a coarse grid
whose elements are the subregions of the original grid. If this problem is de-
noted ACUC = fc and if /£ denotes an appropriate type of interpolation (say,
linear interpolation) from the coarse to the fine grid, then the preconditioner
M~l in (12.43) is replaced by
If the matrices AH and Ayz can be inverted easily, then the variables u\ and
U2 can be eliminated using Gaussian elimination and a much smaller Schur
complement problem solved on the interface F:
Exercises.
12.1. Show that the vectors z^\ i, j = 1,...,TOwith components
[1] R. Alcouffe, A. Brandt, J. Dendy, and J. Painter, The multigrid method for
diffusion equations with strongly discontinuous coefficients, SI AM J. Sci. Statist.
Comput., 2 (1981), pp. 430-454.
[2] M. Arioli and C. Fassino, Roundoff error analysis of algorithms based on Krylov
subspace methods, BIT, 36 (1996), pp. 189-205.
[3] S. F. Ashby, P. N. Brown, M. R. Dorr, and A. C- Hindmarsh, A linear algebraic
analysis of diffusion synthetic acceleration for the Boltzmann transport equation,
SIAM J. Numer. Anal., 32 (1995), pp. 179-214.
[4] S. F. Ashby, R. D. Falgout, S. G. Smith, and T. W. Fogwell, Multigrid
preconditioned conjugate gradients for the numerical simulation of groundwater flow
on the Cray T3D, American Nuclear Society Proceedings, Portland, OR, 1995.
[5] S. F. Ashby, T. A. Manteuffel, and P. E. Saylor, A taxonomy for conjugate
gradient methods, SIAM J. Numer. Anal., 27 (1990), pp. 1542-1568.
[6] 0. Axelsson, A generalized SSOR method, BIT, 13 (1972), pp. 442-467.
[7] 0. Axelsson, Bounds of eigenvalues of preconditioned matrices, SIAM J. Matrix
Anal. Appl., 13 (1992), pp. 847-862.
[8] O. Axelsson and H. Lu, On eigenvalue estimates for block incomplete factorization
methods, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 1074-1085.
[9] N. S. Bakhvalov, On the convergence of a relaxation method with natural
constraints on the elliptic operator, U.S.S.R. Comput. Math, and Math. Phys., 6
(1966), pp. 101-135.
[10] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donate, J. Dongarra,
V. Eijkhout, R. Pozo, C. Romine, and H. van der Vorst, Templates for the Solution
of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, PA,
1995.
[11] T. Barth and T. Manteuffel, Variable metric conjugate gradient methods, in
PCG '94: Matrix Analysis and Parallel Computing, M. Natori and T. Nodera, eds.,
Yokohama, 1994.
[12] T. Barth and T. Manteuffel, Conjugate gradient algorithms using multiple
recursions, in Linear and Nonlinear Conjugate Gradient-Related Methods, L. Adams
and J. L. Nazareth, eds., SIAM, Philadelphia, PA, 1996.
[13] R. Beauwens, Approximate factorizations with S/P consistently ordered M-
factors, BIT, 29 (1989), pp. 658-681.
[14] R. Beauwens, Modified incomplete factorization strategies, in Preconditioned
Conjugate Gradient Methods, 0. Axelsson and L. Kolotilina, eds., Lecture Notes
in Mathematics 1457, Springer-Verlag, Berlin, New York, 1990, pp. 1-16.
205
206 References
[36] M. Dryja and O. B. Widlund, Some domain decomposition algorithms for elliptic
problems, in Iterative Methods for Large Linear Systems, L. Hayes and D. Kincaid,
eds., Academic Press, San Diego, CA, 1989, pp. 273-291.
[37] T. Dupont, R. P. Kendall, and H. H. Rachford, Jr., An approximate factorization
procedure for solving self-adjoint elliptic difference equations, SIAM J. Numer. Anal.,
5 (1968), pp. 559-573.
[38] M. Eiermann, Fields of values and iterative methods, Linear Algebra Appl., 180
(1993), pp. 167-197.
[39] M. Eiermann, Fields of values and iterative methods, talk presented at Oberwol-
fach meeting on Iterative Methods and Scientific Computing, Oberwolfach, Germany,
April, 1997, to appear.
[40] S. Eisenstat, H. Elman, and M. Schultz, Variational iterative methods for
nonsymmetric systems of linear equations, SIAM J. Numer. Anal., 20 (1983), pp. 345-
357.
[41] S. Eisenstat, J. Lewis, and M. Schultz, Optimal block diagonal scaling of block
2-cyclic matrices, Linear Algebra Appl., 44 (1982), pp. 181-186.
[42] L. Eisner, A note on optimal block scaling of matrices, Numer. Math., 44 (1984),
pp. 127-128.
[43] M. Engeli, T. Ginsburg, H. Rutishauser, and E. Stiefel, Refined Iterative Methods
for Computation of the Solution and the Eigenvalues of Self-adjoint Boundary Value
Problems, Birkhauser-Verlag, Basel, Switzerland, 1959.
[44] V. Faber, W. Joubert, M. Knill, and T. Manteuffel, Minimal residual method
stronger than polynomial preconditioning, SIAM J. Matrix Anal. Appl., 17 (1996),
pp. 707-729.
[45] V. Faber and T. Manteuffel, Necessary and sufficient conditions for the existence
of a conjugate gradient method, SIAM J. Numer. Anal., 21 (1984), pp. 352-362.
[46] V. Faber and T. Manteuffel, Orthogonal error methods, SIAM J. Numer. Anal.,
24 (1987), pp. 170-187.
[47] K. Fan, Note on M-matrices, Quart. J. Math. Oxford Ser., 11 (1960), pp. 43-49.
[48] J. Favard, Sur les polynomes de Tchebicheff, C. R. Acas. Sci. Paris, 200 (1935),
pp. 2052-2053.
[49] R. P. Fedorenko, The speed of convergence of one iterative process, U.S.S.R.
Comput. Math, and Math. Phys., 1 (1961), pp. 1092-1096.
[50] B. Fischer, Polynomial Based Iteration Methods for Symmetric Linear Systems,
Wiley-Teubner, Leipzig, 1996.
[51] R. Fletcher, Conjugate gradient methods for indefinite systems, in Proc. Dundee
Biennial Conference on Numerical Analysis. G. A. Watson, ed., Springer-Verlag,
Berlin, New York, 1975.
[52] G. E. Forsythe and E. G. Strauss, On best conditioned matrices, Proc. Amer.
Math. Soc., 6 (1955), pp. 340-345.
[53] R. W. Freund, A transpose-free quasi-minimal residual algorithm for non-
Hermitian linear systems, SIAM J. Sci. Comput., 14 (1993), pp. 470-482.
[54] R. W. Freund and N. M. Nachtigal, QMR: A quasi-minimal residual method for
non-Hermitian linear systems, Numer. Math., 60 (1991), pp. 315-339.
[55] R. Freund and S. Ruscheweyh, On a class of Chebyshev approximation problems
which arise in connection with a conjugate gradient type method, Numer. Math., 48
(1986), pp. 525-542.
[56] E. Giladi, G. H. Golub, and J. B. Keller, Inner and outer iterations for the
Chebyshev algorithm, SCCM-95-10 (1995), Stanford University, Palo Alto. To appear
in SIAM J. Numer. Anal.
208 References
ditioned Conjugate Gradients, ACM Trans. Math. Software, 6 (1980), pp. 206-219.
[101] N. Nachtigal, A look-ahead variant of the Lanczos algorithm and its application
to the quasi-minimal residual method for non-Hermitian linear systems, Ph.D.
dissertation, Massachusetts Institute of Technology, Cambridge, MA, 1991.
[102] N. M. Nachtigal, S. Reddy, and L. N. Trefethen, How fast are nonsymmetric
matrix iterations?, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 778-795.
[103] N. M. Nachtigal, L. Reichel, and L. N. Trefethen, A hybrid GMRES algorithm for
nonsymmetric linear systems, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 796-825.
[104] R. A. Nicolaides, On the L2 convergence of an algorithm for solving finite element
equations, Math. Comp., 31 (1977), pp. 892-906.
[105] A. A. Nikishin and A. Yu. Yeremin, Variable block CG algorithms for solving large
sparse symmetric positive definite linear systems on parallel computers, I: General
iterative scheme, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 1135-1153.
[106] Y. Notay, Upper eigenvalue bounds and related modified incomplete factorization
strategies, in Iterative Methods in Linear Algebra, R. Beauwens and P. de Groen,
eds., North-Holland, Amsterdam, 1991, pp. 551-562.
[107] D. P. O'Leary, The block conjugate gradient algorithm and related methods,
Linear Algebra Appl., 29 (1980), pp. 293-322.
[108] C. W. Oosterlee and T. Washio, An evaluation of parallel multigrid as a solver
and a preconditioner for singular perturbed problems, Part I. The standard grid
sequence, SIAM J. Sci. Comput., to appear.
[109] C. C. Paige, Error Analysis of the Lanczos Algorithm for Tridiagonalizing a
Symmetric Matrix, J. Inst. Math. Appl., 18 (1976), pp. 341-349.
[110] C. C. Paige, Accuracy and Effectiveness of the Lanczos Algorithm for the
Symmetric Eigenproblem, Linear Algebra Appl., 34 (1980), pp. 235-258.
[Ill] C. C. Paige and M. A. Saunders, Solution of sparse indefinite systems of linear
equations, SIAM J. Numer. Anal., 11 (1974), pp. 197-209.
[112] B. N. Parlett, The Symmetric Eigenvalue Problem, Prentice-Hall, Englewood
Cliffs, NJ, 1980.
[113] B. N. Parlett, D. R. Taylor, and Z. A. Liu, A look-ahead Lanczos algorithm for
unsymmetric matrices, Math. Comp., 44 (1985), pp. 105-124.
[114] C. Pearcy, An elementary proof of the power inequality for the numerical radius,
Michigan Math. J., 13 (1966), pp. 289-291.
[115] J. K. Reid, On the method of conjugate gradients for the solution of large sparse
linear systems, in Large Sparse Sets of Linear Equations, J. K. Reid, ed., Academic
Press, New York, 1971.
[116] Y. Saad, Preconditioning techniques for nonsymmetric and indefinite linear
systems, J. Comput. Appl. Math., 24 (1988), pp. 89-105.
[117] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS Pub. Co., Boston,
MA, 1996.
[118] Y. Saad and A. Malevsky, PSPARSLIB: A portable library of distributed memory
sparse iterative solvers, in Proc. Parallel Computing Technologies (PaCT-95), 3rd
International Conference, V. E. Malyshkin, et al., ed., St. Petersburg, 1995.
[119] Y. Saad and M. H. Schultz, GMRES: A generalized minimal residual algorithm
for solving nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 7 (1986),
pp. 856-869.
[120] H. A. Schwarz, Gesammelte Mathematische Abhandlungen, Vol. 2, Springer,
Berlin, 1890, pp. 133-143 (first published in Vierteljahrsschrift Naturforsch. Ges.
Zurich, 15 (1870), pp. 272-286).
[121] H. D. Simon, The Lanczos algorithm with partial reorthogonalization, Math.
References 211