Bjorck 1988
Bjorck 1988
AKE BJORCK
Department of Mathematics, Linkrping University, S-58183, Linkrping, Sweden
Abstract.
An iterative method based on Lanczos bidiagonalization is developed for computing regularized
solutions of large and sparse linear systems, which arise from discretizations of ill-posed problems
in partial differential or integral equations. Determination of the regularization parameter and
termination criteria are discussed. Comments are given on the computational implementation of
the algorithm.
I. Introduction.
In this paper we develop a Lanczos type method for solving systems of linear
equations resulting from discretizations of ill-posed problems in partial differ-
ential and integral equations. Such problems arise in many applications e.g. in
atmospheric studies, geophysics, reconstruction problems and profile inversions.
We assume here that the problem has been discretized into a linear least
squares problem
(1.1) minltb-Axlt2, A: m x n , re>n,
x
where the matrix A is so large that direct methods of solution are not feasible.
The method we are going to develop only requires that matrix b y vector
products Av and Aru can be computed efficiently. This will be the case when
e.g. the matrix A is sparse or has Toeplitz form.
For a survey of methods from numerical linear algebra for solving ill-posed
problems see Bj6rck and Eldrn [1]. It is well known that any attempt to solve
(1.1) without modification will give useless results. Data and round-off errors
will blow up and cause the solution to be completely inaccurate unless some
regularization is used. If we let the singular value decomposition of A be
(1.2) A = U S V r = ~ aiuiv'f,
i=1
For a fine mesh size h, the matrix Ah will have a cluster of singular values
close to zero, corresponding to rapidly oscillating eigensolutions. The corre-
sponding components have to be damped when (1.5) is used to compute uh(0).
An analysis of the rate of convergence of the conjugate gradient method for
spectra of the type arising for some ill-posed problems can be found in
Winther [18].
In [9] Johnson obtained good solutions to (1.5) by using the conjugate
gradient method and terminating after only a few steps. Such a good per-
formance can only be expected if convergence to the regular part of the
solution takes place before the effects of ill-condition show up. It is not clear
that this can always be guaranteed to happen, and therefore it seems to be of
interest to construct methods, which allow a greater control of the regularization
A BIDIAGONALIZAT1ON ALGORITHM ... 661
than the conjugate gradient method. We develop in sections 2 and 3 such a
method based on a variant of the bidiagonalization of Golub and Kahan [6].
In section 4 we discuss methods for choosing the regularization parameter. Some
computational details are considered in section 5 and in section 6 the method
is extended to a generalized form of regularization.
2. T h e bidiagonalization algorithm.
where ai -> 0 and fit > 0, i = O, 1, 2 .... are chosen so that ]luill2 = livill2 = 1.
With the definitions
(2.3)
Uk = (Ul, US,..-, Uk),
f12 aS
fla".
"'. ~k
ilk+
(2.4) Uk+l(ptel)=b,
We now seek an approximate solution Xk to (1.1) such that Xk ~ span(Vk) and write
(2.6) Xk = Vkfk.
662 A.KEBJORCK
It can be shown that in exact arithmetic we have U~+ 1Uk+t = I and VkrVk = I.
Then it follows that
and lib- hxdl2 is minimized over all Xk in span(Vk) by takingfk to be the solution
to the least squares problem
(2.9) dk =/~1el-Bk/k,
(2.10) A T(b -- A x k) = ~ k + l V k + l e k T+ l d k .
Assume now that ~i:P0, flig:0, i = 1..... k. If in the next step flk+l = 0 ,
then Bk has rank k and we can find an ~ such that dk = 0 and AXk = b. If
flk+t ~ 0 but ~k+l = 0, then by (2.10) Xk is a least squares solution of A x = b.
Thus, the recurrence (2.2) cannot break down before a solution to (1.1) is
obtained.
The relations above are basic for the algorithm L S Q R . Paige and Saunders
solve (2.8) by computing the QR-decomposition of B k and they derive simple
recursions for the updating of successive approximations Xk. Note that these Xk
are in exact computation identical to the sequence of approximations generated
by the conjugate gradient method.
The bidiagonalization algorithm described above is closely related to the
Lanczos process applied to the symmetric matrix A r A . If the same starting
vector vl is used as in (2.1), then from (2.5)
It follows that the matrix Vk is the same and that the tridiagonal matrix
generated by the Lanczos process is Tk = BrBk • Therefore the singular values
of Bk are related to the singular values of A exactly as the eigenvalues of Tk
are to the eigenvalues of A r A .
3. Regularization.
(3.1) Bk = Pk Q~ = E e°iPiqf ,
i=1
Then the solution to the least squares problem (2.8) can be written
k
(3.2) fk = fllQk(12k I O)P~el = fll ~ 0)i-lxliqi,
i=1
xji = (Pk)ii.
IIb--axklh = Ildklh = ~ x l x x . ~ + l l ,
We note the important role played by the elements in the first and last low
of Pk.
We can now compute a sequence of approximations XR(•), k = 1, 2 .... to the
regularized solution x(6) defined by (1.3) using the singular value decomposition
(3.1) We take
i.e. in the expansion (3.2) for fk w e only keep terms for which coi > 6. For the
norm of Xk(3) we obtain using the orthogonality of Vk and Qk
where the prime on the summation indicates that i should vary from 1 to k + 1.
Since rk(3) = b--AXR(3) = Uk+ l dk(3) it follows that
Further, we have
We have shown that the norms of xk(6), rk(6) and Arr~(6) can be computed
without explicitly forming xk(6) or J~(6). We only need the singular values 12k
of Bk and the first and last rows of the matrix of singular vectors Pk, which
quantities can be computed in only O(k 2) operations. The key idea in our
algorithm is to base the stopping criterion for the bidiagonalization and the
choice of the regularization parameter 6 on these cheaply computed quantities.
When k and 6 have been determined, then xk(6) is finally computed from (3.3).
To be able to do this it is necessary to save the vectors vi, i = 1, 2..... possibly
on secondary storage. We remark that the idea of saving the Lanczos vectors
has been suggested by Parlett [12], who exploits it for solving symmetric
indefinite systems and for the efficient treatment of multiple right hand sides.
So far we have only considered regularization by truncating the singular value
expansion. However, a similar algorithm can also be developed for the damped
least squares solution (1.4). We then need to compute instead of J~(6) in (3.3)
k
(3.7) fk(#) =/31 ~ (°~,/(c°] +#2))xliq,.
i=1
xk(#), rk(#) and Arrk(p) are straightforward to derive. When k is moderately large
the most efficient way to compute Jk(/~) is to note that it is the solution to the
least squares problem
min { B k ~ f k - - I I
s, l t , # ' W /Tlel '
which can be solved for fixed/~ in O(k) operations as shown by Eld~n [4].
Using (3.5) it is easy to show that when the truncated singular value expansion
is used, we should choose the number of terms v = v(6) < k to include as the
minimizer of
Obviously it only takes O(k) operations to find this minimum. Note that all the
information needed to compute the cross-validation function V(6) is contained
in the first row of the matrix Pk- When the conjugate gradient method is used
to solve (1.1), then regularization is achieved by limiting the number of steps, i.e.
the dimension of the subspace span(~). Wold et al. [19] use the bidiagonal-
ization algorithm with 6 = 0 and use cross-validation to determine the number
of steps to take. It is possible to use cross-validation also for determining when
to terminate the bidiagonalization algorithm with regularization. If we put
666 AKE BJ(~RCK
Ca = min Vk(v),
v
or
is satisfied then x~(3) is the exact solution to a perturbed least squares problems
(A + 6 A ) x = b 4- fib, where
The quantities needed for the tests (4.4) and (4.5) are readily available from
(3.5) and (3.6) in terms of f2k and the first and last row of Pk. However, in
practice it might be difficult to choose adequate values for eA and eb. Further,
conditions (4.4) and (4.5) are sufficient but not necessary and so might lead to
a too large value of k.
5. Computational details.
The storage requirement of the algorithm is one m-vector (ui), one n-vector (vi)
and four k-vectors (c~i,fli, rni and Kli). This assumes that the bidiagonatization
operations A v - o ~ u and A T u - - f l v are implemented to overwrite u and v respec-
tively. Apart from this we need to store the k n-vectors vl . . . . . v k on secondary
storage.
We now consider the computation of the singular value decomposition of B k.
Since B k is (k + 1)x k it cannot directly be input to a standard subroutine for
singular value decomposition of bidiagonal matrices. One efficient solution would
be to make a slight change in the chasing algorithm, cf. Dongarra et al.
[3, pp. 11-12]. However, a simpler solution is to adjoin a zero column to B k
to make it lower bidiagonal and compute the singular value decomposition
(5.1) = =
". ~k
#k+l 0
A BIDIAGONALIZATION ALGORITHM ... 667
Since Bk is singular by construction we have 6)k+ 1 = 0. Dropping the last column
in Bk and Q[ in (5.t) it is readily seen that Qk in (3.1) is the k x k principal
submatrix of Qk and ~g = 6~, i = 1..... k. Note that although Bk differs from
Bk-1 only in its last column, updating techniques are not efficient.
The amount of work required for the computation of f2k and e~Pk is only
O(k 2) multiplications per bidiagonalization step. This is often small compared to
the work in the bidiagonalization algorithm, which is 3(re+n) multiplications
plus the two matrix-vector multiplications Au and ATr. For the final computation
of Xk(6) we also need the orthogonal factor Qk at the expense of O(k3)
multiplications and finally to form Vkfk(6), which requires kn operations and a
pass through all the vectors vl ..... Vk.
In case k is not small the amount of work in the singular value decomposition
can become substantial. We could then save work by just performing the singular
value decomposition every sth step (s > 1) of the bidiagonalization. Further, the
explicit computation of Qk in the last step can be avoided and fk(6) computed
in only O(k z) operations as follows. We rewrite (3.3) as
A(3) = [31Qkzk(6)
where
zk(6)=((1 ..... ~k)r ' (i=(~oi ifcoi_>fi
otherwise"
We now first form Zk(6) and then compute the product Qkgk(~) by applying the
rotations from the singular value decomposition to Zk(6). Note, however, that the
rotations are to be applied in reverse order and not in the order they are
generated, and so we have to store these rotations. This technique has been
used by Cuppen in [2], and it reduces the operation count to O(k2).
It is well known, see Paige [10], that when the first singular value of Bk has
converged to a singular value of A there is an inevitable loss of orthogonality
in Vk and consequently also in Uk. The relation (2.4) will continue to hold with
good accuracy and therefore the problem (2.7) and that of minimizing lib-AxllZ2
over span(Vk) are still almost equivalent. Hence the loss of orthogonality will
only cause a delay of convergence and there is no loss of stability.
More seriously, the expression (4.3) for the cross-validation function will no
longer hold when orthogonality is lost. For the use of cross validation it seems
essential that either a complete reorthogonalization of the computed vectors
ui+ 1 and vi+ 1 is carried out or possibly the technique of selective orthogonal-
ization developed by Parlett and Scott [13] is used.
We finally mention the possibility of using a block bidiagonalization algorithm
in place of (2.1). Some advantages of such a method are discussed by O'Leary
[14]. A detailed discussion of implementation in the context of computing
singular values and singular vectors of the block bidiagonalization algorithm is
found in Golub, Luk and Overton [7].
668 AKE ~ORCK
6. Generalized regularization.
The damped least squares solution x(3) in (1.4) solves the problem
for some (n-p)x n matrix B. For this problem the eigenvalues of the matrix
ATA+#2BTB are relevant, and these do not depend in a simple way on #.
Therefore it is not possible to give a simple closed expression for x(p) in this
general case. Under the assumption that the nullspaces of A and B intersect
only trivially, Eld6n [4] has given a method to transform the problem (6.2)
back to standard form. We now show how to perform this transformation
implicitly for the iterative algorithm we have developed in the earlier sections.
In applications the matrix B is usually a discrete approximation of a
differential operator of low order, and consequently a banded matrix. We
assume in the following that B is (n - p ) × n and of rank n - p , where p is small.
Further we assume that an orthogonal basis W = (wl ..... wp) for the nullspace
of B is known. Following Eld6n we express the solution as
Next we form the m x p matrix A W and reduce this to upper triangular form
by an orthogonal transformation
We solve this standard problem for ~(6) using the algorithm based on bi-
diagonalization on ,4. We then have to generate the vectors from
Here we use the factored form of B ÷ and Q2 given in (6.4) and (6.6). Note that
B-1 is not explicitly computed. If the bandwidth of B is also of order p, then
the extra work in (6.7) compared to (2.1) will only be O(p(n + m)) multiplications
per iteration step.
When we have computed the solution x(6) we determine z in (6.3) from
IIb-ax(6)ll2 = 11/;--~(6)112,
and cross-validation can be used in determining 6.
7. Concluding remarks.
The conjugate gradient method has been used successfully to solve several
kinds of ill-posed problems. In cases when this method works it is more efficient
than the method suggested here based on bidiagonalization. However, the
greater flexibility and robustness offered by the bidiagonalization method seems
to be valuable in many cases.
Particularly interesting is the possibility to use cross-validation for determining
the regularization parameter, which so far only has been used for direct methods
of solution.
Acknowledgement.
The author is indebted to David S. Scott for the insight that saving the
Lanczos vectors could lead to a more satisfactory method for solving large
regularization problems.
670 AKE BJORCK
REFERENCES
I. A~. Bj6rck and L. Eld6n, Methods in numerical algebra Jbr ill-posed problems. Proceedings of the
International Symposium on Ill-posed Problems: Theory and Practice, University of Delaware
(1979).
2. J. J. M. Cuppen, The singular value decomposition in product Jorm, SIAM J. Sci. Stat. Comput.
4, 216-222 (1983).
3. J. J. Dongarra, C. B. Moler, J. R. Bunch and G. W. Stewart, L I N P A C K User's Guide, SIAM
Publications, Philadelphia (1979).
4. L. Eld6n, Algorithms jbr the regularization of ill-conditioned least squares problems, BIT 17,
134 145 (1977).
5. G. H. Golub, M. Heath and G. Wahba, Generalized cross-validation as a methodjbr choosing a
good ridqe parameter, Technometrics 21,215-223 0979).
6. G. H. Golub and .W. Kahan, Calculating the singular values and pseudoinverse of a matrix, SIAM
J. Numer, Anal. 2, 205 224 (1965).
7. G. H. Golub, F. T. Luk and M. L Overton, A block Lanczos method to compute the singular
values and the corresponding singular vectors of a matrix, ACM Trans. Math. Soft. 7, 149-169
(1981),
8. J. van Heijst, J. Jacobs and J. Scherders~ Kleinste kwadraten problemen, Dept. of Math., Eindhoven
University of Technology (1976).
9. C, Johnson, On finite element methods ,['or optimal control problems. Part II. Ill-posed problems,
Report 7%04 R, Dept. of Computer Sciences, Univ. of Gothenburg (1979).
10. C. C. Paige, Error analysis oJ the Lanczos algorithm for tri-diagonatizing a symmetric matrix,
J. Inst. Math. Applics. 18, 341-349 (1976).
11, C. C. Paige and M. A. Saunders, LSQR: an algorithm for sparse linear equations and sparse
least squares problems, ACM Trans. Math. Soft. 8, 43-71 (1982).
12. B. N. Parlett, A new look at the Lanc:os algorithm for solving symmetric systems of linear
equations, Lin, Alg. Appl. 29, 323-346 (1980).
13. B. N. Parlett and D. S. Scott, The Lanczos algorithm with selective orthogonatization, Math.
Comp. 33, 217-238 (1979).
14. D. P. O'Leary, The block conjugate gradient algorithm and related methods, Lin. Alg. Appl. 29,
293-322 (1980).
15. D. P. O'Leary and J. A. Simmons, A bidiagonalization-regularization procedure Jot large scale
discretizations of ill-posed problems, SiAM J. Sci. Stat. Comput, 2, 474-489 (1981).
16. G. W. Stewart, Research, development and LINPACK. In J. R. Rice (Ed.), Mathematical Soft-
ware III, Academic Press, New York, 1-14 (1977).
17. J. M. Varah, A practical examination of some numerical methods .for linear discrete ill-posed
problems, SIAM Review 21, 100-111 (1979).
18. R. Winther, Some superlinear convergence results for the conjugate gradient method, SIAM J. Num.
Anal. 17, 14-17 (1980).
19. S. Wold, H. Wold, W. J. Dunna and A. Ruhe, The collinearity problem in linear and non-linear
regression. The partial least squares (PLS) approach to generalized inverses, SIAM J. Sci. Stat.
Comput. 5, 735-743 (1984).