0% found this document useful (0 votes)
104 views83 pages

Bvps Book PDF

This document introduces finite difference methods for solving two-point boundary value problems. It discusses: 1) Deriving finite difference equations to approximate the derivatives in the differential equation, resulting in a system of equations. 2) Proving the finite difference system has a unique solution under certain conditions by showing the matrix is strictly diagonally dominant. 3) Defining consistency, stability, and convergence of finite difference methods, and proving the introduced method is second-order accurate, consistent, and stable for small mesh sizes.

Uploaded by

Jangya Vicky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views83 pages

Bvps Book PDF

This document introduces finite difference methods for solving two-point boundary value problems. It discusses: 1) Deriving finite difference equations to approximate the derivatives in the differential equation, resulting in a system of equations. 2) Proving the finite difference system has a unique solution under certain conditions by showing the matrix is strictly diagonally dominant. 3) Defining consistency, stability, and convergence of finite difference methods, and proving the introduced method is second-order accurate, consistent, and stable for small mesh sizes.

Uploaded by

Jangya Vicky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Numerical Methods for Two–Point Boundary Value

Problems
Graeme Fairweather and Ian Gladwell

1 Finite Difference Methods


1.1 Introduction
Consider the second order linear two–point boundary value problem

(1.1) Lu(x) ≡ −u00 + p(x)u0 + q(x)u = f (x), x ∈ I,

(1.2) u(0) = g0 , u(1) = g1 ,


where I = [0, 1]. We assume that the functions p, q and f are smooth on I,
q is positive, and p∗ and q∗ are positive constants such that

(1.3) |p(x)| ≤ p∗ , 0 < q∗ ≤ q(x), x ∈ I.

Let π = {xj }N +1
j=0 denote a uniform partition of the interval I such that
xj = jh, j = 0, 1, . . . , N +1, and (N +1)h = 1. On this partition, the solution
u of (1.1)–(1.2) is approximated by the mesh function {uj }N +1
j=0 defined by
the finite difference equations
uj+1 − 2uj + uj−1 uj+1 − uj−1
Lh uj ≡ −
(1.4) +pj +qj uj = fj , j = 1, . . . , N,
h2 2h
(1.5) u0 = g0 , uN +1 = g1 ,
where
pj = p(xj ), qj = q(xj ), fj = f (xj ).
Equations (1.4) are obtained by replacing the derivatives in (1.1) by basic
centered difference quotients.
We now show that under certain conditions the difference problem (1.4)–
(1.5) has a unique solution {uj }N +1
j=0 , which is second order accurate; that
is,
|u(xj ) − uj | = O(h2 ), j = 1, . . . , N.

1
1.2 The Uniqueness of the Difference Approximation
From (1.4), we obtain
h h
    
2
h Lh uj = − 1 + pj uj−1 + 2 + h2 qj uj − 1 − pj uj+1 = h2 fj , j = 1, . . . , N.
2 2
(1.6)
The totality of difference equations (1.6), subject to (1.5), may be written
in the form
(1.7) Au = b,
where
 
  d1 e1    
u1 f1 c1 g0
 .. .. 

 u2 

 c
 2 . . 

 f
 2
 
  0 


u= ..  
 ,A =  .. .. ..  , b = h2  ..
   
− .. 
,
 .   . . .   .   . 
 uN −1 .. ..  fN −1 0
       
. . eN −1
     
 
uN fN eN g1
cN dN
and, for j = 1, . . . , N,
1 1
   
2
(1.8) cj = − 1 + hpj , d j = 2 + h qj , ej = − 1 − hpj .
2 2
We prove that there is a unique {uj }Nj=1 by showing that the tridiagonal
matrix A is strictly diagonally dominant and hence nonsingular.
Theorem 1.1. If h < 2/p∗ , then the matrix A is strictly diagonally domi-
nant.
Proof — If h < 2/p∗ then
h h
|cj | = 1 + pj , |ej | = 1 − pj ,
2 2
and
|cj | + |ej | = 2 < dj , j = 2, . . . , N − 1.
Also,
|e1 | < d1 , |cN | < dN ,
which completes the proof.

Corollary If p = 0, the matrix A is positive definite, with no restriction


on the mesh spacing h.
Proof — If p = 0, then A is strictly diagonally dominant with no restriction
on the mesh spacing h. In addition, A is symmetric and has positive diagonal
entries and is therefore positive definite.

2
1.3 Consistency, Stability and Convergence
To study the accuracy and the computability of the difference approximation
{uj }N +1
j=0 , we introduce the concepts of consistency, stability and convergence
of finite difference methods. The basic result proved in this section is that,
for a consistent method, stability implies convergence.

Definition 1.1 (Consistency). Let

τj,π [w] ≡ Lh w(xj ) − Lw(xj ), j = 1, . . . , N,

where w is a smooth function on I. Then the difference problem (1.4)–(1.5)


is consistent with the differential problem (1.1)–(1.2) if

|τj,π [w]| → 0 as h → 0.

The quantities τj,π [w], j = 1, . . . , N , are called the local truncation (or local
discretization) errors.

Definition 1.2. The difference problem (1.4)–(1.5) is locally pth – order ac-
curate if, for sufficiently smooth data, there exists a positive constant C,
independent of h, such that

max |τj,π [w]| ≤ Chp .


1≤j≤N

The following lemma demonstrates that the difference problem (1.4)–


(1.5) is consistent with (1.1)–(1.2) and is locally second–order accurate.

Lemma 1.1. If w ∈ C 4 (I), then

h2 (4)
τj,π [w] = − [w (νj ) − 2p(xj )w(3) (θj )],
12
where νj and θj lie in (xj−1 , xj+1 ).

Proof — By definition

w(xj+1 ) − 2w(xj ) + w(xj−1 ) w(xj+1 ) − w(xj−1 )


   
τj,π [w] = − 2
− w00 (xj ) +pj − w0 (xj ) , j = 1, . . . ,
h 2h
(1.9)

3
It is easy to show using Taylor’s theorem that

w(xj+1 ) − w(xj−1 ) h2 (3)


(1.10) − w0 (xj ) = w (θj ), θj ∈ (xj−1 , xj+1 ).
2h 6
Also,

w(xj+1 ) − 2w(xj ) + w(xj−1 ) 00 h2 (4)


(1.11) − w (xj ) = w (νj ), νj ∈ (xj−1 , xj+1 ).
h2 12
The desired result now follows on substituting (1.10) and (1.11) in (1.9).

Definition 1.3 (Stability). The linear difference operator Lh is stable if,


for sufficiently small h, there exists a constant K, independent of h, such
that

|vj | ≤ K{max(|v0 |, |vN +1 |) + max |Lh vi |} j = 0, . . . , N + 1,


1≤i≤N

for any mesh function {vj }N +1


j=0 .

We now prove that, for h sufficiently small, the difference operator Lh


of (1.4) is stable.

Theorem 1.2. If the functions p and q satisfy (1.3), then the difference
operator Lh of (1.4) is stable for h < 2/p∗ , with K = max{1, 1/q∗ }.

Proof — If
|vj∗ | = max |vj |, 1 ≤ j∗ ≤ N,
0≤j≤N +1

then, from (1.6), we obtain

dj∗ vj∗ = −ej∗ vj∗+1 − cj∗ vj∗−1 + h2 Lh vj∗ .

Thus,
dj∗ |vj∗ | ≤ (|ej∗ | + |cj∗ |) |vj∗ | + h2 max |Lh vj |.
1≤j≤N

If h < 2/p∗ , then


dj∗ = |ej∗ | + |cj∗ | + h2 qj∗ ,
and it follows that
h2 qj∗ |vj∗ | ≤ h2 max |Lh vj |,
1≤j≤N

4
or
1
|vj∗ | ≤ max |Lh vj |.
q∗ 1≤j≤N
Thus, if max0≤j≤N +1 |vj | occurs for 1 ≤ j ≤ N then
1
max |vj | ≤ max |Lh vj |,
0≤j≤N +1 q∗ 1≤j≤N
and clearly

(1.12) max |vj | ≤ K{max(|v0 |, |vN +1 |) + max |Lh vj |}


0≤j≤N +1 1≤j≤N

with K = max{1, 1/q∗ }. If max0≤j≤N +1 |vj | = max{|v0 |, |vN +1 |}, then


(1.12) follows immediately.
An immediate consequence of stability is the uniqueness (and hence ex-
istence since the problem is linear) of the difference approximation {uj }N +1
j=0
(which was proved by other means in earlier), for if there were two solutions,
their difference {vj }N +1
j=0 , say, would satisfy

Lh vj = 0, j = 1, . . . , N,
v0 = vN +1 = 0.

Stability then implies that vj = 0, j = 0, 1, . . . , N + 1.


Definition 1.4 (Convergence). Let u be the solution of the boundary value
problem (1.1)–(1.2) and {uj }N +1
j=0 the difference approximation defined by
(1.4)–(1.5). The difference approximation converges to u if

max |uj − u(xj )| → 0,


1≤j≤N

as h → 0. The difference uj − u(xj ) is the global truncation (or discretiza-


tion) error at the point xj , j = 1, . . . , N .

Definition 1.5. The difference approximation {uj }N +1 th


j=0 is a p –order ap-
proximation to the solution u of (1.1)–(1.2) if, for h sufficiently small, there
exists a constant C independent of h, such that

max |uj − u(xj )| ≤ Chp .


0≤j≤N +1

The basic result connecting consistency, stability and convergence is


given in the following theorem.

5
Theorem 1.3. Suppose u ∈ C 4 (I) and h < 2/p∗ . Then the difference
solution {uj }N +1
j=0 of (1.4)–(1.5) is convergent to the solution u of (1.1)–
(1.2). Moreover,
max |uj − u(xj )| ≤ Ch2 .
0≤j≤N +1

Proof — Under the given conditions, the difference problem (1.4)–(1.5) is


consistent with the boundary value problem (1.1)–(1.2) and the operator Lh
is stable.
Since

Lh [uj − u(xj )] = f (xj ) − Lh u(xj ) = Lu(xj ) − Lh u(xj ) = −τj,π [u],

and u0 − u(x0 ) = uN +1 − u(xN +1 ) = 0, the stability of Lh implies that


1
|uj − u(xj )| ≤ max |τj,π [u]|.
q∗ 1≤j≤N
The desired result follows from Lemma 1.1.
It follows from this theorem that {uj }N +1
j=0 is a second–order approxima-
tion to the solution u of (1.1).

1.4 Experimental Determination of the Asymptotic Rate of


Convergence
If a norm of the global error is O(hp ) then an estimate of p can be determined
in the following way. For ease of exposition, we use a different notation in
this section and denote by {uhj } the difference approximation computed with
a mesh length of h. Also let

eh ≡ ku − uh k = max |u(xj ) − uhj |.


j

If eh = O(hp ), then, for h sufficiently small, h < h0 say, there exists a


constant C, independent of h, such that

eh ≈ Chp .

If we solve the difference problem with two different mesh lengths h1 and h2
such that h2 < h1 < h0 , then

eh1 ≈ Chp1

6
and
eh2 ≈ Chp2
from which it follows that

ln eh1 ≈ p ln h1 + ln C

and
ln eh2 ≈ p ln h2 + ln C.
Therefore an estimate of p can be calculated from

(1.13) p ≈ ln(eh1 /eh2 )/ ln (h1 /h2 )

In practice, one usually solves the difference problem for a sequence of values
of h, h0 > h1 > h2 > h3 > . . ., and calculates the ratio on the right hand
side of (1.13) for successive pairs of values of h. These ratios converge to
the value of p as h → 0.

1.5 Higher–Order Finite Difference Approximations


1.5.1 Richardson extrapolation and deferred corrections
Commonly used methods for improving the accuracy of approximate so-
lutions to differential equations are Richardson extrapolation and deferred
corrections. These techniques are applicable if there exists an expansion of
the local truncation error of the form

(1.14) τj,π [u] = h2 τ [u(xj )] + O(h4 ).

As is shown in the next theorem, if (1.14) holds, then the stability of the
difference operator Lh ensures that there exists a function e(x) such that

(1.15) u(xj ) = uj + h2 e(xj ) + O(h4 ), j = 0, 1, . . . , N + 1.

In Richardson extrapolation, the difference problem (1.4)–(1.5) is solved


twice with mesh spacings h and h/2 to yield difference solutions {uhj }N +1
j=0 ,
h/2 2(N +1)
and {uj }j=0 . Then at a point x̂ = jh = (2j)(h/2) common to both
meshes we have, from (1.15),

u(x̂) = uhj + h2 e(x̂) + O(h4 )

and
h/2 h2
u(x̂) = u2j + e(x̂) + O(h4 ),
4

7
from which it follows that
h/2 1 h/2
u(x̂) = u2j + (u2j − uhj ) + O(h4 ).
3
Thus
(1) h/2 1 h/2
uj ≡ u2j + (u2j − uhj )
3
is a fourth–order approximation to u(xj ), j = 0, . . . , N + 1.
In deferred corrections, the difference equations (1.4)–(1.5) are solved in
the usual way to obtain {uj }. Then a fourth–order difference approximation
{ûj }N +1
j=0 to u(x) is computed by solving difference equations which are a
perturbation of (1.4)–(1.5) expressed in terms of {uj }. Suitable definitions
of {ûj }N +1
j=0 are discussed once we derive (1.15).

Theorem 1.4. Suppose u ∈ C 6 (I), and h < 2/p∗ . Then (1.14) and (1.15)
hold with e(x) defined as the solution of the boundary value problem

Le(x) = τ [u(x)], x ∈ I,
(1.16)
e(0) = e(1) = 0,

where
1 (4)
(1.17) τ [u(x)] = − [u (x) − 2p(x)u(3) (x)].
12
Proof — Since u ∈ C 6 (I), it is easy to show by extending the argument
used in Lemma 1.1 that (1.14) holds with τ [u(x)] defined by (1.17).
As in the proof of Theorem 1.3, we have

(1.18) Lh [u(xj ) − uj ] = τj,π [u], j = 1, . . . , N.

With (1.14) in (1.18) and using (1.16), we obtain

(1.19) Lh [u(xj ) − uj ] = h2 τ [u(xj )] + O(h4 )


= h2 Le(xj ) + O(h4 )
= h2 [Le(xj ) − Lh e(xj )] + h2 Lh e(xj ) + O(h4 )
= h2 τj,π [e] + h2 Lh e(xj ) + O(h4 ).

From the smoothness properties of the functions p, q, and τ , it follows that


the solution e(x) of (1.16) is unique. Moreover, since τ ∈ C 2 (I), e(x) ∈
C 4 (I) and τj,π [e] = O(h2 ). Using this result in (1.19) and rearranging, we
have
Lh [u(xj ) − {uj + h2 e(xj )}] = O(h4 ).

8
The desired result, (1.15), follows from the stability of Lh .
Deferred corrections is defined in the following way. Suppose τ̂j,π [ · ] is a
difference operator such that

(1.20) |τ [u(xj )] − τ̂j,π [uh ]| = O(h2 ),

where uh denotes the solution {uj }N +1


j=0 of (1.4)–(1.5) We define the mesh
function {ûj }N +1
j=0 by

Lh ûj = fj + h2 τ̂j,π [uh ], j = 1, . . . , N,


û0 = g0 , ûN +1 = g1 .

Then it is easy to show that {ûj }N +1


j=0 is a fourth-order approximation to
u(x), since

Lh [ûj − u(xj )] = fj + h2 τ̂j,π [uh ] − Lh u(xj )


= [Lu(xj ) − Lh u(xj )] + h2 τ̂j,π [uh ]
= −τj,π [u] + h2 τ̂j,π [uh ]
= −h2 {τ [u(xj )] − τ̂j,π [uh ]} + O(h4 ).

Using (1.20) and the stability of Lh , it follows that

|ûj − u(xj )| = O(h4 ).

The main problem in deferred corrections is the construction of second-


order difference approximations τ̂j,π [uh ] to the truncation error term τ [u(xj )].
One way is to replace the derivatives appearing in τ [u(x)] by standard differ-
ence approximations using the finite difference solution {uj }N +1
j=0 in place of
the exact solution u(x). One problem with this approach is that it requires
some modification near the end–points of the interval, or it is necessary to
compute the numerical solution outside the interval I. A second approach
is to use the differential equation to express τ [u(x)] in the form

(1.21) τ [u(x)] = C2 (x)u0 (x) + C1 (x)u(x) + C0 (x),

where the functions C0 , C1 , C2 are expressible in terms of the functions p, q, f


and their derivatives. Then choose
uj+1 − uj−1
τ̂j,π [uh ] = C2 (xj ) + C1 (xj )uj + C0 (xj ).
2h
Since
u00 = pu0 + qu − f,

9
we have

(1.22) u(3) = pu00 + p0 u0 + qu0 + q 0 u − f 0


= p(pu0 + qu − f ) + (p0 + q)u0 + q 0 u − f 0
= (p2 + p0 + q)u0 + (pq + q 0 )u − (pf + f 0 )
≡ P u0 + Qu − F.

Similarly,

(1.23) u(4) = (P p + P 0 + Q)u0 + (P q + Q0 )u − (P f + F 0 ).

When (1.22) and (1.23) are substituted into (1.17), we obtain the desired
form (1.21).
In deferred corrections, the linear algebraic systems defining the basic
difference approximation and the fourth-order approximation have the same
coefficient matrix, which simplifies the algebraic problem.
If the solution u of the boundary value problem (1.1)–(1.2) is sufficiently
smooth, then it can be shown that the local truncation error τj,π [u] has an
asymptotic expansion of the form
m
X
τj,π [u] = h2ν τν [u(xj )] + O(h2m+2 ),
ν=1

and, if m > 1, deferred corrections can be extended to compute approxima-


tions of higher order than 4.

1.5.2 Numerov’s method


A second higher-order finite difference method is easily constructed in the
case p = 0, when (1.1)–(1.2) becomes

Lu(x) = −u00 (x) + q(x)u(x) = f (x), x ∈ I,


u(0) = g0 , u(1) = g1 .

Now

(1.24) ∆h u(xj ) ≡ [u(xj+1 ) − 2u(xj ) + u(xj−1 )]/h2


1
= u00 (xj ) + h2 u(4) (xj ) + O(h4 ).
12
From the differential equation, we have

u(4) (x) = [q(x)u(x) − f (x)]00

10
and therefore
(1.25) u(4) (xj ) = ∆h [q(xj )u(xj ) − f (xj )] + O(h2 ).
Thus, from (1.24) and (1.25),
1 2
u00 (xj ) = ∆h u(xj ) − h ∆h [q(xj )u(xj ) − f (xj )] + O(h4 ).
12
We define {ũj }N +1
j=0 by

1 1
   
Lh ũj ≡ −∆h ũj + 1 + h2 ∆h qj ũj = 1 + h2 ∆h f (xj ), j = 1, . . . , N,
12 12
(1.26)
ũ0 = g0 , ũN +1 = gN +1 ,
which is commonly known as Numerov’s method. Equations (1.26) are sym-
metric tridiagonal and may be written in the form
1
c̃j uj−1 + d˜j uj + ẽj uj+1 = h2 [fj+1 + 10fj + fj−1 ], j = 1, . . . , N,
12
where
1 5 1
   
c̃j = − 1 − h2 qj−1 , d˜j = 2 + h2 qj , ẽj = − 1 − h2 qj+1 .
12 6 12
It is easy to show that, for h sufficiently small, the coefficient matrix of this
system is strictly diagonally dominant and hence {ũj }N +1
j=0 is unique. From
a similar analysis, it follows that the difference operator Lh is stable. Also,
since
1 2
 
Lh [ũj − u(xj )] = 1 + h ∆h f (xj ) − Lh u(xj ) = O(h4 ),
12
the stability of Lh implies that
|ũj − u(xj )| = O(h4 ).

1.6 Second–Order Nonlinear Equations


In this section, we discuss finite difference methods for the solution of the
second–order nonlinear two-point boundary value problem
Lu(x) ≡ −u00 + f (x, u) = 0, x ∈ I,
(1.27)
u(0) = g0 , u(1) = g1 .

11
The basic second–order finite difference approximation to (1.27) takes the
form

Lh uj ≡ −∆h uj + f (xj , uj ) = 0, j = 1, . . . , N.
(1.28)
u0 = g0 , uN +1 = g1 .

If
τj,π [u] ≡ Lh u(xj ) − Lu(xj )
then clearly
h2 (4)
τj,π [u] = − u (ξj ), ξj ∈ (xj−1 , xj+1 ),
12
if u ∈ C 4 (I). Stability of the nonlinear difference problem is defined in the
following way.
Definition 1.6. A difference problem defined by the nonlinear difference
operator Lh is stable if, for sufficiently small h, there exists a positive
constant K, independent of h, such that, for all mesh functions {vj }Nj=0
+1
N +1
and {wj }j=0 ,
n o
|vj − wj | ≤ K max(|v0 − w0 |, |vN +1 − wN +1 |) + max |Lh vj − Lh wj | .
1≤i≤N

Notice that when Lh is linear, this definition reduces to the definition of


Section 1.3 applied to the mesh function {vj − wj }N +1
j=0 .

∂f
Theorem 1.5. If fu ≡ is continuous on I × (−∞, ∞) such that
∂u
0 < q∗ ≤ fu ,

then the difference problem (1.28) is stable, with K = max(1, 1/q∗ )

Proof — For mesh functions {vj } and {wj }, we have

Lh vj − Lh wj = −∆h (vj − wj ) + f (xj , vj ) − f (xj , wj )


= −∆h (vj − wj ) + fu (xj , vˆj )(vj − wj ),

on using the mean value theorem, where vˆj lies between vj and wj . Then

h2 [Lh vj − Lh wj ] = cj [vj−1 − wj−1 ] + dj [vj − wj ]


+ ej [vj+1 − wj+1 ]

12
where cj = ej = −1 and

dj = 2 + h2 fu (xj , vˆj ).

Clearly
|cj | + |ej | = 2 < |dj |.
The remainder of the proof is similar to that of Theorem 1.2.
An immediate consequence of the stability of Lh is the following:

Corollary 1.1. If the difference approximation {uj } defined by (1.28) exists


then it is unique.

For a proof of the existence of the solution, see Keller (1968).

Corollary 1.2. If u is the solution of (1.27) and {uj }N +1


j=0 the solution of
(1.28) then, under the assumptions of Theorem 1.5,

h2
|uj − u(xj )| ≤ K max |u(4) (x)|, j = 0, . . . , N + 1,
12 x∈I

provided u ∈ C 4 (I).

In order to determine the difference approximation {uj }N +1


j=0 , we must
solve the system of nonlinear equations (1.28), which may be written as

(1.29) Ju + h2 f (u) = g,

where
 
2 −1
     
u1 f (x1 , u1 ) g0
 −1 2 −1 
  
 u2 


 f (x2 , u2 ) 


 0 

 .. .. ..   ..   ..   .. 
J =
 . . . ,

u=
 .
,
 f (u) = 
 .
,
 g=
 .
.

.. ..
 
 uN −1  f (xN −1 , uN −1 )  0
     
. . −1 
    

−1 2 uN f (xN , uN ) g1
(1.30)
This system is usually solved using Newton’s method (or a variant of it),
which is described in Section 1.8.

13
1.7 Numerov’s Method for Second Order Nonlinear Equa-
tions
Numerov’s method for the solution of (1.27) may be derived in the following
way. If u is the solution of (1.27) and f is sufficiently smooth then

d2
u(4) (xj ) = f (x, u)|x=xj
dx2
= ∆h f (xj , u(xj )) + O(h2 ).

Since
1 2 (4)
∆h u(xj ) = u00 (xj ) + h u (xj ) + O(h4 ),
12
we have
1 2
∆h u(xj ) = f (xj , u(xj )) + h ∆h f (xj , u(xj )) + O(h4 ).
12
Based on this equation, Numerov’s method is defined by the nonlinear equa-
tions
1 2
−∆h uj + [1 + h ∆h ]f (xj , uj ) = 0, j = 1, . . . , N
12
(1.31)
u 0 = g0 , uN +1 = g1 ,

which may be written in the form

(1.32) Ju + h2 Bf (u) = g,

where J and u are as in (1.30), and


 
10 1 1 2
 
g0 − 12 h f (0, g0 )
 1 10 1 
  
 0 

1  .. .. ..   .. 
(1.33) B =

 . . . , g = 

.
.
12 

.. ..
  
0
 
. . 1
  
 
1 2
1 10 g1 − 12 h f (1, g1 )

Again Newton’s method is used to solve the nonlinear system (1.32).


Stepleman (1976) formulated and analyzed a fourth-order difference method
for the solution of the equation

−u00 + f (x, u, u0 ) = 0, x ∈ I,

14
subject to general linear boundary conditions. When applied to the bound-
ary value problem (1.27), this method reduces to Numerov’s method. Higher-
order finite difference approximations to nonlinear equations have also been
derived by Doedel (1979) using techniques different from those employed by
Stepleman.

1.8 Newton’s Method for Systems of Nonlinear Equations


In the case of a single equation, Newton’s method consists in linearizing the
given equation
φ(u) = 0
by approximating ϕ(u) by

φ(u(0) ) + φ0 (u(0) )(u − u(0) ),

where u(0) is an approximation to the actual solution, and solving the lin-
earized equation
φ(u(0) ) + φ0 (u(0) )∆u = 0.
The value u(1) = u(0) + ∆u is then accepted as a better approximation and
the process is continued if necessary.
Consider now the N equations

(1.34) φi (u1 , u2 , . . . , uN ) = 0, i = 1, . . . , N,

for the unknowns u1 , u2 , . . . , uN , which we may write in vector form as

Φ(u) = 0.

If we linearize the ith equation, we obtain


N
(0) (0) (0) X ∂φi (0) (0)
φi (u1 , u2 , . . . , uN ) +
(1.35) (u1 , . . . , uN )∆uj = 0, i = 1, . . . , N.
j=1
∂uj

∂φi
If J (u) denotes the matrix with (i, j) element (u), then (1.35) can be
∂uj
written in the form    
J u(0) ∆u = −Φ u(0) .

If J (u(0) ) is nonsingular, then


h i−1
∆u = − J (u(0) ) Φ(u(0) ),

15
and
u(1) = u(0) + ∆u
is taken as the new approximation. If the matrices J (u(ν) ), ν = 1, 2, . . . ,
are nonsingular, one hopes to determine a sequence of successively better
approximations u(ν) , ν = 1, 2, . . . from the algorithm

u(ν+1) = u(ν) + ∆u(ν) , ν = 0, 1, 2, . . . ,

where ∆u(ν) is obtained by solving the system of linear equations

J (u(ν) )∆u(ν) = −Φ(u(ν) ).

This procedure is known as Newton’s method for the solution of the system
of nonlinear equations (1.34). It can be shown that as in the scalar case
this procedure converges quadratically if u(0) is chosen sufficiently close to
u, the solution of (1.34).
Now consider the system of equations

φi (u1 , . . . , uN ) = −ui−1 + 2ui − ui+1


h2
+ [f (xi−1 , ui−1 ) + 10f (xi , ui ) + f (xi+1 , ui+1 )],
12
i = 1, . . . , N,

arising in Numerov’s method (1.31). In this case, the (i, j) element of the
Jacobian J is given by

2 + 56 h2 fu (xi , ui ),


 if j = i,

∂φi 
1 2
= −1 + 12 h fu (xj , uj ), if |i − j| = 1,
∂uj 


0, if |i − j| > 1.

Thus
J (u) = J + h2 BF (u)
where
F (u) = diag(fu (xi , ui )).
In this case, Newton’s method becomes
h i h i
J + h2 BF (u(ν) ) ∆u(ν) = − Ju(ν) + h2 Bf (u(ν) ) − g ,
(1.36)
u(ν+1) = u(ν) + ∆u(ν) , ν = 0, 1, 2, . . . .

16
For the system (1.28),

φi (u1 , . . . , uN ) ≡ −ui−1 + 2ui − ui+1 + h2 f (xi , ui ) = 0, i = 1, . . . , N,

and Newton’s method is given by (1.36) with B = I and g = (g0 , 0, . . . , 0, g1 )T .


Note that in each case the Jacobian has the same structure and properties
as the matrix arising in the linear case.
It is customary to stop the iteration (1.36) when
(ν) (ν)
max |∆ui | ≤ XTOL(1 + max |ui |)
1≤i≤N 1≤i≤N

and
max |φi (u(ν+1) )| ≤ FTOL,
1≤i≤N

where XTOL and FTOL are prescribed tolerances.

17
2 Algorithms for Solving Tridiagonal Linear Sys-
tems
2.1 General Tridiagonal Systems
The algorithm used to solve a general tridiagonal system of the form

(2.1) Au = b,

where  
d1 e1
 c d2 e2 
 2 
 .. .. .. 
 . . . 
(2.2) A= ,
 
.. .. ..

 . . . 

 
 cN −1 dN −1 eN −1 
cN dN
is a version of Gaussian elimination with partial pivoting. In the decompo-
sition step of the algorithm, the matrix A is factored in the form

(2.3) A = P LR,

where P is a permutation matrix recording the pivotal strategy, and L is unit


lower bidiagonal and contains on its subdiagonal the multipliers used in the
elimination process. If no row interchanges are performed, then R is upper
bidiagonal, but, in general, this matrix may also have nonzero elements on
its second superdiagonal. Thus fill–in may occur in this process; that is, the
factored form of A may require more storage than A, in addition to a vector
to record the pivotal strategy.
Using (2.3), the system (2.1) is equivalent to the systems

P Lv = b.
Ru = v.

which can be easily solved for v and the desired solution u. Software, sgtsv,
based on this algorithm is available in LAPACK [4].

2.2 Diagonally Dominant Systems


Finite difference methods discussed in section 1 give rise to tridiagonal linear
systems with certain properties which can be exploited to develop stable

18
algorithms in which no pivoting is required. In this section, we consider the
case in which the elements of the coefficient matrix A satisfy
(i) |d1 | > |e1 |,
(2.4) (ii) |di | ≥ |ci | + |ei |, i = 2, . . . , N − 1,
(iii) |dN | > |cN |,
and
(2.5) ci ei−1 6= 0, i = 2, . . . , N ;
that is, A is irreducibly diagonally dominant and hence nonsingular. We
show that, without pivoting, such a matrix can be factored in a stable fashion
in the form
(2.6) A = L̂R̂,
where L̂ is lower bidiagonal and R̂ is unit upper bidiagonal. The matrices
L̂ and R̂ are given by
   
α1 1 γ1
 c
 2 α2 1 γ2
  
  
(2.7) L̂ =  · ·  , R̂ =  · · ,
   
· ·
   
   1 γN −1 
cN αN 1
where
(i) α1 = d1 ,
(ii) γi = ei /αi , αi+1 = di+1 − ci+1 γi , i = 1, . . . , N − 1.
Since
N
Y
det A = det L̂ · det R̂ = αi · 1
i=1
and A is nonsingular, it follows that none of the αi vanishes; hence the
recursion in (2.8) is well–defined.
We now show that conditions (2.4) ensure that the quantities αi and γi
are bounded and that the bounds are independent of the order of the matrix
A.

Theorem 2.1. If the elements of A satisfy conditions (2.4), then


(2.8) |γi | < 1,
and
(2.9) 0 < |di | − |ci | < |αi | < |di | + |ci |, i = 2, . . . , N.

19
Proof — Since γi = ei /αi = ei /di ,

|γ1 | = |e1 |/|d1 | < 1,

from (2.4(i)). Now suppose that

(2.10) |γj | < 1, j = 1, . . . , i − 1.

Then, with (2.8(iii)) in (2.8(ii)), we have

γi = ei /(di − ci γi−1 )

and
|γi | = |ei |/(|di | − |ci ||γi−1 |) < |ei |/(|di | − |ci |)
from (2.10). Thus, using (2.4(ii)), it follows that |γi | < 1, and (2.8) follows
by induction.
From (2.8(ii)), (2.8) and (2.4(ii)), it is easy to show that (2.9) holds.

The system (2.1) with A given by (2.6) is equivalent to

(2.11) L̂v = b, R̂u = v,

from which the desired solution u is obtained in the following fashion:

v1 = b1 /α1
For i = 2 to N do:
vi = (bi − ci vi−1 )/αi
end
(2.12)
uN = vN
For i = 1 to N − 1 do:
uN −i = vN −i − γN −i uN −i+1
end

The nonzero elements of L̂ and R̂ overwrite the corresponding elements of


A, and no additional storage is required.

20
2.3 Positive Definite Systems
In section 1, we saw that if in equation (1.1) p = 0, then in addition to
possessing properties (2.4)–(2.5), the matrix A is also symmetric (so that
ci = ei−1 , i = 1, . . . , N − 1) and its diagonal elements, di , i = 1, . . . , N , are
positive. In this case, the matrix A is positive definite.
It is well known that when A is positive definite a stable decomposi-
tion is obtained using Gaussian elimination without pivoting. The LDLT
decomposition of A can easily be obtained from the recursion (2.8). If
 
1
 l 1 
 1 
· · ,
 

· 1
 
 
lN −1 1

and D = diag(δ1 , . . . , δN ), then by identifying li with γi , and δi with αi , and


setting ci = ei−1 , we obtain from (2.8)

δ1 = d1
For i = 1 to N − 1 do:
(2.13) li = ei /δi
δi+1 = di+1 − li ei
end

Since A is positive definite, D is also positive definite1 , and hence the diag-
onal elements of D are positive. Thus the recursions (2.13) are well defined.
Using the LDLT decomposition of A, the solution of the system (2.1) is
determined by first solving
Lv = b,
which gives

v1 = b1
vi+1 = bi+1 − li vi , i = 1, . . . , N − 1;

then
DLT u = v
1
For u = LT v 6= 0, uT Du = vT LDLT v = vT Av > 0, v 6= 0 since L nonsingular.

21
yields
uN = vN /δN
and, for i = 1, . . . , N − 1,
vN −i
uN −i = − lN −i uN −i+1
δN −i
= (vN −i − lN −i δN −i uN −i+1 )/δN −i
= (vN −i − eN −i uN −i+1 )/δN −i ,

where in the last step we have used from (2.13) the fact that lN −i δN −i =
eN −i . Thus,

v1 = b1
For i = 1 to N − 1 do:
vi+1 = bi+1 − li vi
end
(2.14)
uN = vN /δN
For i = 1 to N − 1 do:
uN −i = (vN −i − eN −i uN −i+1 )/δN −i
end
The code sptsv in LAPACK is based on (2.13)–(2.14)

22
3 The Finite Element Galerkin Method
3.1 Introduction
Consider the two–point boundary value problem

Lu ≡ −u00 + p(x)u0 + q(x)u = f (x), x ∈ I.


(3.1)
u(0) = u(1) = 0,

where the functions p, q and f are smooth on I, and (1.3) holds. Let H01 (I)
denote the space of all piecewise continuously differentiable functions on I
which vanish at 0 and 1. If v ∈ H01 (I), then

−u00 v + pu0 v + quv = f v,

and Z 1 Z 1
00 0
[−u v + pu v + quv]dx = f vdx.
0 0
On integrating by parts, we obtain
Z 1 Z 1 Z 1 Z 1
0 0 0 0
u v dx − [u v]|10 + pu vdx + quvdx = f vdx.
0 0 0 0

Since v(0) = v(1) = 0, we have

(3.2) (u0 , v 0 ) + (pu0 , v) + (qu, v) = (f, v), v ∈ H01 (I),

where Z 1
(ϕ, ψ) = ϕ(x)ψ(x)dx.
0
Equation (3.2) is called the weak form of the boundary value problem (3.1),
and is written in the form

(3.3) a(u, v) = (f, v), v ∈ H01 (I),

where
a(φ, ψ) = (φ0 , ψ 0 ) + (pφ0 , ψ) + (qφ, ψ), φ, ψ ∈ H01 (I).
Suppose that Sh is a finite dimensional subspace of H01 (I). The finite ele-
ment Galerkin method consists in finding the element uh ∈ Sh , the Galerkin
approximation to the solution u of (3.1), satisfying

(3.4) a(uh , v) = (f, v), v ∈ Sh .

23
Suppose {w1 , . . . , ws } is a basis for Sh , and let
s
X
(3.5) uh (x) = αj wj (x).
j=1

Then, on substituting (3.5) in (3.4) with v = wi , we have


s s s
αj wj0 , wi0 ) + (p αj wj0 , wi ) + (q
X X X
( αj wj , wi ) = (f, wi ), i = 1, . . . , s,
j=1 j=1 j=1

from which it follows that


s
[(wj0 , wi0 ) + (pwj0 , wi ) + (qwj , wi )]αj = (f, wi ),
X
i = 1, . . . , s;
j=1

that is,
(3.6) Aα = f ,
where the (i, j) element of the matrix A is a(wj , wi ), and the vectors α and
f are given by

α = [α1 , . . . , αs ]T , f = [(f, w1 ), . . . , (f, ws )]T .

The system of equations (3.6) is often referred to as the Galerkin equations.

3.2 Spaces of Piecewise Polynomial Functions


The choice of the subspace Sh is a key element in the success of the Galerkin
method. It is essential that Sh be chosen so that the Galerkin approximation
can be computed efficiently. Secondly, Sh should possess good approxima-
tion properties as the accuracy of the Galerkin approximation uh depends
on how well u can be approximated by elements of Sh . The subspace Sh is
usually chosen to be a space of piecewise polynomial functions. To define
such spaces, let Pr denote the set of polynomials of degree ≤ r, let

(3.7) π : 0 = x0 < x1 < . . . < xN < xN +1 = 1

denote a partition of I, and set

Ij = [xj−1 , xj ], j = 1, . . . , N + 1,

hj = xj − xj−1 and h = maxj hj . We define

Mrk (π) = {v ∈ C k (I) : v|Ij ∈ Pr , j = 1, . . . , N + 1},

24
where C k (I) denotes the space of functions which are k times continuously
differentiable on I, 0 ≤ k < r, and v|Ij denotes the restriction of the function
v to the interval Ij . We denote by Mr,0k (π) the space

Mrk (π) ∩ {v|v(0) = v(1) = 0}.

It is easy to see that Mrk (π) and Mr,0


k (π) are linear spaces of dimensions
N (r − k) + r + 1 and N (r − k) + r − 1, respectively. These spaces have the
following approximation properties.
Theorem 3.1. For any u ∈ Wpj (I), there exists a ū ∈ Mrk (π) and a constant
C independent of h and u such that
(3.8) k(u − ū)(`) kLp (I) ≤ Chj−` ku(j) kLp (I) ,
for all integers ` and j such that 0 ≤ ` ≤ k + 1, ` ≤ j ≤ r + 1. If
u ∈ Wpj (I) ∩ {v|v(0) = v(1) = 0} then there exists a ū ∈ Mr,0
k (π) such that
(3.8) holds.
Commonly used spaces are the spaces of piecewise Hermite polynomials,
M2m−1 2m−1
m−1 (π), m ≥ 1. A convenient basis for Mm−1 (π) is

(3.9) {Si,k (x; m; π)}N +1, m−1


i=0, k=0 ,

where
Dl Si,k (xj ; m; π) = δij δlk , 0 ≤ l ≤ m − 1, 0 ≤ j ≤ N + 1,
and δmn denotes the Kronecker delta. It is easy to see that Si,k (x; m; π) is
2m−1,0
zero outside [xi−1 , xi+1 ]. A basis for Mm−1 (π) is obtained by omitting
from (3.9) the functions Si,0 (x; m; π), i = 0, N + 1, which are nonzero at
x = 0 and x = 1.

Example 1. M10 (π): the space of piecewise linear functions. The


dimension of this space is N + 2, and, if `i (x) = (x − xi )/hi+1 , then the basis
functions Si,0 (x; 1; π), i = 0, . . . , N + 1, are the piecewise linear functions
defined by
1 − `0 (x), x ∈ I1 ,
(
S0,0 (x; 1; π) =
0, otherwise,
x ∈ Ii ,


 `i−1 (x),

Si,0 (x; 1; π) = 1 − `i (x), x ∈ Ii+1 ,


0, otherwise, 1 ≤ i ≤ N,

25
`N (x), x ∈ IN +1 ,
(
SN +1,0 (x; 1; π) =
0, otherwise.
Thus there is one basis function associated with each node of the partition
π. For convenience, we set
(3.10) wi (x) = Si,0 (x; 1; π), i = 0, . . . , N + 1.
Note that if uh (x) = N +1
P
j=0 αj wj (x) then αj = uh (xj ), j = 0, . . . , N + 1,
since wj (xi ) = δij . A basis for M1,0
0 (π) is obtained by omitting from (3.10)
the functions w0 (x) and wN +1 (x).

Example 2. M31 (π): the space of piecewise Hermite cubic func-


tions. This space has dimension 2N + 4. If
g1 (x) = −2x3 + 3x2 ,
and
g2 (x) = x3 − x2 ,
the basis functions Si,0 (x; 2; π), i = 0, . . . , N + 1, are the piecewise cubic
functions defined by
g1 (1 − `0 (x)), x ∈ I1 ,
(
S0,0 (x; 2; π) =
0, otherwise,
x ∈ Ii ,


 g1 (`i−1 (x)),

Si,0 (x; 2; π) = g1 (1 − `i (x)), x ∈ Ii+1 ,


0, otherwise, 1 ≤ i ≤ N,

g1 (`N (x)), x ∈ IN +1 ,
(
SN +1,0 (x; 2; π) =
0, otherwise,
and the functions Si,1 (x; 2; π), i = 0, . . . , N + 1, are the piecewise cubic
functions
−h1 g2 (1 − `0 (x)), x ∈ I1 ,
(
S0,1 (x; 2; π) =
0, otherwise,
x ∈ Ii ,


 hi g2 (`i−1 (x)),

Si,1 (x; 2; π) = −hi+1 g2 (1 − `i (x)), x ∈ Ii+1 ,


0, otherwise, 1 ≤ i ≤ N,

hN +1 g2 (`N (x)), x ∈ IN +1 ,
(
SN +1,1 (x; 2; π) =
0, otherwise.

26
For notational convenience, we write

(3.11) vi (x) = Si,0 (x; 2; π), si (x) = Si,1 (x; 2; π), i = 0, . . . , N + 1.

The functions vi and si are known as the value function and the slope func-
tion, respectively, associated with the point xi ∈ π. Note that if
N
X +1
uh (x) = {αj vj (x) + βj sj (x)},
j=0

then
αj = uh (xj ), βj = u0h (xj ), j = 0, . . . , N + 1,
since

vj (xi ) = δij , vj0 (xi ) = 0, sj (xi ) = 0, s0j (xi ) = δij , i, j = 0, . . . N + 1.

A basis for M3,0


1 (π) is obtained by omitting from (3.11) the functions v0 (x)
and vN +1 (x).

3.3 The Algebraic Problem


First we examine the structure of the coefficient matrix A of the Galerkin
equations (3.6) in the case in which Sh = M01,0 (π). With the basis functions
given by (3.10), it is clear that

(wi0 , wj0 ) + (pwi , wj0 ) + (qwi , wj ) = 0 if |i − j| > 1,

since wi (x) = 0, x ∈
/(xi−1 , xi+1 ). Thus A is tridiagonal. Since the tridiagonal
matrices
(3.12) A = ((wi0 , wj0 )), B = ((wi , wj )),
corresponding to p = 0, q = 1, occur quite often in practice, their elements
are given in Appendix A. If the partition π is uniform and

2 −1
 
 −1 2 −1 
 
 .. .. .. 

 . . . 

−2 
(3.13) J =h  −1 2 −1 ,

.. .. ..
 
. . .
 
 
 
 −1 2 −1 
−1 2

27
then it is easy to see from Appendix A that
!
h2 h
(3.14) A = hJ, B=h I− J = tridiag(1 4 1).
6 6
Now consider the case in which Sh = M31 (π). It is convenient to consider
the basis {wi }2N +3
i=0 , where

(3.15) w2i = vi , w2i+1 = si , i = 0, . . . , N + 1.


For this ordering of the basis functions (3.11), it follows that, since vi and
si vanish outside [xi−1 , xi+1 ], 1 ≤ i ≤ N , the quantities (wk0 , wl0 ), (pwk , wl0 )
and (qwk , wl ) are nonzero only if wk and wl are basis functions associated
with the same node or adjacent nodes of the partition π. Consequently
a(wk , wl ) = 0 if |k − l| > 3.
Thus the matrix A whose (i, j) element is a(wi , wj ) is a band matrix of
bandwidth seven. More precisely, in the 2ith and (2i + 1)th rows of A, only
the elements in columns 2i + j − 2, j = 0, 1, . . . , 5, can be nonzero. Thus we
can partition the (2N + 4) × (2N + 4) matrix A in the form
 
A00 A01
 
 A A11 A12 
 10 
 

 ··· ··· ··· 

(3.16) A= ;
··· ··· ···
 
 
 
··· ··· AN,N +1
 
 
 
AN +1,N AN +1,N +1
that is, A is block tridiagonal, the submatrices Aii and Ai,i+1 being 2 × 2.
The matrices A = ((wi0 , wj0 )), B = ((wi , wj )) are given in Appendix B for the
case in which the partition π is uniform. If Sh = M3,0 1 (π), the corresponding
(2N +2)×(2N +2) matrix A is obtained from (3.16) by omitting the first and
the (2N + 3)th rows and columns. It is easy to see that if Sh = M2m−1 m−1 (π)
and the basis given by (3.9) is chosen, then A is block tridiagonal with m×m
diagonal blocks.

3.4 Treatment of Inhomogeneous Dirichlet Boundary Con-


ditions and Flux Boundary Conditions
The Galerkin method can be easily modified to treat boundary conditions
other than the homogeneous Dirichlet conditions. Consider, for example,

28
the inhomogeneous Dirichlet boundary conditions

(3.17) u(0) = g0 , u(1) = g1 , g0 g1 6= 0,

and choose Sh = M3,0


1 (π). If we set

w =u−g

where
g(x) = g0 v0 (x) + g1 vN +1 (x)
then
w(0) = w(1) = 0,
and, from (3.2),

(p(w + g)0 , v 0 ) + (q(w + g), v) = (f, v), v ∈ H01 (I).

The Galerkin approximation U ∈ M31 (π) is defined uniquely by

(pU 0 , v 0 ) + (qU, v) = (f, v), v ∈ M13,0 (π),

and the condition that if


N
X +1
U (x) = {αi vi (x) + βi si (x)}
i=0

then α0 = g0 and αN +1 = g1 .
With the basis functions ordered as in (3.15), the coefficient matrix of the
Galerkin equations in this case is the matrix A of (3.16) with the elements
of rows 1 and 2N + 3 replaced by

a1j = δ1j , j = 1, . . . , 2N + 4

and
a2N +3,j = δ2N +3,j , j = 1, . . . , 2N + 4,
respectively, and the right hand side vector has as its first and (2N + 3)th
components, g0 and g1 , respectively.
In the case in which the general linear boundary conditions

(3.18) λ0 u(0) − p(0)u0 (0) = ν0 , λ1 u(1) + p(1)u0 (1) = ν1 ,

are prescribed then as in derivation of (3.2),we have,

(3.19) (pu0 , v 0 ) − [pu0 v]|10 + (qu, v) = (f, v),

29
for v ∈ H 1 (I). Using (3.18) in (3.19), we obtain

(pu0 , v 0 ) + λ1 u(1)v(1) + λ0 u(0)v(0) + (qu, v)


= (f, v) + ν0 v(0) + ν1 v(1), v ∈ H 1 (I).

If Sh is now a subspace of H 1 (I), then the Galerkin approximation U ∈ Sh


is defined by

(pU 0 , v 0 )+λ1 U (1)v(1)+λ0 U (0)v(0)+(qU, v) = (f, v)+ν0 v(0)+ν1 v(1), v ∈ Sh .


(3.20)
Thus, if Sh = M31 (π) and the basis functions are again ordered as in (3.15),
the matrix of coefficents of the Galerkin equations is obtained from (3.16)
by adding λ0 and λ1 to the (1, 1) element and the (2N + 3, 2N + 3) element,
respectively. The term ν0 v(0)+ν1 v(1) on the right hand side of (3.20) merely
adds ν0 to the first component and ν1 to the (2N + 3)–th component of the
right hand side vector.

3.5 The Uniqueness of the Galerkin Approximation


We now show that there exists a unique solution uh ∈ Sh of (3.4) for h
sufficiently small. Since the problem under consideration is linear, it is
sufficient to show uniqueness.
Let U1 and U2 be two solutions of (3.4) and set Y = U1 − U2 . Then

(3.21) a(Y, v) = 0, v ∈ Sh .

Let ψ ∈ H01 (I) be the solution of

L∗ ψ = Y (x), x ∈ I,
ψ(0) = ψ(1) = 0,

where L∗ denotes the formal adjoint of L,

L∗ u = −u00 − (pu)0 + qu.

From the hypotheses on p and q, it follows that ψ ∈ H 2 (I) ∩ H01 (I) and

(3.22) kψkH 2 ≤ CkY kL2 .

Thus, using integration by parts and (3.21), we have

kY k2L2 = (L∗ ψ, Y ) = a(Y, ψ) = a(Y, ψ − χ),

30
where χ ∈ Sh , and since
(3.23) a(φ, ψ) ≤ CkφkH 1 kψkH 1 ,
for φ, ψ ∈ H 1 (I), it follows that
kY k2L2 ≤ CkY kH 1 kψ − χkH 1 .
From Theorem 3.1, we can choose χ ∈ Sh such that
kψ − χkH 1 ≤ ChkψkH 2 .
Thus
kY k2L2 ≤ ChkY kH 1 kψkH 2 ≤ ChkY kH 1 kY kL2 ,
where in the last inequality we have used (3.22), and we obtain
(3.24) kY kL2 ≤ ChkY kH 1 .
Since Y ∈ Sh ,
a(Y, Y ) = 0,
from which it follows that
kY 0 k2L2 = −(pY 0 , Y ) − (qY, Y ) ≤ C{kY 0 kL2 + kY kL2 }kY kL2 .
Using this inequality and (3.24), we obtain
kY k2H 1 ≤ ChkY k2H 1 .
Thus, for h sufficiently small,
kY kH 1 = 0,
and hence from Sobolev’s inequality,
Y = 0.
For the self-adjoint boundary value problem
Lu ≡ −(pu0 )0 + q(x)u = f (x), x ∈ I.
u(0) = u(1) = 0,
the uniqueness of the Galerkin approximation uh ∈ Sh satisfying
(pu0h , v 0 ) + (quh , v) = (f, v), v ∈ Sh ,
is much easier to prove. In this case, the coefficient matrix A of the Galerkin
equations is positive definite, from which it follows immediately that the
Galerkin approximation is unique. This result is proved in the following
theorem.

31
Theorem 3.2. The matrix A is positive definite.

Proof — Suppose β ∈ Rs and β 6= 0. Then, if A = (aij ) and w =


Ps
j=1 βj wj , then

s
X
βT A β = aij βi βj
i,j=1
s
{(pβi wi0 , βj wj0 ) + (qβi wi , βj wj )}
X
=
i,j=1
= (pw , w0 ) + (qw, w)
0

= kp1/2 w0 k2L2 + kq 1/2 wk2L2


≥ p∗ kw0 k2L2 .

The proof is complete if kw0 k > 0. Suppose kw0 k = 0. Then w0 = 0 and

w = C, where C is a constant. Since w ∈ Sh , w(0) = w(1) = 0, from


which it follows that C = 0. Therefore w = 0; that is sj=1 βj wj = 0.
P

Since {w1 , . . . , ws } is a basis for Sh and is therefore linearly independent,


this implies that βj = 0, j = 1, . . . , N , which is a contradiction. Therefore
kw0 k > 0 and A is positive definite.

3.6 The Accuracy of the Galerkin Approximation


3.6.1 Optimal H 1 and L2 error estimates
In the following, an estimate of the error u − uh in an H k –norm (or an
Lp –norm) will be called optimal in H k (or Lp ) if the estimate has the same
power of h as is possible by the approximation properties of the subspace
Sh with the same smoothness assumptions on u.
Throughout this and subsequent analyses, C denotes a generic constant
which is independent of u and h.

Theorem 3.3. Suppose u ∈ H r+1 (I) ∩ H01 (I). Then, for h sufficiently
small,
(3.25) ku − uh kL2 + hku − uh kH 1 ≤ Chr+1 kukH r+1 .

Proof — Let φ ∈ H01 (I) satisfy

L∗ φ = e(x), x ∈ I,
φ(0) = φ(1) = 0,

32
where e = u − uh . Then, as in section 3.2 but with φ and e replacing ψ and
Y , respectively, we have

(3.26) kekL2 ≤ ChkekH 1 ,

since
a(e, v) = 0, v ∈ Sh .
Since
a(e, e) = a(e, u − χ), χ ∈ Sh ,
it follows that

ke0 k2L2 = −(pe0 , e) − (qe, e) + a(e, u − χ)


n o
≤ C [ke0 kL2 + kekL2 ]kekL2 + kekH 1 ku − χkH 1

on using Schwarz’s inequality and (3.23). On using (3.26), we have, for h


sufficiently small,
(3.27) kekH 1 ≤ Cku − χkH 1 .
Since u ∈ H r+1 (I) ∩ H01 (I), from Theorem 3.1, χ can be chosen so that

ku − χkH 1 ≤ Chr kukH r+1 ,

and hence
(3.28) kekH 1 ≤ Chr kukH r+1 .
The use of this estimate in (3.26) completes the proof.

3.6.2 Optimal L∞ error estimate


In this section, we derive an optimal L∞ error estimate when Sh = Mr,0
k (π)
and the partition π of I is from a quasi–uniform collection of partitions,
which we now define.

Definition 3.1. Let

π : 0 = x0 < x1 < . . . < xN < xN +1 = 1

denote a partition of I and let Π(I) denote the collection of all such partitions
π of I. As before, set

hi = xi − xi−1 , i = 1, . . . , N + 1,

33
and h = maxi hi . A collection of partitions C ⊂ Π(I) is called quasi–uniform
if there exists a constant σ ≥ 1 such that, for all π ∈ C,

max hh−1
j ≤ σ.
1≤j≤N +1

The following lemma, which is proved in [14], plays an important role in


the derivation of the L∞ estimate.

Lemma 3.1. Let P u be the L2 projection of u into Mrk (π), that is,

(P u − u, v) = 0 v ∈ Mrk (π),
r+1 (I),
where −1 ≤ k ≤ r − 1. Then, if u ∈ W∞

kP u − ukL∞ (I) ≤ Chr+1 kukW∞


r+1
(I) ,

provided π is chosen from a quasi–uniform collection of partitions of I.

We now prove the following theorem.

Theorem 3.4. Suppose Sh = Mr,0 k (π), with π chosen from a quasi–uniform


r+1 (I) ∩ H 1 (I), then
collection of partitions. If u ∈ W∞ 0

(3.29) ku − uh kL∞ ≤ Chr+1 kukW∞


r+1 .

Proof — Let W ∈ Sh satisfy

(W 0 , v 0 ) = (u0 , v 0 ), v ∈ Sh .

Since
a(u − uh , v) = 0, v ∈ Sh ,
it follows that

((W − uh )0 , v 0 ) + (u − uh , qv − (pv)0 ) = 0,

for all v ∈ Sh . If we set v = W − uh , then

k(W − uh )0 k2L2 ≤ Cku − uh kL2 kW − uh kH 1 .

Therefore, since k · kH 1 and k · kH 1 are equivalent norms on H01 ,


0

kW − uh kH 1 ≤ Cku − uh kL2

34
and from Sobolev’s inequality, we obtain
kW − uh kL∞ ≤ Cku − uh kL2 .
Then, from Theorem 3.3, it follows that
(3.30) kW − uh kL∞ ≤ Chr+1 kukH r+1 .
Since
(3.31) ku − uh kL∞ ≤ ku − W kL∞ + kW − uh kL∞ ,
we need to estimate ku − W kL∞ to complete the proof.
Note that since
((u − W )0 , 1) = 0,
it follows that W 0 is the L2 projection of u0 into Mr−1
k−1 (π), and hence, from
Lemma 3.1, we obtain
(3.32) k(u − W )0 kL∞ ≤ Chr ku0 kW∞ r
r ≤ Ch kuk r+1 .
W∞

Now suppose g ∈ L1 and define G by


G00 = −g(x), x ∈ I,
G(0) = G(1) = 0.
Then

(3.33) kGkW 2 ≤ CkgkL1 ,


1

and, for χ ∈ Sh ,
(u − W, g) = −(u − W, G00 ) = ((u − W )0 , (G − χ)0 ).
On using Hölder’s inequality, we obtain
(u − W, g) ≤ k(u − W )0 kL∞ k(G − χ)0 kL1 .
From Theorem 3.1, we can choose χ so that
k(G − χ)0 kL1 ≤ ChkGkW 2 .
1

Hence, on using (3.33), it follows that


|(u − W, g)| ≤ Chk(u − W )0 kL∞ kgkL1 .
On using (3.32) and duality, we have
(3.34) ku − W kL∞ ≤ Chr+1 kukW∞
r+1 .

The desired result now follows from (3.30), (3.31) and (3.34).

35
3.6.3 Superconvergence results
The error estimates of Theorems 3.3 and 3.4 are optimal and consequently
no better global rates of convergence are possible. However, there can be
identifiable points at which the approximate solution converges at rates that
exceed the optimal global rate. In the following theorem, we derive one such
superconvergence result.

Theorem 3.5. If Sh = Mr,0


0 (π) and u ∈ H
r+1 (I) ∩ H 1 (I), then, for h
0
sufficiently small,

(3.35) |(u − uh )(xi )| ≤ Ch2r kukH r+1 , i = 0, . . . , N + 1.

Proof — Let G(x, ξ) denote the Green’s function for (3.1); that is,

u(x) = −(Lu, G(x, ·)) = a(u, G(x, ·))

for sufficiently smooth u. This representation is valid for u ∈ H01 (I) and
hence it can be applied to e = u − uh . Thus, for χ ∈ Sh ,

e(xi ) = a(e, G(xi , ·)) = a(e, G(xi , ·) − χ),

since
a(e, χ) = 0, χ ∈ Sh .
Thus,
(3.36) |e(xi )| ≤ CkekH 1 kG(xi , ·) − χkH 1 .
From the smoothness assumptions on p and q, it follows that

G(xi , ·) ∈ H r+1 ([0, xi ]) ∩ H r+1 ([xi , 1]),

and
kG(xi , ·)kH r+1 ([0,x ]) + kG(xi , ·)kH r+1 ([x ,1]) ≤ C.
i i

Hence there exists χ ∈ Sh (obtained, for example, by Lagrange interpolation


on each subinterval) such that

(3.37) kG(xi , ·) − χkH 1 ≤ Chr .

From Theorem 3.3, we have

(3.38) kekH 1 ≤ Chr kukH r+1 ,

36
for h sufficiently small, and hence combining (3.36)–(3.38), we obtain

|e(xi )| ≤ Ch2r kukH r+1 ,

as desired.
A method which involves very simple auxiliary computations using the
Galerkin solution can be used to produce superconvergent approximations
to the derivative, [24]. First we consider approximations to u0 (0) and u0 (1).
Motivated by the fact that

(f, (1 − x)) = (−u00 + pu0 + qu, 1 − x) = u0 (0) + a(u, 1 − x),

we define an approximation Γ0 to u0 (0) by

Γ0 = (f, 1 − x) − a(uh , 1 − x),

where uh is the solution to (3.4). Also, with 1−x replaced by x in the above,
we find that
u0 (1) = a(u, x) − (f, x),
and hence we define an approximation ΓN +1 to u0 (1) by

ΓN +1 = a(uh , x) − (f, x).

It can be shown that that if u ∈ H r+1 (I), then

(3.39) |Γj − u0 (xj )| ≤ Ch2r kukH r+1 ,

j = 0, N + 1, when for example, Sh = Mr,0 k (π).


r,0
If Sh = M0 (π), a procedure can be defined which at the nodes xi , i =
1, . . . , N , gives superconvergence results similar to (3.39). Specifically, if Γj ,
an approximation to u0 (xj ), j = 1, . . . , N , is defined by

a(uh , x)Ij0 − (f, x)Ij0


Γj = ,
xj

where the subscript Ij0 denotes that the inner products are taken over Ij0 =
(0, xj ), then (3.39) holds for j = 1, . . . , N . The approximation Γj is moti-
vated by the fact that
(Lu, x)Ij0 = (f, x)Ij0 ,
and, after integration by parts,

−u0 (xj )xj + a(u, x)Ij0 = (f, x)Ij0 .

37
3.6.4 Quadrature Galerkin methods
In most cases, the integrals occurring in (3.6) cannot be evaluated exactly
and one must resort to an approximation technique for their evaluation. In
this section, the effect of the use of certain quadrature rules the Galerkin
method is discussed. To illustrate the concepts, we consider the problem

−(pu0 )0 = f, x ∈ I,
u(0) = u(1) = 0,

where we shall assume that there exists a constant p0 such that

0 < p0 ≤ p(x), x ∈ I.

The Galerkin method in this case takes the form

(3.40) (pu0h , v 0 ) = (f, v), v ∈ Sh ,

where we shall take Sh to be Mr,0


0 (π). Let

0 ≤ ρ1 < ρ2 < . . . < ρν ≤ 1,

and set
ρij = xi−1 + hi ρj , i = 1, . . . , N + 1, j = 1, . . . , ν.
Suppose ωj > 0, j = 1, . . . , ν, and denote by
ν
X
hα, βii = hi ωj (αβ)(ρij ),
j=1

a quadrature rule on Ii which is exact for αβ ∈ Pt (Ii ).


Set
N
X +1
hα, βi = hα, βii .
i=1

If this quadrature formula is used in (3.40), the problem becomes that of


finding zh ∈ Sh such that

(3.41) hpzh0 , v 0 i = hf, vi, v ∈ Sh .

If t is sufficiently large, the existence and uniqueness of zh follow from the


next lemma.

38
Lemma 3.2. If t ≥ 2r − 2,

hpV 0 , V 0 i ≥ p0 kV 0 k2L2 , V ∈ Sh .

Proof — Since the weights ωj , j = 1, . . . , q are positive, and (V 0 )2 ∈


P2r−2 (Ii ),

hpV 0 , V 0 i ≥ p0 hV 0 , V 0 i
= p0 kV 0 k2L2 .

From this lemma, it follows that, if t ≥ 2r − 2, there exists a unique


solution zh ∈ Sh of (3.41). In order to have t ≥ 2r − 2, it is necessary to
have at least r quadrature points; the use of fewer leads to non–existence,
as is shown by Douglas and Dupont (1974).
It can be shown (cf. Douglas and Dupont, 1974) that if t ≥ 2r − 1 then

ku − zh kL2 ≤ Chr+1 kf kH r+1 .

In addition, the superconvergence indicated in Theorem 3.5 is preserved.


More specifically
|(u − zh )(xi )| ≤ Ch2r kf kH t+1 .
Notice that the use of an r–point Gaussian quadrature rule produces the
desired accuracy and satisfies the condition of Lemma 3.2.

3.7 The Galerkin Method for Nonlinear Problems


Consider the boundary value problem
−u00 + f (x, u) = 0, x ∈ I,
u(0) = u(1) = 0.
It is easy to show that the weak form of this boundary value problem is

(3.42) (u0 , v 0 ) + (f (u), v) = 0, v ∈ H01 (I).

As before, let Sh be a finite dimensional subspace of H01 (I) with basis


{w1 , . . . , ws }. The Galerkin approximation to u is the element uh ∈ Sh
such that
(3.43) (u0h , v 0 ) + (f (uh ), v) = 0, v ∈ Sh ,
and if s
X
uh (x) = αj wj (x),
j=1

39
we obtain from (3.43) with v = wi , the nonlinear system
s  s 
(wi0 , wj0 )αj
X X
(3.44) + f( αν wν ), wi = 0, i = 1, . . . , s,
j=1 ν=1

for α1 , . . . , αs . Newton’s method for the solution of (3.44) can be easily


derived by linearizing (3.43) to obtain

(n) 0 (n−1) (n−1) (n) (n−1)


(uh , v 0 ) + (f (uh , v) + (fu (uh )(uh − uh ), v) = 0, v ∈ Sh ,
(0)
where uh is arbitrary. If
s
(k) X (k)
uh = αj wj ,
j=1

then
(3.45) (A + Bn )α(n) = −Fn + Bn α(n−1) ,
where
(n) (n)
A = ((wi0 , wj0 )), α(n) = (α1 , . . . , αN )T ,
s
X
Bn = ((fu ( αν(n−1) wν )wi , wj )),
ν=1
(n−1)
Fn = ((f (uh ), wi )).

Note that (3.45) may be written in the form

(A + Bn )(α(n) − α(n−1) ) = −(Aα(n−1) + Fn ).

A comprehensive account of the Galerkin method for second order nonlinear


problems is given by Fairweather (1978).

40
4 The Orthogonal Spline Collocation Method
4.1 Introduction
Consider the linear second order two-point boundary value problem

(4.1) Lu ≡ −u00 + p(x)u0 + q(x)u = f (x), x ∈ I,

(4.2) µ0 u(0) + ν0 u0 (0) = g0 , µ1 u(1) + ν1 u0 (1) = g1 ,


where the functions p, q and f are smooth on I. Let

π : 0 = x0 < x1 < . . . < xN < xN +1 = 1

denote a partition of I, and set hi = xi −xi−1 . In the orthogonal spline collo-


cation method for (4.1)–(4.2), the approximate solution uh ∈ Mr1 (π), r ≥ 3.
If {wj }sj=1 is a basis for Mr1 (π), where s = (N + 1)(r − 1) + 2, we may write
s
X
(4.3) uh (x) = uj wj (x).
j=1

Then the coefficients {uj }sj=1 in (4.3) are determined by requiring that uh
satisfy (4.1) at the points {ξj }s−2
j=1 , and the boundary conditions (4.2):

µ0 uh (0) + ν0 u0h (0) = g0 ,


Luh (ξj ) = f (ξj ), j = 1, 2, . . . , s − 2,
µ1 uh (1) + ν1 u0h (1) = g1 ,

where

(4.4)
ξ(i−1)(r−1)+k = xi−1 + hi σk , i = 1, 2, . . . , N + 1, k = 1, 2, . . . , r − 1,

and {σk }r−1


k=1 are the nodes for the (r − 1)-point Gauss-Legendre quadrature
rule on the interval [0, 1].
As an example, consider the case when r = 3, for which the collocation
points are
1 1 1 1
   
ξ2i−1 = xi−1 + 1 − √ hi , ξ2i = xi−1 + 1 + √ hi , i = 1, 2, . . . , N +1.
2 3 2 3
(4.5)
With the usual basis (3.15) for M31 (π), we set
N
X +1
uh (x) = {αj vj (x) + βj sj (x)}.
j=0

41
Since only four basis functions,

(4.6) vi−1 , si−1 , vi , si ,

are nonzero on [xi−1 , xi ], the coefficient matrix of the collocation equations


is of the form  
D0
 S
 1 T1



 S2 T 2 

(4.7) 
.. .. ,

 . . 

 
 SN +1 TN +1 
D1
with Da = [µ0 ν0 ], Db = [µ1 ν1 ], and, for i = 1, 2, . . . , N + 1,
! !
Lvi−1 (ξ2i−1 ) Lsi−1 (ξ2i−1 ) Lvi (ξ2i−1 ) Lsi (ξ2i−1 )
Si = , Ti = .
Lvi−1 (ξ2i ) Lsi−1 (ξ2i ) Lvi (ξ2i ) Lsi (ξ2i )

For r > 3, with the commonly used B-spline basis [8], the coefficient
matrix has the form
 
D0
 W
11 W12 W13

 

 W21 W22 W23 

(4.8) 
.. ,

 . 

 
 WN +1,1 WN +1,2 WN +1,3 
D1

with Wi1 ∈ R(r−1)×2 , Wi2 ∈ R(r−1)×(r−3) , Wi3 ∈ R(r−1)×2 . The matrices


(4.7) and (4.8) are called almost block diagonal and the collocation equations
form an almost block diagonal linear system. There exist efficient algorithms
for solving such systems based on the idea of alternate row and column
elimination; see section 5.
For the case in which the partition π is uniform, the values at the colloca-
tion points of the basis functions (4.6) and their first and second derivatives
are given in Appendix C.

4.2 The Existence and Uniqueness of the Collocation Ap-


proximation
To demonstrate analytical tools used in the proof of the existence and
uniqueness of the collocation approximation and in the derivation of error

42
estimates, we consider the linear two-point boundary value problem
Lu = −u00 + p(x)u0 + q(x)u = f (x), x ∈ I,
(4.9)
u(0) = u(1) = 0.
Then, for v ∈ H 2 (I) ∩ H01 (I), there exists a constant C0 such that
(4.10) kvkH 2 (I) ≤ C0 kLvkL2 (I) .
We define the discrete inner product h·, ·i by
N
X +1
hφ, ψi = hφ, ψii ,
i=1
where
r−1
X
hφ, ψii = hi ωj φ(ξ(i−1)(r−1)+k )ψ(ξ(i−1)(r−1)+k ), i = 1, . . . .N + 1,
k=1

and {ξj }s−2 r−1


j=1 are the collocation points given in (4.4) and {ωk }k=1 are the
weights for the (r − 1)-point Gauss-Legendre quadrature rule on the interval
[0, 1].
In order to demonstrate the existence and uniqueness of the collocation
approximation uh ∈ Sh ≡ Mr,0 1 (π) satisfying
Luh (ξj ) = f (ξj ) j = 1, . . . , s − 2,
we use the following lemma, [13].

Lemma 4.1. There exists a constant C such that


(4.11) |hL Y, L Y i − kL Y k2L2 (I) | ≤ ChkY k2H 2 (I) ,
for all Y ∈ Sh .

From (4.10) and this lemma, we find that, for Y ∈ Sh ,


kY k2H 2 (I) ≤ C0 kL Y k2L2 (I) ≤ C0 (hL Y, L Y i + ChkY k2H 2 (I) ).
Thus, for h sufficiently small,
(4.12) kY k2H 2 (I) ≤ ChL Y, L Y i.
If
L Y (ξj ) = 0, j = 1, . . . , s − 2,
then (4.12) implies that Y = 0, which proves the uniqueness and hence
existence of the collocation approximation uh .

43
4.3 Optimal H 2 Error Estimates
We now derive an optimal H 2 error estimate.

Theorem 4.1. Suppose u ∈ H r+1 (I) ∩ H01 (I). Then there exists a constant
C such that, for h sufficiently small,

(4.13) ku − uh kH 2 (I) ≤ C hr−1 kukH r+1 (I) .

Proof — Let Tr,h be the interpolation operator from C 1 (I) to Sh deter-


mined by the conditions

(i) (Tr,h v)(xi ) = v(xi ), (Tr,h v)0 (xi ) = v 0 (xi ), i = 0, . . . , N + 1,


(ii) (Tr,h v)(ηij ) = v(ηij ), i = 1, . . . , N + 1, j = 1, . . . , r − 3,

for v ∈ C 1 (I), where the points ηij , j = 1, . . . , r − 3, are certain specific


points in Ii ; see [13] for details. Clearly Tr,h is local in the sense that Tr,h v
is determined on Ii by v on Ii . (Note that when r = 3, Tr,h v is the Hermite
cubic interpolant of v.) It can be shown that, if W = Tr,h u and u ∈ H r+1 (I),
2
X r+ 21
(4.14) hk k(u − W )(k) kL∞ (Ii ) ≤ Chi ku(r+1) kL2 (Ii ) , i = 1, . . . , N + 1,
k=0

and
2
X
(4.15) hk k(u − W )(k) kL2 (I) ≤ C hr+1 ku(r+1) kL2 (I) .
k=0

Then, using (4.14), it follows that


2
X 3
|L(uh −W )(ξj )| = |L(u−W )(ξj )| ≤ C k(u−W )(k) kL∞ (Ii ) ≤ C hr− 2 ku(r+1) kL2 (Ii ) .
k=0

Hence
N +1 r−1
hi ωj [L(uh − W )(xij )]2
X X
hL(uh − W ), L(uh − W )i =
i=1 j=1
N +1 r−1
h2r−2
X X
≤ i ωj ku(r+1) k2L2 (Ii )
i=1 j=1

≤ C h2r−2 ku(r+1) k2L2 (I) ,

and, from (4.12), it follows that

44
(4.16) kuh − W kH 2 (I) ≤ C hr−1 ku(r+1) kL2 (I) .

The required result is now obtained from the triangle inequality, (4.15) and
(4.16).

4.4 Superconvergence Results


Douglas and Dupont [13] construct a “quasi–interpolant”, û ∈ Sh , which
2r−1 (I),
has the properties that, for u ∈ W∞

(4.17) |(u − û)(j) (xi )| ≤ C h2r−2 kukW∞


2r−1
(I) , j = 0, 1,

2r (I),
and, if u ∈ W∞

L(uh − û)(ξj ) = (ξj ), j = 1, . . . , s − 2,


(4.18)
(uh − û)(0) = (uh − û)(1) = 0,

where
|(ξj )| ≤ C h2r−2 kukW∞
2r (I) .

From (4.12) and (4.18), we obtain


1
kuh − ûkH 2 (I) ≤ ChL(uh − û), L(uh − û)i 2
1
= Ch, i 2
≤ C h2r−2 kukW∞
2r (I) ,

and hence, using Sobolev’s inequality,

2r−2
(4.19) kuh − ûkW∞
1 (I) ≤ C h kukW∞
2r (I) ;

that is, the quasi–interpolant û differs from the collocation solution uh along
with first derivatives, uniformly by O(h2r−2 ). For r ≥ 3, this represents a
higher order in h than is possible for u − uh . However, using the triangle
inequality, (4.17) and (4.18), we see that a superconvergence phenomenon
occurs at the nodes, namely

(4.20) |(u − uh )(j) (xi )| ≤ C h2r−2 kukW∞


2r (I) , j = 0, 1.

45
4.5 L2 and H 1 Error Estimates
From the inequality,
Z 1 1
Z 1
2 2
2 00
g(x) dx ≤ C{hi (|g(xi−1 | + |g(xi )|) +
2
h2i g (x) dx 2
},
Ii Ii

valid for any g ∈ C 2 (Ii ), the H 2 estimate (4.13) and the superconvergence
result (4.20), it follows that
 
ku − uh kL2 ≤ C h2r−1 kukW∞
2r (I) + h
r+1
kukH r+1 (I) .

From this inequality, (4.13) and the inequality [1],


Z  Z Z 
0 2
|g (x)| dx ≤ 54 h−2
i
2
|g(x)| dx + h2i 00
|g (x)| dx , 2
Ii Ii Ii

we obtain the estimate


 
ku − uh kH 1 (I) ≤ C h2r−2 kukW∞ r
2r (I) + h kukH r+1 (I) .

In these estimates, the exponent of h is optimal but the smoothness re-


quirements on u are not minimal. It is shown in [13] that, by modifying the
collocation procedure, the smoothness requirements on the solution can be
reduced to the minimal ones required for approximation when global esti-
mates are sought. To describe this modification, let fˆ denote the standard
r−2
L2 –projection of f into M−1 . The smoothed collocation method consists
of finding uh ∈ Sh such that

(L uh )(ξj ) = fˆ(ξj ), j = 1, . . . , s − 2.

Douglas and Dupont [13] prove that, for h sufficiently small,

ku − uh kH j (I) ≤ Cj hr+1−j kf kH r−1 (I) , j = 0, 1, 2,

and
|(u − uh )(xi )| ≤ Ch2r kf kH r−1 (I) , i = 1, . . . , N.

46
Figure 1: Structure of a general ABD matrix

5 Algorithms for Solving Almost Block Diagonal


Linear Systems
5.1 Introduction
In several numerical methods for solving two-point boundary value problems,
there arise systems of linear algebraic equations with coefficient matrices
which have a certain block structure, almost block diagonal (ABD), [8]; see,
for example, section 4.1.
The most general ABD matrix, shown in Figure 1, has the following
characteristics: the nonzero elements lie in blocks which may be of different
sizes; each diagonal entry lies in a block; any column of the matrix intersects
no more than two blocks (which are successive), and the overlap between
successive blocks (that is, the number of columns of the matrix common to
two successive blocks) need not be constant. In commonly used methods for
solving two-point boundary value problems, the most frequently occurring
ABD structure is shown in Figure 2, where the blocks W (i) , i = 1, 2, . . . , N ,
are all of equal size, and the overlap between successive blocks is constant
and equal to the sum of the number of rows in TOP and BOT.
In this section, we outline efficient methods for the solution of systems
with coefficient matrices having this structure. These algorithms are all
variants of Gaussian elimination with partial pivoting, and are more efficient
than Gaussian elimination primarily because they introduce no fill–in.
We describe the essential features of the algorithms using a simple ex-
ample with a coefficient matrix of the form in Figure 2, namely the matrix
of Figure ?? in which there are two 4 × 7 blocks W (1) , W (2) , and TOP and

47
Figure 2: Special ABD structure arising in BVODE solvers

Figure 3: Fill-in introduced by SOLVEBLOK

BOT are 2×3 and 1×3, respectively. The overlap between successive blocks
is thus 3.

5.2 Gaussian Elimination with Partial Pivoting


The procedure implemented in solveblok [9, 10] uses conventional Gaussian
elimination with partial pivoting. Fill-in may be introduced in the positions
indicated ∗ in Figure 3. The possible fill-in, and consequently the possible
additional storage and work, depends on the number of rows, NT , in the
block TOP.

48
Figure 4: Structure of the reduced matrix

5.3 Alternate Row and Column Elimination


This stable elimination procedure, based on the approach of Varah [23],
generates no fill-in for the matrix A of Figure ??. Suppose we choose a
pivot from the first row. If we interchange the first column and the column
containing the pivot, there is no fill-in. Moreover, if instead of perform-
ing row elimination as in conventional Gaussian elimination, we reduce the
(1, 2) and (1, 3) elements to zero by column elimination, the corresponding
multipliers are bounded in magnitude by unity. We repeat this process in
the second step, choosing a pivot from the elements in the (2, 2) and (2, 3)
positions, interchanging columns 2 and 3 if necessary and eliminating the
(2, 3) element. If this procedure were adopted in the third step, fill-in could
be introduced in the (i, 3) positions, i = 7, 8, 9, 10. To avoid this, we switch
to row elimination with partial pivoting to eliminate the (4, 3), (5, 3), (6, 3)
elements, which does not introduce fill-in. We continue using row elimina-
tion with partial pivoting until a step is reached when fill-in could occur, at
which point we switch back to the “column pivoting, column elimination”
scheme. This strategy leads to a decomposition

A = P LB̃U Q,

where P, Q are permutation matrices recording the row and column inter-
changes, respectively, the unit lower and unit upper triangular matrices L,
U contain the multipliers used in the row and column eliminations, respec-
tively, and the matrix B̃ has the structure shown in Figure 4, where · denotes
a zeroed element. Since there is only one row or column interchange at each
step, the pivotal information can be stored in a single vector of the order of

49
Figure 5: The coefficient matrix of the reordered system

the matrix, as in conventional Gaussian elimination. The nonzero elements


of L, U can be stored in the zeroed positions in A. The pattern of row and
column eliminations is determined by NT and the number of rows NW in
a general block W (i) (cf. Figure 2). In general, a sequence of NT column
eliminations is alternated with a sequence of NW − NT row eliminations.
To solve Ax = b, we solve

P Lz = b, B̃w = z, U Qx = w.

The second step requires particular attention. If the components of w are


ordered so that those associated with the column eliminations precede those
associated with the row eliminations, and the equations are ordered accord-
ingly, the system is reducible. In our example, if we use the ordering

ŵ = [w1 , w2 , w5 , w6 , w9 , w10 , w3 , w4 , w7 , w8 , w11 ]T ,

the coefficient matrix of the reordered equations has the structure in Figure
5. Thus, the components w1 , w2 , w5 , w6 , w9 , w10 , are determined by solving a
lower triangular system and the remaining components by solving an upper
triangular system.

5.4 Modified Alternate Row and Column Elimination


The alternate row and column elimination procedure can be made more
efficient by using the fact that, after each sequence of row or column elimi-
nations, a reducible matrix results. This leads to a reduction in the number
of arithmetic operations because some operations involving matrix-matrix

50
multiplications in the decomposition phase can be deferred to the solution
phase, where only matrix-vector multiplications are required. After the first
sequence of column eliminations, involving a permutation matrix Q1 and
multiplier matrix U1 , say, the resulting matrix is
 
C1 O
 M1
 
B1 = AQ1 U1 =  ,

 A1 
O

where C1 ∈ R2×2 is lower triangular, M1 ∈ R4×2 and A1 ∈ R9×9 . The


equation Ax = b becomes
!
x̂1
B1 x̂ = b, x̂ = U1−1 QT1 x = ,
x̂2
!
b1
and x̂1 ∈ R2 . Setting b = , where b1 ∈ R2 , we obtain
b2
!
M1 x̂1
(5.1) C1 x̂1 = b1 , A1 x̂2 = b2 − ≡ b̂2 .
0

The next sequence of eliminations, that is, the first sequence of row elimi-
nations, is applied only to A1 to produce another reducible matrix
!
R1 N 1 O
L1 P1 A1 = ,
O A2

where R1 ∈ R2×2 is upper triangular, N1 ∈ R2×3 and A2 ∈ R7×7 . If


! !
x̃1 b̃1
x̂2 = , L1 P1 b̂2 = ,
x̃2 b̃2

where x̃1 , b̃1 ∈ R2 , system (5.1) becomes

(5.2) A2 x̃2 = b̃2 , R1 x̃1 = b̃1 − [N1 O] x̃2 ,

and the next sequence of eliminations is applied to A2 , which has the struc-
ture of the original matrix with one W block removed. Since row operations
are not performed on M1 in the second elimination step, and column op-
erations are not performed on N1 in the third, etc., there are savings in

51
arithmetic operations [11]. The decomposition phase differs from that in al-
ternating row and column elimination in that if the rth elimination step is a
column (row) elimination, it leaves unaltered the first (r −1) rows (columns)
of the matrix. The matrix in the modified procedure has the same structure
and diagonal blocks as B̃, and the permutation and multiplier matrices are
identical.
In [11, 12], this modified alternate row and column elimination procedure
was developed and implemented in the package colrow for systems with
matrices of the form in Figure 2, and in the package arceco for ABD systems
in which the blocks are of varying dimensions, and the first and last blocks
protrude, as shown in Figure 1.
A comprehensive survey of the occurrence of, and solution techniques
for, ABD linear systems is given in [3].

6 First Order Systems


6.1 Introduction
Consider the two–point boundary value problem for the first order system
given by
(6.1) u0 = f (x, u), x ∈ I,
subject to the separated boundary conditions

(6.2) g0 (u(0)) = 0, g1 (u(1)) = 0,


where u and f are vector functions of order m, g0 and g1 are vector func-
tions of order m1 and m2 respectively, with m1 + m2 = m, and the prime
denotes differentiation with respect to x. In most practical applications, f
is nonlinear while g0 and g1 are both linear.
Equations of the form (6.1) may arise in practice or they may be derived
for one or more higher order equations of the form

u(k) = f (x, u, u0 , . . . , u(k−1) ),

subject to appropriate boundary conditions. By introducing

u1 = u, u2 = u0 , . . . , uk = u(k−1) ,

the first order system corresponding to such an equation is

u0i = ui+1 i = 1, 2, . . . , k − 1,
u0k = f (x, u1 , . . . , uk ),

52
subject to rewritten boundary conditions. In many cases, it is desirable to
reduce the higher–order equations in this way and to use a solution approach
that is suitable for first order systems. However, it should be noted that
there are situations where such a reduction is not necessary and approaches
applicable to the original higher–order equation are more appropriate.

6.2 The Trapezoidal Rule


If
π : 0 = x0 < x1 < . . . < xN +1 = 1
is a partition of I and hi = xi − xi−1 , then the trapezoidal rule for the
solution of (6.1)–(6.2) takes the form

φi (U) ≡ Ui − Ui−1 − 12 hi {f (xi , Ui ) + f (xi−1 , Ui−1 )} = 0, i = 1, . . . , N + 1,


φL (U) ≡ g0 (U0 ) = 0, φR (U) ≡ g1 (UN +1 ) = 0,

where Ui ≈ u(xi ), i = 0, 1, . . . , N + 1. The determination of the approxi-


mation
U ≡ [UT0 , UT1 , · · · , UTN +1 ]T
requires the solution of a system of m(N + 2) equations of the form
 
φL (U)
 

 φ0 (U) 

 
(6.3)

Φ(U) ≡  .. 
 = 0.
 . 
 
φN +1 (U) 
 

 
φR (U)

This system is usually solved by Newton’s method (or a variant of it), which
takes the form
(J (Uν−1 )∆Uν = −Φ(Uν−1 ),
(6.4)
Uν = Uν−1 + ∆Uν , ν = 1, 2, . . . ,
where U0 is an initial approximation to U and the (block) matrix
!
∂Φ(U) ∂φi
J (U) ≡ =
∂U ∂Uj

53
is the Jacobian of the nonlinear system (6.3). In this case, this matrix has
the ABD structure
 
A
 
 L1 R1 
 
 

 L2 R2 

 
(6.5) · ·
 
 
 
· ·
 
 
 
 

 LN +1 RN +1 

B

where
∂g0
A = is m1 × m,
∂U0
1 ∂f
 
Li = − I + hi (xi−1 , Ui−1 ) is m × m,
2 ∂Ui−1
1 ∂f
Ri = I − hi (xi , Ui ) is m × m,
2 ∂Ui
and
∂g1
B = is m2 × m.
∂UN +1
It can be shown that the approximation U determined from the trapezoidal
rule satisfies
u(xi ) − Ui = O(h2 ),
where h = maxj hj . Under sufficient smoothness conditions on u, higher–
order approximations to u on the mesh π can be generated using the method
of deferred conditions (cf. Section 1.5.1), which we shall describe briefly;
full details are given by Lentini and Pereyra (1977).
The local truncation error of the trapezoidal rule is defined as

τπ,i [u] = {φi (u) − [u0 − f (x, u)]|x=xi−1/2 }, i = 1, . . . , N + 1,

where xi−1/2 = xi−1 + 21 hi . By expanding in Taylor series about xi−1/2 , it


is easy to show that
L
X
(6.6) τπ,i [u] = Tν (xi−1/2 )h2ν
i + O(h
2L+2
), i = 1, . . . , N + 1,
ν=1

54
where
ν d2ν
Tν (xi−1/2 ) = − f (x, u)|x=xi−1/2 .
22ν−1 (2ν + 1)! dx2ν
If Sk (U(k−1) ) is a finite difference approximation of order h2k+2 to the first k
terms in the expansion (6.6) of the local truncation error, then the solution
U(k) of the system

(6.7) φ(v) = Sk (U (k−1) ), k = 1, 2, . . . , L,

where U(0) = U, the solution of (6.3), is an order h2k+2 approximation to u


on the mesh π. Details of the construction of the operators Sk by numerical
differentiation are given by Lentini and Pereyra (1974).
Note that the systems (6.3) and (6.7) are similar, the right hand side of
(6.7) being simply a known vector. Once the first system (6.3) is solved, the
remaining systems (6.7) are small perturbations of (6.3) and considerable
computational effort can be saved by keeping the Jacobian, and therefore
its LU decomposition, fixed during the iteration (6.4). Of course, if the
mesh π is changed, the Jacobian must be recomputed and refactored. In
principle, this modification degrades the quadratic convergence of Newton’s
method, but because of the accuracy of the approximate Jacobian and the
initial approximation, convergence is still rapid.

6.3 Orthogonal Spline Collocation


This method produces an approximation to u which is a piecewise poly-
nomial vector function U(x) = (U1 (x), . . . , Um (x)), where, for each i, i =
1, . . . , m, Ui ∈ Mr0 (π). Each piecewise polynomial Ui is represented in the
form sX
(6.8) Ui (x) = αli Bl (x),
l=1

where s = r(N + 1) + 1, and {Bl (x)}M l=1 is a convenient basis, commonly


known as the B–spline basis (de Boor, 1978), for the piecewise polynomial
space Mr0 (π). The coefficients {αli } are determined by requiring that the
approximate solution U satisfy the boundary conditions, and the differential
equation at r specific points, the collocation points, in each subinterval.
These points are the Gauss points defined by

ξ(i−1)(r−1)+k = xi−1 + hi ρk , i = 1, . . . , N + 1, k = 1, . . . , r,

where {ρk }rk=1 are the r zeros of the Legendre polynomial of degree r on
the interval [0, 1] (cf., Section 4.1). The collocation equations then take the

55
form
φ0 (U) ≡ g0 (U(0)) = 0,

(6.9) φl (U) ≡ U0 (ξj ) − f (ξj , U(ξj )) = 0, j = 1, 2, . . . , s − 2,

φs (U) ≡ g1 (U(1)) = 0.
Again a variant of Newton’s method is usually used to solve the nonlinear
system (6.9). Certain properties of the B–splines ensure that the Jacobian
of (6.9) is almost block diagonal but having a more general structure than
that of the matrix (6.5). These properties are (de Boor, 1978):
• On each subinterval [xi−1 , xi ] only r+1 B–splines are non–zero, namely
Br(i−2)+1 , . . . , Br(i−1)+1 .

• At x = 0, Bi (0) = δi,1 and at x = 1, Bi (1) = δi,s where δij denotes the


Kronecker delta.
If the unknown coefficients in (6.8) are ordered by “columns”, that is, first
with respect to i and then with respect to l, the Jacobian of (6.9) has the
form
A
 
 
 W11 W12 W13 
 
 

 W21 W22 W23 

 

 · 

 
·
 
 
 
·
 
 
 
 

 WN +1,1 WN +1,2 WN +1,3 

B
where the matrices A and B are m1 × m and m2 × m respectively, and
Wi1 , Wi2 and Wi3 are mr × m, mr × m(r − 1) and mr × m, respectively.
Under sufficient smoothness conditions, it can be shown (see, for example
Ascher et al., 1979) that the error in U for x ∈ [xi−1 , xi ] is given by
(r+1)
uj (x) − Uj (x) = c(x)uj (xi−1 )hr+1
i + O(hr+2 ), j = 1, . . . , m,
where c(x) is a known bounded function of x, and at each mesh point xi ,
uj (xi ) − Uj (xi ) = O(h2r ), j = 1, . . . , m, i = 0, 1, . . . , N + 1.

56
6.4 Multiple Shooting
This procedure involves finding an approximation Ui to u(xi ), i = 0, 1, . . . , N +
1, in the following way (cf. Stoer and Bulirsch, 1980). Let ui (x; Ui ) denote
the solution of the initial value problem

(6.10) u0 = f (x, u), u(xi ) = Ui .

Then the method consists in finding the vectors Ui , i = 0, 1, . . . , N + 1, such


that

a) the function defined by

u(x) = ui (x; Ui ) on [xi , xi+1 ], i = 0, 1, . . . , N,


u(1) = UN

is continuous on I, and is thus a solution of the differential equation


(6.1), and

b) the function u satisfies the boundary conditions (6.1).

These requirements lead to the equations

(6.11) g0 (U0 ) = 0,
(6.12) Ui+1 − ui (xi+1 ; Ui ) = 0, i = 0, . . . , N,
(6.13) g1 (UN +1 ) = 0,

where equations (6.12) arise from the continuity constraints, and (6.11) and
(6.13) from the boundary conditions. When Newton’s method is used to
solve this system, the Jacobian is again almost block diagonal, and, in this
case, takes the form
 
A
 
 L0 I 
 
 

 L1 I 

 
· ·
 
 
 
· ·
 
 
 
 

 LN I 

B

57
where
∂g0 ∂ui (xi+1 ; Ui ) ∂g1
A= is m1 ×m, Li = − is m×m, B= is m2 ×m,
∂U1 ∂Ui ∂UN

and I denotes the m × m unit matrix. In order to determine the m × m


matrices Li , i = 0, 1, . . . , N , one must solve the initial value problems

(6.14) L0i (x) = J(x, ui (x; Ui ))Li (x), Li (xi ) = I, i = 0, . . . , N,


∂f
where J is the Jacobian matrix J = . Then Li = Li (xi+1 ). Given
∂u
the function f and its Jacobian J, one can solve the initial value problems
(6.10) and (6.14) simultaneously using the values of U calculated from the
previous Newton iteration. In practice, these initial value problems are
solved numerically using an initial value routine, usually a Runge–Kutta
code.

6.5 Software Packages


**(THIS SECTION IS OUTDATED)** In recent years, several general–
purpose, portable software packages capable of handling nonlinear boundary
value problems involving mixed order systems of ordinary differential equa-
tions have been developed. The use of these packages has led to the solution
of problems which were difficult and/or costly to solve, if not intractable,
using standard solution procedures, and has removed much of the guesswork
from the problem of solving boundary value problems for ordinary differen-
tial equations, (see, for example, Davis and Fairweather (1981), Denison et
al. (1983), Muir et al. (1983), Fairweather and Vedha–Nayagam, (1987),
Bhattacharya et al. (1986), Akyurtlu et al. (1986), Vedha–Nayagam et al.
(1987)).
In this section, we shall discuss three packages which are widely available
and are now in common use. In addition to implementing one of the basic
numerical methods described in Sections 8.2-8.4, each package incorporates
a sophisticated nonlinear equations solver, which involves a linear equations
solver, and a complex, fully–automatic mesh selection and error estimation
algorithm upon which the success of the code largely depends.
The IMSL package, DVCPR, called BVPFD in the new IMSL Library,
implements a variable order finite difference method based on the trapezoidal
rule with deferred corrections. Like the package DD04AD in the Harwell
Subroutine Library (1978) and D02RAF in the NAG Library, this package

58
is a version of the code PASVA3 (Pereyra, 1979). The code PASVA3 has
fairly long history and a series of predecessors has been in use for several
years. The package COLSYS (Ascher et al., 1981), which implements spline
collocation at Gauss points, and IMSL’s multiple shooting code, DTPTB ,
called BVPMS in the new IMSL Library, are of a more recent vintage. All
three codes are documented elsewhere, and in this section we shall only
briefly mention some of their similarities and differences, and other note-
worthy features of the codes. It should be noted that Bader and Ascher
(1987) have developed a new version of COLSYS called COLNEW which
differs from COLSYS principally in its use of certain monomial basis func-
tions instead of B-splines. This new code, which is reputed to be somewhat
more robust that its predecessor, shares many of its features, and, in the
remainder of this section, any comments referring to COLSYS apply equally
well to COLNEW .
The codes DVCPR and DTPTB require that the boundary value prob-
lem be formulated as a first–order system, and they can handle nonseparated
boundary conditions, i.e. boundary conditions for (1.1), for example, of the
form
g(u(0), u(1)) = 0,
where g is a vector function of order m. On the other hand, COLSYS can
handle a mixed–order system of multipoint boundary value problems with-
out first reducing it to a first–order system, but requires that the boundary
conditions be separated. This restriction is not a serious one as a boundary
value problem with nonseparated boundary conditions can be reformulated
so that only separated boundary conditions occur, as we shall see in Section
10.2. However this reformulation does increase the size of the problem.
The algebraic problem arising in the codes are similar but there is no
uniformity in the manner in which they are solved. Each code uses some
variant of Newton’s method for the solution of the nonlinear systems, and a
Gauss elimination–based algorithm to solve the almost block diagonal linear
systems. Since the codes were written, the package COLROW , (Diaz et al.,
1983) has been developed specifically for the solution of such linear systems;
see Section 9.4. Its use in the existing codes has not been investigated, but
may improve their efficiency significantly.
Both DVCPR and COLSYS solve the boundary value problem on a se-
quence of meshes until user–specified error tolerances are satisfied. Detailed
descriptions and theoretical justification of the automatic mesh–selection
procedures used in DVCPR and COLSYS are presented by Lentini and
Pereyra (1974) and Ascher et al. (1979), respectively. It is worth noting

59
that DVCPR constructs meshes in such a way that each mesh generated
contains the initial mesh, which is not necessarily the case in COLSYS .
In DTPTB , the “shooting points” x1 , . . . , xN , are chosen by the user or,
for linear problems, by the code itself. Any adaptivity in the code appears
in the initial value solver, which in this case is IMSL’s Runge–Kutta code,
DVERK .
In DVCPR, one can specify only an absolute tolerance, , say, which is
imposed on all components of the solution. The code attempts to find an
approximate solution U such that the absolute error in each of its compo-
nents is bounded by . On the other hand, COLSYS allows the user to
specify a different tolerance for each component of the solution, and in fact
allows one to impose no tolerance at all on some or all of the components.
This code attempts to obtain an approximate solution U such that

max |ul (x) − Ul (x)| ≤ l + max |Ul (x)|l


x∈[xi ,xi+1 ] x∈[xi ,xi+1 ]

for lth component on which the tolerance l is imposed. This criterion has
a definite advantage in the case of components with different order of mag-
nitude or when components differ in magnitude over the interval of defini-
tion. Moreover, it is often more convenient to provide an array of tolerances
than to rescale the problem. On successful termination, both DVCPR and
COLSYS return estimates of the errors in the components of approximate
solution.
In DTPTB , the same absolute tolerance is imposed on all components
of the solution. In addition, a boundary condition tolerance is specified,
and, on successful termination, the solution returned will also satisfy the
boundary conditions to within this tolerance.
Each code offers the possibility of providing an initial approximation to
the solution and an initial subdivision of the interval of definition (the shoot-
ing points in the case of DTPTB ). With COLSYS , one may also specify the
number of collocation points per subinterval. In many instances, parame-
ter continuation is used to generate initial approximations and subdivisions.
That is, when solving a parameterized family of problems in increasing or-
der of difficulty, the subdivision and initial approximation for a particular
problem are derived from the final mesh and the approximate solution cal-
culated in the previous problem. This is an exceedingly useful technique
which improves the robustness of the packages. It should be noted that
COLSYS requires a continuous initial approximation, which is sometimes
an inconvenience.

60
It has been the authors’ experience that DTPTB is the least robust of
the three codes. However W. H. Enright and T. F. Fairgrieve (private com-
munication) have made some modifications to this code which have greatly
improved its performance, making it competitive with both DVCPR and
COLSYS . In general, there is little to choose between DVCPR and COL-
SYS , and it is recommended that both be used to solve a given boundary
value problem. Driver programs for these codes are straightforward to write.
Moreover, there are definite advantages to using both codes. For example,
each provides its own insights when solving a problem, simple programming
errors can be detected more quickly by comparing results, and considerable
confidence can be attached to a solution obtained by two different methods.
With the wide availability of these software packages, it is rarely advisable
or even necessary for the practitioner to develop ad hoc computer programs
for solving boundary value problems ordinary differential equations.

7 Reformulation of Boundary Value Problems into


Standard Form
7.1 Introduction
Boundary value problems arising in various applications are frequently not in
the form required by existing general purpose software packages. However,
many problems can be easily converted to such a form thus enabling the
user to take advantage of the availability of this software and avoid the need
to develop a special purpose code.
In this section, we describe the conversion to “standard” form of some
rather common “nonstandard” problems which the author has encountered
in applications in chemical engineering and mechanical engineering. Some
of the material presented is adapted from the excellent survey article [5],
where additional examples may be found.

7.2 Nonseparated Boundary Conditions


The standard form required by several boundary value problem codes is

u0 = f (x, u), x ∈ I,
(7.1)
g(u(0), u(1)) = 0,

where u, f and g have m components and f and/or g may be nonlinear.


The boundary conditions in (7.1) are said to be nonseparated.

61
While COLSYS/COLNEW can handle mixed order systems directly,
it does not accept nonseparated boundary conditions. A simple way to
convert a BVP of the form (7.1) with nonseparated boundary conditions to
one with separated boundary conditions, where each condition involves only
one point, is the following. We introduce the constant function v such that

v(x) = u(1), x ∈ I,

and obtain the equivalent boundary value problem

u0 = f (x, u), v0 = 0, x ∈ I,
g(u(0), v(0)) = 0, u(1) = v(1).

The penalty in this approach is that the transformed problem is twice the
order of the original problem. On the other hand, as we have seen, the ap-
plication of standard techniques to boundary value problems with separated
boundary conditions involves the solution of almost block diagonal linear
systems. The algebraic problem is significantly more involved when these
techniques are applied directly to the boundary value problem (7.1).

7.3 Singular Coefficients


In problems involving cylindrical or spherical geometry, ordinary differential
equations with singular coefficients often occur. A prime example is the
problem of diffusion and reaction in porous catalytic solids which gives rise
to a boundary value problem of the form
s 0
(7.2) u00 + u = R(u), x ∈ I,
x
subject to the boundary conditions

(7.3) u0 (0) = 0, u(1) = 1.

In (7.2), the quantity s, is a geometric factor which depends on the ge-


ometry of the catalyst pellets: s = 0, 1, 2 for rectangular, cylindrical, and
spherical geometries, respectively. When s = 1 or 2, codes such as DVCPR
and DTPTB cannot be used directly because of the presence of the sin-
gular coefficient in the differential equation (7.2). Since u0 (x) = 0, and
u0 (x)
limx→0 = u00 (0), the differential equation at x = 0 is replaced by
x
R(u)
u00 = .
(1 + s)

62
With this modification, the codes DVCPR (a predecessor of BVPFD) and
DTPTB (a predessor of BVPMS) have been used successfully to solve prob-
lems of the form (7.2)–(7.3); see, for example, [15, 18].

7.4 Continuation
Many problems arising in engineering applications depend on one (or more)
parameters. Consider, for example, the boundary value problem

u0 = f (x, u; λ), x ∈ I,
(7.4)
g(u(0), u(1); λ) = 0,

where λ is a parameter. For the desired value of λ, λ1 say, the problem (7.4)
might be exceedingly difficult to solve without a very good initial approx-
imation to the solution u. If the solution of (7.4) is easily determined for
another value of λ, λ0 say, then a chain of continuation steps along the ho-
motopy path (λ, u( · ; λ)) may be initiated from (λ0 , u( · ; λ0 )). At each step
of the chain, standard software can be used to determine an approximation
to u( · ; λ) which is then used, at the next step, as the initial approximation
to u( · ; λ + ∆λ). This procedure can also be used when the solution of (7.4)
is required for a sequence of values of λ, and the problems are solved in
increasing order of difficulty.
As an example, consider the boundary value problem
s 0
u00 + u + λeu = 0, x ∈ I,
(7.5) x
u0 (0) = 0, u(1) = 0,

which describes the steady–state temperature distribution in the radial di-


rection in the interior of a cylinder of unit radius when s = 1, and a sphere
of unit radius when s = 2 with heat generation according to the exponential
law [21]. There is no solution for λ > 1.7 when s = 1 and for λ > 3.0 when
s = 2. The quantity of interest in this problem is the dimensionless cen-
ter temperature u(0). The boundary value problem (7.5) has been solved
for various values of λ in [18] using DVCPR and DTPTB (replacing the
differential equation in (7.5) by
λeu
u00 + = 0,
(1 + s)

when x = 0, as described in section 3) and COLSYS , and comparisons made


with results obtained in [21]. The results for the case s = 1 are presented in

63
λ [21] COLSYS
0.1 0.252(−1) 0.254822(−1)
0.2 0.519(−1) 0.519865(−1)
0.3 0.796(−1) 0.796091(−1)
0.4 0.109 0.108461
0.5 0.138 0.138673
0.6 0.169 0.170397
0.7 0.2035 0.203815
0.8 0.238 0.239148
1.0 0.65 0.316694
1.5 1.1 0.575364
1.7 2.0 0.731577

Table 1: Dimensionless center temperature u(0) for the case s = 1.

Table 1. For each value of λ, COLSYS converged without difficulty while


it was necessary to use continuation with the other codes; DVCPR was run
starting with λ = 1.7 and decreasing λ in steps of 0.1 to λ = 0.1 but did
not converge for λ = 1.7, while DTPTB was run with λ increasing in steps
of 0.1 from 0.1. In most cases, the results produced by DVCPR (except for
λ = 1.7) and DTPTB agree to six significant figures with those obtained by
COLSYS , casting doubt on the validity of the results [21] for λ ≥ 1. Details
of these numerical experiments and a discussion of the case s = 2 are given
in [18].

7.5 Boundary Conditions at Infinity


Consider the following boundary value problem on the semi–infinite interval
[0, ∞):
u0 = f (x, u), x ∈ [0, ∞),
(7.6)
g(u(0), limx→∞ u(x)) = 0.
Usually the components of the solution u approximate their asymptotic
values quite accurately at a value x, L say, of moderate size. The boundary

64
value problem (7.6) can then be replaced by the boundary value problem

u0 = f (x, u), x ∈ [0, L],


g (u(0), u(L)) = 0.

It is usually necessary to determine the value of L experimentally; this can


be facilitated by using the transformation
x
t=
L
to map the interval [0, L] to the unit interval. The quantity L then appears
as a parameter in the differential equation and continuation as described in
Section 7.4 can be used to determine an appropriate value for L.
Consider the following example. In the study conducted by Na and
Pop [20] of free–convective flow past a vertical plate embedded in a satu-
rated porous medium, the governing equations reduce to the boundary value
problem

f 000 + 13 (m + 2)f f 00 − 31 (2m + 1)(f 0 )2 = 0, x ∈ [0, ∞),


(7.7)
f (0) = 0, f 00 (0) = −1, f 0 (∞) = 0,

where m is a parameter. The boundary condition at infinity is replaced by


f 0 (L) = 0, and, with z = x/L, (7.7) becomes

f 000 + 31 L(m + 2)f f 00 − 13 L(2m + 1)(f 0 )2 = 0, z ∈ I,


(7.8)
f (0) = 0, f 00 (0) = −L2 , f 0 (1) = 0,

which can now be solved directly by COLSYS or reduced to a first order


system and solved by DVCPR or DTPTB . The quantity of interest in this
problem is f 0 (0) and the boundary value problem (7.8) is solved for an
increasing sequence of values of L until f 0 (0) remains constant to the desired
number of significant figures; see [18].

7.6 Eigenvalue Problems


As an example of problems in this category, consider the following boundary
value problem which arises in the study of squeezing flow of a viscous fluid
between elliptic plates [6]:

f 000 + k = SF (f, g), η ∈ I,


(7.9)
g 000 + βk = SF (g, f ), η ∈ I,

65
S −0.5 0.0 1.0 25.0
(a) 1.3023 3.0000 6.2603 73.8652
(b) 1.3036 3.0018 6.2778 74.0414

Table 2: Values of k for β = 1 and various values of S.

where
1 1
F (ϕ, ψ) = 2ϕ0 + ηϕ00 + (ϕ0 )2 − ϕ00 (ϕ + ψ),
2 2
subject to the boundary conditions

f (0) = f 00 (0) = g(0) = g 00 (0) = 0,


(7.10)
f (1) + g(1) = 2, f 0 (1) = g 0 (1) = 0.

In (7.9), the parameters β and S are prescribed, and k is an unknown


constant. If we add to the boundary value problem (7.9)–(7.10) the trivial
differential equation
k 0 = 0,
the augmented boundary value problem can be solved using standard soft-
ware. In Table 2, we give the values of k for β = 1 and various values of S
determined by:

(a) solving the augmented boundary value problem using COLSYS and
DVCPR;

(b) [6].

Since COLSYS and DVCPR produce the same results to the number of
figures presented, it seems reasonable to assume that the results given by
(a) are the correct values.

7.7 Integral Constraints


In many boundary value problems, a term of the form
Z 1
G(u(t), t) dt = R,
0

where R is a given constant, occurs as a constraint. In such cases, we define


Z x
w(x) = G(u(t), t) dt,
0

66
and replace the constraint by the differential equation

w0 (x) = G(u(x), x)

and the boundary conditions

w(0) = 0, w(1) = R.

As an example, consider the following boundary value problem which arises


in a study of membrane separation processes:
1 0
(7.11) u00 + u + c1 + c2 (1 − e−ϕ(x,k) ) [ψ(u, ϕ) − 1] + c3 ψ(u, ϕ)u = 0,
x
where
ϕ(x, k) = k/(c4 − xk),
ψ(u, ϕ) = eu /[1 + c5 eϕ (eu − 1)],
subject to

(7.12) u0 (0) = 0, u(1) = 0,


Z 1 Z 1
(7.13) ψ(u, ϕ)x dx = c6 ux dx,
0 0

In the boundary value problem (7.11)–(7.13), the quantities ci , i = 1, . . . , 6,


are prescribed constants and the constant k is to be determined in addition
to the function u. To reformulate this problem, we first write (7.13) as
Z 1
F (u, x, k)x dx = 0,
0

and set Z x
w(x) = F (u, x, k)x dx.
0
Then to the original boundary value problem we add

w0 = xF (u, x, k),
w(0) = 0, w(1) = 0,

and the trivial ordinary differential equation

k 0 = 0.

This augmented boundary value problem was solved successfully by [7] using
COLSYS .

67
7.8 Interface Conditions
In problems involving layered media, for example, one has interface condi-
tions at known interior points. Such a problem is

L u = 0, x ∈ [0, β],
(7.14)
L u − ϕ21 u = 0, x ∈ [β, 1],

where
d2 u 1 du
Lu =2
+ ,
dx x + α dx
and α and ϕ1 are known constants, subject to
du du − du +
(7.15) u(0) = 1, (1) = 0, u(β − ) = u(β + ), (β ) = (β ).
dx dx dx
To obtain a standard boundary value problem, we map the problem (7.14)–
(7.15) to [0, 1], by setting z = x/β to map [0, β] to [0, 1], and then z =
(1 − x)/(1 − β) maps [β, 1] to [1, 0]. With

d2 u β du
L0 u = 2
+ 2 ,
dz β + α dz
and
d2 u β−1 du
L1 u = + ,
dz 2 (β − 1)2 + 1 + α dz
we obtain
(7.16) L0 u1 = 0 L1 u2 − ϕ21 (β − 1)2 u2 = 0
and
du2
u1 (0) = 1, (0) = 0,
dz
(7.17)
1 du1 1 du2
u1 (1) = u2 (1), (1) = (1).
β dz β − 1 dz
An additional complication arises when the location of the interface, x = β,
is unknown. Such a problem is discussed in [2], where in addition to (7.14),
(7.15), we have
(7.18) Lv − ϕ22 u = 0, x ∈ [β, 1],
and the boundary conditions
dv
(7.19) v(1) = 0, (β) = 0, v(1) = 1,
dx

68
where ϕ2 is a given constant. Equation (7.18) is transformed to

(7.20) L1 v1 − ϕ22 (β − 1)2 u2 = 0, z ∈ I,

and (7.19) becomes

dv1
(7.21) v1 (0) = 1, v1 (1) = 0, (1) = 0.
dz
Since β is unknown, to the boundary value problem consisting of (7.16),
(7.17), (7.20) and (7.21), we add the trivial ordinary differential equation


= 0, z ∈ I.
dz

69
8 Matrix Decomposition Algorithms for Poisson’s
Equation
8.1 Introduction
In this section, we consider the use of finite difference, finite element Galerkin
and orthogonal spline collocation methods on uniform partitions for the so-
lution of Poisson’s equation in the unit square subject to Dirichlet boundary
conditions:

−∆u = f (x, y), (x, y) ∈ Ω,


(8.1)
u(x, y) = 0, (x, y) ∈ ∂Ω,

where ∆ denotes the Laplacian and Ω = (0, 1)2 . Each method gives rise to
a system of linear equations of the form

(8.2) (A ⊗ B + B ⊗ A)u = f ,

where A and B are square matrices of order M , say, u and f are vectors of
order M 2 given by

(8.3) u = [u1,1 , . . . , u1,M , . . . , uM,1 , . . . , uM,M ]T ,

(8.4) f = [f1,1 , . . . , f1,M , . . . , fM,1 , . . . , fM,M ]T ,

and ⊗ denotes the tensor product; see Appendix D for the definition of ⊗
and its properties. A matrix decomposition algorithm is a fast direct method
for solving systems of the form (8.2) which reduces the problem to one of
solving a set of independent one-dimensional problems. We first develop a
framework for matrix decomposition algorithms. To this end, suppose the
real nonsingular matrix E is given and assume that a real diagonal matrix
Λ and a real nonsingular matrix Z can be determined so that

(8.5) AZ = BZΛ

and
(8.6) Z T EBZ = I,
where I is the identity matrix of order M . Premultiplying (8.5) by Z T E
and using (8.6), we obtain

(8.7) Z T EAZ = Λ.

70
The system of equations (8.2) can then be written in the form

(8.8) (Z T E ⊗ I)(A ⊗ B + B ⊗ A)(Z ⊗ I)(Z −1 ⊗ I)u = (Z T E ⊗ I)f ,

which becomes, on using the properties of ⊗, (8.6) and (8.7),

(8.9) (Λ ⊗ B + I ⊗ A)(Z −1 ⊗ I)u = (Z T E ⊗ I)f .

From the preceding, we obtain the following algorithm for solving (8.2):

MATRIX DECOMPOSITION ALGORITHM

1. Determine the matrices Λ and Z satisfying (8.5) and (8.6).

2. Compute g = (Z T E ⊗ I)f .

3. Solve (Λ ⊗ B + I ⊗ A)v = g.

4. Compute u = (Z ⊗ I)v.

In the discretization methods considered in this section, the matrices Λ


and Z are known explicitly. If Λ = diag{λi }M
i=1 , Step 2 reduces to a system
of independent problems of the form

(A + λj B)vj = gj , j = 1, . . . , M.

It is shown that, for each method, these systems correspond to discretiza-


tions of two-point boundary value problems of the form

−u00 + λu = f (x), x ∈ (0, 1),


(8.10)
u(0) = u(1) = 0,

where λ is a positive constant. The elements of the matrix Z are sines and/or
cosines and as a consequence multiplication by Z or Z T can be done using
fast Fourier transforms (FFTs). Moreover, the matrix E is sparse, so that
multiplication by E can be done very efficiently. When all of these proper-
ties are exploited, the operation count for any of the matrix decomposition
algorithms discussed in this section is O(N 2 log N ), where (N + 1)h = 1 and
h is the mesh parameter.

71
8.2 Finite Difference Method
As a simple example, consider the basic five point difference approximation
for Poisson’s equation. To describe this method, suppose h = 1/(N + 1),
where N is a positive integer, and set xm = mh, yn = nh. Denote by um,n
an approximation to u(xm , yn ) defined by the usual second order difference
equations
um−1,n − 2um,n + um+1,n um,n−1 − 2um,n + um,n+1
− − = f (xm , yn ), m, n = 1, ..., N,
h2 h2
(8.11)
where u0,n = uN +1,n = um,0 = um,N +1 = 0. If we set M = N and introduce
vectors u and f as in (8.3) and (8.4), respectively, with fm,n = f (xm , yn ),
then the finite difference equations (8.11) may be written in the form

(8.12) (A ⊗ I + I ⊗ A)u = f ,

with A = J, where J is the tridiagonal matrix of order N given by (3.13).


It is well known that (8.5) and (8.6) are satisfied if B = E = I,

(8.13) Λ = diag(λ1 , . . . , λN ),

where
4 jπh
(8.14) λj = sin2 , j = 1, . . . , N,
h2 2
and Z is the symmetric orthogonal matrix given by

(8.15) Z = S,

where 1/2  N
2 mnπ

(8.16) S= sin .
N +1 N +1 m,n=1

From the Matrix Decomposition Algorithm with Λ and Z defined by (8.13)


and (8.15), respectively, we obtain the following matrix decomposition algo-
rithm for solving (8.12).

FINITE DIFFERENCE ALGORITHM

1. Compute g = (S ⊗ I)f .

2. Solve (Λ ⊗ I + I ⊗ A)v = g.

3. Compute u = (S ⊗ I)v.

72
Note that Steps 1 and 3 can be carried out using FFTs at a cost of O(N 2 log N )
operations. Step 2 consists of N tridiagonal linear systems each of which
can be solved in O(N ) operations so that the total cost of the algorithm is
O(N 2 log N ) operations. Each tridiagonal system corresponds to the stan-
dard finite difference approximation to (8.10).

8.3 Finite Element Galerkin Methods with Piecewise Bilin-


ear Elements
To obtain the weak form of the Poisson problem (8.1), let H01 (Ω) denote
the space of all piecewise continuously differentiable functions defined on Ω
which vanish on ∂Ω. Then, for all v ∈ H01 (Ω), u(x, y) satisfies

−∆u(x, y)v(x, y) = f (x, y)v(x, y),

and Z Z
(8.17) − ∆u(x, y)v(x, y)dx dy = f (x, y)v(x, y)dx dy.
Ω Ω
On applying Green’s formula to (8.17) and using the fact that v = 0 on ∂Ω,
we obtain
∂u ∂v ∂u ∂v
Z   Z
(8.18) + dx dy = f (x, y)v(x, y)dx dy.
Ω ∂x ∂x ∂y ∂y Ω

For convenience, we introduce the notation


Z
(g1 , g2 ) = g1 (x, y) · g2 (x, y)dx dy

for functions g1 and g2 with the dot denoting scalar multiplication in the
case of vector functions. Then (8.18) can be written in the form

(8.19) (∇u, ∇v) = (f, v) for all v ∈ H01 (Ω).

This is the weak form of (8.1) on which the finite element Galerkin method
is based.
Now let {xk }N +1
k=0 be a uniform partition of the interval [0, 1], so that
xk = kh, k = 0, . . . , N + 1, where the stepsize h = 1/(N + 1). Let {wn }N
n=1
denote the standard basis for the space of piecewise linear functions defined
on this partition which vanish at 0 and 1; see (3.10). Then the C 0 piecewise
bilinear Galerkin approximation
N X
X N
uh (x, y) = um,n wm (x)wn (y)
m=1 n=1

73
to the solution u of (2.1) is obtained by requiring that

(8.20) (∇uh , ∇v) = (f, v),

for all piecewise bilinear functions v which vanish on ∂Ω. If M = N , and u


and f are as in (8.3) and (8.4), respectively, with fm,n = (f, φm φn ), then we
obtain the linear system (8.2) in which the matrices A and B are given by
(3.14). If E = I,
 N
(8.21) Λ = diag λj /[1 − 61 λj ] ,
j=1

and
n o−1/2 N
(8.22) Z = S diag h[1 − 61 λj ] ,
j=1

where λj and S are given by (8.14) and (8.16), respectively, then (8.5)
and (8.6) are satisfied. Thus, with Λ and Z defined by (8.21) and (8.22),
respectively, we have the following matrix decomposition algorithm:

FINITE ELEMENT GALERKIN ALGORITHM

1. Compute g = (Z T ⊗ I)f .

2. Solve (Λ ⊗ B + I ⊗ A)v = g.

3. Compute u = (Z ⊗ I)v.

As in the finite difference method, Step 2 involves the solution of N tridi-


agonal systems and consequently the computational cost of the algorithm
is O(N 2 log N ) operations. In this case, each tridiagonal system arises from
the Galerkin approximation of (8.10).

8.4 Orthogonal Bicubic Spline Collocation Method


Again, let {xk }N +1
k=0 be a uniform partition of the interval [0, 1], and let
3,0
{φn }M
n=1 , where M = 2N + 2 be a basis for M1 given by

(8.23) φn = vn , n = 1, . . . , N, φN +n+1 = sn , n = 0, . . . , N + 1.

74
This is the standard basis but the ordering of the basis functions is nonstan-
dard. The Hermite bicubic orthogonal collocation approximation
M X
X M
uh (x, y) = um,n φm (x)φn (y)
m=1 n=1

to the solution u of (2.1) is obtained by requiring that

(8.24) −∆uh (ξm , ξn ) = f (ξm , ξn ), m, n = 1, . . . , M.

With u and f as in (8.3) and (8.4), respectively, where fm,n = f (ξm , ξn ),


equations (8.24) can be written in the form (8.2) with

A = (amn )M
m,n=1 , amn = −φ00n (ξm ), B = (bmn )M
m,n=1 , bmn = φn (ξm ).
(8.25)
Let

(8.26) Λ = diag(λ− − + +
1 , . . . , λN , λ0 , λ1 , . . . , λN , λN +1 ),

where
!
8 + ηj ± µj
λ±
j = 12 h−2 , j = 1, . . . , N, λ0 = 36h−2 , λN +1 = 12h−2 ,
7 − ηj

and

  q
ηj = cos , µj = 43 + 40ηj − 2ηj2 .
N +1
±
To describe the matrix Z, let Λ±
α , Λβ be diagonal matrices defined by

± ±

Λ±
α = diag(α1 , . . . , αN ), Λ− − −
β = diag(β1 , . . . , βN ), Λ+
β = diag(1, β +
1 , . . . , βN
+
, 1/ 3),

where

 
αj± = (5 + 4ηj ∓ µj )νj± , βj± = 18 sin ν ±,
N +1 j
and
h i−1/2
νj± = 27(1 + ηj )(8 + ηj ∓ µj )2 + (1 − ηj )(11 + 7ηj ∓ 4µj )2 .

Then

" #
SΛ−α 0 SΛ+
α 0
(8.27) Z=3 3 ,
C̃Λ−
β CΛ+
β

75
where 0 is the N -dimensional zero column vector, S is given by (8.16), and
1/2  N +1 1/2  N +1,N
2 mnπ 2 mnπ
 
C= cos , C̃ = cos .
N +1 N +1 m,n=0 N +1 N +1 m=0,n=1

It can be shown that if A, B, Λ, Z are given by (8.25), (8.26), (8.27), and


E = B T , then equations (8.5) and (8.6) are satisfied. Thus, from the Matrix
Decomposition Algorithm, we obtain:

ORTHOGONAL SPLINE COLLOCATION ALGORITHM

1. Compute g = (Z T B T ⊗ I)f .

2. Solve (Λ ⊗ B + I ⊗ A)v = g.

3. Compute u = (Z ⊗ I)v.

Since there are at most four nonzero elements in each row of the matrix B T
the matrix-vector multiplications involving the matrix B T in step 1 require
a total of O(N 2 ) arithmetic operations. From (8.27), it follows that FFT
routines can be used to perform multiplications by the matrix Z T in step
1 and by the matrix Z in step 3, the corresponding cost of each step being
O(N 2 log N ) operations. Step 2 involves the solution of M independent
almost block diagonal linear systems with coefficient matrices of the form
(4.7) arising from the orthogonal spline collocation approximation of (8.10),
which can be solved in a total of O(N 2 ) operations. Thus the total cost of
this algorithm is also O(N 2 log N ) operations.

76
References
1. S. Agmon, Lectures on Elliptic Boundary–Value Problems, Van Nos-
trand, Princeton, New Jersey, 1965.
2. A. Akyurtlu, J. F. Akyurtlu, C. E. Hamrin Jr. and G. Fairweather, Re-
formulation and the numerical solution of the equations for a catalytic,
porous wall, gas–liquid reactor, Computers Chem. Eng., 10(1986),
361–365.
3. P. Amodio, J. R. Cash, G. Roussos, R. W. Wright, G. Fairweather, I.
Gladwell, G. L. Kraut and M. Paprzycki, Almost block diagonal linear
systems: sequential and parallel solution techniques, and applications,
Numerical Linear Algebra with Applications, to appear.
4. E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz,
A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov and D.
Sorensen, LAPACK Users’ Guide, SIAM Publications, Philadelphia,
1995.
5. U. Ascher and R. D. Russell, Reformulation of boundary value prob-
lems into ‘standard’ form, SIAM Rev., 23(1981), 238–254.
6. A. Aziz and T. Y. Na, Squeezing flow of a viscous fluid between elliptic
plates, J. Comput. Appl. Math., 7(1981), 115–119.
7. D. Bhattacharyya, M. Jevtitch, J. T. Schrodt and G. Fairweather,
Prediction of membrane separation characteristics by pore distribution
measurements and surface force–pore flow model, Chem.Eng. Com-
mun., 42(1986), 111–128.
8. C. de Boor, A Practical Guide to Splines, Applied Math. Sciences 27,
Springer–Verlag, New York, 1978.
9. C. de Boor and R. Weiss, SOLVEBLOK: A package for solving almost
block diagonal linear systems, ACM Trans. Math. Software, 6(1980),
80–87.
10. C. de Boor and R. Weiss, Algorithm 546: SOLVEBLOK, ACM Trans.
Math. Software, 6(1980), 88–91.
11. J.C. Diaz, G. Fairweather and P. Keast FORTRAN packages for solv-
ing certain almost block diagonal linear systems by modified alternate
row and column elimination, ACM Trans. Math. Software, 9(1983),
358–375.

77
12. J. C. Diaz, G. Fairweather and P. Keast, Algorithm 603 COLROW and
ARCECO: FORTRAN packages for solving certain almost block diag-
onal linear systems by modified alternate row and column elimination,
ACM Trans. Math. Software, 9(1983), 376–380.

13. J. Douglas Jr. and T. Dupont, Collocation Methods for Parabolic


Equations in a Single Space Variable, Lecture Notes in Mathematics
385, Springer–Verlag, New York, 1974.

14. J. Douglas Jr., T. Dupont and L. Wahlbin, Optimal L∞ error estimates


for Galerkin approximations to solutions of two–point boundary value
problems Math. Comp., 29(1975), 475–483.

15. K. S. Denison, C. E. Hamrin Jr. and G. Fairweather, Solution of


boundary value problems using software packages: DD04AD and COL-
SYS, Chem. Eng. Commun., 22(1983), 1–9.

16. E. J. Doedel, Finite difference collocation methods for nonlinear two


point boundary value problems, SIAM J. Numer. Anal., 16(1979), 173–
185.

17. G. Fairweather, Finite Element Galerkin Methods for Differential Equa-


tions, Lecture Notes in Pure and Applied Mathematics, Vol. 34, Mar-
cel Dekker, New York, 1978.

18. G. Fairweather and M. Vedha–Nayagam, An assessment of numerical


software for solving two–point boundary value problems arising in heat
transfer, Numerical Heat Transfer, 11(1987), 281-293.

19. H. B. Keller, Numerical Methods for Two–Point Boundary Value Prob-


lems, Dover, New York, 1992.

20. T. Y. Na and I. Pop, Free convection flow past a vertical flat plate
embedded in a saturated porous medium, Int.J. Engrg. Sci., 21(1983),
517–526.

21. T. Y. Na and S. C. Tang, A method for the solution of conduction heat


transfer with non–linear heat generation, Z. Angew. Math. Mech.,
49(1969), 45–52.

22. R. S. Stepleman, Tridiagonal fourth order approximations to general


two–point nonlinear boundary value problems with mixed boundary
conditions, Math. Comp., 30(1976), 92–103.

78
23. J. M. Varah, Alternate row and column elimination for solving certain
linear systems, SIAM J. Numer. Anal., 13(1976), 71–75.

24. M. F. Wheeler, A Galerkin procedure for estimating the flux for two–
point boundary value problems, SIAM J. Numer. Anal., 11(1974), 764–
768.

Appendix A. Galerkin Matrices for Piecewise Linear Functions.


For the matrix, A = ((wi0 , wj0 )):
Z 1
(wi0 , wi0 ) = [wi0 (x)]2 dx
0
Z xi 1 xi+1 Z
1
= 2
dx + dx
xi−1 (xi − xi−1 ) xi (xi+1 − xi )2
1 1
 
= +
(xi − xi−1 ) (xi+1 − xi )
1 1
= + ;
hi hi+1
Z xi+1
−1 1 1
(wi0 , wi+1
0
) = 2
dx = − =− ;
xi (x i+1 − xi ) (xi+1 − xi ) h i+1
1
(wi0 , wi−1
0
) = − .
hi
For the matrix B = ((wi , wj )) :
xi 2 xi+1 2
x − xi−1 x − xi
Z  Z 
(wi , wi ) = dx + dx
xi−1 xi − xi−1 xi xi − xi+1
1
= [hi + hi+1 ];
Z3 xi+1 
xi+1 − x x − xi
 
(wi , wi+1 ) = dx
xi xi+1 − xi xi+1 − xi
1 1 1 xi+1
Z
2 xi+1
= { (xi+1 − x)(x − xi ) |xi + (x − xi )2 dx}
h2i+1 2 2 xi
1
= hi+1 .
6
Similarly,
1
(wi , wi−1 ) = hi .
6

79
Appendix B. Galerkin Matrices for Piecewise Hermite Cubic
Functions. We consider the case in which the partition π of I is uniform
and let h = xi+1 − xi , i = 0, . . . , N . The conditioning of the matrices is
improved if the slope functions are normalized by dividing by h. Thus we
define
s̃i (x) = h−1 si (x), i = 0, . . . , N + 1.
If the Galerkin solution uh is expressed in the form
N +1
X (1) (2)
uh (x) = {αi vi (x) + αi s̃i (x)},
i=0

then
(1)
αi = uh (xi ),
as before, but now
(2)
αi = hu0h (xi ), i = 0, . . . , N + 1.

If A and B are partitioned as in (3.16) then, with the normalization of


the slope functions, we have, if A = ((wi0 , wj0 )) ≡ (Aij ) and B = ((wi , wj )) ≡
(Bij ),
" #
−1
36 3
A00 = (30h) ,
3 4
" #
−1
36 0
Aii = (15h) , i = 1, . . . , N,
0 4
−3
" #
36
AN +1,N +1 = (30h)−1 ,
−3 4
−36
" #
−1
3
Ai,i+1 = ATi+1,i = (30h) , i = 0, . . . , N,
−3 −1

and
" #
h 78 11
B00 = ,
210 11 2
" #
h 78 0
Bii = , i = 1, . . . , N,
105 0 2

80
−11
" #
h 78
BN +1,N +1 = ,
210 −11 2
54 −13
" #
T h
Bi,i+1 = Bi+1,i = , i = 0, . . . , N.
420 13 −3

Appendix C. Elements of Piecewise Hermite Cubic Orthogonal


Spline Collocation Matrices. On [xi , xi+1 ], with xi = ih,
xi+1 − x 3 xi+1 − x 2
   
vi (x) = −2 +3 ,
h h
6 6
vi0 (x) = 3
(xi+1 − x)2 − 2 (xi+1 − x),
h h
00 12 6
vi (x) = − 3 (xi+1 − x) + 2 .
h h

From (4.5), ξ2i+1 = xi + 21 h(1 + ρ1 ), where ρ1 = −1/ 3. Then, if x = ξ2i+1
1
xi+1 − x = h(1 − ρ1 )
2
and
1 3
vi (ξ2i+1 ) = − (1 − ρ1 )3 + (1 − ρ1 )2
4 4
1 2
= (1 − ρ1 ) (2 + ρ1 )
4
3 3
vi0 (ξ2i+1 ) = (1 − ρ1 )2 − (1 − ρ1 )
2h h
3 2
= − (1 − ρ1 )
2h
6 6 6ρ1
vi00 (ξ2i+1 ) = − 2 (1 − ρ1 ) + 2 = 2
h h h
Similar expressions hold for vi (ξ2i+2 ), vi0 (ξ2i+2 ), vi00 (ξ2i+2 ), where ξ2i+2 =
xi + 12 h(1 + ρ2 ), ρ2 = −ρ1 .
On [xi , xi+1 ],
x − xi 3 x − xi 2
   
vi+1 (x) = −2 +3 ,
h h
0 6 6
vi+1 (x) = − 3 (x − xi )2 + 2 (x − xi )
h h
00 12 6
vi+1 (x) = − 3 (x − xi ) + 2 .
h h

81
If x = ξ2i+1 ,
1
x − xi = h(1 + ρ1 ).
2
Therefore,
1 3
vi+1 (ξ2i+1 ) = − (1 + ρ1 )3 + (1 + ρ1 )2
4 4
1 2
= (1 + ρ1 ) (2 − ρ1 ),
4
0 3 3
vi+1 (ξ2i+1 ) = − (1 + ρ1 )2 + (1 + ρ1 )
2h h
3 2
= (1 − ρ1 ),
2h
00 6 6
vi+1 (ξ2i+1 ) = − 2 (1 + ρ1 ) + 2
h hi+1
6ρ1
= − 2,
h
0 (ξ
with similar expressions for vi+1 (ξ2i+2 ), vi+1 00
2i+2 ), vi+1 (ξ2i+2 ).
With s̃i = h−1 si as in Appendix B, on [xi , xi+1 ],
(  )
xi+1 − x 3 xi+1 − x 2
 
s̃i (x) = − − ,
h h
3 2
 
s̃0i (x) = − − 3 (xi+1 − x)2 + 2 (xi+1 − x) ,
h h
6 2
 
s̃00i (x) = − (xi+1 − x) − 2 .
h3 h

With xi+1 − x = 21 h(1 − ρ1 ), we obtain

1 1
 
s̃i (ξ2i+1 ) = − (1 − ρ1 )3 − (1 − ρ1 )2
8 4
1
= − (1 − ρ1 )2 {1 − ρ1 − 2}
8
1
= (1 − ρ1 )2 (1 + ρ1 ),
8( )
3 (1 − ρ1 )2 1
s̃0i (ξ2i+1 ) = − − + (1 − ρ1 )
h 4 h
1
= − (1 − ρ1 ){−3(1 − ρ1 ) + 4}
4h

82
1
= − (1 − ρ1 )(1 + 3ρ1 ),
4h
3 2
 
s̃00i (ξ2i+1 ) = − (1 − ρ 1 ) −
h2 h2
1
= − 2 (1 − 3ρ1 ),
h
with similar expressions for s̃i (ξ2i+2 ), s̃0i (ξ2i+2 ), and s̃00i (ξ2i+2 ).
On [xi , xi+1 ],
x − xi 3 x − xi 2
   
s̃i+1 (x) = − ,
h h
3 2
s̃0i+1 (x) = 3
(x − xi )2 − 2 (x − xi ),
h h
6 2
s̃00i+1 (x) = (x − xi ) − 2 .
h3 h
Then, with x − xi = 12 h(1 + ρ1 ), we have
1 1
 
s̃i+1 (ξ2i+1 ) = (1 + ρ1 )3 − (1 + ρ1 )2
8 4
1
= − (1 + ρ1 )2 (1 − ρ1 ),
8
3 1

0 2
s̃i+1 (ξ2i+1 ) = (1 + ρ1 ) − (1 + ρ1 )
4h h
1
= − (1 + ρ1 )(1 − 3ρ1 ),
4h
1
s̃00i+1 (ξ2i+1 ) = {3(1 + ρ1 ) − 2}
h2
1
= (1 + 3ρ1 ),
h2
with similar expressions for s̃i+1 (ξ2i+2 ), s̃0i+1 (ξ2i+2 ), s̃00i+1 (ξ2i+2 ).

Appendix D. Properties of the Tensor Product. If A = (aij ) is


an M × M matrix and B is an N × N matrix, then the tensor product of A
and B, A ⊗ B, is the M N × M N block matrix whose (i, j) element is aij B.
The tensor product has the following properties:
A ⊗ (B + C) = A ⊗ B + A ⊗ C;
(B + C) ⊗ A = B ⊗ A + C ⊗ A;
(A ⊗ B)(C ⊗ D) = AC ⊗ BD, provided the matrix products are defined.

83

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy