Interval Analysis Book Hensen
Interval Analysis Book Hensen
ELDONHANSEN
Consultant
Los Altos, California
G. WILLIAM WALSTER
Sun Microsystems Laboratories
Mountain View, California, U.S.A.
Although great care has been taken to provide accurate and current information, neither the
author(s) nor the publisher, nor anyone else associated with this publication, shall be liable for
any loss, damage, or liability directly or indirectly caused or alleged to be caused by this book.
The material contained herein is not intended to provide specific advice or recommendations for
any specific situation.
Trademark notice: Product or corporate names may be trademarks or registered trademarks and
are used only for identification and explanation without intent to infringe.
Sun, Sun Microsystems, the Sun Logo, and Forte are trademarks or registered trademarks of Sun
Microsystems, Inc. in the United States and other countries.
ISBN: 0-8247-4059-9
Headquarters
Marcel Dekker, Inc., 270 Madison Avenue, New York, NY 10016, U.S.A.
tel: 212-696-9000; fax: 212-685-4540
The publisher offers discounts on this book when ordered in bulk quantities. For more informa-
tion, write to Special Sales/Professional Marketing at the headquarters address above.
Neither this book nor any part may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, microfilming, and recording, or by any infor-
mation storage and retrieval system, without permission in writing from the publisher.
CR/SH 10 9 8 7 6 5 4 3 2 1
EXECUTIVE EDITORS
EDITORIAL BOARD
Take note, mathematicians. Here you will find a new extension of real
arithmetic to interval arithmetic for containment sets (csets) in which there
are no undefined operand-operator combinations such as previously “inde-
terminate forms” 0/0, ∞ − ∞, etc.
Take note, hardware and software engineers, programmers and
computer users. Here you will find arithmetic with containment sets which
is exception free, so exception event handling is unnecessary.
The main content of the volume consists of interval algorithms for
computing guaranteed enclosures of the sets of points where constrained
global optimization occurs. The use of interval methods provides com-
putational proofs of existence and location of global optima. Computer
software implementations use outwardly rounded interval (cset) arithmetic
to guarantee that even rounding errors are bounded in the computations.
The results are mathematically rigorous.
Computer-aided proofs of theorems and long-standing conjectures in
analysis have been carried out using outwardly rounded interval arithmetic,
including, for example, the Kepler conjecture — finally proved after 300
years. See “Perspectives on Enclosure Methods”, U. Kulisch, R. Lohner
and A. Fascius (eds.), Springer, 2001.
The earlier edition [Global Optimization Using Interval Analysis, El-
don Hansen, Marcel Dekker, Inc, 1992] has been expanded also by more
recently developed methods and algorithms for global optimization prob-
lems with either (or both) inequality and equality constraints. In particular,
constraint satisfaction and propagation techniques, using interval intersec-
tions for instance, discussed in the new chapter on “consistencies”, are
integrated with Newton-like interval methods, in a step towards bridging
Ramon Moore
Foreword
Preface
1 INTRODUCTION
1.1 AN OVERVIEW
1.2 THE ORIGIN OF INTERVAL ANALYSIS
1.3 THE SCOPE OF THIS BOOK
1.4 VIRTUESAND DRAWBACKS OF INTERVAL MATH-
EMATICS
1.4.1 Rump’s Example
1.4.2 Real Examples
1.4.3 Ease of Use
1.4.4 Performance Benchmarks
1.4.5 Interval Virtues
1.5 THE FUTURE OF INTERVALS
5 LINEAR EQUATIONS
5.1 DEFINITIONS
5.2 INTRODUCTION
5.3 THE SOLUTION SET
5.4 GAUSSIAN ELIMINATION
5.5 FAILURE OF GAUSSIAN ELIMINATION
5.6 PRECONDITIONING
5.7 THE GAUSS-SEIDEL METHOD
5.8 THE HULL METHOD
5.8.1 Theoretical Algorithm
5.8.2 Practical Procedure
5.9 COMBINING GAUSS-SEIDEL AND HULL METHODS
5.10 THE HULL OF THE SOLUTION SET OF AI x = bI
5.11 A SPECIAL PRECONDITIONING MATRIX
5.12 OVERDETERMINED SYSTEMS
6 INEQUALITIES
6.1 INTRODUCTION
6.2 A SINGLE INEQUALITY
6.3 SYSTEMS OF INEQUALITIES
6.4 ORDERING INEQUALITIES
6.5 SECONDARY PIVOTS
6.6 COLUMN INTERCHANGES .
6.7 THE PRECONDITIONING MATRIX
6.8 SOLVING INEQUALITIES
10 CONSISTENCIES
10.1 INTRODUCTION
10.2 BOX CONSISTENCY
10.3 HULL CONSISTENCY
10.4 ANALYSIS OF HULL CONSISTENCY
10.5 IMPLEMENTING HULL CONSISTENCY
10.6 CONVERGENCE
10.7 CONVERGENCE IN THE INTERVAL CASE
13 CONSTRAINED OPTIMIZATION
13.1 INTRODUCTION
13.2 THE JOHN CONDITIONS
13.3 NORMALIZING LAGRANGE MULTIPLIERS
13.4 USE OF CONSTRAINTS
13.5 SOLVING THE JOHN CONDITIONS
13.6 BOUNDING THE LAGRANGE MULTIPLIERS
13.7 FIRST NUMERICAL EXAMPLE
18 MISCELLANY
18.1 NONDIFFERENTIABLE FUNCTIONS
18.2 INTEGER AND MIXED INTEGER PROBLEMS
References
The primary purpose of this book is to describe and discuss methods using
interval analysis for solving nonlinear equations and the global optimization
problem. The overall approach is the same as in the first edition. However,
various new procedures are included. Many of them have previously not
been published. The methods discussed find the global optimum and pro-
vide bounds on its value and location(s). All solution bounds are guaranteed
to be correct despite errors from uncertain input data, approximations, and
machine rounding.
The global optimization methods considered here are those developed
by the authors and their collaborators. Other methods using interval analysis
can be found in the literature. Most of the published methods use only
subsets of the procedures described herein.
In the first edition of this book, the interval Newton methods for solving
systems of nonlinear equations were the most important part of our global
optimization algorithms. In the second edition, this place is shared with
consistency methods that are used to speed up the initial convergence of
algorithms. As in the first edition, these central methods are discussed in
detail.
We show that interval Newton and consistency methods can prove the
existence and uniqueness of a solution of a system of nonlinear equations in
a given region. This has important practical implications for the discussed
global optimization algorithms. Proof of existence and/or uniqueness by
an interval Newton or consistency method follows as a by-product of ei-
ther algorithm and requires no extra computing. As before, these proofs
hold true in the presence of errors from rounding, approximation and data
• Ramon Moore, for starting the field of interval analysis; for his many
and continuing contributions to the field; for his tireless encourage-
ment and support; and for his personal friendship.
• Jeff Tupper, for creating GrafEq™ and for his generous help with
final preparation of manuscript Figures.
INTRODUCTION
1.1 AN OVERVIEW
There are several types of mathematical computing errors. Data often con-
tain measurement errors, or are otherwise uncertain because rounding errors
generally occur, and approximations are made, etc. The purpose of interval
analysis is to provide upper and lower bounds on the effect all such errors
and uncertainties have on a computed quantity.
It is desirable to make interval bounds as narrow as possible. A major
focus of interval analysis is to develop practical interval algorithms that
produce sharp1 (or nearly sharp) bounds on the solution of numerical com-
puting problems. However, in practical problems with interval inputs, it is
often sufficient to simply compute reasonably narrow interval bounds.
Several people independently had the idea of bounding rounding errors
by computing with intervals; e.g., see Dwyer (1951), Sunaga (1958), War-
mus (1956), (1960) and Wilkinson (1980). However, interval mathematics
and analysis can be said to have begun with the appearance of R. E. Moore’s
book Interval Analysis in 1966. Moore’s work transformed this simple idea
into a viable tool for error analysis. In addition to treating rounding errors,
Moore extended the use of interval analysis to bound the effect of errors
from all sources, including approximation errors and errors in data.
All three results agree in the first seven decimal digits and thirteen digits
agree in the last two results. Nevertheless, they are all completely incorrect.
Even their sign is wrong.
Loh and Walster (2001) show that both Rump’s original and the expres-
sion for f (x, y) in (1.4.1) reduce to:
x0
f (x0 , y0 ) = − 2, (1.4.2)
2y0
from which
f (x0 , y0 ) = −0.827396059946821368141165095479816...
(1.4.3)
www.math.psu.edu/dna/disasters/
and by Daumas (2002). For example, the failure of the Patriot Missile bat-
tery at Daharan was directly attributable to accumulation of roundoff errors;
and the explosion of the Ariane 5 was caused by overflow. The Endeavour
US Space Shuttle maiden flight suffered a software failure in its Intelsat
satellite rendezvous maneuver and the Columbia US Space Shuttle maiden
flight had to be postponed because of a clock synchronization algorithm
failure.
Use of standard interval analysis could presumably have detected the
roundoff difficulty in the first example. The extended interval arithmetic
discussed in Chapter 4 and used in this book would have produced a correct
interval result in the second example, even in the presence of overflow.
See Walster (2003b) for an extended interval arithmetic implementation
standard in which underflow and overflow are respectively distinguished
from zero and infinity. The third failure was traced to an input-dependent
software error that was not detected in spite of extensive testing. Intervals
can be used to perform exhaustive testing that is otherwise impractical.
Finally, the fourth failure occurred after the algorithm in question had been
subjected to a three year review process and formally proved to be correct.
Unfortunately, the proof was flawed. Although it is impossible to know,
we believe that all of these and similar errors would have been detected if
interval rather than floating-point algorithms had been used.
Despite the value of interval analysis for bounding rounding errors in prob-
lems such as these, interval mathematics is less used in practice than one
might expect. There are several reasons for this. Undoubtedly, the main
reasons are the (avoidable) lack of convenience, the (avoidable) slowness
of many interval arithmetic packages, the (occasional) slowness of some
interval algorithms, and the (unavoidable) difficulty of some interval prob-
lems.
For programming convenience, an interval data type is needed to repre-
sent interval variables and interval constants as single entities rather than as
two real interval endpoints. This was made possible early in the history of
interval computations by the use of precompilers. See, for example, Yohe
(1979). However, the programs they produced were quite slow because
each arithmetic step was invoked with a subroutine call. Moreover, sub-
routines to evaluate transcendental functions were inefficient or lacking and
interval programs were available on only a few computers.
Eventually, some languages (e.g., Pascal-SC, Ada, and C++) made pro-
gramming with intervals convenient and reasonably fast by supporting user
defined types and operator overloading.
Microprogramming can be fruitful in improving the speed of interval
arithmetic. See Moore (1980). However, this has rarely been considered.
Convenient programming of interval computations was made available
as part of ACRITH. See Kulisch and Miranker (1983) and IBM (1986a,
1986b). However, the system was designed for accuracy with exact (de-
generate interval) inputs rather than speed with interval inputs that are not
exact. Because binary-coded decimal arithmetic was used, it was quite slow.
The M77 compiler was developed at the University of Minnesota. See
Walster, et al (1980). It was available only on certain computers manu-
factured by Control Data Corp. With this compiler, interval arithmetic was
roughly five times slower than ordinary arithmetic. All the numerical re-
sults contained in the first edition of this book were computed using the
M77 compiler.
More recently compilers have been developed by Sun Microsystems
Inc. that represent the current state of the art. See Walster (2000c) and
With the inherent ability of intervals to represent errors from all sources
and to rigorously propagate their interactions, the validity of answers from
the most extensive computations can now be guaranteed. With the natural
parallel character of nonlinear interval algorithms, it will be possible to
efficiently use even the largest parallel computing architectures to safely
solve large practical problems.
Computers are attaining the speed required to replace physical experi-
ments with computer simulations. Gustafson (1998) has written that using
computers in this way might turn out to be as scientifically important as
the introduction of the experimental method in the Renaissance. One diffi-
culty is how to validate computed results from huge simulations. A second
difficulty is how to then synthesize simulation results into optimal designs.
With interval algorithms, simulation validity can be verified. Moreover,
interval global optimization can use the mathematical models derived from
validated simulations to solve for optimal designs.
X • Y = {x • y | x ∈ X, y ∈ Y } (2.3.1)
Thus the interval X • Y resulting from the operation contains every possible
number that can be formed as x • y for each x ∈ X, and each y ∈ Y.
This definition produces the following rules for generating the endpoints
of X • Y from the two intervals X = [a, b] and Y = [c, d].
X + Y = [a + c, b + d] (2.3.2)
X − Y = [a − d, b − c] (2.3.3)
[ac, bd] if a ≥ 0 and c ≥ 0
[bc, bd] if a ≥ 0 and c < 0 < d
[bc, ad] if a ≥ 0 and d ≤ 0
[ad, bd] if a < 0 < b and c ≥ 0
[bc, ac] if a < 0 < b and d ≤ 0
X×Y = (2.3.4)
[ad, bc] if b ≤ 0 and c ≥ 0
[ad, ac] if b ≤ 0 and c < 0 < d
[bd, ac] if b ≤ 0 and d ≤ 0
[min(bc, ad),
max(ac, bd)] if a < 0 < b and c < 0 < d
(2.3.7)
2.4 DEPENDENCE
X Y = [a − c, b − d] (2.4.2)
See Sections 6.2 and 10.5 for example uses of dependent subtraction.
In addition to the dependent subtraction operation, each interval basic
arithmetic operation (BAO) has a corresponding dependent form. For ex-
ample, dependent division, denoted , is used to recover either A or B
from X = A × B. The key requirement to use a dependent operation is:
The dependent operation must be the inverse of an operation already per-
formed on the same variable or subfunction being “removed”. Dependent
operations cannot be performed on interval constants, as they cannot be
dependent. In this respect the distinction between constants and variables
is much more important for intervals than for points.
FUNCTIONS OF
INTERVALS
a+b
m(X) = .
2
The width of X is
w(X) = b − a.
The magnitude is also called the absolute value by some authors. We use
the notation |X| to denote mag (X) in the development and analysis of our
The interval version of the absolute value function abs(X) can be de-
fined in terms of the magnitude and mignitude:
We also use the notation |X| to denote abs (X) in two contexts: discussing
slope expansions of nonsmooth functions in Section 7.11; and applications
involving nondifferentiable functions in Chapter 18.
Various other real-valued functions of intervals have been defined and
used. For a discussion of many such functions, see Ris (1975).
f (x1 , · · · , xn ) ∈ F (x1 , · · · , xn )
These two forms of f (x) are equivalent for an arbitrary value of c. Let
X = [0, 1] and c = m(X) = 0.5. Evaluating f (X) in the form in (3.2.1),
we compute f ([0, 1]) = [0, 0.25]. Now replace X = [0, 1] by X =
[0, 0.9]. Also replace c = 0.5 by c = m(X ) = 0.45. We compute
f (X ) = [0, 0.2925]. Thus, f (X ) is not contained in f (X) even though
X ⊂ X. Inclusion isotonicity failed because we changed the functional
form of f when we replaced c by c .
In this example, we could say that the functional form is the same for
each evaluation since c = m(X) and c = m(X ). However, the midpoint
m(X) of an interval cannot be evaluated using only the interval arithmetic
operations of addition, subtraction, multiplication, and division. A separate
computation involving the endpoints of X is required for m(X).
The following Theorem shows that, for rational functions, inclusion
isotonicity is easily assured.
f (X1 , · · · , Xn ) ⊆ F (X1 , · · · , Xn ),
where the inf and sup are taken for all xi ∈ Xi (i = 1, · · · , n). The fol-
lowing theorem due to Moore (1966) shows how easy it is to bound the
range of a function. It is undoubtedly the most important theorem in inter-
val analysis. Rall (1969) aptly calls it the fundamental theorem of interval
analysis. One of its far reaching consequences is that it makes possible the
solution to the global optimization problem.
Also denote
d = max w(Xi ).
1≤i≤n
That is, the form fc (called the centered form by Moore) has an excess
width that is of second order in the “width” d of the box (X1 , · · · , Xn ).
This conjecture was proved to be true by Hansen (1969c).
Various centered forms have been derived. By using expansions of
appropriate orders, it is possible to derive centered forms fc for which
m
X= Xi ,
i=1
we have
m
f (X) ⊆ f I (Xi )
i=1
⊆ f I (X).
m
f (X) ⊆ F (Xi )
i=1
⊆ F (X).
Note that the lower endpoint of X + X is a function of X only and the upper
endpoint of X + X is a function of X only. However, each endpoint of
X − X is a function of both X and X.
As we shall see, these facts tell us that not only is X+X sharp and X−X
is not sharp, but also that the function f (x) = x + x is a monotonically
increasing function of x. (Remember, however, that X X = [0, 0] is
sharp.)
Suppose we compute an interval extension F (X1 , · · · , Xn ) of a ra-
tional function f (x1 , · · · , xn ). Following the rules of interval arithmetic,
the lower endpoint F and the upper endpoint F of F (X1 , · · · , Xn ) are
computed as functions of the endpoints of Xi (i = 1, · · · , n). However,
we do not denote this fact in our notation because the form of depen-
dence can change when the values of the Xi change. For example, let
f (x) = x (x − 3) . If X = [1, 2] then F = X X − 3 . However, if
X = [2, 5] then F = X X − 3 .
An endpoint of the range of a function is a value of the function at a
point. Therefore, if F or F is computed using more than one endpoint of a
given variable, then it cannot be sharp. A kind of converse is expressed in
the following Theorem from Hansen (1997b).
f (a1 , · · · , am , Xm+1 , · · · , Xn )
∂
f (X1 , · · · , Xn ) ≥ 0,
∂xi
with x replaced by the interval X = [1, 2]. To do so, let us define and
evaluate the following functions:
f1 (x) = sin(x),
f1 (x)
f2 (x) = ,
x
f3 (x) = arctan[f2 (x)].
Then
f (x) = x 2 + f3 (x).
http://physics.nist.gov/cuu/Constants
CLOSED INTERVAL
SYSTEMS
4.1 INTRODUCTION
4.2.1 Generality
If algorithms are more general, they are usually simpler and require fewer
special-case branches. The same is true for interval systems: Generality is
good.
Algorithms that use the closed interval system accept more inputs and
are therefore more general because:
The first option can be easily implemented if the given function is avail-
able for analysis. The second alternative is described in Walster (2003a) and
can be used when discontinuities arise at branch cuts. The third alternative
can be automated and does not require knowing the given function. How-
ever, an enclosure of the function’s derivative is required. These methods
can be used to eliminate the requirement for defensive code to guarantee
algorithm assumptions are satisfied when using an exception-free closed
interval system. Also see Section 4.8.5.
2 Throughout this book, the term “expression” refers to any sequence of arithmetic op-
erations, and/or compositions of single-valued functions and multivalued relations. In this
chapter, the term “function” is reserved for single-valued mappings of points onto points.
The concept of an interval function is discussed in Chapter 3.
All other things being equal, “more is better” when it comes to speed.
However, there can be “too much of a good thing” with narrow width. The
quest for narrow width or speed must never come at the cost of the failure
to contain the set of required results. This set is called the containment
set of a given expression. Failure to enclose an expression’s containment
set in a computed interval is a containment failure. Interval systems must
not produce a containment failure by violating the containment constraint.
Interval algorithms can be slow and produce wide intervals, but they must
always satisfy the containment constraint.
For all finite intervals, X and Y, interval arithmetic operations must satisfy:
X • Y ⊇ {x • y | x ∈ X, y ∈ Y } , (4.4.1)
1
4.5.2 A Simple Example: 0
A simple example illustrates how the cset and containment constraint con-
cepts permit otherwise undefined operations to be given consistent csets.
Consider the problem of defining the cset of 01 . In the real analysis of points,
division by zero is undefined. A reason is that the resulting value turns out
to be a set, not a point.
Temporarily, ignore the fact that the cset is not defined until Section
4.5.5. Consider instead:
x
f (x) = . (4.5.2)
x+1
1
g (x) = . (4.5.3)
1 + x1
1. Because their domains are different, the functions f and g are differ-
ent. Using R to denote the set of real numbers {z | −∞ < z < +∞} ,
and M\N to denote the complement of the set N in M (or all the
points of M not in N ):
Df = {−∞ < x < −1} ∪ {−1 < x < +∞}
= R\ {−1} ,
and
Dg = {−∞ < x < −1} ∪ {−1 < x < 0} ∪ {0 < x < +∞}
= R\ {−1, 0} .
cset (f, S) .
The zero subscript on x is used to connote the fact that an expression’s cset
depends on the specific value(s) of the expression’s argument(s).
For convenience and without loss of generality, the following develop-
ment uses scalar points and sets, rather than n-dimensional vectors. When
the set S is the singleton set {x0 } and x0 ∈ Df , then
The notation {f (x0 )} denotes the value of the function f, but viewed as a
singleton set. Whether x0 is inside or outside the domain of f, it is notation-
ally convenient to permit “f (x0 )” to be understood to mean cset (f, {x0 }) .
Otherwise, a plethora of braces “{· · · }” is needed to distinguish points from
singleton sets. Therefore, for example, when x0 = 0 in (4.5.3), it is under-
stood that
The reason for using the union to define f (S) in (4.5.5) is that f (z0 ) is
now a set. Therefore, {f (z0 ) | z0 ∈ S} is properly interpreted as a set of
sets, rather than cset (f, S) . The expression in (4.5.5) for f (S) is exactly
analogous to the definition of f xI in (4.5.1) when xI is a nondegenerate
interval vector. Note the difference in notation between the set S of scalars
and the interval vector xI . See Section 2.2.
Finally, when it is important to distinguish between a variable x and
a value x0 of it, the zero subscript notation is used. Otherwise, let it be
understood that the point arguments of expressions in csets are specific
given values and not simply the names of variables.
1
4.5.4 The Containment Set of 0
With the above notation conventions, the value of the containment set (cset)
of 01 is now addressed. Continue to use the expression definitions: f (x) =
x
x+1
and g (x) = 1+1 1 in (4.5.2) and (4.5.3). The question to be answered is:
x
What is the smallest set of values (if any) that can be assigned to h (x) = x1
when x0 = 0 so that g (x0 ) = 0. In fact the only way for g (x0 ) to equal zero
is if h (x0 ) = −∞ or +∞. Therefore {−∞, +∞} is the set of all possible
values that the cset of h (x0 ) must include when x0 = 0. Moreover, when
x0 = 0, if the cset of h (x0 ) includes any value other than −∞ or +∞, then
g (x0 ) = 0. Therefore, a mathematically consistent way for g (0) to equal
zero is if
1
= {−∞, +∞} . (4.5.6)
0
1
h (x) = , (4.5.7)
x
then because h (0) is understood to mean cset (h, {0}) , when h (x) is oth-
erwise undefined,
1 1
g (0) = ∪
1 + {−∞} 1 + {+∞}
= 0.
Having informally established (4.5.6), both f (−1) and g (−1) are seen to
be {−∞, +∞} as well. That g (−1) = {−∞, +∞} can also be seen by
writing g in terms of h as defined in (4.5.7):
g (x) = h (1 + h (x)) .
Similar arguments to that given above can be developed to find the cset
of any indeterminate form in any closed cset-based system.
For a rigorous development of csets, see Walster, Pryce, and Hansen (2002).
With their development and/or analyses similar to that of 01 above, csets for
the basic arithmetic operations (BAOs) displayed in Tables 4.1 through 4.4
have been derived and proved to be consistent in R∗ ; where R denotes the
real numbers {z | −∞ < z < +∞} , and the set of extended real numbers
R∗ is the set of real numbers to which the elements {−∞} and {+∞} are
R∗ = R ∪ {−∞, +∞} .
In each of tables 4.1 through 4.4, the upper left corner cell contains the
given operation in terms of specific values x0 and y0 of the variables x and
y. The first column and first row in each table contains the values of x0 and
y0 for which different cset values are produced.
The value of 01 discussed above is found in Table 4.4.
x0 − y 0 −∞ y0 ∈ R +∞
∗
−∞ R −∞ −∞
x0 ∈ R +∞ x0 − y 0 −∞
+∞ +∞ +∞ R∗
x0 × y 0 −∞ y0 ∈ − (R∗ )2 0 y0 ∈ (R∗ )2 +∞
∗
−∞ +∞ +∞ R −∞ −∞
∗ 2
x0 ∈ − (R ) +∞ x0 × y 0 0 x0 × y 0 −∞
0 R∗ 0 0 0 R∗
x0 ∈ (R∗ )2 −∞ x0 × y 0 0 x0 × y 0 +∞
+∞ −∞ −∞ R∗ +∞ +∞
x0 ÷ y 0 −∞ y0 ∈ − (R∗ )2 0 y0 ∈ (R∗ )2 +∞
−∞ [0, +∞] +∞ {−∞, +∞} −∞ [−∞, 0]
x0 ∈ R− {0} 0 x0 ÷ y 0 {−∞, +∞} x0 ÷ y 0 0
0 0 0 R∗ 0 0
+∞ [−∞, 0] −∞ {−∞, +∞} +∞ [0, +∞]
x
f (x) =
x+1
and
1
g (x) = ,
1 + x1
the usual notation can be used for the set that an enclosure F of the expres-
sion f must contain:
F xI ⊇ hull f xI .
All the definitions of finite interval arithmetic in Section 2.3 carry over to
closed systems because the cset of a defined function is simply the defined
function’s value, but viewed as a singleton set. The cases that require
additional analysis are those that implicitly or explicitly use undefined point
operations, such as (±∞) − (±∞) , 01 , 0 × (±∞) , and ±∞ ±∞
.
It is a tempting mistake to conclude that each of the following three
examples is non-negative, because there is no obvious way to produce a
negative result:
[1, 2]
= [1, +∞] (4.7.1b)
[0, 1]
and
[0, 1]
= [0, +∞] . (4.7.1c)
[0, 1]
Combining the rules of interval arithmetic in Section 2.3 with csets from
Tables 4.1 through 4.4 produces the correct results, which are:
[1, 2]
= {−∞} ∪ [1, +∞] (4.7.2b)
[0, 1]
and
[0, 1]
= [−∞, +∞] . (4.7.2c)
[0, 1]
[1, 1]
. (4.7.3)
[−1, 1]
However, by splitting the denominator interval into the union of two interior
intervals, both of which have zero as an endpoint, the entries in Table 4.4
and the formulas in Section 2.3 interact to produce the sharp result. From
(4.5.5) and the entries in Table 4.4, Section 4.6 page 81
From BAO csets with extended interval operands, different closed in-
terval arithmetic systems are possible to implement. They differ in how
narrowly BAO csets are enclosed. For example, if exterior intervals are
not explicitly supported, then the result of computing (4.7.3) is the hull of
(4.7.4c), which is R∗ . Walster (2000) proposed a “Simple” closed interval
system that has been fully implemented in Sun Microsystem’s Fortran 95
and C++ compilers. See Walster (2002).
Interval algorithms, including those in the remainder of this book, can
be realized in any closed interval system using the extended BAOs displayed
in Tables 4.1 through 4.4, or their equivalent in other different cset-based
systems. To accomplish this, the fundamental theorem of interval analy-
sis (Theorem 3.2.2) is generalized in Theorem 4.8.13 to include csets and
rational expressions. Finally, using the closed interval system, this more
general fundamental theorem is further extended in Theorem 4.8.14 to in-
clude irrational composite expressions.
So far in the above development, closed interval systems are limited to the
BAOs. As described in Chapter 3, repeated application of the fundamental
theorem to compositions of finite BAOs proves that over the argument
intervals, computed rational functions using interval arithmetic produce
bounds on the value of the underlying rational function. Closing interval
systems requires the following extensions to the fundamental theorem:
At the same time these extensions are made, the fundamental theorem is
further extended to any expressions, whether a single-valued continuous
function with finite domain that is a subset of the real numbers, or a multi-
valued relation that is always defined. Examples of single-valued continu-
ous functions include the log or square root functions. A simple example
of a multi-valued relation is a single-valued function to which is added a
nondegenerate interval constant.
The closure of f is always defined, but might be the empty set. Namely,
for yj to exist, f xj must be non-empty, which, by definition of f ’s
domain, means that xj ∈ Df . Therefore, if x0 ∈
/ D f , or if Df = ∅, there
are no sequences xj , hence no yj , and f (x0 ) is empty. The domain of
f is the set of argument values x0 for which f (x0 ) = ∅, which is D f , the
closure of f ’s domain. Using the compactness of R∗ , Walster, Pryce, and
Hansen (2002) proved that f (x0 ) is never empty if Df = ∅ and x0 ∈ D f .
6 In this chapter the notation S denotes the closure of the set S. Readers should not confuse
this commonly used mathematical notation with the interval notation X = X, X . The
former is only used in this chapter. The latter is used in the remaining chapters of this book
to denote the infimum
and supremum of the interval X.
7 Note that the f x are single valued, so for x ∈ D , this sequence is well defined.
j j f
Example 4.8.3
The expression 0 ÷ 0 is the set of all lim (xn ÷ yn ) where yn <
0, or yn > 0, and both xn → 0 and yn → 0. Any finite limit
a can be achieved (e.g. xn = an , yn = n1 ) as well as ±∞ (e.g.
xn = ± n1 , yn = ± n12 ), so
0 ÷ 0 = [−∞, +∞] . (4.8.4)
Example 4.8.4
Let a > 0 be finite. Then a÷0 is the set of all lim ±xa n , where
xn > 0 and xn → 0. If xn = n1 , then lim ±xa n = lim (±an) ,
so for any finite a > 0
a ÷ 0 = {−∞, +∞}. (4.8.5)
1
This result implies, for instance, that [0,1] = {−∞}∪[1, +∞].
1
f (x) = (4.8.6)
x−x
is undefined for all values of x. Another example is:
x
f (x) = , (4.8.7)
[−1, 1]
the cset of which is [−∞, − |x|]∪[|x| , +∞] . In these cases there is no sin-
gle function, the closure of which is the cset of such an expression. For all
expressions to have csets, whether the expressions are single-valued func-
tions or multi-valued relations, additional analysis is required. In Walster,
Pryce, and Hansen (2002), the concept of constant variable expressions
1
(CVEs) is introduced. Briefly, x−x is a CVE that can be replaced by the
expression y10 given that y0 = 0. Thus, the cset of x−x
1
is unconditionally
equal to {−∞, +∞} . Note that CVEs are independent. This means that
each occurrence of the same CVE must be replaced by a different variable,
or by a zero-subscripted symbolic constant.
x
Defining the cset of [−1,1] can be accomplished using composite ex-
pressions and the union of all possible function closures in (4.8.8). This
same device can be used for CVEs and is described now.
Let a composite expression be given, such as
f (x | c) = g (h (x | c) , x | c) ,
in which the elements of the vector c are fixed constants. Further, as-
sume that Dg(y,x|c) = ∅, but assume there are no values of x for which
the expression h (x | c) is a single-valued function. For example, with
f (x) = [−1,1]
x
, let g (y, x) = xy , and h (x | [−1, 1]) = [−1, 1] . Therefore
the natural domain of h given c is the empty set. Additionally, Df (x|c) = ∅
because Dh(x|c) = ∅. Let H (x0 , c) denote a set of values that depends
on the value x0 of x and the constant vector c. H (x0 ,c) might be simply
cset (f, (x0 | c)) = g (h0 , x | c) (4.8.8)
h0 ∈H (x0 ,c)
where H (x0 , c) = cset (h, (x0 | c)) . Combining this case with the usual
case in which Df (x|c) = ∅ yields the four cases in (4.8.10) to be distin-
guished if
f (x | c) = g (h (x | c) , x | c) : (4.8.9)
x0 ∈ Df Case 1
if Df = ∅ and
x0 ∈
/ Df Case 2
x0 ∈ D h and D g ∩ cset (h, (x0 | c)) = ∅ Case 3
if Df = ∅ and
x0 ∈
/ D h or D g ∩ cset (h, (x0 | c)) = ∅ Case 4
(4.8.10)
f (x0 | c) , if Case 1
f (x0 | c) , if Case 2
cset (f, (x0 | c)) =
cset (g, (y0 , x0 | c)) , if Case 3
y0 ∈cset(h,(x0 |c))
∅, if Case 4
(4.8.11)
x0
= (4.8.12b)
[−1, 1]
= x0 × ([−∞, −1] ∪ [+1, +∞]) (4.8.12c)
= [−∞, − |x0 |] ∪ [|x0 | , +∞] . (4.8.12d)
To avoid a continuing plethora of zero subscripts, let it be understood
that given argument values of expression csets are always given points, even
though they are written without zero subscripts. Only where it is particu-
larly important to emphasize the distinction will cset (f, x) be written as
cset (f, x0 ) .
The following example uses simple special cases of (4.8.11) to illus-
trate the important distinction between constants and variables in defining
expression’s cset. The examples all use only scalar (not vector) functions
h, so f (x, y, z) = g (h (x, y) , z).
Example 4.8.5
can be simplified to
x2 (4.8.13b)
1
x2 × 2
,2 (4.8.14b)
2
(a) If g1 (x, y) = f (x, y, x) = xy , not x×x
y
because multiple oc-
currences of x are dependent, then
f (x | c) = g (h (x | c) , x | c) , (4.8.19)
It does not matter whether the vector of constants c is a singleton set, a more
general set, or an interval vector. Note that as a consequence of (4.8.26a),
cset (g, (y, x | c)) (4.8.27)
y∈cset(h,(x|c))
From the definition of the cset of f at a singleton set in (4.8.21) and over
a non-singleton set in (4.8.26),
cset f, X = cset (f, X) ∪ cset f, X \X (4.8.31a)
⊇ cset (f, X) , (4.8.31b)
Cases 1 through 3 are covered by simply replacing any of the given expres-
sion’s possible interval extensions (Definition 4.8.9) in Theorem 4.8.11 by
any of the given expression’s cset enclosures.
Theorem 4.8.13 Let the function f have an inclusion isotonic cset enclo-
sure, F (x0 ) , of the expression f (x0 ). Then F xI 0 is an enclosure of f ’s
n
cset for all xI 0 ∈ IR∗ . That is,
F xI 0 ⊇ hull cset f, xI 0 . (4.8.32)
4.8.5 Continuity
When using the closed interval system and continuity is required (as for
example is the interval version of Newton’s method), then there are three
options:
The following theorem provides a way to automate the test for continu-
ity over a given interval. This, combined with explicit domain constraints
can be used to create fail-safe (at least with respect to assumptions of exis-
tence and continuity) algorithms for finding roots and fixed points using the
interval version of Newton’s method and the Brouwer fixed-point theorem.
See Chapters 6, 11, and 16 for the algorithms needed to apply domain and
continuity constraints.
−1 if x < 0
0 if x < 0
sign (x) = 0 if x = 0 hv (x) =
+1 if x ≥ 0
+1 if x > 0
(4.8.36)
(4.8.35)
Note that even the above notation is not completely consistent. For
example, in some cases point functions of interval arguments are required.
Two such examples are the width w (X) and the midpoint m (X) of an
interval. Explicitly identified notation overloading can improve exposition
4.10 CONCLUSION
With this chapter, the algorithms the remaining chapters can be imple-
mented using interval cset enclosures of any expression, whether a function
or a relation. With compiler support for interval data types and cset enclo-
sures of interval expressions, any interval algorithms can be implemented
without regard to the form of the expressions contained therein. Conse-
quently, and without loss of containment, any enclosure of a cset-equivalent
expression can be chosen to produce narrow bounds on expression values.
LINEAR EQUATIONS
5.1 DEFINITIONS
n
mig(Aii ) ≥ |Aij | for (i = 1, · · · , n)
j =1
j =i
for i = 1, · · · , n.
5.2 INTRODUCTION
Ax = b. (5.2.1)
There are many applications in which the elements of the matrix A and/or
the components of the vector b are not precisely known. If we know an
interval matrix AI bounding A and an interval vector bI bounding b, we can
replace (5.2.1) by
AI x = bI . (5.2.2)
s = {x : Ax = b, A ∈ AI , b ∈ bI }. (5.2.3)
That is, s is the set of all solutions of (5.2.1) for all A ∈ AI and all b ∈ bI .
This set is generally not an interval vector. In fact, it is usually difficult
to describe s, as we show by example in Section 5.3. In Section 17.1, we
describe a method for approximating s as closely as desired by covering it
with boxes of arbitrarily small size. Other sections in Chapter 17 provide
a means for bounding the hull (defined below) of the solution set. See
especially Section 17.10.
Because s is generally so complicated in shape, it is usually impractical
to try to use it. Instead, it is common practice to seek the interval vector
xI containing s that has the narrowest possible interval components. This
interval vector is called the hull of the solution set or simply the hull. We
say we “solve” the system when we find the hull xI .
[4, 5]
xI = = [2, 5].
[1, 2]
But, the product of AI times xI is [2, 10], which does not equal bI . That is,
we cannot substitute the solution into the given equation and get equality.
All we can say is that AI xI ⊃ bI .
I
To understand what happens in this example, note that xI = Ab I and
I
hence AI xI = AI Ab I . This formulation shows that AI occurs twice in the
computation of AI xI . Therefore, dependence (as discussed in Section 2.4)
causes loss of sharpness in the computed result.
I
In this scalar example, it is possible to compute AI Ab I correctly to
be bI using dependent multiplication described in Section 2.4.1. However,
when AI is a matrix, this does not seem to be possible.
We continue to write an equation in the form (5.2.2) wherein x occurs
as if it were a real vector. The obvious incongruity emphasizes the fact that
the “equation” requires interpretation.
To illustrate that the solution set s given by (5.2.2) is not simple, we now
give an example from Hansen (1969b). See also Deif (1986). Consider the
If x is to be a point of the solution set s, it must be such that the left member
intersects the right member in each equation. Therefore,
x1
-100 200
-100
There are several variants of methods for solving linear equations that can
be labeled as Gaussian elimination. See the outstanding book by Wilkinson
(1965). An interval version of any of them can be obtained from one using
ordinary real arithmetic by simply replacing each real arithmetic step by
the corresponding interval arithmetic step.
One standard method involves factoring the coefficient matrix into the
product of a lower and an upper triangular matrix. An interval version of
this method with iterative improvement of the triangular factors is discussed
by Alefeld and Rokne (1984). Most papers on interval linear equations have
not used factorization and we do not do so.
If the coefficient matrix AI and the vector bI are real (i.e., noninterval),
then the interval version of Gaussian elimination applied to Ax = b simply
bounds rounding errors. If the coefficient matrix AI and/or the vector bI
have any interval elements, the interval solution vector computed using
Gaussian elimination contains the set s.
Suppose the elimination procedure does not fail because of division by
an interval containing zero. Then it produces an upper triangular matrix.
If no diagonal element of the upper triangular matrix contains zero, then
AI is regular (i.e., each real matrix contained in AI is nonsingular). If AI
is degenerate, this result proves that AI is nonsingular. That is, the interval
method can prove that a real matrix is nonsingular.
Note that regularity can be proved even when (outwardly) rounded
interval arithmetic is used because the rounding merely widens intervals.
If the widened diagonal elements of the transformed (by elimination) matrix
MI x = r I . (5.6.1)
x1
-100 200
-100
Figure 5.6.1: Solution Set S and the enlarged solution set due to precondi-
tioning for Equations in (5.3.1)
Mi1 x1 + · · · + Min xn = Ri .
1
Yi = (Ri − Mi1 X1 − · · · − Mi,i−1 Xi−1
− Mi,i+1 Xi+1 − · · · − Min Xn ).
Mii
(5.7.1)
The intersection
Xi = Xi ∩ Yi (5.7.2)
now replaces Xi .
The hull of the solution set of a system of interval linear equations is defined
to be the smallest box containing the solution set of the system. For brevity,
we speak of the “hull of a system”. In general, finding the hull of a system
is N P -hard. See Heindl et al (1998).
Suppose a given interval system AI x = bI has been preconditioned
using the inverse of the center of AI as described in Section 5.6. If the
preconditioned system MI x = rI is regular, its hull can be determined
exactly by a fairly simple procedure. In this section, we describe how this
is done. We refer to the procedure as the hull method.
A procedure for computing this hull was given by Hansen (1992) and
independently by Bliek (1992). An improved version was given by Rohn
(1993). The procedure we describe in this section is a further improved
version from Hansen (2000a). See also Neumaier (1999, 2000). Ning and
Kearfott (1997) gave a method for bounding the hull when the coefficient
matrix is an H-matrix. Their bounds are sharp when the center of the
H-matrix is diagonal. We do not discuss H-matrices in this book. For a
definition and discussion of H-matrices, see Neumaier (1990).
We give a procedure for computing the hull; but do not derive it. For
a derivation, see Hansen (2000). We first give a theoretical algorithm and
then a practical one.
That is, the center of MI is the identity matrix if B is the exact inverse of
Ac .
Denote MI = [MI , MI ] and rI = [rI , rI ]. Then for i, j = 1, · · · , n,
MI ij = −MI ij (i = j ), (5.8.1a)
MI ii + MI ii = 2. (5.8.1b)
0 ≤ |R i | ≤ R i . (5.8.2)
Let B0 denote the exact inverse of the exact center of A. Recall that
instead of B0 , we compute an approximation B for B0 and therefore we
must widen the approximate matrix MI to obtain a matrix MI whose center
is the identity matrix. Suppose that, using this theorem, we verify that the
Both the Gauss-Seidel method of Section 5.7 and the hull method of Section
5.8 begin by preconditioning a given system. In the resulting system
MI x = r I , (5.9.1)
We have noted that the problem of determining the hull of the solution set of
an arbitrary (non-degenerate) linear system is NP -hard. Nevertheless, the
INEQUALITIES
6.1 INTRODUCTION
We solve this inequality for x1 and obtain a new interval bound X1 for x1 .
We replace X1 by X1 ∩ X1 and repeat the procedure to get a new bound on
x2 , · · · , xn .
Note that when we solve for x2 , we can simplify the computation by
using dependent subtraction. This dependent operation is defined by (2.4.2).
When solving for x1 , we compute the sum C2 X2 + · · · + Cn Xn . If we cancel
C2 X2 from this sum and add C1 X1 , we have the needed sum to solve for
x2 . We do not have to recompute the sum of the other n − 2 terms.
In each step, we solve an inequality of the form
U +Vt ≤ 0 (6.2.2)
for a variable t where U and V are fixed intervals. That is, we solve for a
set T of values of t for which there exists at least one value of u ∈ U and
at least one value of v ∈ V such that u + vt ≤ 0. Thus,
T = {t | ∃u ∈ U, ∃v ∈ V , u + vt ≤ 0}. (6.2.3)
U + V t = [−∞, 0].
[−∞, −a]
T = .
[c, d]
The last entry in this list is the set of values of t such that [a, b]+[0, 0]t ≤ 0
when a > 0. In practice, we generally seek finite solutions to an inequality.
When a > 0 and c = d = 0 the set of finite solution points is empty.
Note that if a > 0 and c < 0 < d, the solution consists of two
semi-infinite intervals. This occurs because we divide by an interval whose
interior contains zero. If this solution is intersected with a finite interval (see
above), the result can be empty, a single interval or two disjoint intervals.
In the latter case, we speak of an interval with an (open) gap removed. The
gap consists of certainly infeasible points.
AI x ≤ bI (6.3.1)
QI x = cI (6.3.2)
BQI x = BcI .
We can solve this relation with less growth of interval widths than for the
original relation AI x ≤ bI .
In general, we do column interchanges to compute B. Therefore, instead
of (6.3.3), our new system is
Hereafter in this section, we assume that this condition holds for all i =
1, · · · , m.
We want to know which inequalities are more helpful in deleting cer-
tainly infeasible points of a given box xI . The corresponding question in
the noninterval case is: “Which constraints are most strongly violated at
some point?” This question is complicated by the fact that the different
inequalities might be scaled differently. In the interval case, we address
this complication by (implicitly) normalizing.
Consider the quantity
P i (xI )
si = (6.4.2)
P i (xI ) − P i (xI )
Assume the inequalities have been placed in the order described in Section
6.4. Because this is a desirable order, we wish to not interchange rows of
A even though interchanges for pivot selection might enhance numerical
stability. However, we are free to interchange columns. In this section,
we discuss how to do so to get a “well positioned” secondary pivot and
improve numerical stability.
A procedure was described by Hansen and Sengupta (1983) for choos-
ing a pivotal column. We describe a simpler procedure in this section.
We do not perform row interchanges unless the current row is zero. In
decreasing order of importance, we want
1. Set r = 0.
2. Replace r by r + 1.
9. Use the secondary pivot element (in position (s, r)) to zero the ele-
ments opposite in sign to it in positions (i, r) for i = r + 1, · · · , m.
11. Use the primary pivot element (in position (r, r)) to zero the elements
opposite in sign to it in positions (i, r) for i = r + 1, · · · , m.
12. Go to step 2.
13. The first m rows of A are now in upper trapezoidal form. A submatrix
of r −1 rows and n columns has been appended to A. This submatrix
is composed of secondary pivot rows and is also in upper trapezoidal
form. We now begin zeroing elements above the diagonal of each of
the two submatrices. Set r = 0.
14. Set r = r + 1.
17. Use arr as a pivot to zero any element (except the one in position
(m + r, r)) of opposite sign in column r.
19. Terminate.
AI x ≤ bI . (6.8.1)
MI x ≤ c I (6.8.3)
7.1 INTRODUCTION
(y − x)2
f (y) ∈ f (x) + (y − x)f (x) + f (X) (7.2.5)
2
and
(X − x)2
f (X) ⊂ f (x) + (X − x)f (x) + f (X ). (7.2.6)
2
We now explain why we have replaced X by X in (7.2.4) and (7.2.6).
We noted above that the quantity ξ in (7.2.1) must be in X. Therefore, we
replaced ξ by X in (7.2.3). However, this is a bound with the same numeric
value as X but is not analytically identical to X. Thus, in (7.2.4), while
X and X are numerically equal, they are not the same variable and are
therefore independent.
To illustrate this fact, consider the example in which f (x) = x1 . Since
−1
f (x) = x2
, (7.2.4) can be written as
1 X−x
f (X) ⊂ − . (7.2.7)
x X 2
1 1 x
f (X) ⊂ − + 2.
x X X
Completing the square to sharpen the bound on f (X), we rewrite this as
2
1 1 3
f (X) ⊂ x − + . (7.2.8)
X 2x 4x
n
f (y) ∈ f (x) + (yi − xi )gi (X1 , · · · , Xi , xi+1 , · · · , xn ). (7.3.6)
i=1
x1 x2 x3
f (x1 , x2 , x3 ) = + + . (7.3.7)
x2 + 2 x3 + 2 x1 + 2
1
f (y) ∈ f (x) + (y − x)T g(x) + (y − x)T H(x, xI )(y − x) (7.3.8)
2
n
f1·2 (y) ∈ f1·2 (x) + (yj − xj )[f1 (j )g2j (j ) + g1j (j )f2 (j )].
j =1
(7.3.10)
Now suppose that we simply take the product of the expanded forms of
f1 and f2 as given by (7.3.9). By combining terms appropriately, we obtain
n
f1·2 (y) ∈f1·2 (x) + f1 (x) (yj − xj )g2j (j )
j =1
n
+ f2 (xI ) (yj − xj )g1j (j ). (7.3.11)
j =1
The factor f2 (xI ) in the right member occurs because we have replaced the
argument y by xI in the (unexpanded) function f2 (y). Other similar forms
are possible by combining terms in other ways
Let us now compare the two forms (7.3.10) and (7.3.11). The latter has
advantages that we list and then discuss.
1
n
f1/2 (y) = f1/2 (x) + (yj − xj )g1j (j )
f2 (xI ) j =1
f1 (x)
n
− (yj − xj )g2j (j ). (7.3.12)
f2 (x)f2 (xI ) j =1
∂gi ∂ 2f
Jij = = (i, j = 1, · · · , n).
∂xj ∂xi ∂xj
As a noninterval matrix, J is symmetric. To compute J, we need only
compute the lower (or upper) triangle and use symmetry to get the elements
above the diagonal.
But the situation differs in the interval case if we want to have some
noninterval arguments as discussed in Section 7.3. Suppose we expand
each component of g as in (7.3.6) using the same pattern of real and interval
arguments given therein. The resulting Jacobian is not symmetric. A real
argument of Jij might be an interval argument in Jj i . To compute an interval
enclosure JI in this case, we must compute all n2 elements. We can have
symmetry by using intervals for all the arguments. But then the interval
elements of JI are wider than necessary.
In this section, we consider how to have symmetry without using all
interval arguments. We also consider how to compute both the Hessian of
f and the Jacobian of the gradient of f . In the interval case, they can differ
(if we want some arguments to be real) because the pattern of real versus
interval arguments can differ.
Consider the case n = 2. Let us expand g1 with respect to x2 and then
x1 . Let us expand g2 in the opposite order. Then
g(y) ∈ g(x) + J(x, xI )(y − x) (7.4.1)
where
J11 (X1 , x2 ) J12 (X1 , X2 )
J(x, x ) =
I
.
J21 (X1 , X2 ) J22 (x1 , X2 )
1
f (y) ∈ f (x) + (y − x)T g(x) + (y − x)T H(x, xI )(y − x)
2
and
The former is a repeat of (7.3.8) and the latter is a repeat of (7.4.1). Function-
ally, H(x, xI ) and J(x, xI ) in these equations are the same matrix. However,
if we wish to have real (instead of interval) arguments everywhere possible
in their matrix elements, then they become different matrices when evalu-
ated. This is because their patterns of real and interval arguments differ.
Suppose, we use an expansion of the form (7.3.6) for each component
of g. Then the arguments of Jij are (X1 , ...Xj , xj +1 , · · · , xn ) for all i =
1, · · · , n. These arguments are the same as those for the elements of the
be known.
We illustrate the method with a simple example. Consider the func-
tion f (x) = x1 + x sin x. Define the primitives f1 = 1 and f2 = x and
their derivatives f1 = 0 and f2 = 1. A code to evaluate f might involve
generation of values on the functions
f1
f3 = ,
f2
f4 = sin x,
f5 = f2 f4 ,
f = f3 + f5 .
In some manner or other, we must know that the derivative of sin x is cos x.
Using the definitions of the functions f3 , f4 , and f5 and the rules in Table
7.1, we obtain
f1 f2 − f1 f2
f3 = ,
f22
f5 = f2 f4 + f2 f4 ,
f = f3 + f5 .
Since we know the primitives f1 and f2 and their derivatives and the special
function f4 = sin x and its derivative, we can compute the derivative of
f using the above equations. Code to compute the right-hand side of
each equation can be generated automatically. Note that the evaluation is
numerical, not analytic.
Hopefully, the readers understand what is involved in automatic dif-
ferentiation from this abbreviated introduction. Details can be found in
the references cited above. We merely wish to make readers aware that
automatic procedures exist that are alternatives to symbolic differentiation.
The procedure we have described is suitable for computing the deriva-
tive of a function of a single variable. It can also serve for the multidimen-
sional case. However, as we showed in Section 7.3, it is better to compute
the expansion of a product or quotient by using the product or quotient of
the individual expansions. This alternative method also can be incorporated
into an automatic procedure.
Automatic differentiation can be used, for example, to evaluate the
gradient of a multivariable function. Let r denote the ratio of the amount
of work to evaluate the gradient of a function and the work to evaluate the
function itself. Wolfe (1982) observed that the value of r is usually around
1.5 in practice.
Bauer and Strassen (1983) used complexity theory to show that, for a
certain way of counting arithmetic operations, the theoretical value of r is
at most 3. For another proof, see Iri (1984). Griewank (1989) gives an
explicit algorithm and shows that, for it, r ≤ 5.
We now consider an example to illustrate the advantage of symbolic over
automatic differentiation from consideration of dependence. Let f (x) =
u(x)
v(x)
where u(x) = x 2 + x and v(x) = x 2 − x. The derivative is
f (x1 , x2 , x3 ) = x1 x2 sin x3 .
f (y, y, y) = f (x, x, x)
+ (y − x)[g1 (ξ1 , x, x) + g2 (y, ξ2 , x) + g3 (y, y, ξ3 )].
y 2 − x 2 = (y + x)(y − x)
The right member does not generally provide sharp bounds on f (X)
despite the fact that it arises from the identity (7.7.1). To illustrate this fact,
let f (x) = x 2 . We obtain
X 2 ⊂ x 2 + (X + x)(X − x).
For X = [−1, 3] and x = 1, the left member is [0, 9], but the right member
is the wider interval [−7, 9]. Dependence has caused widening of the latter
result.
from (7.7.3) for slopes, the interval X in g(x, X) and in the factor X − x are
identically the same. This is another advantage of a slope expansion over
a Taylor expansion because there is an opportunity to analytically reduce
multiple occurrences of the interval X.
When dependence does not cause widening, the interval value of the
slope expression f (x)+g(x, X)(X−x) provides sharp bounds on the range
of y for y ∈ X. The corresponding Taylor expression f (x)+f (X )(X−x)
does not. However, dependence generally prevents us from computing
sharp results when evaluating the former expression.
For a small box, the (exact) slope expression generally yields sharper
bounds on f (X) than direct evaluation of f (X) because f (x) is a good
v(u(y)) − v(u(x))
g(x, y) = (7.7.5a)
y−x
v(u(y)) − v(u(x)) u(y) − u(x)
= . (7.7.5b)
u(y) − u(x) y−x
Therefore, the slope of v(u(x)) is the product of two slopes. The first factor
is the slope of v with u(y) regarded as the variable and u(x) regarded as
the fixed point. The other factor is the slope of u.
We give an example illustrating the computation of the slope of a com-
posite function at the end of Section 7.8.
An extension of (7.7.5b) gives a chain rule for slopes similar to that for
derivatives. For example, if
then
We have noted that for any rational function, f (x), we can analytically
divide f (y) − f (x) by y − x. This can also be done for certain algebraic
1
functions. Note that, for f (x) = x n (n = 2, 3, 4, · · · ) we have
y−x
f (y) − f (x) = .
n−1 k n−1−k
xny n
k=0
y n − x n k n−1−k
n−1
= x y . (7.8.3)
y−x k=0
gu (x, X) = x + X + 1. (7.8.5)
where u and u are the endpoints of u(X). That is, u(X) = [u, u].
For a general function u, we might not be able to determine u and u
sharply. However, we can compute bounds on u(X) by simply evaluating
u over X. The result might not be sharp because of dependence. Therefore,
the bounds on the slope of f might not be sharp. If u is monotonic over X,
this fact can be used to compute u(X) sharply. See Section 3.6. Since we
use the slope of u to obtain an expression for the slope of f , we can also
bound u(X) using the slope expansion
where
f (y1 , y2 , y3 ) − f (y1 , y2 , x3 )
g3 (y1 ; y2 ; x3, y3 ) = .
y3 − x 3
where
f (y1 , y2 , x3 ) − f (y1 , x2 , x3 )
g2 (y1 ; x2 , y2 ; x3 ) = .
y2 − x2
where
f (y1 , x2 , x3 ) − f (x1 , x2 , x3 )
g1 (x1 , y1 ; x2 ; x3 ) = .
y1 − x1
f (y1 , y2 , y3 ) =f (x1 , x2 , x3 )
+ (y1 − x1 )g1 (x1 , y1 ; x2 ; x3 )
+ (y2 − x2 )g2 (y1 ; x2 , y2 ; x3 )
+ (y3 − x3 )g3 (y1 ; y2 ; x3, y3 ). (7.9.1)
where G(x, y) is the appropriate matrix. That is, we can generate a slope
expansion of a vector function.
Note that G(x, y) takes the place of the Jacobian in a Taylor expansion.
In Section 7.4, we discussed how to make the Jacobian symmetric when f
is the gradient of some function. Unfortunately, the slope expansion does
not seem to provide a means for making G(x, y) symmetric.
f (y)−f (x)
The slope g(x, y) is defined to be y−x
. Therefore,
f (y) − f (x)
g(x, x) = lim = f (x).
y→x y−x
g(x,y)−g(x,x)
That is, the slope y−x
of g becomes
For this function, we merely bound the slope. If u(x) ≥ v(x) for all x in an
interval X, then f (x) = u(x) for x ∈ X; and the slope of f over X is the
Its slope can be bounded in a way similar to that for the maximum function.
We obtain
gu (x, X) if z < 0,
g(x, X) = gv (x, X) if z ≥ 0,
gu (x, X) ∪ gv (x, X), otherwise.
provided
Hence,
Note that one of the functions f1 and f2 has argument x and the other has
argument y. If we interchange the roles of f1 and f2 , we obtain g (x, y)
in a different form. Analytically, the two forms are interchangeable, be-
cause they are containment-set equivalent (see Chapter 4) algebraic rear-
rangements of one another. Nevertheless, the effect of dependence on the
computed values might be different for the two cases; so different results
might be produced.
The slope of the quotient of two functions is unique and is given in Table
7.2. Note that the table includes the slopes of the primitives f = constant
and f = x which are necessary for starting the procedure for automatic
evaluation of slopes.
Ideally, we want merely to program the evaluation of a function and
have code generated automatically to evaluate its slope. When a slope
cannot be determined, such a program can produce a bound for the slope.
Such a bound can be in the form of a derivative. The approach is the same
as for automatic differentiation as described in Section 7.5. It automatically
yields numerical values of the slope.
Table 7.2: Slopes for the Basic Arithmetic Operations on Two Functions.
g(x, y) = (x + y) − (x 2 + xy + y 2 ) (7.12.4)
f (y) − f (x)
g(x, y) = ∈ f (X)
y−x
For multiplication,
For division,
f1 (y)
f (y) =
f2 (y)
f1 (x) f (x)f2 (x) − f1 (x)f2 (x)
= + (y − x) 1
f2 (x) f2 (x)f2 (y)
f2 (x)h1 (x, y) − f1 (x)h2 (x, y)
+ (y − x)2 .
f2 (x)f2 (y)
A salient feature of a slope expansion is that its analytic form for a ratio-
nal function is exact. Intervals enter only when terms are bounded. In
contrast, intervals enter into a Taylor expansion to bound unknown deriva-
tive values. There are other types of expansions which are equivalent to
slope expansions (and to each other) because they also use exact analytic
expansions and then bound certain terms. The oldest of these is Moore’s
(1966) centered form (see Section 3.3) which generalizes in various ways
(see Ratschek and Rokne (1984)).
Another equivalent type of expansion is the generalized interval arith-
metic introduced by Hansen (1975). It has been used to speed the process
of solving nonlinear equations by interval methods. See Hansen (1993).
QUADRATIC EQUATIONS
AND INEQUALITIES
8.1 INTRODUCTION
where A = [A, A], B = [B, B], and C = [C, C] are intervals. The
interval roots of (8.1.1) are the set of real roots x of the quadratic equation
ax 2 + bx + c = 0 for all real a ∈ A, b ∈ B, and c ∈ C.
We can express an interval enclosure for the roots as
−B ± (B 2 − 4AC)1/2
r± ∈ . (8.1.2)
2A
2C
r± ∈
−B ± (B 2 − 4AC)1/2
because B and C now occur twice.
Since x 2 ≥ 0, Ax 2 = [Ax 2 , Ax 2 ]. The term Bx in (8.1.1) can be written
[Bx, Bx] if x ≥ 0,
Bx =
[Bx, Bx] if x ≤ 0.
Denote
F1 (x) = Ax 2 + Bx + C,
F2 (x) = Ax 2 + Bx + C,
F3 (x) = Ax 2 + Bx + C,
and
F4 (x) = Ax 2 + Bx + C.
We can rewrite (8.1.1) as [F1 (x), F2 (x)] = 0 when x ≥ 0 and as [F3 (x),
F4 (x)] = 0 when x ≤ 0. Denote
F1 (x) if x ≥ 0, F2 (x) if x ≥ 0,
F (x) = F (x) = (8.1.3)
F3 (x) if x ≤ 0. F4 (x) if x ≤ 0.
F (x) = Ax 2 + Bx + C
can be expressed as
Ax 2 + Bx + C ≥ 0 (8.1.5)
Ax 2 + Bx + C = [0, +∞]
so that it becomes
Ax 2 + Bx + [−∞, C] = 0.
This has the form of (8.1.1) and can be solved by the method that we now
describe.
8.2 A PROCEDURE
We wish to know where the lower and upper real functions defining F (x)
are zero. These functions are determined by Fi (x) (i = 1, 2, 3, 4). It is
easily verified that the upper function F (x) is convex. If A > 0, then the
lower function F (x) is also convex. If A < 0, the lower function F (x) is
concave for x ≤ 0 and for x ≥ 0, but it can have a cusp at x = 0.
Let us compute the real roots of each of these real functions Fi (x)
(i = 1, 2, 3, 4) and place them in a list L. A double root is to be entered
twice. If A = 0, then F1 and F3 are linear and we have only a single root.
The roots are computed using interval arithmetic to bound rounding errors.
Thus, the entries in the list L are intervals.
Since we omit complex roots, it appears that these four functions can
have a total of 0 to 8 roots. However, these real roots are endpoints of the
interval roots and there can be no more than three interval roots. Therefore,
the list L can contain no more than six real roots.
The functions F1 (x) and F2 (x) define bounds on F (x) only when x ≥ 0.
Therefore, we drop any negative root of these functions from the list L.Also,
drop any negative part of the interval bounding such a root. Similarly, drop
any root (or part of root) of F3 (x) or F4 (x) that is positive.
The intervals remaining in L bound values of x that are either a lower
or an upper endpoint of an interval root. We need only determine which
they are. Before doing so, it is convenient to put the interval root endpoints
±∞ into L (if they occur).
1. Compute intervals containing the real roots of each of the real func-
tions Fi (x) (i = 1, 2, 3, 4). Put the results in a list L. A double root
is to be entered twice. If C = 0, both F1 (x) and F3 (x) have a root at
x = 0. This root is entered only once into L. If C = 0, both F2 (x)
and F4 (x) have a zero at x = 0. This root is entered only once into
L.
NONLINEAR EQUATIONS
OF ONE VARIABLE
9.1 INTRODUCTION
Various other interval methods for this problem have been published
including more general versions of the method we describe. However, the
simple version of the interval Newton method has so many remarkable
properties (see Section 9.6) and is so efficient that no other methods or
variations are needed.
The quotation below points up the value of some of the properties of the
method. Section 2.1 of the excellent book by Dennis and Schnabel (1983)
is entitled “What is not possible”. As we shall see, properties of the interval
Newton method make this title erroneous.
They state that “It would be wonderful if we had a general purpose computer
routine that would tell us: ‘The roots of f1 (x) are 0, 3, 4, and 5; the real
roots of f2 (x) are x = 1 and x ∼= 0.888; f3 (x) has no real roots’ ”.
They continue: “It is unlikely that there will ever be such a routine. In
general the questions of existence and uniqueness — does a given problem
have a solution, and is it unique? — are beyond the capabilities one can
expect of algorithms that solve nonlinear problems”.
As described in this chapter, the general purpose computer routine that
Dennis and Schnabel say “would be wonderful” does, in fact, exist. It was
used by one of the authors to solve the problems listed. It produced precisely
the information requested in the above quotation, including answers to the
questions of existence and uniqueness.
We derive this “wonderful” algorithm in the next section and give a
version of it in Section 9.3. We now list some of its properties in informal
terms. The properties are proved formally as theorems in Section 9.6
After discussing stopping criteria in Section 9.4, we list the steps of the
algorithm in Section 9.5. We then state and prove several theorems about
the properties of the algorithm in Section 9.6 and give a numerical example
in Section 9.7. In Section 9.8, we describe a variant of the method using the
slope function (discussed in Section 7.7). An illustrative example of this
variant is given in Section 9.9. We close the chapter with a brief discussion
of perturbed problems in Section 9.10.
The interval Newton method was derived by Moore (1966) in the following
manner. From the mean value theorem
f (x) − f (x ∗ ) = (x − x ∗ )f (ξ ) (9.2.1)
f (x)
x∗ = x − .
f (ξ )
f I (x)
N(x, X) = x − ,
f (X)
xn = m(Xn )
f I (xn )
N(xn , Xn ) = xn − (9.2.2)
f (Xn )
Xn+1 = Xn ∩ N(xn , Xn )
We call xn the point of expansion for the Newton method. It is not necessary
to choose xn to be the midpoint of Xn . We require only that xn ∈ Xn to
assure that x ∗ ∈ N(xn , Xn ) whenever x ∗ ∈ Xn . However, it is convenient
and efficient to choose xn = m(Xn ). Later in this section, we discuss a
useful result of this choice. In Section 9.3, we discuss a case in which the
point of xn is an endpoint of Xn .
In his original derivation of the interval Newton method, Moore (1966)
assumed that 0 ∈ / f (X0 ). Alefeld (1968) and (independently, but much
later) Hansen (1978b) extended the algorithm to include the case 0 ∈
f (X0 ). We consider this more general case in this section.
If 0 ∈/ f (X0 ) then 0 ∈ / f (Xn ) for all n = 1, 2, · · · . This follows
from inclusion isotonicity and the fact that Xn ⊂ X0 for all n = 1, 2, ....
However, if 0 ∈ f (X0 ), then evaluating N(x1 , X1 ) requires the use of
extended interval arithmetic (see Chapter 4). If x ∗ is a multiple zero of
f , then f (x ∗ ) = 0 and so 0 ∈ f (X) for any interval X containing x ∗ .
Even though N(xn , Xn ) is not finite in such a case, the intersection Xn+1 =
Xn ∩ N(xn , Xn ) is finite.
When we evaluate f (xn ), we use interval arithmetic to bound rounding
errors and denote this fact by a superscript “I” on f I (xn ) = [an , bn ]. If
0 ∈ f I (xn ), then xn is a zero of f or else is “near” to one (if one exists).
If 0 ∈ f I (xn ) and 0 ∈/ f (Xn ), a step of the interval Newton method using
the interval Xn might or might not yield a smaller interval than Xn .
where
an
pn = xn −
cn
an
qn = xn − .
dn
If bn < 0, then
[qn , +∞] ∪ {−∞} if cn = 0
N(xn , Xn ) = [−∞, pn ] ∪ {+∞} if dn = 0 (9.2.4)
[−∞, pn ] ∪ [qn , +∞] if cn < 0 < dn
where
bn
pn = xn −
cn
bn
qn = xn − .
dn
20
f(x) = x 2 - 2
-1 5
x
[ X ]
[ ] Gap [ ]
/ f (X).
9.3 A PROCEDURE WHEN 0 ∈
0. Set n = 0.
1. If flag Fa = 1 go to step 4.
4. If flag Fb = 1, go to step 7.
9. Replace n by n + 1.
Criteria for stopping iteration of an interval Newton method must assure that
iteration is continued until an interval bound on a zero of f is sufficiently
narrow. Also, the criteria should avoid needless iteration. We discuss these
issues in this section. Throughout this section, we assume 0 ∈ f (X) for
any interval X that is a candidate for a final bound on a zero of f . Otherwise,
we use the stopping procedure of Section 9.3.
A simple criterion is to terminate when the width w(X) of the current
interval X is small. However, this can create a difficulty. Consider a hypo-
thetical computer that uses three significant decimal digits in its arithmetic.
Suppose we require w(X) < 0.001 for any final interval. We consider two
examples using this criterion on such a computer.
w(X)
Criterion 9.4.3 |X|
< εX for some given εX > 0.
Note that this criterion is in absolute rather than relative form because it is
used when 0 ∈ f (X).
2. If the list L is empty, stop. Otherwise, select the interval from L that
has been in L for the shortest time. Denote the interval by X. Delete
X from L.
3. If 0 ∈ f (X), go to step 5.
4. Iterate the Newton method until either the result is empty or else
0 ∈ f (m(X)) .In the latter case, apply the procedure described in
Section 9.3. If the result is empty, go to step 2. Otherwise record the
solution interval that the procedure produces and go to step 2.
5. If 0 ∈ f I (x), go to step 7.
6. If w(X)
|X|
< εX and w (f (X)) < εf , record X as a final bound and go
to step 2. Otherwise, go to step 8.
9. If the Newton step reduced the width of the interval by at least half,
go to step 3.
10. Split the current interval in half. Put one half in list L. Designate the
other half as the current interval and go to step 3.
When the algorithm stops (see step 2), each zero of f in X0 is in one
of the intervals recorded in step 4, step 6, or step 7. Intervals might have
been recorded that do not contain a zero of f . However, every zero in X0
is in one of the output intervals. As noted in Section 9.3, the algorithm
might prove the existence (using Theorem 9.6.8 below) of a zero of f in a
recorded interval.
In step 2, we process the interval that has been in the list L for the
shortest time. This tends to keep the list L short and conserve memory.
This choice of interval is easily implemented using a stack. An alternative
choice that keeps the list short is to choose the narrowest interval in L. Our
choice tends to do this.
In the next chapter, we discuss a procedure that we have called “hull
consistency”. Before the above Newton method is used, hull consistency
is applied. This can reduce the region of search for zeros of f .
The interval X0 to which an interval Newton method is applied generally
contains more than one zero of f . Steps 7 and 10 split the current interval
and serve to separate different zeros into different intervals. This enables
rapid convergence to each zero separately.
Theorem 9.6.2 Let an initial interval X0 be given and assume that f and
f have a finite number of zeros in X0 . Denote the intervals in the list L at
the i-th stage by Xj(i) (j = 1, · · · , Ni ) where Ni is the number of intervals
in L at stage i. Assume that at the i-th stage, a step of the interval Newton
method is applied to the interval of greatest width in L. Then for arbitrary
ε > 0, and all sufficiently large i, we have wi < ε where
Ni
wi = w(Xj(i) ).
j =1
progress is made.
Suppose Xn is the union of two intervals and that xn = m(Xn ). This
midpoint xn is not in N(xn , Xn ). Therefore, each of the two subintervals
generated in the n-th step is of width less than 21 w(Xn ).
It might happen that Xn+1 = Xn when 0 ∈ fn (Xn ). That is, the Newton
step makes no progress. In this case, we split Xn in half. (See step 10 of the
algorithm in Section 9.5.). Thus, in all cases, any new interval generated
This well-known theorem was first proved by Moore (1966) (see also
Alefeld and Herzberger (1983)). The theorem states that if 0 ∈ / f (Xn ),
then convergence is rapid asymptotically (i.e., quadratic) while Theorem
9.6.6 says that the rate can be reasonably fast (i.e., geometric) even for wide
intervals.
This theorem was first proved by Hansen (1969a). His proof is for the
case in which f is a thin function (as defined in Section 3.8). The proof
contained herein follows as a special case of Theorem 9.6.9 below.
Note that evaluation of N(x, X) = x − f (x)/f (X) involves division
by f (X). If 0 ∈ f (X), then N(x, X) is not finite and the hypothesis
N(x, X) ⊂ X of the theorem cannot be satisfied. If X contains a multiple
zero or more than one isolated zero of f, then 0 ∈ f (X). Therefore,
Theorem 9.6.8 can prove existence of simple zeros only.
Theorem 9.6.9 below is a generalization of Theorem 9.6.8. It is par-
ticularly useful in practice because it holds when f is a thick function (as
defined in Section 3.8).
Let f depend on an interval parameter C. to emphasize this dependence,
we rewrite f (x) as f (x, C) and f (X) as f (X, C). Assume that f (X, C)
is a continuously differential function of x for each c ∈ C. The function
N(x, X) becomes
f (x, C)
N(x, X, C) = x −
f (X, C)
This theorem (and the proof that follows) holds equally well when C is
a vector of more than one interval parameter. We assume that C is a single
parameter merely to simplify exposition.
Proof. We develop a proof by showing that f (x, c) changes sign in X
for each c ∈ C.
Let c be a point in C and let x and y be points in X. From the mean
value theorem, for each c ∈ C,
for some ξ(c) between x and y. Since x and y are in X, it follows that
ξ(c) ∈ X. Therefore,
for each c ∈ C.
Note that if 0 ∈ f (X, c), then N(x, X, C) is not finite. Hence, the
/ f (X, C)
hypothesis N(x, X, C) ⊂ X of the theorem can be true only if 0 ∈
since X is finite. Note that the condition 0 ∈ / f (X, C) implies that any
zero of f (x, c) in X must be simple for each c ∈ C.
Denote f (X, C) = [p, q]. Then 0 ∈/ [p, q]. Since we can change the
sign of both f and f without changing the algorithm, there is no loss of
generality in assuming f (X, C) > 0. Therefore, we assume that p > 0.
Since C is a nondegenerate interval, so is f (x, C) even though x is
degenerate. Denote f (x, C) = [f (x, C), f (x, C)]. Also, denote X =
[X, X] and N(x, X, C) = [N (x, X, C), N(x, X, C)].
We show that f (X, c) ≤ 0 and f (X, c) ≥ 0 for each c ∈ C, which
implies that f (x, c) has a zero in X for each c ∈ C. Note that the assumption
p > 0 implies that f (x, c) is monotonically increasing in X for each c ∈ C.
Hence, if f (x, C) < 0, then f (X, C) < 0. That is, f (X, c) < 0 for each
c ∈ C, as we wished to show.
f (x, C)
N (x, X, C) = x −
p
f (x, C)
X≤x− .
p
That is,
f (x, C) + (X − x)p ≤ 0
for each c ∈ C.
From (9.6.2),
and, hence,
We now describe the slope interval Newton method. To obtain it, we modify
the above interval Newton method by replacing the derivative f by the slope
function g discussed in Section 7.7.
From (7.7.1),
f (y) = f (x) + (y − x)g(x, y) (9.8.1)
f(x)
15
f I(m(X)) + g(m(X), X) (x - m(X))
If we compare this relation with the relation (9.2.2) for the interval
Newton method, the only apparent difference is that we have replaced f (X)
by g(x, X). Actually, there is another difference. To assure that any zero
We now consider a simple example to illustrate the virtue of the slope form
of the interval Newton method. Consider the function
g(x, X) = X 3 + (x + 3)X 2
+ (x 2 + 3x − 96)X + x 3 + 3x 2 − 96x − 388.
CONSISTENCIES
10.1 INTRODUCTION
f (x1 , · · · , xn ) = 0.
q(a)
N(a, Y ) = a − ∂
.
∂xi
q(Y )
For the interval Y = [−4, 2] , Figure 10.2.1 illustrates how a Newton step
about the point a = −4 produces a (small) reduction in the width of Y .
The slopes of the slanting lines in Figure 10.2.1 equal the lower and upper
∂
bounds of ∂x q(Y ).
Similarly, we use q(b) to obtain a Newton result. Expanding about b,
the Newton result is
q(b)
N(b, Z) = b − ∂
.
∂xi
q(Z)
Y = [a, b]
N(a, Y)
[ ]
[ ]
x
-10 10
qI(a) + q(Y) ( x - a)
qI(a) + q(Y) ( x - a)
-10
f(x)
10
qI(b) + q(Z) ( x - b)
N(b, Z)
Z = [a, b]
]
[ ]
x
-10 [ ] 10
Z ∩ N(b, Z)
qI(b) + q(Z) ( x - b)
-10
1. Set β = 1
4
and w0 = b − a.
10. Replace a by Y
12. Record the final interval [a, b] and terminate the algorithm.
f (x) = x − h(x) = 0.
41
can let g(x) = ax 4 and compute X = ± − bX+c a
. We can also separate
21
x 4 into x 2 ∗ x 2 and solve for one of the factors as X = ± − bX+c aX2
. We
consider a particular method for choosing g in Section 10.6.
A virtue of consistency methods is that they can work well “in the
large”. When we seek a solution of f (x) = 0, we often start the search
over a large interval to assure that it contains the solution. When the solution
is not where |x| is large, we must somehow eliminate large values. For this
purpose, HC is very useful.
As an example, suppose we seek a solution of x 4 +x −2 = 0 in the inter-
val X = [−100, 100]. Solving for x 4 and replacing x in the remaining terms
by the interval X, we obtain (X )4 = 2 − [−100, 100] = [−98, 102]. Since
(X )4 must be non-negative, we replace this equation by (X )4 = [0, 102]
1
and conclude that X = ± [0, 102] 4 so X = [−3.18, 3.18] approximately.
This is a substantial reduction of the original interval.
X ∩ X 4
]
g(X) = X
]
]
h(X)
x
[
-2 5
[ X ]
[
-1
f (x) = x 4 + x 2 − x − 1 = 0.
m
f (x) = gi (x) − h(x) = 0 (10.5.1)
i=1
Let X be given and evaluate g1 (X), g2 (X), h(X), and then f (X). Suppose
we have used g1 as the function to solve using HC and obtained a new
interval X .
We now want to solve for g2 as g2 (X ) = h(X ) − g1 (X ). But if f
(and hence h) is quite complicated, this is a lengthy computation. Instead
of computing h(X ), we can use h(X), which has already been computed.
This saves computing at the expense of loss of sharpness. Therefore, we
want to obtain
f (x, y) = x 4 y − x 2 y 2 − 4x − 2ex = 0
If we replace both x and y by their bounds except for the x 2 term and solve
for x 2 (and then x). We obtain
Here and in what follows we record results to four significant digits. This
greatly improved bound is independent of the size of a.
(x − y)2 + x 2 y 2 − 22xy + 13 = 0
and solve for 22xy. We obtain the slightly better result xy = [0.5909, a 2 ].
We can do still better by writing f (x, y) = 0 in the form
10.6 CONVERGENCE
The step to solve for g(x) (and then x) can be iterated. Thus we can define
(x − x ∗ )g (ξ ) = (x − x ∗ )h (η)
Instead of solving for x using (10.6.1), we now use the iterative step
(x − x ∗ )[g (ξ ) + v (ξ )] = (x − x ∗ )[h (x ∗ ) + v (x ∗ )]
1
+ (x − x ∗ )2 [h (η) + v (η)]
2
where ξ ∈ X and η ∈ X.
Suppose we choose v so that
h (x ∗ ) + v (x ∗ ) = 0. (10.6.6)
Then
h (η) + v (η)
x − x∗ = (x − x ∗ )2 . (10.6.7)
2[g (ξ ) + v (ξ )]
h (x ∗ ) + v (x ∗ )
C(x ∗ ) = .
2[g (x ∗ ) + v (x ∗ )]
Let us first choose g(x) so that it is large when |x| is large. Let g(x) = x 4 .
Then C(10) = 0.152 and C(0.001) = 1500. That is, convergence to the
large zero is much more rapid than to the small zero. Next, choose g =
10.001x so that g(x) dominates the other powers of x when |x| is small.
We find C(10) = −0.303 and C(0.001) = 0.003. Now convergence is
more rapid to the smaller zero.
These differences in values of the asymptotic constant are qualitatively
1
the same for this example if we use v(x) = α[g(x)]2 or v(x) = α[g(x)] 2
instead of v(x) = αg(x). That is, if g is chosen so that HC is efficient in
deleting values of x that are (say) large in magnitude, the procedure remains
so no matter the form we use for v.
It can be argued that there is no need for a quadratically convergent
form of HC because the interval Newton method has this property (when
converging to a simple zero of f ). For best performance for both small and
large values of |x|, more than one form of HC must be used. However the
Newton method requires use of a single form only.
The data needed for HC to exhibit quadratic convergence is essentially
the same as for Newton’s method. For Newton’s method, we need to evalu-
ate f (X) and we can compute it by separately computing g (X) and h (X).
∗)
If we define v(x) = αg(x), then for HC, we want α = − gh (x (x ∗ )
. Know-
m h (X))
ing g (X) and h (X), we can approximate this value by α = − ( .
m(g (X))
Therefore, one can use both methods with very little extra computing.
It can be shown that the asymptotic constant for Newton’s method is
f (x ∗ )
CN (x ∗ ) = .
2f (x ∗ )
10.8 SPLITTING
We can now solve this equation for the single variable xi . This is the case
we have been discussing. The difference is that now the equation involves
interval constants.
A subset of the equations in a system of nonlinear equations often
contains terms that are linear in some of the variables. In this case, we can
use HC to solve for linear combinations of such variables and then solve
the linear system. We can also solve for linear combinations of simple
nonlinear functions.
In the multidimensional case, we can solve for a term involving more
than one variable. We then have a two stage process. For example, suppose
1
f (x, y) = − h(x, y) = 0.
x+y
Let x ∈ X = [1, 2] and y ∈ Y = [0.5, 2]. Suppose we find that h(X, Y ) =
[0.5, 1]. Then x+y1
∈ [0.5, 1] so x + y ∈ [1, 2]. Now we replace y by Y =
[0.5, 2] and obtain the bound [−1, 1.5] on X. Intersecting this interval with
the given bound X = [1, 2] on x, we obtain the new bound X = [1, 1.5].
We can use X to get a new bound on h; but this might require extensive
computing if h is a complicated function; so suppose we do not. Suppose
that we do, however, use this bound in our intermediate result x+y = [1, 2].
Solving for y as [1, 2] − X , we obtain the bound [−0.5, 1]. Intersecting
this interval with Y , we obtain the new bound Y = [0.5, 1] on y. Thus, we
improve the bounds on both x and y by solving for a single term of f .
For a system of equations, we apply HC by cycling through the equa-
tions and variables as described at the end of Section 10.2. Suppose we
have solved once for each variable from each equation. We can now repeat
the process. In our optimization algorithms, we do so only if sufficient
progress is made in a given cycle. Otherwise, we apply other procedures.
If they also fail to make sufficient progress we split the box.
f (x, y) = xy − 10 = 0.
x + y = 0,
x−y =0
We write it in analytic form with terms collected and arranged to produce the
sharpest interval values when evaluated with interval arguments. Afterward,
we substitute numerical values for Bi,j (i, j = 1, · · · , n).
If the box is not small, the tangent planes can be poor approximations for
the functions. In this case, this procedure might not be helpful. Moreover,
a linear combination of functions is more complicated than the original
component functions of f(x). Therefore, it is likely that dependence causes
greater loss of sharpness when applying HC to the transformed functions.
Therefore, this procedure for using linear combinations of functions is best
used only when the box is small. It is for small boxes that we want to use an
interval Newton method; and it is for the Newton method that we compute
the matrix B needed for the HC procedure just described.
After a Newton step is applied, we apply HC and BC to the linear
combination of nonlinear functions as described above. This involves more
computation than application to the original system. If the box is small,
it is for this step that the quadratically convergent form of HC (described
in Section 10.6) is of value. This is because it is applied to an equation
that depends strongly on only one variable and because the box is small so
behavior of the procedure is approximately asymptotic.
Proof. Since h(x) ∈ hI (xI ) for all x ∈ xI , the function h(x) maps
the convex, compact set xI into itself. Therefore, the Brouwer fixed point
theorem (see Theorem 5.3.13 of Neumaier (1990)) assures that this function
has a fixed point x∗ in the interior of xI . That is, x∗ = h(x∗ ) and hence
f(x∗ ) = 0.
To apply this theorem, we evaluate each component of h over the same
box xI . In practice, we use a reduced component of xI as soon as it is
computed. We can prove existence using this more efficient form.
We illustrate the procedure for a system of two equations of two vari-
ables. Assume we are able to write the equations in the form
f1 (x1 , x2 ) = x1 − h1 (x1 , x2 ) = 0,
f2 (x1 , x2 ) = x2 − h2 (x1 , x2 ) = 0.
X1 = H1 (X1 , X2 ).
X2 = H2 (X1 , X2 ).
Conclusion 10.12.3 For each x1 ∈ X1 , there exists x2∗ ∈ X2 such that
f2 (x1 , x2∗ ) = 0.
and hence x ∈ [−22.24, 22.24]. If we iterate this step, the limiting interval
bound on x is approximately [−13.25, 13.25].
Suppose we use BC by applying one Newton step to increase the lower
bound and one Newton step to decrease the upper bound. We obtain
[−66.42, 66.42] approximately. Thus, we perform more work than a step
of HC and obtain less sharp bounds. To get bounds as good as that from one
step of HC, we must apply ten Newton steps when using BC. However, if we
iterate BC, the limiting interval bound is approximately [−6.824, 6.824],
which is narrower than the best possible HC result.
BC can usually produce bounds that are at least as narrow as those from
HC. However, this requires more computing effort. In fact, it might be true
only in the limit after an infinite number of BC steps.
Consider a function of the form x1m − h(x2 , · · · , xn ) = 0 where m ≥ 2
and where x1 does not occur in h. To solve for x1 using either HC or BC,
we replace x2 , · · · , xn by their interval bounds and solve for x1 . Using HC,
we get the best possible result in one step. Using BC, we must iterate and
generally stop with a result that is not as good as that produced by HC.
f (x) = x 3 − 4x 2 + 15x.
1
±
−B ± B 2 − 4AC 2
r = (10.15.1)
2A
expresses the roots of a quadratic equation
Ax 2 + Bx + C = 0
in terms of the square root of the discriminant
D = B 2 − 4AC.
We noted in Section 8.1 that generally one should not use the explicit
expression (8.1.2) to find roots of an interval quadratic equation. Instead,
the method of Section 8.2 should be used. This procedure reduces loss of
sharpness in computed roots resulting from dependence.
Nevertheless, (10.15.1) is useful because of the implied condition
B 2 − 4AC ≥ 0, (10.15.2)
which is necessary for the roots to be real. This condition can be used in
various ways when solving problems in which an (appropriate) equation
must be satisfied. The inequality in (10.15.2) is an example of a domain
constraint discussed in Chapter 4 on page 71.
We now consider some illustrative examples. First, consider the equa-
tion
x 2 y 2 − 20xy + x 2 + y 2 + 10 = 0. (10.15.3)
Suppose we wish to find all solutions of this equation in the box given
by X = Y = [−100, 100]. We can apply HC by regarding this equation
as a quadratic in the product xy. Solving the quadratic by the method of
Section 8.2 does not reduce the box. Similarly, if we regard the equation
as a quadratic in x (or in y) we do not reduce the box.
However, regarding the equation as a quadratic in xy, the discriminant
is
D1 = 360 − 4(x 2 + y 2 ).
x 4 y − x 2 y 2 − 4x − 20 = 0 (10.15.4)
must hold in some box. We can obviously regard this equation as a quadratic
in y. Also, we can think of x in the linear term as a separate variable and
regard the equation as a quadratic in x 2 . Also, we can think of x 4 in the
leading term as a separate variable and regard the equation as a quadratic
in x. The relevant discriminants for these cases are
D1 = x 8 − 4x 2 (4x + 20) ≥ 0,
D2 = y 4 + 4y(4x + 20) ≥ 0,
D3 = 16 + 4y 2 (x 4 y − 20) ≥ 0.
Suppose (10.15.3) must be satisfied in a box given by X = Y = [−5, 5].
If we apply HC directly to (10.15.4), we are unable to reduce the box.
However, applying HC to D1 ≥ 0, we can delete the gap (−1.91, 2.20) from
X. Applying HC to D3 ≥ 0, we can reduce the interval Y to [−0.448, 5].
Thus, use of the discriminant relations is fruitful.
In Section 9.5, we give the steps of an interval Newton method for solving
nonlinear equations in one variable. We now give the steps of that algorithm
after incorporating hull consistency. This makes the new steps somewhat
more complicated than the original ones in Section 9.5. However, they are
more efficient, especially when the initial interval is wide.
At any stage of the algorithm, the current interval is denoted by X even
though it changes from step to step.
4. If 0 ∈ f (X), go to Step 6.
6. If 0 ∈ f I (x),go to Step 8.
7. If w(X)
|X|
< εX and w(f (X)) < εf , record X as a final bound and go
to Step 2. Otherwise, go to Step 9.
10. If the width of X is less than half that of X (1) (defined in Step 3), go
to Step 3.
11. Split X using the procedure given in Section 10.8. Put the resulting
subintervals into list L and go to Step 2.
SYSTEMS OF NONLINEAR
EQUATIONS
11.1 INTRODUCTION
This set contains any point y ∈ xI for which f(y) = 0. We could use the
notation {s} to emphasize the solution set is not a point, but we choose not
to do so.
The smaller the box xI , the smaller the set s. The object of an interval
Newton method is to reduce xI until s is as small as desired so that a solution
point y ∈ xI is tightly bounded. Note that s is generally not a box. (See
Section 5.3).
(k = 0, 1, 2, · · · )
As before for (11.2.4b), the intersecting in (11.2.6b) is done for a given
component as soon as it is computed.
We frequently make reference to the interval Newton method in which
we solve (11.2.6a) using a step of the Gauss-Seidel method described in
Section 5.7. We write the iteration in succinct form by dropping the super-
script k and letting x and xI denote x(k+1) and xI (k+1) , respectively. We also
replace M(x, xI ) by MI . The iteration for the i-th element of N(x(k) , xI (k) )
is simply denoted by
1
i−1 n
Ni = xi + I Ri − MI ij (Xj − xj ) − MI ij (Xj − xj ) ,
M ii j =1 j =i+1
Xi = Ni ∩ Xi (11.2.7)
i−1
n
Ki = xi + Ri + P ij (Xj − xj ) −
I
PI ij (Xj − xj ),
j =1 j =i
Xi = Ki ∩ Xi . (11.2.8)
Type II: Equation (11.2.6a) is solved to obtain bounds on its solution set;
but the bounds are not sharp, in general.
Type III: A method of type I or a method of type II (or both) is used in each
iteration depending on criteria designed to enhance overall efficiency.
The first procedure, which does not sharply bound s, was introduced
(independently) by both Kahan (1968) and Krawczyk (1969). It is given by
(11.2.8). It is commonly called the Krawczyk method and has been studied
thoroughly. For example, see Alefeld (1999), Moore (1979) and Neumaier
(1990).
In Moore’s method the interval matrix V(x, xI ) in (11.3.1) is computed
using interval Gaussian elimination. This can fail because of division by
an interval containing zero. At the time of inception, the Krawczyk method
was an important development because it avoided applying Gaussian elim-
ination to an interval matrix. In fact, the Krawczyk method avoids solving
any set of interval linear equations. Instead, only a real (i.e., noninterval)
matrix inverse is computed. This was the motivating factor for its introduc-
tion. For a recent discussion of the Krawczyk method, see Alefeld (1999).
A minor weakness of the Krawczyk and Gauss-Seidel methods is that
even if x∗ is a zero of f so that f(x∗ ) = 0, the solution of (11.2.6a) does not
yield a result precisely equal to the degenerate box x∗ . Instead, a nondegen-
erate box containing x∗ is produced. Partly for this reason, these methods
are not as rapidly convergent as some other interval Newton methods.
Hansen and Sengupta (1981) noted that the Gauss-Seidel method is
more efficient than the Krawczyk method. See also Hansen and Green-
berg (1983). Neumaier (1990), discusses the Gauss-Seidel and Krawczyk
methods in the form in which intersecting is done only after all new compo-
nents have been computed. He notes (page 177) that in this form N x, xI ⊂
K x, xI where Ni , the elements of N x, xI , are given by Equation (11.2.7)
and Ki , the elements of K x, xI , are given by (11.2.8). Therefore, be-
tween the two methods, the Gauss-Seidel method is preferred.
For variations of algorithms using the Gauss-Seidel method, See Hansen
and Sengupta (1981) and Hansen and Greenberg (1983).
In this section, we describe an “inner iteration” that has been used in the
past to improve the convergence of interval Newton methods. It can be used
in the way described. In Section 11.4.1, we describe an alternative way to
apply an inner iteration. We prefer the latter form. We use this alternative
form in the algorithms given in Section 11.12 and 11.14.
The purpose of an inner iteration is to find an approximation for a
solution of f = 0 in the current box xI . This approximation can be used as
the point of expansion x in (11.2.1). The closer x is to a solution point of
f = 0, the smaller the solution set of (11.2.6a).
Later in this section, we describe an inner iteration that generally ob-
tains a point x ∈ xI where ||f (x) || is smaller than at the center of xI . The
expectation is that by obtaining this better point of expansion, we require
fewer iterations of the main algorithm. This means that fewer evaluations
of the Jacobian are required. Therefore, less overall computation is re-
quired to solve the system of nonlinear equations. This has been verified
by experiment. For example, see Hansen and Greenberg (1983).
In Section 11.2, we note that the first step in solving (11.2.3) is to
multiply by an approximate inverse B of the center of the coefficient matrix
J(x, xI ). Hansen and Greenberg (1983) pointed out that since B is available,
we can use it to perform a real Newton step (or steps) to try to obtain a better
approximation for a zero of f than the center m(xI ) of xI .
The initial point z(0) can be chosen to be the center of the current box. At
the final point, f is as small or smaller in norm than at z(0) . The final point
is used as the point of expansion x in (11.2.1). This final point must be in
xI so that the expansion (11.2.1) is valid for the ensuing interval Newton
step using xI .
The iteration is discontinued if z(i+1) is outside xI . Suppose this is the
case. Let z denote the point between z(i) and z(i+1) where the line segment
joining these two points crosses the boundary of xI . If ||f(z )|| < ||f(z(i) )||,
we choose z as our approximation for a zero (and hence as our point of
expansion). Otherwise,we choose z(i) . The vector norm we use in this book
is
The inner iteration is also stopped if ||f(z(i+1) )|| > ||f(z(i) )||. In this
case, we let x = z(i) . Otherwise, the iteration (11.4.1) is stopped after three
steps and we set x = z(3) . Further iteration might converge to a zero of f.
However, the convergence rate is only linear if B is fixed. Hence, it is not
efficient to perform too many iterations.
It is more important to have f (x) small when solving (11.2.6a) using
either the hull method or Gaussian elimination than using a step of the
Gauss-Seidel method. This is because the former two methods yield con-
vergence in one step of (11.2.6a) if f (x) = 0 (and exact interval arithmetic
is used). A Gauss-Seidel step does not.
Therefore, a different number of steps using (11.4.1) can be used de-
pending on which method is used to solve the linearized system (11.2.3).
In the algorithm given below in this section, we have used the same upper
limit (i.e., 3) on the number of inner iterations using (11.4.1).
The amount of work to do the inner iteration is small compared to that
required to compute J(x, xI ) and B and then to solve the preconditioned
linear system (11.2.6a). Therefore, the inner iteration is worth doing even
0. Initialize:
Criterion A:
As noted in Section 9.4, care must be taken when choosing εX for use in an
absolute criterion of this form.
A user might want to have ||f xI || small for all x in a final box xI
bounding a zero of f. Therefore, we define:
Criterion B:
Since f I (x) and J(x, xI ) are computed when performing a Newton step, the
only extra effort required to compute a bound on f(xI ) is a matrix-vector
multiplication and a vector addition. Moreover, when w(xI ) is small, this
method generally bounds the range of f over xI more sharply than directly
evaluating f(xI ).
When we use the relation in (11.5.3) to bound f(xI ), we use the already
computed J(x, xI ) to obtain a new box xI in xI . Thus, we use f(xI ) ⊂
f I (x) + J(x, xI )(xI − x). Therefore, J(x, xI ) has wider elements than the
correct Jacobian. As a result, the computed bound on f(xI ) is wider than
necessary. However, if we simply evaluate f(xI ), then dependence causes
widening so computed bounds are not sharp in either case.
To see if Criterion B is satisfied, another option when evaluating f(xI )
is to use hull consistency to bound f(xI ) as described in Section 10.10. The
box might be reduced in the process.
In our termination procedure, we check whether Criterion A is satisfied
before checking to see if Criterion B is satisfied. If Criterion A is not
(See (11.2.4a).) To precondition the system, the first step in solving this
equation is to multiply by an approximate inverse B of the center of J(x, xI ).
(See Sections 5.6 and 11.2.) The resulting coefficient matrix is M(x, xI ) =
BJ(x, xI ). We discuss another way to compute M(x, xI ) in Section 11.9.
Assume that J(x, xI ) is regular (i.e., does not contain a singular matrix).
Then the inverse B exists and M(x, xI ) can be computed. If w(xI ) is small,
then the interval elements of M x, xI are generally small in width and
M(x, xI ) approximates the identity matrix.
For our current purpose, we are interested only in whether M(x, xI ) is
regular or not. Below, we see that for another reason M(x, xI ) is checked
for regularity. Therefore, no extra computation is required to make this
check for our current purpose.
Note that regularity of M(x, xI ) assures that any zero of f in xI is simple.
In this case, we are able to decide when xI approximates xI ∗ . We use
All we really need is a condition that assures that xI is not large when
Condition D holds. However, a choice must be made on how to proceed.
We describe our choice in this section. When M(x, xI ) = BJ(x, xI ) is
regular, we can determine the hull of the solution set of the preconditioned
system
by the hull method given in Section 5.8. When M(x, xI ) is irregular, this
method is not applicable; and neither is Gaussian elimination. Therefore,
we use the Gauss-Seidel method described in Section 5.7. A difficulty of
the Gauss-Seidel method (from the perspective of termination) is that even
when 0 ∈ f I (x) and xI is large, it is possible to have N(x, xI ) ⊃ xI .
Suppose that in our algorithm, we have applied hull and box consisten-
cies to a box (say yI ) and have obtained the box xI to which the Newton
method is applied. Suppose yI = xI . Then the consistency methods have
reduced yI . In this case, we assume xI is not as small as it can be made to
be, even if the Newton method fails to reduce it.
In our algorithm, we rely upon Criteria A and B to stop iteration on
a box for which Condition D holds. However, we now point out some
alternatives that can be used, that we have not yet tried.
We can compare ||f (x)|| with ||f (xI )|| for x ∈ xI . Note that ||f (xI )||
approaches ||f (x)|| as the width of xI approaches zero with x ∈ xI .
3. Continue processing xI .
4. Continue processing xI .
Note that our termination processes do not take special measures when
Condition D of Section 11.5 holds. In this case we assume that the toler-
ances in Criteria A and B are chosen adequately to cause termination in a
suitable manner.
It might happen that two or more abutting boxes are output as final
results and it is their union that contains a solution (or multiple nearly
coincident solutions). However, some of these output boxes might not
contain a solution.
for some constant α where 0 < α < 1. But suppose xi is the narrowest
component of xI . Then this condition is satisfied when there is little decrease
in the distance between extreme points of xI .
We could also require that
w(xI ) < β w(xI ) (11.7.2)
for some constant β where 0 < β < 1. In this case, we compare the
widest component of xI with the widest component of xI . But even if every
component of xI except the widest is reduced to zero width, this criterion
says that insufficient progress has been made.
for some constant γ where 0 < γ < 1. This assumes that at least one
component of xI is reduced in width by an amount related to the widest
component of xI .
We choose γ = 0.25. Thus, we define
We do not use this criterion form (see (11.7.2)) for determining when
progress is barely sufficient. However, we use it in the present case when
it indicates that progress is more than just sufficient.
n 1/2
V = Ti2
i=1
Tj ≥ Tj +1 (j = 1, · · · , n − 1) .
1 3Tn2
αn−1 = + 2
2 8Tn−1
2
1 Tk+1
αk = 1 + αk+1 (2 − αk+1 ) 2 (k = 1, · · · , n − 2)
2 Tk
Each αi , and thus wi (i = 1, · · · , n − 2) can be found by recurring back-
ward.
Suppose we are splitting a particular component of a box by the process
we have described. Since the component is not split in half, we must decide
whether to have the split occur in the lower half or the upper half of the
component.
We have no information concerning how well a Newton step will per-
form in each new subbox. In fact, since we use the Jacobian in deciding
how to split, it is reasonable to assume that a Newton step (which uses the
Jacobian) will be equally effective in any one of the generated subboxes.
However, the consistency methods (which do not use the Jacobian) provide
a small amount of guidance.
Note that if the width of the gap is ≥ 0.2 w(xI ), then (11.8.2) is satisfied.
In this case, we are willing to shave off a thin slice of xI because we also
delete a relatively wide gap. Any gap we discuss subsequently is assumed
to satisfy (11.8.2). If gaps exist that satisfy condition (11.8.2) for use in
splitting, their use takes precedence over simply splitting a component of
xI at a point. The parameter Tj given by (11.8.1) is used to select the gaps
to be used. That is, we do not simply use the widest gaps.
Suppose we are using the procedure given above in this section to split
in n dimensions. When a particular component is to be split and it contains
a gap suitable for splitting, we split using the gap rather than the computed
point. We store the smaller part and use the larger part for splitting as the
procedure prescribes. This does not alter the relative sizes of subboxes
formed subsequently by the procedure.
When solving an optimization problem using methods given in subse-
quent chapters, we sometimes use the splitting procedure described above.
In this case, the matrix J(x, xI ) whose elements occur in (11.8.1) is the
Jacobian of the gradient of an objective function. In some constrained
problems, this Jacobian does not involve some of the variables defining the
box to be split. In this case, we proceed as follows:
Let SJ denote the set of indices of variables that occur in the definition
of J(x, xI ). Let S0 denote the set of remaining variable indices.
We order all components of the box in order of decreasing width. If the
Jacobian has not yet been evaluated, we split the three leading components
in the list. If the Jacobian has been evaluated and the three leading compo-
nents all have indices in the set SJ , we use the above procedure to split all
the components with indices in SJ . If the index of at least one of the three
leading components in the list is in S0 , we split only the three components
leading the list.
If parallel processors are available, each can have a copy of the interval
Newton algorithm. The initial box can be split before processing starts so
that each processor has a box to process. When any box is split, a resulting
subbox can be passed to an available processor. A processor becomes
We have noted that the search for solutions to the system of nonlinear
equations is confined to a box xI (0) . Generally, this box is specified by the
user. Actually the region of search can be a set of boxes. The boxes can be
disjoint or overlap. However, if they overlap, a solution at a point that is
common to more than one box is separately found in each box containing
it. In this case, computing effort is wasted.
If the user does not specify an initial box (or boxes), we use a “default
box”. Let N denote the largest floating point number that can be repre-
sented in the number system used on the computer. Our default box has
components xi(0) = −N, N for all i = 1, · · · , n. We assume that any
finite solution outside the default box is of no interest. To find any solution
outside the default box requires higher precision arithmetic.
1
w(xI ) ≤ (wR + wI ). (11.11.1)
2
In this section, we list the steps of our interval Newton method. The al-
gorithm contains the procedures described earlier. Note that we refer to it
simply as a Newton method. However, it involves application of hull and
box consistencies and two forms of Newton methods.
The selected features of the algorithm were chosen using both exper-
imentation and theoretical considerations. There is undoubtedly room for
improvement. In particular our selection of various numerical parameters
was often made with too little experimental work to obtain the best values.
For example, it might be better to let some parameters vary with dimen-
sionality. Despite its shortcomings, the algorithm works well in practice.
We assume that an initial box xI (0) is given. We seek all zeros of f in
this box. However, as discussed in Section 11.10, more than one box can
be given. As the algorithm proceeds, it usually generates various subboxes
1. If the list L is empty, stop. Otherwise, choose the box most recently
put in L to be the current box. Delete it from L.
2. For future reference, store a copy of the current box xI . Call the
copy xI (1) . If hull consistency has been applied n times in succes-
sion without applying a Newton step, go to Step 9. (In making this
count, ignore any applications of box consistency.) Otherwise, apply
hull consistency to the equation fi (x) = 0 (i = 1, · · · , n) for each
variable xj (j = 1, · · · , n). To do so, cycle through equations and
variables as described at the end of Section 10.2. Use more general
hull consistency methods if desired. See Section 10.5. If the result
is empty, go to Step 1.
4. If the box xI (1) (see Step 2) was sufficiently reduced (as defined using
(11.7.4)) in Step 2, repeat Step 2.
6. Repeat Step 3.
10. For later reference, denote the current box by xI (2) . Compute J(x, xI )
using a Taylor expansion based on (7.3.6) or else using slopes (see
(7.9.1)). Use the point determined in Step 9 as the point of expansion.
Compute an approximation Jc for the center of J(x, xI ). Compute an
approximate inverse B of Jc . If Jc is singular, compute B as described
in Section 5.11. Compute M(x, xI ) = BJ(x, xI ) and r(x) = −Bf(x).
If M ii (x, xI ) ≤ 0 for any i = 1, · · · , n, then M(x, xI ) is irregular; so
update wI as described in Section 11.11 and go to Step 12.
14. If the box was sufficiently reduced (as defined using (11.7.4)) by the
single pass of the Gauss-Seidel method of Step 12, update wR as if
M(x, xI ) were regular for the box to which the Gauss-Seidel method
was applied in Step 12 and return to Step 12. If the box was not
sufficiently reduced in Step 12, go to Step 15.
16. Use the hull method (see Section 5.8.2) to solve M(x, xI )(y − x) =
r(x). If the result is empty, go to Step 1.
19. Record the fact that the matrix B computed the last time Step 10 was
applied is to be used whenever Step 9 is applied to any subbox of xI .
20. If w(xI ) ≤ 1
8
w(xI (2) ) (note xI (2) was defined in Step 9), go to Step 9.
21. Note: The user might wish to bypass analytic preconditioning. See
the comment in Section 11.9. If so, go to Step 25.
Additional note: This step as well as Steps 22 and Step 25 are written
for the case in which the first method of analytic preconditioning
described in Section 11.9 is used. If the alternative method in Section
11.9.1 is used, these steps must be altered accordingly. In either
case, determine the analytically preconditioned function Bf I (x) as
described in Section 11.9.
22. If hull consistency has been applied n times to the analytically pre-
conditioned equation Bf(x) = 0 without changing B, go to Step
25. Otherwise, apply hull consistency to solve the i-th equation of
Bf(x) = 0 to bound xi for i = 1, · · · , n. If the result is empty, go to
Step 1.
24. If the box xI is sufficiently reduced (see (11.7.4)) in Step 16, go again
to Step 22.
After termination (in Step 1), bounds on all solutions of f (x) in the
initial box xI (0) have been recorded. A bounding box xI recorded in Step 3
satisfies the conditions w(xI ) ≤ εX and ||f(xI )|| ≤ εf specified by the user.
A box xI recorded in Step 14 approximates the best possible bounds that
can be computed with the number system used.
where M(x, xI ) and rI (x) are defined in (11.2.5). In this case, a repeat of
Step 16 cannot improve the result without recomputing this matrix. How-
ever, if M(x, xI ) is irregular, a Gauss-Seidel step is used and it might be
possible to compute a sharper solution by repeating Step 12 using the same
matrix. This explains Step 14.
When the Newton method exhibits rapid convergence, there is little
point in using other methods. Therefore, in Step 20, we bypass hull con-
sistency and box consistency when the Newton method suffices.
5. Use the hull method (see Section 5.8.2) to solve M(x, xI )(y − x) =
r(x). Record the result for use in the main program; and return to the
main program.
See Alefeld (1984) for a proof of this theorem. See also Alefeld (1999).
In our algorithm in section 11.12, we always precondition the system
(11.9.1). Alefeld’s proof holds whether the system is preconditioned or
not.
The following theorem shows that, under certain conditions, the volume
of the current box is reduced by at least half in a step of the interval Newton
method using Gaussian elimination.
Any box that does not contain the center of xI (k) must intersect less than
half of xI (0) . Therefore, the theorem states, in effect, that the volume of
xI (k+1) is less than half that of xI (k) for all k.
This theorem is proved by Alefeld (1984). Compare Corollary 5.2.9 of
Neumaier (1990).
To discuss how interval Newton methods can prove the existence of a
solution, we introduce the following proposition.
This proposition has been proved for various interval Newton methods.
Recall that interval Newton methods differ in how the bound N(x, xI ) is
computed from (11.2.6a). Each proof of Proposition 11.15.5 has been for a
method using a specific procedure for computing N(x, xI ). The first proof
of Proposition 11.15.5 was by Kahan (1968) for a method he derived and
which is now called Krawczyk’s method after its independent derivation by
Krawczyk (1969). See (11.2.8). Proof for this method can also be found
in Moore (1977, 1979). See also Alefeld (1999).
Proofs for various methods can be found in Alefeld (1999), Krawczyk
(1986), Nickel (1971), Qi(1982), Shearer and Wolfe (1985a), and especially
Neumaier (1990).
When the Gaussian elimination or the Gauss-Seidel method is used to
compute N(x, xI ), an intersection process is used to compute each new
component. See (11.2.6b). Therefore, we always have xI (k+1) ⊂ xI (k) . To
invoke Proposition 11.15.5, we must assume that the box N(x, xI ) is the
same one that results if no intersecting is done.
Alternatively, authors usually impose the slightly stronger condition
that xI (k+1) be strictly in the interior of xI (k) . Thus, we state a relevant
theorem as follows.
For a proof of this theorem, see Neumaier (1985, 1990) or Alefeld (1999).
In practice, when rounding is present, we compute a box N (x, xI )
containing N(x, xI ). If N (x, xI ) ⊂ xI , then N(x, xI ) ⊂ xI . Hence, we can
invoke Theorem 11.15.6 even when rounding occurs and guarantees the
existence and uniqueness of a zero of f in a box xI .
Hansen and Walster (1990b) conjectured as follows that Proposition
11.15.5 is true for all interval Newton methods.
fi (x) = xi (2 + 5xi2 ) + 1 − xj (1 + xj ) = 0 (i = 1, · · · , n)
j ∈Ji
UNCONSTRAINED
OPTIMIZATION
12.1 INTRODUCTION
f (x) = x 4 − 4x 2
Thus, we know that f (x) ≥ 17 for all x ∈ [3, 4] including such transcen-
dental points as π = 3.14.... Since f (1) = −3, the minimum value of f is
no larger than −3. Therefore, the minimum value of f cannot occur in the
interval [3, 4]. We have proved this fact using only two evaluations of f .
In general the evaluation of f at a point involves rounding. Suppose that
(outward) rounding and widening of intervals from dependence occurred
in our example and we somehow obtained
f I (1) = [−3.1, −2.9] and f I ([3, 4]) = [16.9, 220.1].
Because we have bounded all errors, we know that f (1) ≤ −2.9 and that
f (x) ≥ 16.9 for all x ∈ [3, 4]. Hence, as before, we know the minimum
value of f is not in [3, 4]. Rounding and dependence have not prevented
us from infallibly drawing this conclusion.
By eliminating subintervals that are proved to not contain the global
minimum, we eventually isolate the minimum point. We describe various
ways to do the elimination.
An algorithm for global optimization was introduced by Hansen (1980).
The algorithm provides guaranteed bounds on the globally minimum value
f ∗ of an objective function and on the point(s) x∗ where it occurs. It
guarantees that the global solution(s) in some given box has been found. A
one dimensional version of the algorithm can be found in Hansen (1979).
In this chapter, we describe an improved version of the algorithm. The
primary improvement is the introduction of hull and box consistency. Var-
ious other changes are also made.
Because of computational limitations on accuracy, our algorithm might
also find “near global” minima when rounding and/or dependence prevents
determination of which of two or more candidates is the true minimum.
However, if the termination criteria are chosen stringently enough, our
algorithm always eliminates a local minimum from consideration if f is
sufficiently larger than f ∗ at the local minimum. Obviously, “sufficiently
larger” depends on the wordlength used in the computations
12.2 AN OVERVIEW
1. Begin with a box xI (0) in which the global minimum is sought. Be-
cause we restrict our search to this particular box, our problem is
really constrained. We discuss this aspect in Section 12.3.
The user of our algorithm can specify a box or boxes in which the solution
is sought. Any number of finite boxes can be prescribed to define the region
of search. The boxes can be disjoint or overlap. However, if they overlap,
a minimum at a point that is common to more than one box is separately
found as a solution in each box containing it. In this case, computing effort
is wasted. For simplicity, assume the search is made in a single box that
we denote by xI (0) .
If the user does not specify xI (0) , we search in a “default box” described
in Section 11.10. The smaller the initial box, the faster the algorithm can
solve the problem. Therefore, it is better if the user can specify a smaller
box than the default.
For the box used, we have written g1 (x, y) in a form from which its interval
value can be sharply computed.
g1 (x, y) = 4x − 4.2x 3 + x 5 − y.
Suppose we solve for the term 4x. We write the remaining terms containing
x in factored form so that
4X = Y − X 3 (X 2 − 4.2).
We use various procedures to find the zeros of the gradient of the objective
function. We have noted that these zeros can be stationary points that are
not the global minimum. Therefore, we want to avoid spending the effort
to closely bound such points when they are not the desired solution. In this
section we consider procedures that help in this regard.
As our algorithm proceeds, we evaluate f at various points in the orig-
inal box xI (0) . The computed upper bound on each such value is an upper
bound for the globally minimum value f ∗ of f . We use the smallest bound
f obtained in this way.
3
f (xI ) = [(x1 − xi2 )2 + (xi − 1)2 ].
i=1
3
3
(x1 − xj2 )2 ≤f− (X1 − Xi ) −
2 2
(Xi − 1)2
i=1 i=1
i=j
for j = 1, 2, 3 and
3
3
(xi − 1)2 ≤ f − (X1 − Xi2 )2 − (Xi − 1)2
i=1 i=1
i=j
for i = 1, 2, 3.
When new bounds on both variables have been obtained in this way,
we get a new bound f on f ∗ by evaluating f at the center of the new box.
We iterate these steps. The procedure converges to the solution at (1, 1, 1)
in only seven steps.
The first method (just described) applies hull consistency to f (x) ≤ f.
Note that it does not involve use of derivatives or slopes. The methods
we now discuss use Taylor expansions. However, better results can be ob-
tained using slope expansions. Because expansions are used, the following
methods require more computing than the first method, which uses only
hull consistency.
q I (ai )
N(ai , xI ) = ai − . (12.5.1)
gi (xI )
However, we do not apply the Newton step if 0 ∈ q I (ai ). See the algorithm
in Section 10.2. Therefore, assume 0 ∈ / q I (ai ).
In the denominator of (12.5.1), the function gi (xI ) is the derivative of
F with respect to xi . That is, it is the i-th component of the gradient g
of f . If 0 ∈ / gi (xI ) then there is no stationary point of f in xI ; and we
I
delete x . Therefore, when we apply box consistency to F , we always have
0 ∈ gi (xI ).
We now know that the denominator interval in (12.5.1) contains zero
but the numerator does not. Therefore, the quotient in (12.5.1) is computed
as the union of two semi-infinite intervals. The endpoint ai of Xi is in the
interior of the gap between these intervals. This implies that, if the Newton
step is applied, it always deletes part of Xi . However, the deleted part can
be vanishingly small.
Box consistency can be effective when applied to f (x) ≤ f even when
it is applied over a large box. We now consider methods that are usually
effective only when the box is small.
g I i = gi (X1 , · · · , Xi , xi+1 , · · · , xn ).
where g I i is the i-th component of the gradient of f . From (7.3.6), the first
order Taylor expansion of f gives
n
f (y) ∈ f (x) + (yi − xi )g I i . (12.5.2)
i=1
Note that f (y) > f if the left endpoint of the right member of (12.5.4)
exceeds f.
Define t = yj − xj and the intervals
n
U = f (x) + (Xi − xi )g I i − f
i=1
i=j
U + V t ≤ 0. (12.5.5)
Let T denote the set of values of t for which (12.5.5) holds. This set can
be an interior or an exterior interval as given by (6.2.4).
Note that if T is the empty set, we have proved that f (y) > f for all
y ∈ xI so we delete xI .
Having computed T for a particular value of j , the set of retained values
of yj is T + xj . Since we are interested only in values of yj within xI j , we
retain only the intersection
Yj = xI j ∩ (T + xj ).
1
f (y) ∈ f (x) + (y − x)T g(x) + (y − x)T H(x, xI )(y − x)
2
1
f (x) + zT g(x) + zT H(x, xI )z > f.
2
To simplify presentation, we describe the case n = 2. If we delete the
points indicated, we retain the complementary set of points y where
1
f (x) + z1 g1 + z2 g2 + (z12 H11 + z1 z2 H21 + z22 H22 ) ≤ f (12.5.6)
2
We first solve (12.5.6) for z1 . Therefore, we replace z2 by Z2 = X2 −x2 .
That is, we replace y2 by its bounding interval X2 . Thus, (12.5.6) becomes
where
1
A = f (x) − f + Z2 g2 + Z22 H22
2
1
B = g1 + Z2 H21
2
1
C = H11 .
2
Suppose we solve the quadratic relation (12.5.7) (as described in Section
8.2) for z1 and obtain an interval Z1 and thus the interval X1 = Z1 + x1 .
where
1
A = f (x) − f + Z1 g1 + (Z1 )2 H11
2
1
B = g2 + Z1 H21
2
1
C = H22 .
2
Whether we are solving (12.5.7) or (12.5.8), we must determine the
solution points t of a quadratic inequality of the form
A + Bt + Ct 2 ≤ 0 (12.5.9)
where A, B, and C are intervals. The solution set of (12.5.9) was derived
by Hansen (1980) in explicit form. Although his analysis was correct, there
are errors in his listed results. Denote A = [a1 , a2 ]. His errors are for the
case a1 = 0. A correct form (with some simplifications) was given in the
first edition of this book. To list all the cases required almost a full page.
We gave a simpler procedure in Section 8.2. See also Hansen and Walster
(2001).
12.5.5 An Example
We have described four procedures for deleting points x where f (x)˙ > f.
Generally, none of these methods can delete all the points x of a box xI
It is easily shown that the derivative f (x) of this function has a zero in the
interval [n, n + 1] for all n = ±2, ±3, · · · . To see this, note that f (0) = 0,
f (±n) > 0 for n even and nonzero, and f (±n) < 0 for n odd and n ≤ 3.
Also, f (x) = 0 for x = 1.118, approximately. Thus, f´ has at least 2n + 1
zeros in the interval [−n, n] for all n = 1, 2, ....
Suppose we have sampled the value of f at x = 1. Since f (1) =
2, we have the upper bound f = 2 on f ∗ . We can replace the relation
x 2 (2 + sin π x) ≤ f = 2 by
Consider the interval X = [2, 1030 ]. To solve this equation using hull
consistency we can replace sin π x by sin π X = [−1, 1] and solve for the
factor x 2 . We obtain
12.7 CONVEXITY
which is positive for |x| < 0.5628 and for |x| > 2.384, approximately.
Therefore, hull or box consistency applied to f (xI ) ≥ 0 is not able to
delete any part of an interval X if |X| < 0.5628 or if |X| > 2.384.
However, f (x) < 0 in the intervals ±[0.5628, 2.383]. If X intersects
one of these intervals, box consistency (when iterated) can delete this inter-
section. Depending on how hull consistency is implemented, it can delete
all or part of this intersection. For example, if X = [−1, 2] and we solve
for the term in the left member that is dominant over [−1, 2], we have
for y. See Section 11.2. But the Jacobian J of g is the Hessian of the
objective function f . In Sections 7.4 and 12.7, we noted that the diagonal
elements of the Hessian must be nonnegative at a minimum That is, the
diagonal elements of JI x, xI must be nonnegative when expanding the
gradient of f .
Note that certain arguments of elements of JI x, xI are real (rather
than interval). See Section 7.4. That is why we denote the Jacobian by
JI x, xI rather than JI (xI ). However, one element of JI x, xI in each of
its rows must have all its arguments as intervals. The sequential expansions
to obtain a row of JI x, xI can be ordered differently for each row so that
it is the diagonal element which has intervals for all its arguments.
We can now conclude that there is no minimum of f in xI if a diagonal
element of J x, xI is negative. In our minimization algorithm, this fact
is useful because we sometimes check to see if any diagonal element of
J x, xI is negative before we compute the remaining elements of J x, xI .
See Step 20 of the algorithm in Section 12.14. However, we can delete any
negative part of a computed interval value of a diagonal element of J x, xI .
This is a valid step whether or not xI contains a minimum of f .
Note that this modification of Jii (xI ) is not valid if J is obtained using
slopes. This is one of the few cases in which derivatives have an advantage
over slopes.
12.9 TERMINATION
Because our optimization algorithm often splits a box into subboxes, the
number of stored unprocessed subboxes stored can grow. The algorithm
can entirely eliminate a given subbox. This generally keeps the number of
stored boxes from growing too large.
Splitting and reducing boxes eventually causes any remaining box to be
“small”. We require that two conditions be satisfied before a box is deemed
to be small enough to be included in the set of solution boxes. First, a box
xI must satisfy a condition
w(xI ) ≤ εX (12.9.1)
f − εf ≤ f ∗ ≤ f.
That is, f ∗ is bounded to the required accuracy. Note that with this ap-
proach, we can dispense with computations required to check termination
conditions.
The algorithm provides both lower and upper bounds on the global mini-
mum f ∗ . After termination, the solution box (or boxes) must contain the
global minimum. Suppose that some number s of boxes remain. Denote
them by xI (i) (i = 1, · · · , s).
The algorithm evaluates f (xI (i) ) for each i = 1, · · · , s. Denote the
result by
Denote
(i)
F = min f (xI ).
1≤i≤s
(i) (i)
f (xI ) ≤ f ≤ f (xI ) (12.10.1)
for all i = 1, · · · , s.
Since the global minimum must be in one of the final boxes,
F ≤ f ∗. (12.10.2)
f ∗ ≤ f ≤ f (xI ).
(j )
(12.10.3)
F ≤ f ∗ ≤ F + εf . (12.10.5)
F ≤ f ≤ F + εf .
From this relation and (12.10.5), we see that f and f ∗ are in the same interval
of width εf . Therefore, using (12.10.3)
f ∗ ≤ f ≤ f ∗ + εf . (12.10.6)
That is, the upper bound f differs from the global minimum by no more
than εf .
Note that f ∗ ≤ f. This might be a sharper upper bound on f ∗ than that
given by (12.10.5).
From (12.10.1), (12.10.4), and (12.10.6), we conclude that f (x)−f ∗ ≤
2εf for each point x in each final box.
The accuracy specified in the above relations is guaranteed to be correct
for the results computed using our algorithm. This is because we use interval
arithmetic to bound rounding errors. In contrast, noninterval algorithms
generally cannot guarantee accuracy. This fact is illustrated by a published
paper in which run time for a noninterval algorithm is given to obtain eight
digit accuracy for the solution to a given problem. However, the reported
solution is correct to only four digits.
Our optimization algorithm normally begins its search in a single given box
xI (0) . For simplicity, our discussion throughout this book usually assumes
this to be the case. We can also begin with a set of boxes wherein we seek
the global minimum. This is no disadvantage even if the region formed by
w(xI ) ≤ εx (12.11.1a)
w[f (xI )] ≤ εf (12.11.1b)
(where f is the objective function) is put in a list L2 . Any box for which
these criteria are not both satisfied is placed in a list L1 of boxes yet to be
processed. Assuming both εX and εf are small, boxes in L2 are small and
f varies very little over any one of them.
In this section, we list the steps of our algorithm for solving the uncon-
strained optimization problem. The algorithm computes guaranteed bounds
on the globally minimum value f ∗ of the objective function f (x) and guar-
anteed bounds on the point(s) where f (x) is a global minimum.
Assume a given box or boxes in which the solution is sought is placed
in a list L1 of boxes to be processed. Set wR = 0 (see Section 11.11).
If a single box x(0) is given, set wI = w(xI (0) ). If more than one box is
given, set wI equal to the width of the largest one. If an upper bound on the
minimum value f ∗ of f (x) is known, set f equal to this value. Otherwise,
set f = +∞.
A box size tolerance εX and a function width tolerance εf must be
specified as described in Section 12.9 by the user.
Let wH be defined as in Section 12.7 on page 305. The algorithm sets
the initial value of wH = +∞. It also sets wR and wI .
We do the following steps in the order given except as indicated by
branching. For each step, the current box is denoted by xI even though it
might be altered in one or more steps.
4. If hull consistency has been applied (in Step 6 ) n times to the relation
f (x) ≤ f without having done Step 9, go to Step 8. (The integer n is
the number of variables on which f depends.)
20. Compute the Jacobian J(x, xI ) of the gradient g. Order the variables
in each row of J x, xI so that it is the diagonal element for which
all arguments are intervals as described in Section 12.7 (and Section
7.4). If a diagonal element of J x, xI is strictly negative go to Step
3. Otherwise, delete any negative part of any diagonal element of
J x, xI . Compute an approximate inverse B of the approximate
center of J(x, xI ) and the matrix M(x, xI ) = BJ(x, xI ).
21. If the matrix M(x, xI ) is regular, find the hull of the solution set of
the linear system determined in Step 20 as described in Section 5.8.
If M(x, xI ) is irregular, apply one pass of the Gauss-Seidel method
to the linear system. See Section 5.7. Update wI or wR as prescribed
in Section 11.11. If the result of the Newton step is empty, go to Step
3. If the interval Newton step proves the existence of a solution in xI
(see Proposition 11.15.5), record this information.
23. If the width of the box was reduced by a factor of at least 8 in the
Newton step (Step 21), go to Step 20.
25. Use the gradient value g(x) and the matrix B computed in Step 20 to
compute the point y = x − Bg(x). See Section 12.6. Use the value
of f (y) to update f.
29. Using the matrix B computed in Step 20, analytically determine the
system Bg(x). Apply hull consistency to solve the i-th component
of Bg(x) for the i-th variable xi for i = 1, · · · , n. If this procedure
proves the existence of a solution in xI (as discussed in Section 10.12),
record this information. Note: If the user prefers not to do analytic
preconditioning, go to Step 36.
32. Apply box consistency to solve the i-th component of Bg(x) (as
determined in Step 29) for the i-th variable for i = 1, · · · , n.
35. Split the box xI as prescribed Section 12.13. If gaps that satisfy
(11.8.2) have been generated in any of these components, use the
gaps to do the splitting. Evaluate f at the center of each new box and
use the results to update f . Then go to Step 3. Note that if multiple
processors are use, the number of components to split might be more
than three. See Section 11.8.
36. Delete any box xI from list L2 for which f (xI ) > f. Denote the
remaining boxes by xI (1) , · · · , xI (s) where s is the number of boxes
remaining. Determine the lower bound for the global minimum f ∗
as F = min f (xI (i) ).
1≤i≤s
37. Terminate.
x − x∗ || ≤ ε1
||# (12.15.1)
and/or
x) − f ∗ ≤ ε2
f (# (12.15.2)
for some ε1 and ε2 . Recall that x ∗ is a point such that f (x∗ ) = f ∗ is the
globally minimum value of the objective function f . Our algorithm might
or might not determine a point # x that fully satisfies (12.15.1) and (12.15.2).
If#x is any point in any final box, then f (#x) − f ∗ ≤ 2εf . See (12.10.6).
Therefore, (12.15.2) can always be satisfied by choosing εf = 21 ε2 .
If there is only one final box xI , the algorithm assures that it contains
∗
x . Therefore, we can choose # x to be any point in xI . Since w(xI ) ≤ εX ,
(12.15.1) is satisfied by choosing εX = ε1 . Also, f (# x) − f ∗ ≤ εf for any
#
x ∈ x because of the termination condition (12.9.2).
I
If there is more than one final box, we cannot necessarily satisfy equa-
tion (12.15.1). Let # x be any point in any final box. All we can assure
is that #x is no farther from x∗ than the maximum distance from # x to any
∗
point in any final box. However, f (# x) − f ≤ 2εf because this is true for
every point in every final box. Decreasing εX and εf and/or using higher
precision arithmetic might improve the bound on # x − x∗ .
The algorithm in Section 12.14 begins with procedures that involve the least
amount of computing. We use hull consistency first because it does not
Van Hentenryck et al. (1997) solved this problem using their algorithm
Numerica. Their initial box was given by X = Y = [−106 , 106 ] and the
stopping criterion was given by εX = 10−8 . We used these same param-
eters and chose εf large so that it did not affect our stopping procedure.
As a comparison criterion, we counted the number of boxes generated by
splitting. Numerica generated 356 boxes. The algorithm given in Section
12.14 generated 36 boxes. This is not a definitive comparison because the
computational effort per box is not compared.
Walster, Hansen, and Sengupta (1985) solved this problem beginning
with the much smaller box given by X = Y = [−4.5, 4.5] and obtained
a bounding box of width 10−11 . This required 315 applications of the
interval Newton method (as well as other procedures). For the much larger
initial box of width 2 × 106 , the algorithm of Section 12.14 needed only 18
Newton applications. Again, this is an incomplete comparison. However,
it illustrates the virtue of hull and box consistency when used together with
the interval Newton method.
There are applications in which one wants to find all stationary points of a
function. There are other applications in which one wants to find all local
minima whether they are global or not. In this section, we discuss how our
procedures can be applied to compute such results.
Note that all stationary points of a function in a box can be found by
applying the procedure in Section 11.12 to solve the system of equations
formed by setting to zero the components of the gradient of the given
function. However, our optimization algorithm can also be used for this
purpose.
In Section 12.5, we discussed how an upper bound on the global min-
imum can be used to delete local minima. If we omit the procedures of
Section 12.5 from the algorithm of Section 12.14, the resulting algorithm
finds all (global or local) minima of the objective function in the initial box.
If we wish to find all stationary points of the objective function, we
can do so by omitting an additional procedure. We also omit the procedure
described in Section 12.7 that deletes points of the box at which the objective
function is not convex.
CONSTRAINED
OPTIMIZATION
13.1 INTRODUCTION
m
r
u0 ∇f (x) + ui ∇pi (x) + vi ∇qi (x) = 0, (13.2.1a)
i=1 i=1
ui pi (x) = 0 (i = 1, · · · , m) , (13.2.1b)
qi (x) = 0 (i = 1, · · · , r) , (13.2.1c)
ui ≥ 0 (i = 0, · · · , m) (13.2.1d)
u 0 + · · · + um + E 1 v1 + · · · + E r vr = 1 (13.2.2a)
u0 + · · · + um = 1.
0 ≤ ui ≤ 1 (i = 0, · · · , m) .
u0 + · · · + um + v1 + · · · + vr = 1
since the left member might be zero for the solution to a given problem.
A possible alternative normalization is
S = u0 + · · · + um + v1 + · · · + vr . (13.3.2)
u0 + · · · + um + e1 v1 + · · · + er vr = 1 (13.3.3)
u0 + · · · + um + E(v1 + · · · + vr ) = 1.
u0 + · · · + um + e(v1 + · · · + vr ) = 1.
Therefore, we must not use the factored form. By using the non-factored
form (13.2.2a), we use the fact that interval arithmetic is not distributive.
This is an example of the need to carefully distinguish between interval
variables that are independent and those that are not. (See Chapter 4.)
Suppose we apply our optimization algorithm to a subbox xI of the
initial box xI (0) . Suppose also that, for any solution in xI , the interval
bounds uIi on ui (i = 0, · · · , m) and bounds viI on vi (i = 1, · · · , r) hold
and that
0 ≤ ui ≤ 1 (i = 0, · · · , m) ,
−1 ≤ vi ≤ 1 (i = 1, · · · , r) . (13.3.4)
and
which is valid for all u ∈ uI and all v ∈ vI where uI and vI are interval
vectors and Vi (i = 1, · · · , n) is the i-th component of vI . The real vectors
u ∈ uI and v ∈ vI are fixed. Using the bounds (13.3.4), we see that we
can replace condition (13.2.2b) by its an expansion in the form
By now, readers have observed that it is far simpler to use the normal-
ization u0 = 1 than to normalize as we have done. This can be done. It
produces the Kuhn-Tucker-Karush conditions. It simplifies the formulation
and slightly improves the efficiency of the algorithm. The only difficulty
with using u0 = 1 is that we might fail to find the global minimum in
the rather rare case in which the constraints are linearly dependent at the
solution. Consistent with the promise of or interval algorithms to never fail
to produce bounds on the set of all problem solutions, we choose not to
risk failing to find the global solution in this special circumstance. Con-
sequently, we do not use the normalization u0 = 1. Instead, we avoid this
risk by performing the extra computation required to use either (13.2.2a)
or (13.2.2b).
There is another simple expedient. We can write the equality constraints
qi (x) = 0 (i = 0, · · · , r) as two inequality constraints qi (x) ≤ 0 and
−qi (x) ≤ 0. Now all constraints are inequality constraints. Therefore, we
can use the simple normalization
u0 + · · · + us = 1
∂
Jij (t, tI ) = φi (T1 , ...Tj , tj +1 , · · · , tN )˙ (13.5.2)
∂tj
We linearize φ I as
When we use the linear normalization condition (13.2.2a), then some forms
of the interval Newton method do not need initial bounds on the Lagrange
multipliers. However, there are three ways in which computed bounds can
be useful. First, suppose we obtain a bound Ui on a Lagrange multiplier ui
and find that Ui < 0. Then condition (13.2.1d) is violated for all ui ∈ Ui ;
and there cannot be a solution in whatever box xI is used to compute Ui .
A(x)w = e1 (13.6.1)
has m + r + 1 components.
Consider the set of vectors w satisfying (13.6.1) as x ranges over xI .
This set contains the vector of Lagrange multipliers that satisfy the John
conditions for any x ∈ xI . We replace x by xI in (13.6.1) and obtain
A(xI )w = e1 . (13.6.2)
The solution of this equation provides the desired bounds on the Lagrange
multipliers.
We wish to apply Gaussian elimination to this equation to transform the
(nonsquare) coefficient matrix into upper trapezoidal form. To do so, we
precondition the problem by multiplying by a real transformation matrix as
described for the square matrix case in Section 5.6.
Let Ac denote the center of A(xI ). Using (real) Gaussian elimination
with row interchanges, we determine a matrix B that transforms Ac into up-
per trapezoidal form. We then apply interval Gaussian elimination (without
row interchanges) to the preconditioned equation
BA(xI )w = Be1
where RI is a square upper triangular matrix of order m+r +1. The vectors
bI 1 and bI 2 have m + r + 1 and n − m − r components, respectively. The
zero block in the new coefficient matrix has n − m − r rows and m + r + 1
columns. It is absent if m + r = n.
Consider the case m + r < n. From (13.6.3)
RI w = bI 1 (13.6.4a)
0 = bI 2 . (13.6.4b)
The solution is at
1/2
51/2 − 1
x1∗ =− −0.786,
2
51/2 − 1
x2∗ = 0.618.
2
Since there are no equality constraints, our normalization for the La-
grange multipliers is
u0 + u1 + u2 = 1. (13.7.2)
2x1∗
u∗0 = 0.611,
2x1∗ − 1
1
u∗1 = 0.174,
(1 − 2x1 )[1 + 2(x1∗ )2 ]
∗
1 − u1 − u2 + 2x1 (u1 + u2 ) = 0,
2x2 u1 − u2 = 0, (13.7.3)
u1 p1 (x) = 0,
u2 p2 (x) = 0.
No new bounds on the solution point are obtained by this procedure. There-
fore, we cannot improve the bounds on uI by iterating.
Next, consider the box xI with components X1 = [−0.9, −0.8] and
X2 = [0.5, 0.6]. This box does not contain a solution. Using the good
approximations u1 = 0.174 and u2 = 0.215 (and, implicitly, u0 = 0.611),
one interval Newton step yields a solution box disjoint from xI . This proves
that no solution of the optimization problem (13.7.1) exists in xI .
We now consider a final case for this example. This time, we do not
eliminate u0 using the normalization condition. Consider the box xI with
components X1 = [−0.7, −0.6] and X2 = [0.7, 0.8]. This box does not
contain a solution of (13.7.1). Evaluating p2 over the box, we obtain
p2 (xI ) = [−0.44, −0.21]. Since p2 (xI ) < 0, the constraint p2 ≤ 0 is
not active for any point in xI . However, 0 ∈ p1 (xI ). Dropping the inactive
constraint, equation (13.6.2) becomes
1 1 1
1 [−1.4, −1.2] u0 = 0 .
u1
0 [1.4, 1.6] 0
The third component of the right member does not contain zero. This proves
that no solution of the optimization problem (13.7.1) exists in xI .
Equation (13.5.1), which expresses (part of) the John conditions, be-
comes
u0 + u1 + Ev1 − 1
u0 + 2u1 x1 + 2v1 x1
φ(t) =
2u 1 x 2 − v 1
=0
u1 (x 2 + x 2 − 1)
1 2
x 1 − x2
2
where t = (x1 , x2 , u0 , u1 , v1 )T .
Let the box xI have components
X1 = [−0.8, −0.7], X2 = [0.6, 0.7].
containing the vector t for any solution with x∗ ∈ xI . The first two compo-
nents of tI are improved bounds for the solution point x∗ . The last three
components of tI are bounds for the Lagrange multipliers.
Since the last component of tI bounds v1 , we now know that v1 > 0
(for any solution with x∗ ∈ xI ). Hence, we can replace E by 1 in the
first component of φ(t) for any subsequent iterations using the new box
(because it is contained in xI ).
In this example, we started with bounds X1 and X2 and approxima-
tions for the Lagrange multipliers. Using (13.5.1), we computed improved
bounds on x∗ while producing bounds on the Lagrange multipliers. Iterat-
ing the procedure can produce sharper bounds on all these quantities.
Next, we consider use of the method described in Section 13.6 for the
same problem using the same box xI . Now, we do not need the approxima-
tions for the Lagrange multipliers. For this problem, the coefficient matrix
A(xI ) in equation (13.6.2) is square and (13.6.2) becomes
1 1 E 1
1 2X1 2X1 w = 0 .
0 2X2 −1 0
Hull consistency and box consistency can be applied to the John conditions.
To do so, we need bounds on the Lagrange multipliers. We have discussed
how bounds can be computed.
However, we do not apply consistency methods to the John conditions.
In our optimization algorithms, we apply consistency methods to each con-
straint individually. See the algorithms in Sections 14.8 and 15.12. Little
is gained by also applying them to the equations expressing the John con-
ditions.
INEQUALITY
CONSTRAINED
OPTIMIZATION
14.1 INTRODUCTION
In Chapter 13 we dealt primarily with the John conditions and the Lagrange
multipliers that are introduced to provide conditions for a solution. In Chap-
ters 14 and 15, we discuss procedures for solving constrained optimization
problems.
For pedagogical reasons, we consider inequality and equality con-
strained problems separately. In this chapter, we discuss the optimization
problem in which only inequality constraints are present. In the next chap-
ter, we discuss the problem in which only equality constraints occur.
By separating the cases, we hope to make clear which aspects of the
constrained problem are peculiar to the particular kind of constraints. There
is no difficulty in combining the algorithms for the two cases into a single
algorithm for problems in which both kinds of constraints occur. We do so
in Chapter 16.
Suppose that a given problem has inequality constraints but no equality
constraints. Then it might be possible to show that a given box is certainly
strictly feasible. If so, we know that any minimum in the box is a stationary
f(x)
10
Sampled Value of f
x
-10 10
[ ]
Minimum Value of f in X
-10
u0 + · · · + um = 1. (14.2.1)
See Sections 13.2 and 13.3. Therefore, the function given by (13.5.1),
which expresses (part of) the John conditions, becomes
u0 + · · · + um − 1
u0 ∇f (x) + u1 ∇p1 (x) + · · · + um ∇pm (x)
u1 p1 (x)
φ(t) = . (14.2.2)
..
.
um pm (x)
The remaining part of the John conditions not in (14.2.2) is that the
Lagrange multipliers are nonnegative. (See (13.2.1d).) Therefore, the
normalization equation (14.2.1) provides the bounds
0 ≤ ui ≤ 1 (i = 0, · · · , m). (14.2.3)
These bounds are useful when solving the John conditions using the form
of the interval Newton method in which the linearized equations are solved
by the Gauss-Seidel method.
Suppose we solve the linearized John conditions by Gaussian elimina-
tion or by the “hull method” of Section 5.8. Then we do not need bounds on
In Section 12.5, we discussed how to obtain and use an upper bound f on the
globally minimum value f ∗ of the objective function f . We do the same
thing for the constrained case. We can delete any point x where f (x) > f.
To compute f, we evaluate f at various certainly feasible points obtained
by the algorithm; and we set f equal to the smallest upper bound found in
this way.
When constraints are present, we must assure that each point used to
update f is feasible. We must assure this feasibility despite the fact that
rounding makes the correct value of a constraint function uncertain. We do
this by requiring that the point is certainly feasible as defined in Section
6.1.
Having computed an upper bound f, we use it to delete subboxes of
I (0)
x in the same way described in Section 12.5. Points deleted in this way
can be feasible or infeasible. The inequality f (x) ≤ f can also be added to
the John conditions as if it were an ordinary constraint.
We try to reduce f at various stages of the algorithm. Whenever the
algorithm produces a new subbox of xI (0) (see Section 14.8), we check to
see if the center of the box is certainly feasible. If so, we evaluate f at this
point and update f.
Suppose our algorithm has generated a subbox xI of the initial box xI (0) .
(See Section 14.8 to see how this might be done.) Let x denote the center
of xI . Under appropriate conditions (to be given), we search along a half-
line beginning at x for an approximate solution point of the optimization
problem (14.1.1).
If we have already found a certainly feasible point in xI (0) , we have a
finite upper bound f on the globally minimum value f ∗ . (See Section 14.3.)
Otherwise, f = +∞. We perform the line search to try to reduce f only if
f (x) < f. This decision is made regardless of whether x is a certainly
feasible point or not.
0. Initialize:
From the above discussion, we see that the i-th constraint is not violated in
xI if si ≤ 0, and there is no feasible point in xI if si > 1. For small values of
si the i-th constraint is only “slightly violated” by points in xI and is likely
to delete only a small part of xI . Therefore, to reduce effort, we use the i-th
constraint only if
f (xI ) − f
> 0.25. (14.6.3)
f (xI ) − f (xI )
In so doing, we obtain a(xI ) and h(xI ). Denote Xj = Xj ∩Xj . The function
g is chosen to be simple. Therefore, we can easily evaluate g(Xj ). One
extra subtraction and one extra multiplication yields a(xI )g(Xj ) − h(xI ).
This is an adequate approximation to pi (xI ) for the purpose of determining
si as defined in (14.6.1). Therefore, we save the work of evaluating pi (xI )
(i = 1, · · · , m).
Suppose that hull consistency is applied to another constraint after it
was last applied to the i-th. If so, the box can change and the computed
value for pi (xI ) is not for the final box. However, when we cycle through the
constraints and through the variables when applying hull consistency, most
of the change in the box occurs in the early stages. Therefore, we expect the
value of pi (xI ) (computed as described) to be a reasonable approximation
for the value pi over the final box.
p(x0 ) + J(y − x0 ) ≤ 0
n − m ≤ s/2. (14.6.4)
In this section, we list the steps of our algorithm for computing the global
solution to the inequality constrained problem (14.1.1).
Generally, we seek a solution in a single box specified by the user.
However, any number of boxes can by specified. The boxes can be disjoint
or overlap. However, if they overlap, a minimum at a point that is common
to more than one box is separately found as a solution in each box containing
it. In this case, computing effort is wasted. If the user does not specify an
initial box or boxes, we use a default box as described in Section 12.3. The
algorithm finds the global minimum in the set of points formed by the set
of boxes. We assume these initial boxes are placed in a list L1 of boxes to
be processed.
Suppose the user of our algorithm knows a point x that is guaranteed to
be feasible. If so, we use this point to compute an initial upper bound f on
the global minimum f ∗ . If x cannot be represented exactly on the computer,
we input a representable interval vector xI containing x. We evaluate f (xI )
and obtain [f (xI ), f (xI )]. Even if rounding and/or dependence are such
that xI cannot be numerically proven to be certainly feasible, we rely upon
the user and assume that xI contains a feasible point. Therefore, we set
f = f (xI ).
Also the user might know an upper bound f on f ∗ even though he might
not know where (or even if) f takes on such a value. If so, we set f equal to
this known bound. If the known bound is not representable on the computer,
we round the value up to a larger value that is representable.
If no feasible point is known and no upper bound on f ∗ is known, we
set f = +∞.
1. For each box in the list L1 , apply hull consistency to each of the
inequality constraints as described in Section 10.5.
2. If f < +∞, then for each box in L1 , apply hull consistency to the
inequality f ≤ f.
4. If xI is certainly feasible, go to 13
5. Skip this step if xI has not changed since Step 1. Apply hull consis-
tency over xI to each constraint inequality. If xI is deleted, go to Step
3.
10. Apply box consistency (as described in Section 10.2) to each con-
straint inequality. If f < +∞, also apply box consistency to the
inequality f (x) ≤ f. If xI is deleted, go to Step 3.
11. If pi (xI ) ≥ 0 for any i = 1, · · · , m (i.e., if xI is not certainly strictly
feasible), go to Step 28.
12. Apply hull consistency to gi = 0 for i = 1, · · · , n where g is the
gradient of the objective function f . See Section 12.4. If the result
for any i = 1, · · · , n is empty, go to Step 3.
13. Evaluate f at the center of xI . That is F m xI . Use the result to
update f.
14. If f < +∞, apply hull consistency to the relation f (x) ≤ f. If the
result is empty, go to Step 3.
15. If w(xI ) ≤ wH , go to Step 20. Otherwise, apply hull consistency to
the relation Hii (x) ≥ 0 for i = 1, · · · , n where Hii is an element of
the Hessian of f . See Section 12.7. If the result is empty, go to Step
3.
21. Generate the interval Jacobian J(x, xI ) of the gradient g and compute
the approximate inverse B of the center of J(x, xI ). See Section 5.6.
g
Compute M x, xI = BJ x, xI and rI (x) = −Bf I (x) .Update wI
g
and wR . (See Section 14.7.) Apply one step of an interval Newton
method to solve g = 0. If the result is empty, go to Step 3.
23. The user might wish to bypass use of analytic preconditioning (see
Section 11.9). If so, go to Step 27. To apply analytic preconditioning,
use the matrix B found in Step 21 to obtain Bg in analytic form.
Apply hull consistency to solve the i-th equation of Bg = 0 for the
i-th variable xi for i = 1, · · · , n. If the result is empty, go to Step 3.
25. Use box consistency to solve the i-th equation of Bg (as obtained in
Step 23). for the i-th variable for i = 1, · · · , n. If the result is empty,
go to Step 3.
27. Use the matrix B found in Step 21 in the search method of Section
12.6 to try to reduce the upper bound f.
31. Use the linear method of Section 12.5.3 to try to reduce xI using
f f
the inequality f (x) ≤ f. Update wI and wS . If xI is deleted, go to
Step 3. Otherwise, if this application of the linear method does not
sufficiently reduce (as defined using (11.7.4)) the box considered in
Step 30, go to Step 35.
33. If w(xI ) > (wIS + wSS )/2 (see Section 14.7.) go to Step 36.
34. Use the quadratic method of Section 12.5.4, to try to reduce xI using
the inequality f (x) ≤ f. Update wIS and wSS (see Section 14.7.) If xI
is deleted, go to Step 3.
41. Use box consistency to solve the same inequalities for the same vari-
ables as in Step 39.
44. Modify the John conditions by omitting those constraints pi for which
pi (xI ) < 0 (since they are not active in xI ). Apply one pass of
the interval Newton method of Section 11.14 to the (modified) John
conditions. Update wIJ and wRJ . If the result is empty, go to Step 3.
46. In various previous steps, gaps might have been generated in com-
ponents of xI . If so, merge any of these gaps that overlap. Use the
procedure described in Section 11.8 to split xI . Note that the vector
of functions use in defining the Jacobian in Section 11.8 is now the
gradient. Note also that when the Newton method has been used, the
Jacobian elements needed in (11.8.1), will have been determined in
Step 21 or Step 44.
Put the generated subboxes in L1 and go to Step 3.
48. For each box xI in list L2 , do the following: If pi (xI ) εp for any
i = 1, · · · , m, put xI in list L1 .
(i) (i)
F = min f (xI ) and F = max f (xI ).
1≤i≤s 1≤i≤s
51. Terminate.
After termination, w(xI ) < εX and w[f (xI )] < εf for each remaining box
xI . Also, F ≤ f (x) ≤ F for every point x in all remaining boxes. If,
after termination, f < +∞, we know there is a feasible point in the initial
box(es). Therefore, we know that
F ≤ f ∗ ≤ min{f, F }.
F ≤ f∗ ≤ F.
x − x∗ || ≤ ε1
||# (14.9.1)
and/or
x) − f ∗ ≤ ε2
f (# (14.9.2)
for some ε1 and ε2 . Recall that x∗ is a (feasible) point such that f (x∗ ) = f ∗
is the globally minimum value of the objective function f . Our algorithm
might or might not fully provide such a point. We distinguish four cases.
Case 1. There is only one final box xI and x ∈ xI and f < +∞. (Recall
that x is the feasible point where the smallest upper bound f = f (x)
on f ∗ was determined by the algorithm.)
x − x∗ || ≤ εX and
||#
x) − f ∗ ≤ εf .
f (#
x) ≤ εp for all i = 1, · · · , m
pi (# (14.9.3)
for some εp > 0. We do not (and should not) use (14.9.3) to delete points
in the optimization algorithm. However, we can add a convergence con-
dition of this kind to the algorithm without altering the correctness of the
algorithm. The condition can be useful for the present purpose of finding a
x near x∗ . The convergence condition that we can use is
suitable point #
Case 3. There is more than one final box and f < +∞.
In this case, we do not know if there is any feasible point in the initial
box in which the algorithm began its search. However, if we assure that
(14.9.4) is satisfied and accept a point as feasible when it is satisfied, then
any point in any final box is “feasible” and the condition f (# x) − f ∗ ≤ εf
is satisfied for any point #
x in any final box.
It is possible that, for a given problem, the feasible region does not have
an interior. In this case, the algorithm will probably not find a certainly
feasible point. As a result, the algorithm will not be able to delete local
minima. In this case, we can proceed as follows.
Let xI be a final “solution” box produced by the algorithm. Evaluate the
constraints over xI . We will not find that pi (xI ) > 0 for any i = 1, · · · , m
because otherwise, the box would have been deleted by the algorithm. If
pi (xI ) < 0 (i = 1, · · · , m) the i-th constraint is disregarded in what follows
(while we are considering the box xI ). The remaining constraints probably
pass through xI and the stopping criteria assure that xI is small. Therefore,
there is a reasonable chance that the remaining constraints have a common
point in xI .
We now try to prove that there is a point in xI satisfying the remaining
constraints written as equalities. A procedure for doing so is given in Sec-
tions 15.4 through 15.6. If this procedure is successful, it proves existence
of a solution in a box xI contained in xI . Now f (xI ) is an upper bound
on the global minimum. If we do this process for each of the boxes that
remains after the optimization algorithm terminates, we are likely to obtain
an upper bound on the global minimum in at least one of the final boxes.
The stopping criteria in Step 9 require that a box xI satisfy w(xI ) ≤ εX ,
w f (xI ) ≤ εf , and pi (xI ) ≤ εp (i = 1, · · · , m). It would be possible
to check that all three condition are satisfied each time Step 9 is used.
However, if there are several (or many) inequality constraints, the first two
conditions require less work to check than the third. A box that satisfies
the first two conditions might eventually be deleted using a procedure such
14.11 PEELING
The solution to this problem must occur where at least one of the constraint
functions is equal to zero. If the constraints are all simple bound constraints,
this solution occurs on the boundary of the feasible region (which in this
case is the initial box). The problem can be solved by the algorithm given
in Section 15.12.
Suppose that at least one prescribed constraint is not a simple bound
constraint. Then a minimum on one constraint might not satisfy another
constraint. Therefore, we formulate the problem as
Let S denote the region satisfying all the prescribed inequality con-
straints pi (x) ≤ 0. This is the feasible region of the original problem
(14.1.1). Adding the constraint q(x) = 0 prevents the solution of (14.11.3)
from occurring in the interior of S. Therefore, the region of search is
reduced more quickly for (14.11.3) than for (14.1.1).
The global minimum is the smaller of the solutions of (14.11.1) and
(14.11.3). We can solve for either of these values first. However, it is best
∇f (x) + v∇q(x) = 0,
q(x) = 0
For simplicity, assume there are only two inequality constraints so that
q = p1 p2 . Then
∇q = p1 ∇p2 + p2 ∇p1 .
We noted above that, for a given box xI , a factor pi does not occur in q if
pi (xI ) < 0 and the box is infeasible if pi (xI ) > 0. Therefore 0 ∈ ∇q(xI )
and the column of the Jacobian given by (14.11.4) contains the zero vector.
That is, the Jacobian is singular.
To avoid this, we can expand the John conditions in such a way that ∇q
is not evaluated over the entire box. We can do so by using the sequential
expansion of Section 7.3 and choosing the Lagrange multiplier v to be the
last variable in the sequence. As a result, ∇q in (14.11.4) is still evaluated
and
Frequently, we want only crude bounds on the range of f over the box
xI . Simply evaluating the function over the box provides bounds. (This
follows from the fundamental theorem of interval analysis.) However, such
bounds are often too far from sharp because of dependence. Beginning
with Moore (1966), methods have been sought that are better than simple
evaluation but less work than obtaining sharp bounds. Various methods of
this kind can be found in Ratschek and Rokne (1984). Recently, methods
using Bernstein polynomials have been studied. For example, see Garloff,
J. and Smith, A. P. (2000).
In this section, we provide a method of the “crude bound type” which
can be regarded as a simplification of peeling. It generally provides sharper
bounds than most “crude bound methods”; but falls short of providing sharp
bounds. Pillow functions are particularly helpful in providing an efficient
Note that the graph of p(x) = 0 passes through all the corners of the box
and otherwise is outside the box.
For m = 1, the equation p(x) = 0 defines an ellipsoid. For higher
values of m, it approximates the box more closely. Because of the vague
resemblance of the graph of p(x) = 0 to a pillow, we call p(x) a pillow
function.
Because any point x ∈ xI satisfies p(x) ≤ 0, the range of f (x) for
x ∈ xI is contained in the range of f (x) as x varies such that p(x) ≤ 0.
1 There is a mistake in the cited paper: In the second set of optimization problems, the
objective function should have a negative sign. Text surrounding equation (25) should be
corrected to read:
By simply reversing the signs in (17) of the objective function and the func-
tions in the inequality constraints, these optimization problems can be con-
verted into:
maximize f (x)
x∈xI
c) f (x) ≥ 0
subject to .
d) f (x) > 0
and
n
1 n
(xi − )2 =
i=1
2 4
n
1 n
(xi − )4 =
i=1
2 16
These functions each pass through the corners of xI , but do not intersect
elsewhere. The problem is now expressed in terms of continuous rather
than discrete variables.
In this chapter, we have assumed that the constraint functions are contin-
uously differentiable and that the objective function is twice continuously
differentiable. In this section, we briefly consider how the algorithm of
Section 14.8 must be altered to solve problems in which these assumptions
are not satisfied.
If the objective function is not twice continuously differentiable, we
cannot use the Newton method to solve g = 0 nor to solve the John condi-
tions. Also, we cannot use box consistency to solve g = 0. If the objective
EQUALITY CONSTRAINED
OPTIMIZATION
15.1 INTRODUCTION
For problem (15.1.1), the function φ(t) given by (13.5.1) used to express
(part of) the John conditions becomes
R(u0 , v)
u0 ∇f (x) + v1 ∇q1 (x) + · · · + vr ∇qr (x)
q1 (x)
φ(t) = (15.2.1)
..
.
qr (x)
where
R(u0 , v) = u0 + E1 v1 + · · · + Er vr − 1 (15.2.2)
h(z) = 0 (15.4.1)
where
Kearfott (1996) reports that a method similar to this works well in practice.
We could use epsilon-inflation to try to prove existence of a feasible
point. This procedure was introduced by Rump (1980) and discussed in
detail by Mayer (1995). See, also, Kearfott (1996). A first step in the pro-
cedure is designed to prove existence of a solution of a system of equations
at or near a tentative solution found by a noninterval algorithm. Thus, it is
an intensive effort to prove existence of a point that is reasonably certain
to exist. Our case is different. We try to prove existence in many boxes
that might or might not contain a solution. Therefore we use a simpler
procedure. By applying our relatively simple procedure to many boxes, we
increase the likelihood of success.
If we fix any one of the three variables, we can solve the constraint equations
11
for the other two provided x ≥ . That is, we can determine a feasible
4
point directly.
For other problems, we might be able to determine a subset of the
variables by fixing one or more appropriate variables. This reduces the
number of remaining constraints. However, to simplify the discussion,we
assume all the constraints remain.
We now consider an illustrative example to show that care must be taken
in choosing which variables to fix. Consider a two-dimensional problem
in which there is a single constraint
q(x1 , x2 ) = x2 − 1 (15.5.1)
∂qi (x)
Mij (x) = (i = 1, · · · , r; j = 1, · · · , n). (15.5.2)
∂xj
(We ignore the fact that if we fix x1 and x2 , we can solve for x3 and x4 .)
The matrix given by (15.5.2) is
3x12 −1 2x3 0
M(x) = .
2x1 −1 0 −2x4
That is,
We indicate this result using four significant decimal digits. Higher preci-
sion was used in the computations.
Since zI ⊂ zI , there exists (by Proposition 11.15.5) a solution of
(15.7.1) in zI . That is, there is a feasible point x with −0.9352 ≤ x1 ≤
−0.9173, 0.1856 ≤ x2 ≤ 0.2380, x3 = 1, and x4 = 0.8. The actual feasi-
ble point in xI with x3 = 1 and x4 = 0.8 is at (approximately) x1 = −0.9234
and x2 = 0.2126.
The main program for solving the equality constrained optimization prob-
lem, generates a sequence of subboxes of the initial box. We can apply hull
and box consistencies to each constraint equations qi (x) = 0 (i = 1, · · · , r)
to delete infeasible points from such a subbox. We can also linearize and
“solve” the constraint equations in the same way as described in Section
15.4.1. In that procedure, we try to prove existence of a feasible point
in a box. Here we use the procedure to try to eliminate infeasible points
from a box. To do so, we fix n − r of the n variables so that we have the
same number r of variables as constraints. In that procedure, the variables
chosen to be fixed are given point values. Now, however, we fix them by
replacing them by their interval bounds. This causes a slight change in the
procedure. When we replace variables by points, it can be worth while to
repeat the process of expanding the constraint equations. This is because
the derivatives defining the expansion are narrowed by the replacement.
If we replace variables by their interval bounds, this narrowing does not
occur. Therefore, re-expanding is of no value.
Before continuing, we explain why we do not use a rather obvious alter-
native way of choosing which variables to fix. Suppose we have expanded
the equality constraints with respect to all the variables. Then we have the
information needed to (roughly) determine which variables cause a given
constraint to change the most (or least) over the current box. See Section
11.8. We might choose to fix those variables that cause little change.
However, the procedure in Section 15.4.1 is designed to cause a Newton
step to make a larger reduction of a box irrespective of how much change
is made in the range of the constraints. We consider the former aspect to
be of more value than the latter
Another topic deserves mention. Suppose we fix some variables before
we linearize the constraints. Then we have fewer derivatives to determine;
and we thus save some work. However, we do not linearize the constraints
unless there is reason to believe that the box is so small that a linearized
form will yield sharper bounds on the range over the box than the original
unexpanded form. See Section 14.7. In this case, the extra effort should
provide greater reduction of the box used.
where
∂
Jij (x, xI ) = qi (X1 , · · · , Xi , xi+1 , · · · , xr ) (15.9.2)
∂xj
Note that, in theory, the choice of which variables are real in (15.9.2) could
be related to our present question of which variables to fix. However, we do
not know which variables to fix until after we have obtained the expansion
(15.9.1). We do the following steps. Compare the algorithm in Section
11.12.
1. Compute a real matrix Jc , which is the approximate center of the r
by n matrix J(x, xI ).
2. Do Gaussian elimination on Jc using both row and column pivoting
to transform it into a form in which elements in positions (i, j ) are
zero for 1 ≤ i ≤ r and 1 ≤ j ≤ r except for i = j. In the process, do
the final column pivoting so that the final element in position (r, r)
is largest in magnitude among the elements in positions (r, j ) for
j = r, · · · , n.
3. Choose the variables corresponding to those now in the final columns
r + 1, ...n to be those to replace by their interval bounds.
Assume we have done these three steps. For simplicity, assume it is
variables with indices r + 1, · · · , n that we fix. After fixing these variables,
the constraint equations become
r
Qi + Jij (x, xI )(yj − xj ) = 0 (i = 1, · · · , r) (15.9.3)
j =1
where
where wIJ and wRJ are defined as in Section 14.7 (see (14.7.1)).
We also use a Newton method when trying to prove the existence of a
feasible point. See Section 15.4. However, this procedure reduces the size
of the box when necessary so there is no need to decide whether the box in
which the procedure is applied is large or not.
As explained in Section 14.7, we avoid using linearization when the
box is so large that linearization is not effective. Thus, besides avoiding the
1 f f
w(xI ) ≤ (wI + wS ). (15.11.2)
2
1 q q
w(xI ) ≤ (w + wS ). (15.11.3)
2 I
This relation is defined in Section 14.7 for inequality constraints. Now the
constraints are equalities. Similarly, when the box is large, we avoid using
the method of Section 12.5.4, which involves a Taylor expansion of the
relation f (x) ≤ f through quadratic terms.
In this section, we list the steps of our algorithm for computing global
solution(s) to the equality constrained optimization problem (15.1.1).
Generally, we seek a solution in a single box specified by the user.
However, any number of boxes, can be specified. The boxes can be disjoint
or overlap. However, if they overlap, a minimum at a point that is common
to more than one box is separately found as a solution in each box containing
it. In this case, computing effort is wasted.
If the user does not specify an initial box or boxes, we use a default box
described in Section 11.10. We assume the box(es) are placed in a list L1
of boxes to be processed.
Suppose the user of our algorithm knows a point x that is guaranteed
to be feasible. If so, we use this point to compute an initial upper bound
f on the global minimum f ∗ . If x cannot be represented exactly in the
computer’s number system, we input a representable box xI containing x.
We evaluate f (xI ) and obtain [f (xI ), f (xI )]. We set f = f (xI ), which is
guaranteed to be an upper bound on the global minimum f ∗ . If no feasible
point is known we set f = +∞ as our upper bound for f ∗ .
2. If f < +∞, delete any box xI from L1 for which f (xI ) > f. This
can be done while applying hull consistency. See Section 10.10.
3. If L1 is empty, go to Step 47. Otherwise, find the box xI in L1 for
which f (xI ) is smallest. For later reference, call this box xI (1) . This
box is processed next by the algorithm. Delete xI (1) from L1 .
4. If flag F = 0, go to Step 4(a). If flag F = 1, go to Step 4(b).
6. Repeat Step 4.
7. If xI (1) (as defined in Step 3) has been sufficiently reduced (as defined
using (11.7.4)), put xI in the list L1 and go to Step 3.
10. For later reference call the current box xI (2) . Use the procedure
described in Sections 15.4 through 15.6 to try to reduce the upper
bound f.
11. If f was not changed in Step 10, go to Step 13. Otherwise, apply
hull consistency (See Chapter 10) to the relation f (x) ≤ f. If the
result is empty, go to Step 3.
13. If xI (1) (as defined in Step 3) has been sufficiently reduced (as defined
using (11.7.4)), put xI in L1 and go to Step 3.
16. If the current box is the same box xI (2) defined in Step 10, go to Step
18.
17. Use the procedure described in Section 15.4 through 15.6 to try to
reduce the upper bound f.
19. Apply box consistency to the relation f (x) ≤ f. If the result is empty,
go to Step 3.
21. If xI (1) (as defined in Step 3) has been sufficiently reduced, put xI in
the list L1 and go to Step 3.
1 f f
23. If w(xI ) > (w + wI ), go to Step 27. (See Section 15.11.)
2 S
24. Denote the current box by xI (3) . Apply the linear method of Section
f f
12.5.3 to try to reduce xI (3) using f (x) ≤ f. Update wS and wI as
described in Section 14.7. If the result is empty, go to Step 3.
26. If xI (3) (as defined in Step 24) was sufficiently reduced (as defined
using (11.7.4)) in the single Step 24, go to Step 30. Otherwise, go to
Step 32.
30. If xI (1) (as defined in Step 3) has been sufficiently reduced, put xI in
L1 and go to Step 3.
1 q q
31. If w(xI ) > (wS + wI ), go to Step 43. (See Section 15.11.)
2
32. If condition (14.6.4) is not satisfied, go to Step 40. Otherwise, do the
following as described in Section 15.9. Replace n−r of the variables
by their interval bounds and find the preconditioning matrix B for the
system involving the remaining r variables.
33. Precondition the linearized system. If the preconditioned coefficient
matrix is regular (see Theorem 5.8.1), find the hull of the linearized
system by the method of Section 5.8. If the matrix is not regular, solve
q
the system by the Gauss-Seidel method (see Section 5.7). Update wS
q
and wI as described in Section 14.7. If the result is empty, go to Step
3.
35. The user might wish to bypass analytic preconditioning (see Section
11.9). If so go to Step 40. If analytic preconditioning is to be used,
analytically multiply the nonlinear system of constraint equations
by the matrix B computed in Step 32. Do so without replacing any
variables by their interval bounds (so that appropriate combinations
and cancellations can be made). After the analytic multiplication is
complete, replace the fixed variables (as chosen in Step 32) by their
interval bounds.
39. Apply box consistency to solve the i-th nonlinear equation of the
preconditioned nonlinear system for the i-th (renamed) variable for
i = 1, · · · , r. If the result is empty, go to Step 3.
42. Apply one step of the interval Newton method of Section 11.14 for
solving the John conditions (15.2.1). Update wRJ and wIJ as described
in Section 14.7, If the result is empty, go to Step 3. If the existence
of a solution of the John conditions is proved as discussed in Section
15.3, then update f (as discussed in Section 15.3).
43. If the box xI (1) (as defined in Step 3) has been sufficiently reduced,
put xI in L1 and go to Step 3.
44. Any previous step that used hull consistency, a Newton step, or a
Gauss-Seidel step might have generated gaps in the interval compo-
nents of xI . Merge any such gaps when possible. Split the box as
described in Section 11.8. This might involve deleting gaps. Place
the subboxes (generated by splitting) in the list L1 and go to Step 3.
45. If the list L2 is empty, print “There is no feasible point in xI (0) ” and
go to Step 52.
52. Denote the remaining boxes by xI (1) , · · · , xI (s) where s is the number
of boxes remaining. Determine
(i) (i)
F = min f (xI ) and F = max f (xI ).
1≤i≤s 1≤i≤s
53. Terminate.
At termination, if the list L2 is empty, then all of the initial box xI (0) has been
eliminated. This provides proof that the initial box xI (0) does not contain a
feasible point.
Assume that at least one box remains in the list L2 . What we have
proved in this case depends on the final value of f. If f < +∞, then we
know that a feasible point exists in the initial box xI (0) . If f = +∞, there
might or might not be a feasible point in xI (0) .
Consider the case f < +∞. No matter how poor the bound f on f ∗ , we
know that a global solution exists in xI (0) ; and it is in one of the remaining
boxes. Also, we know that
F ≤ f∗ ≤ F.
Therefore,
f (x) − f ∗ ≤ εf
If more than one box remains, it is possible that one contains a local
solution at which f is less than our upper bound f. Also, there might be
more than one global solution occurring in separate boxes. We know only
that
F ≤ f ∗ ≤ min{f, F }
and that the global minimum point(s) are in the remaining boxes.
If the final value of f is ∞ and xI (0) is not entirely deleted, then xI (0)
might or might not contain a feasible point. We do not know. It is highly
probable that a solution exists since, otherwise, we expect all of xI (0) to be
deleted. However, we do know that if a feasible point does exist in xI (0) ,
then,
F ≤ f∗ ≤ F
x − x∗ || ≤ ε1
||# (15.13.1)
and/or
x) − f ∗ ≤ ε2
f (# (15.13.2)
for some ε1 and ε2 . Recall that x∗ is a (feasible) point such that f (x∗ ) = f ∗
is the globally minimum value of the objective function f .
Generally, no algorithm can assure that a single point is certainly feasi-
ble because of rounding errors in evaluating the equality constraints. There-
fore, we cannot provide a point# x that is guaranteed to be feasible. In Section
15.3, we described how we could prove that some (unknown) feasible point
exists in a given box. However, we can assure that for a given point # x the
equality constraints satisfy
x)| ≤ εq
|qi (# (15.13.3)
Since f < +∞, we know that a feasible point exists in the initial box.
Since x ∗ is never deleted by the algorithm, it must be in the single remaining
box xI . We can choose # x to be any point in xI . Then the stopping criteria of
the algorithm assure that
x − x∗ || ≤ εX ,
||#
x) − f ∗ ≤ εf , and
f (#
x)| ≤ εq (i = 1, · · · , r) .
|qi (#
Case 2. There is more than one final box and f < +∞.
Since f < +∞, we know that a feasible point exists in at least one of
the final boxes; but we do not know which one(s). All or part of the box
in which we proved the existence of a feasible point and obtained the final
value of f might have been deleted. A given point in a final box might be far
from x∗ because x∗ is in another box. However, any such point # x satisfies
x)| ≤ εq (i = 1, · · · , r) so it must be “almost feasible”. Suppose we
|qi (#
pick #x to be an arbitrary point in an arbitrary final box. From Step 53 of
the algorithm, we have F ≤ f (# x) ≤ F.
In this case, we do not know if there is any feasible point in the initial
box in which the algorithm began its search. However, if there is, then
Case 1 or Case 2 apply.
So far in this chapter, we have assumed that the objective function and the
equality constraint functions are twice continuously differentiable. We now
consider how the algorithm in Section 15.12 must be altered when these
assumptions do not hold.
If the constraints are not continuously differentiable, then Steps 10 and
17 of the algorithm in Section 15.12 cannot be used. That is, we cannot
guarantee the existence of a feasible point as discussed in Section 15.3 and
15.4.1.
An alternative might be to assume a point x is feasible if all the con-
straints are satisfied to within some error tolerance. We discussed this
possibility in Section 15.10. This would not produce guaranteed bounds
on the solution. If the constraints are continuous, it is possible to prove
existence of a feasible point using hull consistency as discussed in Sec-
tion 15.4.2. If hull consistency is used for this purpose, the quadratically
converging implementation should be applied in Step 39.
If the objective function is not twice continuously differentiable, we
cannot apply a Newton method to solve the John conditions. Therefore,
Step 44 of the algorithm cannot be used.
Dropping procedures such as Newton’s method that require differentia-
bility degrades the performance of the algorithm. However, the algorithm
solves the optimization problem even when continuity is lacking. Hull
consistency provides the means.
Some nondifferentiable functions can be replaced by differentiable
functions (plus constraints). See Chapter 18. This can resolve the diffi-
culty and facilitate proving existence of a feasible point.
It is always better to use expansions in slopes rather than derivatives
because slopes produce sharper bounds. We noted in Section 7.11 that
some nondifferentiable functions have slope expansions. This can obviate
concerns regarding differentiability.
16.1 INTRODUCTION
16.3.1 Case 1
Assume that we fix a number k of the variables and are able to determine
all of the others from the equality constraints. To do so, we choose these k
variables to have their values at the center of xI . This serves to determine a
partially prescribed and partially computed point # x. However, it might be
necessary to make rounding errors in computing the unprescribed variables
so, in practice, the “point” might have interval components. To emphasize
this fact, we denote it by x#I . If x#I satisfies the inequality constraints, then
any point #x ∈ x#I is a feasible point. Therefore, f (x#I ) is an upper bound on
the global minimum f ∗ .
Note that x#I might not be in the current box xI . We consider this to
be irrelevant. We are searching for any point that gives a good upper
bound on the global minimum. The current box merely serves to begin the
determination of x#I . Note also that x#I might not be in the initial box x#I . If
(0)
16.3.2 Case 2
We now consider the case in which we fix a number k of the variables and
determine some, but not all, of the others so that a number s < n of the
variables is either prescribed or computed. For simplicity, assume they
are the first s components. Thus, we know the components X1 , · · · , Xs
of some “point”. The computed components might be intervals (to bound
rounding errors) so we denote (all of) them as intervals with capital letters.
If Xi Xi(0) for some i = 1, · · · , s, we abandon our effort to bound a
feasible point when starting from the current box. It might happen that no
point with components X1 , · · · , Xs intersects the current box. We consider
this to be irrelevant.
16.3.3 Case 3
In our final case, no variables are fixed and no equality constraints are
satisfied before we try to prove existence of a feasible point. Now we
reverse the order in which the equality and inequality constraints are used.
It is generally easier to check whether the inequality constraints are satisfied
than to try to prove existence of a point satisfying the equality constraints.
Therefore, we first find a point satisfying the inequality constraints; and
try to prove existence of a point satisfying the equality constraints in a
box centered at this point. If there are few equality constraints and many
inequality constraints, it might be more economical to reverse the order as
in the two previous cases. We shall not do so.
When the main program generates a new box, we do a line search pro-
ceeding from the center of the box in the direction of the negative gradient
of the objective function as described in Section 14.4. The purpose of the
4. Set n = 0.
8. Use the procedure in Sections 15.4 through 15.6 to try to prove that
there exists a point in zI that satisfies the equality constraints.
Note that, for a given problem, it might not be possible to find a non-
degenerate box that satisfies the inequality constraints. For example, there
might be only a single point satisfying them. In Section 14.9, we discuss
how to treat this situation when there are no equality constraints. We treat
a subset of the inequality constraints as equalities. In the current case, we
simply add the equality constraints to the set of inequality constraints that
are treated as equalities. We then try to prove existence of the combined set
of equalities using the procedure discussed in Sections 15.4 through 15.6.
We now give the steps of the algorithm for the case in which both inequality
and equality constraints occur.
To initialize, we require that the user specify a box size tolerance εX , a
function width tolerance εf , an inequality function width tolerance εp , an
equality function width tolerance εq , and the initial box(es). Any tolerance
not specified is set to +∞ by the program. However, a finite value must
be specified for at least one of them. The initial box(es) are put in list L1 .
The algorithm provides the parameters needed to perform any linearization
f q
test. (See Section 14.7.) It sets wS , wSS , wSC , wR and wRJ to zero and sets
f q
wI , wIS , wIC , wI and wIJ equal to w(xI (0) ). It also sets the flag F = 0.
The steps of the algorithm are to be performed in the order given except
as indicated by branching. The current box is denoted by xI throughout the
algorithm even though it changes from step to step.
1. For each initial box xI in the list L1 , evaluate f (xI ). Denote the result
by [f (xI ), f (xI )].
2. If f < +∞, delete any box xI from L1 for which f (xI ) > f .
7. Repeat Step 4.
8. If xI (1) (as defined in Step 3) has been sufficiently reduced (as defined
using (11.7.4)), put xI in list L1 and go to Step 3.
11. For later reference, call the current box xI (2) . Use the procedure de-
scribed in Section 16.3 to try to reduce the upper bound f . Note:
The box xI could be the same one used the last time this step was
used. If so, do not repeat the procedure.
13. Apply hull consistency (see Chapter 10) to the relation f (x) ≤ f . If
the result is empty, go to Step 3.
15. If xI (1) (as defined in Step 3) has been sufficiently reduced (as defined
using (11.7.4)), put xI in list L1 and go to Step 3.
18. If the current box is the same box xI (2) defined in Step 11, go to Step
20.
19. Use the procedure described in Section 16.3 to try to reduce the upper
bound f .
21. Apply box consistency to the relation f (x) ≤ f . If the result is empty,
go to Step 3.
23. If xI (1) (as defined in Step 3) has been sufficiently reduced, put xI in
the list L1 and go to Step 3.
28. If xI (3) (defined in Step 26) was sufficiently reduced (as defined using
11.7.4), in the single Step 26, go to Step 32. Otherwise, go to Step
33.
29. If w(xI ) > 21 (wSS + wIS ), go to Step 33. See Section 14.7.
30. Apply the quadratic method of Section 12.5.4 to try to reduce the
current box using f (x) ≤ f . Update wSS and wIS . If the result is
empty, go to Step 3.
32. If xI (1) (as defined in Step 3) has been sufficiently reduced (as defined
using (11.7.4)), put xI in L1 and go to Step 3.
33. If w(xI ) > 21 (wSC + wIC ), go to Step 42. (See Section 15.11.)
37. The user might wish to bypass analytic preconditioning (see Section
11.9). If so, go to Step 42. If analytic preconditioning is to be used,
analytically multiply the nonlinear system of equality and inequal-
ity constraints by the preconditioning matrix described in Section
16.2 and computed in the combined Steps 34 and 35. Do so without
38. Apply hull consistency to the relations derived in Step 37. Solve
the equalities only for the variables that were solved for in Step 34.
Solve the inequalities only for the variables that were solved for in
Step 35. If the result is empty, go to Step 3.
40. Apply box consistency to the relations derived in Step 37. Solve the
equalities only for the variables that were solve for in Step 34. Solve
the inequalities only for the variables that were solved for in Step 35.
If the result is empty, go to Step 3.
42. If w(xI ) > 21 (wRJ + wIJ ), go to Step 45, (where wRJ and wIJ are defined
as in Section 14.7 (see 14.7.1)).
43. Apply one step of the interval Newton method of Section 11.14 for
solving the John conditions (13.5.1). Update wRJ and wIJ . If the
result is empty, go to Step 3. If the existence of a solution of the John
conditions is proved as discussed in Section 15.3, then update f (as
discussed in Section 15.3).
44. If xI (1) (as defined in Step 3) has been sufficiently reduced, put xI in
L1 and go to Step 3.
45. Any previous step that used hull consistency, a Newton step, or a
Gauss-Seidel step might have generated gaps in the interval compo-
nents of xI . Merge any such gaps when possible. Split the box as
described in Section 11.8. This might involve deleting gaps. Place
the subboxes (generated by splitting) in the list L1 and go to Step 3.
48. For each box xI in list L2 do the following. If p(xI ) > εp for any
i = 1, · · · , m or if |qi (xI )| > εq for any i = 1, · · · , r, put the box in
list L1 .
51. For each box xI in L2 , if f [m(xI )] < +∞, try to prove existence of
a feasible point using the method describe in Section 16.3. Use the
results to update f .
54. Terminate.
PERTURBED PROBLEMS
AND SENSITIVITY
ANALYSIS
17.1 INTRODUCTION
We wish to bound f ∗ (cI ) and x∗ (cI ). The width of the interval f ∗ (cI )
is a measure of the sensitivity of the problem to variation of c over cI .
The solution set x∗ (cI ) contains the global minimum for every specific
(real) choice of the parameters c satisfying the interval bounds c ∈ cI . The
size of the set is a measure of the sensitivity of the solution point to variation
of the parameters within their interval bounds. Therefore, either or both of
f ∗ (cI ) and x∗ (cI ) are of interest in sensitivity analysis.
If the intervals bounding the parameters are narrow, and if the problem
is not especially sensitive to perturbation, the solution set x∗ (cI ) is small.
Therefore, it can be covered by a small number of boxes whose width is
less than the tolerance εX used in a termination process.
As we point out in the next section, the optimization algorithms given in
previous chapters all solve perturbed problems. They produce a box or set
of boxes containing the solution whether it is a single point or an extended
set.
If the perturbations are large, it can require many boxes to cover the
solution set, especially if the box size tolerance is small. In this case, the
number of boxes (and the computing time) can be excessive.
In low dimensional problems, we might want to cover the solution set
by small “pixel” boxes to obtain a kind of map of the solution region. In
fact, we might want to subdivide the intervals containing the parameter to
sharpen the “map” of the region. The “pixel” size of the covering boxes
can be determined by specifying the box size tolerance εX appropriately.
Dependence resulting from multiple occurrences of a parameter can cause
loss of sharpness in defining the boundary of the solution set. In this case,
it can be desirable to split the parameter interval into small subintervals
and repeatedly solve the optimization problem using each subinterval of
the parameter.
However, a primary purpose of this chapter is to show how we can
bound the solution set without covering it by a large number of small boxes.
Instead, we compute a single box bounding the solution set x∗ (cI ). In
For termination, the optimization algorithms require that the width of each
output box be less than a tolerance εX . Also, the width of the range of
the objective function over each box must not be greater than a function
width tolerance εf . The latter tolerance assures that the final bound on the
minimum value of the objective function is in error by no more than εf .
If the tolerances are too large, the solution set x∗ (cI ) is poorly defined
and the bounds on the interval f ∗ (cI ) of solution values are far from sharp.
As parameter values change, the location of the global minimum can change
discontinuously. This can happen when a local minimum becomes global
while the global minimum becomes local. Examples are given in Sections
17.7 and 17.8.
A virtue of the interval algorithms given in the previous chapters is
that such a case does not affect their behavior. Any point that is a global
solution for any values of the parameters within their bounding intervals is
contained in the solution set. The solution set can be composed of disjoint
subsets.
This problem differs from the original problem (17.1.1) in that the com-
ponents of c have become independent variables. In addition, the constraint
c ∈ cI has been added. The globally minimum value of f for this problem
Maximize minimum
f (x, c) (17.5.1)
c∈cI x∈xI
pi (x, c) ≤ 0 (i = 1, ..., m),
subject to
qi (x, c) = 0 (i = 1, ..., r).
The box xI computed in the first phase of our procedure contains the
global solution to problem (17.1.1) for every c ∈ cI . Assume that xI does
not also contain a local (i.e., nonglobal) solution of (17.1.1) for any c ∈ cI .
A solution of (17.1.1) satisfies the John conditions φ(t) = 0 where φ(t)
is given by (13.5.1) and
x
t= u
v
The solutions to problems (17.5.3) and (17.5.4) yield lower and upper
bounds on xI ∗i (cI ), respectively. However, if Assumption 17.5.1 is not
valid, these bounds might not be sharp.
J (y − x) = 0.
where u and v are vectors of Lagrange multipliers. The variable t takes the
place of the variable x used when discussing Corollary 17.6.2. Let J(t, c)
denote the Jacobian of f(t, c) as a function of t. We can use this Jacobian
as described above to prove that any John point in xI is a global minimum
for some c ∈ cI . Proof is obtained only if the result of the Newton step is
contained in the input box. See Proposition 11.15.5.
Note that a subroutine is available for evaluating the Jacobian (and
multiplying by an approximate inverse of its center) when doing the first
stage of the algorithm in Section 17.5. Therefore, only a small amount of
coding is needed to test the hypothesis of Corollary 17.6.2.
In this and the next two section, we discuss examples that illustrate our
method for solving perturbed problems.
As a first example, we consider the unconstrained minimization problem
with objective function
For c = 1, this is (the negative of) the so-called three hump camel function.
See (for example) Dixon and Szegö (1975).
Let the coefficient (i.e., parameter) c vary over the interval [0.9, 1]. For
all c for which 0.945 < c ≤ 1, the global minimum is the single point
The smallest value of f ∗ for c ∈ [0.9, 1] occurs for c = 0.9 for which
f ∗ = −1.8589, approximately.
Consider perturbing the problem continuously by letting c increase from
an initial value of 0.9. Initially, there are two global solution points that
move along (separate) straight lines in the x1 , x2 plane until c = 0.945. As
c passes through this value, the global minimum jumps to the origin and
remains there for all c ∈ [0.945, 1].
A traditional perturbation analysis can detect the changes in the global
minimum as c increases slightly from c = 0.9 (say). However, an expansion
about this point cannot reveal the nature of the discontinuous change in
position of the global minimum as c varies at and near the value 0.945.
The algorithm in Section 12.14 solves this problem without difficulty.
Unfortunately, if termination tolerances are small, the output consists of an
undesirably large number of boxes.
For cI = [0.9, 1], the set x∗ (cI ) of solution points consists of three
parts. One part is the origin. Another is the line segment joining the two
points of the form (y, −y/2) where y = (10/3)1/2 at one endpoint and
y = {[21 + (126)1/2 ]/9}1/2 at the other endpoint. The third part of the set
is the reflection of this line segment in the origin.
The interval algorithm does not reveal the value of c at which the global
point jumps discontinuously. However, it bounds the solution set for all
c ∈ [0.9, 1] as closely as prescribed by the tolerances.
This solution set was covered by five “solution” boxes. The smallest box
containing these five boxes is
[1.739, 1.896]
.
0.856, 0.968]
The interval components of this box are too large by roughly the size of the
tolerance εX .
The reflection (in the origin) of the exact solution set (17.7.3) is also
a solution set. It was covered by five “solution” boxes in much the same
way.
The third subset of the exact solution is the origin. It remains an isolated
solution point as c varies. It is a global solution for all c ∈ [0.945.1]. Its
location was computed precisely and is guaranteed to be exact because the
bounding interval components are degenerate.
The bounds on the set of values of f ∗ were computed to be the interval
[−4.781, 0]. The correct interval is [−1.859, 0]. Since we chose εf = 105 ,
the algorithm was not configured to produce good bounds on the interval
f ∗ (cI ). The more stringent box size tolerance εX = 0.1 kept the bounds on
and
The value of the minimum at these points depends very little on c. For
c = 0.9, the global minimum is f ∗ = 0.199035280 and for c = 1, f ∗ =
0.199035288 approximately. The value of x1 for c = c is 0.06604161 and
for c = 1 it is 0.066041626 approximately.
The smallest boxes containing the set of points of global minimum as
c varies over [0.9, 1] are (when outwardly rounded)
[0.0660416, 0.0660417]
±
[−0.192896, −0.192895]
and
[1.81787, 1.89224]
± .
[−0.946118, −0.908935]
Another isolated box is approximately the negative of this one. They contain
the minima on the boundary of the feasible region.
One set of 76 contiguous boxes is isolated from all the other output
boxes. The smallest box containing all of them, when outwardly rounded
is
[1.7472, 1.8923]
y =
I
.
[−0.9611, −0.8679]
18
Minimize (globally) f (x) = (xi − ci )2
i=1
subject to x1 x9 = x2 x14 + x3 x4
x1 x10 = x2 x15 + x3 x5
x1 x11 = x2 x16 + x3 x6
x1 x12 = x2 x17 + x3 x7
x1 x13 = x2 x18 + x3 x8
x4 + x 5 + x 6 + x 7 + x 8 = 1
x9 + x10 + x11 + x12 + x13 = 1
x14 + x15 + x16 + x17 + x18 = 1
x14 = 66.67x4
x15 = 50x5
x16 = 0.015x6
x17 = 100x7
x18 = 33.33x8 .
i ci ± εi i ci ± εi
1 100 ± 1.11 10 0.66 ± 0.017
2 89.73 ± 1.03 11 0.114 ± 0.0046
3 10.27 ± 0.51 12 0.002 ± 0.0001
4 0.0037 ± 0.00018 13 0.004 ± 0.00012
5 0.0147 ± 0.0061 14 0.245 ± 0.0067
6 0.982 ± 0.032 15 0.734 ± 0.02
7 0±0 16 0.0147 ± 0.0061
8 0.0001 ± 0 17 0.0022 ± 0.0001
9 0.22 ± 0.0066 18 0.0044 ± 0.0013
i Xi i Xi
1 [96.2, 103.8] 10 [0.6046, 0.7207]
2 [87.7, 91.7] 11 [0.0603, 0.1638]
3 [8.47, 12.07] 12 [−0.0174, 0.0237]
4 [0.0035, 0.00348] 13 [−0.0109, 0.0205]
5 [0.0142, 0.0152] 14 [0.2338, 0.2559]
6 [0.98142, 0.98147] 15 [0.7122, 0.7556]
7 [−0.000205, 0.000249] 16 [0.0147, 0.0148]
8 [−0.000384, 0.000645] 17 [−0.0205, 0.0249]
9 [0.1983, 0.2440] 18 [−0.0128, 0.0215]
In Section 12.5, and elsewhere, we discuss how we can use an upper bound
f on the global minimum to improve the performance of our global opti-
mization algorithm. In this section, we consider an artifice in which we
preset f to zero in certain examples. We then give an example that shows
why this is particularly helpful in the perturbed case.
For simplicity, we consider the unconstrained case. Suppose we wish
to minimize f (x, cI ) where cI is a nondegenerate interval vector. We apply
the algorithm given in Section 12.14.
Assume we know that f (x, c) is nonnegative for all values of x and c of
interest. Suppose we also know that f (x∗ , c) = 0 for all c ∈ cI , where x∗
is the point of global minimum. Then we know that f ∗ (cI ) = 0. Therefore,
we set f = 0.
Least squares problems are sometimes of this type of problem. It can be
known that the squared functions are consistent and, hence, that f ∗ (cI ) = 0.
For example, see Walster (1988).
As described in Section 12.5, our algorithm deletes points of an original
box in which f (x, cI ) > f. The smaller f, the more points that can be
deleted in a given application of the procedure. Thus, it is advantageous to
know f ∗ (cI ) when the algorithm is first applied. Often, the first value of f
computed by the algorithm is much larger than f ∗ (cI ). Values of f closer
to f ∗ (cI ) are computed as the algorithm proceeds.
We know f ∗ (cI ) = 0. In addition to speeding up the algorithm, this
also saves the effort of repeatedly trying to improve f. But, in the perturbed
case, there is another advantage. When we evaluate f (x, cI ) at some point
x, we obtain an interval. The upper endpoint of this interval is an upper
bound for f ∗ (cI ). But even if we evaluate f (x, cI ) at a global minimum
point x∗ , the upper endpoint of the interval is not f ∗ . It is larger because
If we rely only on the procedure that uses f to delete points, the computed
solution is much larger than it is possible to compute by including other
procedures. The other procedures in the optimization algorithm delete
whatever remaining points they can that are outside the solution set.
If we set f = 0, then deleting points where f (x, cI ) > f can delete all
points not in the solution set. That is, the subroutine using f can contribute
to the progress of the algorithm as long as points remain that are not in the
solution set.
f(x, cI ) = 0 (17.11.1)
where
MISCELLANY
The presence of the constraint (18.1.5) causes xn+1 to take on the re-
quired value of |t (x)| at the solution point. Lemaréchal’s use of inequality
constraints allows slack at the solution, in general.
Using this relation and our procedure for replacing the absolute value func-
tion, we can replace max{t1 , t2 } by 0.5[t1 (x) + t2 (x) + xn+1 ] if we add the
constraints xn+1 ≥ 0 and
This enables us to treat the max of two functions. If there are more than
two functions, we can use the relation
recursively.
Note that the minimum of two or more functions can be treated by using
the relation
where R is the set of integers {1, · · · , r} for some integer r and for any
k ∈ R, Qk is a set of integers {1, · · · , qk }.
These constraints force the variables to be integers. The problem can now
be treated as if there are no conditions that variables take integer values.
Solving an integer or mixed integer problem in this form can be a
slow process. The constraints (18.2.1) are of little use unless the intervals
bounding the variables have width less than (say) 1. However, the method
produces the global solution.
15. Bliek, C., Jermann, C., and Neumaier, A. (eds.) (2003). Global opti-
mization and Constraint Satisfaction: First International Workshop
Global Constraint Optimization and Constraint Satisfaction, CO-
COS 2002 Valbonne-Sophia Antipolis, France, October 2–4, 2002.
Volume number: LNCS 2861.
17. Boggs, P. T., Byrd, R. H., and Schnabel, R. B. (eds.) (1985). Nu-
merical Optimization 1984, SIAM Publications.
95. Jacobs, D. (ed.) (1976). The state of the art in numerical analysis,
in Proc. Conference on the State of the Art in Numerical Analysis,
University of York.
96. Jaulin, L., Kieffer, M., Didrit, O., and Walter, É. (2001) Applied
Interval Analysis, Springer-Verlag, London.
112. Levy, A. V., and Gomez, S. (1985). The tunneling method applied
to global optimization, in Boggs, Byrd, and Schnabel (1985), pp.
213–244.
113. Levy, A. V., Montalvo, A., Gomez, S., and Calderon, A. (1981). Top-
ics in Global Optimization, Lecture Notes in Mathematics No.909,
Springer-Verlag, New York.
117. McAllester, D., Van Hentenryck, P., and Kapur, D. (1995), Three cuts
for accelerated interval propagation, Massachusetts Inst. of Tech.
Artificial Intelligence Lab. memo no. 1542.
160. Ratschek, H., and Rokne, J. (1984). Computer Methods for the Range
of Functions, Halstead Press, New York.
161. Ratschek, H., and Rokne, J. (1988). New Computer Methods for
Global Optimization, Wiley, New York.
162. Ratz, D. (1994). Box splitting strategies for the interval Gauss-Seidel
step in a global optimization method, Computing, 53, 337–353.
170. Rohn, J. (1993). Cheap and tight bounds: The recent result of E.
Hansen can be made more efficient, Interval Computations, 4, 13–21.
172. Rokne, J. G., and Bao, P. (1987). Interval Taylor forms, Computing
39, 247–259.
204. Walster, G., Pryce, J.D., and Hansen, E. R. (2002). Practical, excep-
tion-free interval arithmetic on the extended reals. SIAM Journal on
Scientific Computing. The paper was provisionally accepted for pub-
lication; has since been significantly revised; and will be resubmitted
for publication.
209. Wilkinson, J. H. (1980). Turing’s work at the NPL and the Pilot ACE,
DEUCE, and ACE, in Metropolis, et al. (1980).