0% found this document useful (0 votes)
29 views24 pages

Non Linear Optmisation - Notes

Uploaded by

HVR Mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views24 pages

Non Linear Optmisation - Notes

Uploaded by

HVR Mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

C H A P T E R 4

Modeling with Nonlinear


Programming
By nonlinear programming we intend the solution of the general class of problems
that can be formulated as
min f (x)
subject to the inequality constraints

gi (x) ≤ 0

for i = 1, . . . , p and the equality constraints

hi (x) = 0

for i = 1, . . . , q. We consider here methods that search for the solution using
gradient information, i.e., we assume that the function f is differentiable.

EXAMPLE 4.1
Given a fixed area of cardboard A construct a box of maximum volume. The
nonlinear program for this is
min xyz
subject to
2xy + 2xz + 2yz = A

EXAMPLE 4.2
Consider the problem of determining locations for two new high schools in a set of
P subdivisions Nj . Let w1j be the number of students going to school A and w2j
be the number of students going to school B from subdivision Nj . Assume that the
student capacity of school A is c1 and the capacity of school B is c2 and that the
total number of students in each subdivision is rj . We would like to minimize the
total distance traveled by all the students given that they may attend either school
A or B. It is possible to construct a nonlinear program to determine the locations
(a, b) and (c, d) of high schools A and B, respectively assuming the location of each
subdivision Ni is modeled as a single point denoted (xi , yi ).

P
X   21   21
2 2 2 2
min w1j (a − xj ) + (b − yj ) + w2j (c − xj ) + (d − yj )
j=1

64
Section 4.1 Unconstrained Optimization in One Dimension 65

subject to the constraints X


wij ≤ ci
j

w1j + w2j = rj
for j = 1, . . . , P .

EXAMPLE 4.3
Neural networks have provided a new tool for approximating functions where the
functional form is unknown. The approximation takes on the form
X
f (x) = bj σ(aj x − αj ) − β
j

and the corresponding sum of squares error term is


X 2
E(aj , bj , αj , β) = yi − f (xi )
i

The problem of minimizing the error function is, in this instance, an unconstrained
optimization problem. An efficient means for computing the gradient of E is known
as the backpropogation algorithm.

4.1 UNCONSTRAINED OPTIMIZATION IN ONE DIMENSION


Here we begin by considering a significantly simplified (but nonetheless important)
nonlinear programming problem, i.e., the domain and range of the function to be
minimized are one-dimensional and there are no constraints. A necessary condition
for a minimum of a function was developed in calculus and is simply

f 0 (x) = 0

Note that higher derivative tests can determine whether the function is a max or a
min, or the value f (x + δ) may be compared to f (x).
Note that if we let
g(x) = f 0 (x)
then we may convert the problem of finding a minimum or maximum of a function to the
problem of finding a zero.

4.1.1 Bisection Algorithm


Let x∗ be a root, or zero, of g(x), i.e., g(x∗ ) = 0. If an initial bracket [a, b] is known
such that x∗ ∈ [a, b], then a simple and robust approach to determining the root is
to bisect this interval into two intervals [a, c] and [c, b] where c is the midpoint, i.e.,
a+b
c=
2
66 Chapter 4 Modeling with Nonlinear Programming

If
g(a)g(c) < 0
then we conclude
x∗ ∈ [a, c]
while if
g(b)g(c) < 0
then we conclude
x∗ ∈ [b, c]
This process may now be iterated such that the size of the bracket (as well as the
actual error of the estimate) is being divided by 2 every iteration.

4.1.2 Newton’s Method


Note that in the bisection method the actual value of the function g(x) was only
being used to determine the correct bracket for the root. Root finding is accelerated
considerably by using this function information more effectively.
For example, imagine we were seeking the root of a function that was a straight
line, i.e., g(x) = ax + b and our initial guess for the root was x0 . If we extend this
straight line from the point x0 it is easy to determine where it crosses the axis, i.e.,
ax1 + b = 0
so x1 = −b/a. Of course, if the function were truly linear then no first guess
would be required. So now consider the case that g(x) is nonlinear but may be
approximated locally about the point x0 by a line. Then the point of intersection
of this line with the x-axis is an estimate, or second guess, for the root x∗ . The
linear approximation comes from Taylor’s theorem, i.e.,
1
g(x) = g(x0 ) + g 0 (x0 )(x − x0 ) + g 00 (x0 )(x − x0 )2 + . . .
2
So the linear approximation to g(x) about the point x0 can be written
l(x) = g(x0 ) + g 0 (x0 )(x − x0 )
If we take x1 to be the root of the linear approximation we have
l(x1 ) = 0 = g(x0 ) + g 0 (x0 )(x1 − x0 )
Solving for x1 gives
g(x0 )
x1 = x0 −
g 0 (x0 )
or at the nth iteration
g(xn )
xn+1 = xn −
g 0 (xn )
The iteration above is for determining a zero of a function g(x). To determine
a maximum or minimum value of a function f we employ condition that f 0 (x) = 0.
Now the iteration is modified as as
f 0 (xn )
xn+1 = xn − 00
f (xn )
Section 4.2 Unconstrained Optimization in Higher Dimensions 67

4.2 UNCONSTRAINED OPTIMIZATION IN HIGHER DIMENSIONS


Now we consider the problem of minimizing (or maximizing) a scalar function of
many variables, i.e., defined on a vector field. We consider the extension of Newton’s
method presented in the previous section as well as a classical approach known as
steepest descent.

4.2.1 Taylor Series in Higher Dimensions


Before we extend the search for extrema to higher dimensions we present Taylor
series for functions of more than one domain variable. To begin, the Taylor series
for a function of two variables is given by
∂g ∂g
g(x, y) =g(x(0) , y (0) ) + (x − x(0) ) + (y − y (0) )
∂x ∂y
∂ 2 g (x − x(0) )2 ∂ 2 g (y − y (0) )2 ∂2g
+ 2
+ 2 + (x − x(0) )(y − y (0) )
∂x 2 ∂y 2 ∂x∂y
+ higher order terms

In n variables x = (x1 , . . . , xn )T the Taylor series expansion becomes


1
g(x) = g(x(0) ) + ∇g(x(0) )(x − x(0) ) + (x − x(0) )T Hg(x(0) )(x − x(0) ) + · · ·
2
where the Hessian matrix is defined as
 ∂ 2 g(x)
Hg(x) ij =
∂xi ∂xj
and the gradient is written as a row vector, i.e.,
 ∂g(x)
∇g(x) i =
∂xi

4.2.2 Roots of a Nonlinear System


We saw that Newton’s method could be used to develop an iteration for determining
the zeros of a scalar function. We can extend those ideas for determining roots of
the nonlinear system
g1 (x1 , . . . , xn ) = 0
g2 (x1 , . . . , xn ) = 0
..
.
gn (x1 , . . . , xn ) = 0
or, more compactly,
g(x) = 0.
Now we apply Taylor’s theorem to each component gi (x1 , . . . , xn ) individually, i.e.,
retaining only the first order terms we have the linear approximation to gi about
the point x(0) as
li (x) = gi (x(0) ) + ∇gi (x(0) )(x − x(0) )
68 Chapter 4 Modeling with Nonlinear Programming

for i = 1, . . . , n. We can write these components together as a vector equation

l(x) = g(x(0) ) + Jg(x(0) )(x − x(0) )

where now
 ∂gi (x)
Jg(x)) ij =
∂xj
is the n × n–matrix whose rows are the gradients of the components gi of g. This
matrix is called the Jacobian of g.
As in the scalar case we base our iteration on the assumption that

l(x(k+1) ) = 0

Hence,
g(x(k) ) + Jg(x(k) )(x(k+1) − x(k) ) = 0
and given x(k) we may determine the next iterate x(k+1) by solving an n × n system
of equations.

4.2.3 Newton’s Method


In this chapter we are interested in minimizing functions of several variables. Anal-
ogously with the scalar variable case we may modify the above root finding method
to determine maxima (or minima) of a function f (x1 , . . . , xn ). To compute an
extreme point we require that ∇f = 0, hence we set

∂f (x) ∂f (x) T
g(x) = ,..., .
∂x1 ∂xn

Substituting
∂f (x)
gi (x) =
∂xi
into
g(x(k) ) + Jg(x(k) )(x(k+1) − x(k) ) = 0
produces
∇f (x(k) ) + Hf (x(k) )(x(k+1) − x(k) ) = 0
where
  ∂gi (x) ∂ 2 f (x)
Hf (x) ij = Jg(x) ij = =
∂xj ∂xi ∂xj

Again we have a linear system for x(k+1) .

4.2.4 Steepest Descent


Another form for Taylor’s formula in n-variables is given by

f (x + th) = f (x) + t∇f (x)h + higher order terms


Section 4.3 Constrained Optimization and Lagrange Multipliers 69

where again (∇f (x))i = ∂f (x)/∂xi . Now t is a scalar and x + th is a ray emanating
from the point x in the direction h. We can compute the derivative of the function
f (x + th) w.r.t. t as
df
(x + th) = ∇f (x + th)h.
dt
Evaluating the derivative at the point t = 0 gives

df
(x + th)|t=0 = ∇f (x)h
dt
This quantity, known as the directional derivative of f , indicates how the function is
changing at the point x in the direction h. Recall from calculus that the direction of
maximum increase (decrease) of a function is in the direction of the gradient (neg-
ative gradient). This is readily seen from the formula for the directional derivative
using the identity
∇f (x)h = k∇f (x)kkhk cos(θ)
where θ is the angle between the vectors ∇f (x) and h. Here kak denotes the
Euclidean norm of a vector a. We can assume without loss of generality that h is
of unit length, i.e., khk = 1. So the quantity on the right is a maximum when the
vectors h and ∇f (x) point in the same direction so θ = 0.
This observation may be used to develop an algorithm for unconstrained func-
tion minimization. With an appropriate choice of the scalar step-size α, the itera-
tions
x(k+1) = x(k) − α∇f (x(k) ) (4.1)
will converge (possibly slowly) to a minimum of the function f (x).

4.3 CONSTRAINED OPTIMIZATION AND LAGRANGE MULTIPLIERS


Consider the constrained minimization problem

min f (x)

subject to
ci (x) = 0
i = i, . . . , p. It can be shown that a necessary condition for a solution to this
problem is provided by solving

∇f = λ1 ∇c1 + · · · + λp ∇cp

where the λi are referred to as Lagrange multipliers. Consider the case of f, c being
functions of two variables and consider their level curves. In Section 4.4 we will
demonstrate that an extreme value of f on a single constraint c is given when the
gradients of f and c are parallel. The equation above generalizes this to several
constraints ci : an extreme value is given if the gradient of f is a linear combination
of the gradients of the ci .
We demonstrate a solution via this procedure by recalling our earlier example.
70 Chapter 4 Modeling with Nonlinear Programming

EXAMPLE 4.4
Given a fixed area of cardboard A construct a box of maximum volume. The
nonlinear program for this is
min xyz
subject to
2xy + 2xz + 2yz = A
Now f (x, y, z) = xyz and c(x, y, z) = 2xy + 2yz + 2xz − A. Substituting these
functions into our condition gives

∇f = λ∇c

which produces the system of equations

yz − λ(2y + 2z) = 0
xz − λ(2x + 2z) = 0
xy − λ(2y + 2x) = 0

These equations together with the constraints provide four equations for (x, y, z, λ).
If we divide the first equation by the second we find x = y. Similarly, if the second
equation is divided by the third we obtain y = z. From the constraint it follows
then that 6x2 = A, hence the solution is
r
A
x=y=z= .
6
In this special case the nonlinear system could be solved by hand. Typically this is
not the case and one must resort to numerical techniques such as Newton’s method
to solve the resulting (n + m) × (n + m) system

g(x1 , . . . , xn , λ1 , . . . , λm ) = 0.

4.4 GEOMETRY OF CONSTRAINED OPTIMIZATION


4.4.1 One Equality Constraint
Consider a two variable optimization problem

min f (x, y)

subject to
c(x, y) = 0.
Geometrically the constraint c = 0 defines a curve C in the (x, y)–plane, and the
function f (x, y) is restricted to that curve. If we could solve the constraint equation
for y as y = h(x), the problem would reduce to an unconstrained, single variable
optimization problem
min f (x, h(x)).
Section 4.4 Geometry of Constrained Optimization 71

From calculus we know that a necessary condition for a minimum is


d ∂f ∂f
f (x, h(x)) = (x, h(x)) + (x, h(x))h0 (x) = 0. (4.2)
dx ∂x ∂y
Since c(x, h(x)) = 0, we also have
d ∂f ∂c
c(x, h(x)) = (x, h(x)) + (x, h(x))h0 (x) = 0. (4.3)
dx ∂x ∂y
A necessary condition for equations (4.2) and (4.3) to hold simultaneously is
∂f ∂c ∂f ∂c
− = 0. (4.4)
∂x ∂y ∂y ∂x
From elementary linear algebra we know that if an equation ad − bc = 0 holds then
the vectors (a, b) and (c, d) are linearly dependent, i.e. collinear, and so one of them
is a multiple of the other. Thus there exists a constant λ such that

∇f = λ∇c. (4.5)

Now let’s look more closely at the curve C. The tangent of the curve y = h(x)
at a point (x0 , y0 ) = (x0 , h(x0 )) is given by

y = (x − x0 )h0 (x0 ) + y0 .

We set x − x0 = t and write this equation in vector form as


     
x x0 1
= +t .
y y0 h0 (x0 )

The vector T = [1, h0 (x0 )]T points into the direction of the tangent line and is
called a tangent vector of C at (x0 , y0 ). Equation (4.3) tells that T is orthogonal to
∇c(x0 , y0 ). Thus at every point on C the gradient ∇c is orthogonal to the tangent
of C.
For level contours f (x, y) = f0 at level f0 (an arbitrary constant) the situation
is analogous, i.e., at each point on the contour the gradient ∇f is orthogonal to the
tangent. Moreover, it is shown in multivariable calculus that ∇f points into the
region in which f is increasing as illustrated in Figure 4.1. Note that the vector
(∂f /∂y, −∂f /∂x) is orthogonal to ∇f and so is a tangent vector.
At a point (x0 , y0 ) on C for which (4.5) holds, the level contour of f0 =
f (x0 , y0 ) intersects the curve C. Since the gradients of f and c are collinear at this
point, the tangents of the contour f = f0 and the curve c = 0 coincide, hence the
two curves meet tangentially. Thus the condition (4.5) means geometrically that
we search for points at which a level contour and the constraint curve C have a
tangential contact.

EXAMPLE 4.5
Consider the problem of finding all maxima and minima of

f (x, y) = x2 − y 2
72 Chapter 4 Modeling with Nonlinear Programming

∇f

T f>f0

tangent line
(x0,y0)
f<f0

contour f=f0

FIGURE 4.1: The gradient of f is orthogonal to the tangent of a level contour and points into the
region of increasing f .

subject to
x2 + y 2 = 1. (4.6)
The equation (4.5) becomes

2x = 2λx (4.7)
2y = −2λy, (4.8)

and (4.6)–(4.8) are three equations for (x, y, λ). Equation (4.7) has the solution
x = 0 and the solution λ = 1 if x 6= 0. If x = 0, (4.6) leads to y = ±1 giving
the solution points (0, ±1) with values f (0, ±1) = −1. If x 6= 0 and λ = 1, (4.8)
implies y = 0 and so x = ±1 from (4.6). This leads to the solution points (±1, 0)
with values f (±1, 0) = 1. Hence the points (0, ±1) yield minima and (±1, 0) yield
maxima.
In Figure 4.2 (a) some level contours of f and the constraint circle (4.6) are
shown. The contours f = 1 and f = −1 are the only contours that meet this circle
tangentially. The points of tangency are the maximum and minimum points of f
restricted to the unit circle.
A slightly more complicated objective function is

f (x, y) = x3 + y 2 .

Again we seek all maxima and minima of f subject to the constraint (4.6). The
equation (4.5) now results in

3x2 = 2λx (4.9)


2y = 2λy. (4.10)

Equation (4.9) has the solution x = 0 and λ = 3x/2 if x 6= 0. If x = 0 we find y = ±1


from (4.6) giving the solutions (0, ±1) with values f (0, ±1) = 1. If λ = 3x/2 6= 0,
equation (4.10) has the solutions y = 0 and λ = 1 if y 6= 0. Now if y = 0 we find
Section 4.4 Geometry of Constrained Optimization 73

1.5 1.5

f=1
1 1
f=23/27
0.5 0.5
f=0 f=−1

y
0
f=0

y
0
f=−1 f=−1
−0.5 −0.5 f=1/8
−1
−1
f=1 f=1
−1.5 f=1.52
−1.5
−1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5
x x
(a) (b)

FIGURE 4.2: Unit circle x2 + y 2 = 1 (dashed) and level contours of (a): f (x, y) = x2 − y 2 , (b):
f (x, y) = x3 + y 2 . The points of tangency are the extreme points of f (x, y) restricted to the unit
circle.

x = ±1 from (4.6) giving the solutions (±1, 0) with


√ values f (±1, 0) = ±1. If y 6= 0
it follows that λ = 1, hence√x = 2/3, and so y = ± 5/3 from (4.6). The f -values of
the solution points (2/3, ± 5/3) are both 23/27 < 1. Thus there is a single global
minimum f = −1 at (−1, 0), and three global maxima f = 1 at (0, ±1) and (1, 0).
Some level contours of f and the constraint curve (4.6) are shown in Figure
4.2 (b). Note that the zero contour forms a cusp, y = ±(−x)3/2 , x ≤ 0. The
points of tangency of a level contour and the√constraint curve are again identified
with extreme points. Since the points (2/3, ± 5/3) are located between the global
maximum points they must correspond to local minima.

In three dimensions the equation ∇f = λ∇c, resulting from an optimization


problem with a single constraint, implies that at a solution point a level surface
f (x, y, z) = f0 is tangent to the constraint surface c(x, y, z) = 0.

EXAMPLE 4.6
Find the maxima and minima of

f (x, y, z) = 5x + y 2 + z

subject to
x2 + y 2 + z 2 = 1. (4.11)
The equation ∇f = λ∇c now leads to

5 = 2λx (4.12)
2y = 2λy (4.13)
1 = 2λz. (4.14)
74 Chapter 4 Modeling with Nonlinear Programming

From (4.12) and (4.14) we infer that x = 5z, and (4.13) has the solutions y = 0 and
λ = 1 if y 6= 0. √
Assume first y√= 0. The constraint
√ (4.11) implies
√ x2 +z 2 2
√ = 26z = 1,
hence z = ±1/ 26, x = ±5/ 26, and f (±5/ 26, 0, ±1/ 26) = ± 26.
Now assume y 6= 0, hence λ = 1, and so x = 5/2, z = 1/2. The constraint
(4.11) then yields y 2 = 1 which has no solution. Thus√there is a √
√ 26/4 + √ unique
maximum at (5/ 26, 0, 1/ 26) and a unique minimum at (−5/ 26, 0, −1/ 26).

EXAMPLE 4.7
Find the maxima and minima of

f (x, y, z) = 8x2 + 4yz − 16z (4.15)

subject to the constraint


4x2 + y 2 + 4z 2 = 16. (4.16)
Note that (4.16) defines an ellipsoid of revolution. The equation ∇f = λ∇c yields

16x = 8λx (4.17)


4z = 2λy (4.18)
4y − 16 = 8λz. (4.19)

From (4.18) we find z = λy/2 and then from (4.19) 4y − 16 = 4λ2 y, i.e.

4 2λ
y= 2
, z= .
1−λ 1 − λ2
Equation (4.17) has the solutions x = 0 and λ = 2 if x 6= 0. Assume first x = 0.
Substituting y, z and x = 0 into (4.16) yields a single equation for λ which can be
manipulated to λ2 (3 − λ2 ) = 0, i.e. λ√= 0 or λ2 = 3. Setting λ = 0√leads to y = 4,
√ 0. For λ = ∓ 3 we find y = 2 and z = ± 3, with values
z = 0, and√f (0, 4, 0) =
f (0, −2, ± 3) = ∓24 3.
If x 6= 0 we have λ = 2 and so y = z = −4/3. The missing value of x is again
found from (4.16) as x = ±4/3. The values of f at these points are both 128/3.
Thus the maxima and minima of f are
√ √
fmax = f (±4/3, −4/3, −4/3) = 128/3, fmin = f (0, −2, 3) = −24 3.

The level surfaces for the minimum and maximum values of f and the constraint
ellipsoid are shown in Figure 4.3. We see in this figure that the solution points are
points of tangency of a level surface and the constraint surface.

4.4.2 Several Equality Constraints


If several constraints are present, the situation is trivial when the number of (in-
dependent) constraints equals the number of variables. In this case all constraints
typically are satisfied only by a finite number of points, if any, and one merely
Section 4.4 Geometry of Constrained Optimization 75

6
f=−41.6
5

z
0

−1

−2

−3 f=42.7
−5
−4 0
−3 −2 −1 0 y
1 2 3 5
x

FIGURE 4.3: Level surfaces f = fmin ≈ −41.6 and f = fmax ≈ 42.7 for f (x, y, z) defined by
equation (4.15). Both level surfaces have a tangential contact with the constraint ellipsoid (4.16).

has to evaluate the objective function at these points to find the global maxima
or minima. Lagrange multiplies are needed if the number of constraints is smaller
than the number of variables.

Consider for simplicity the case of three variables (x, y, z) and two constraints
c1 (x, y, z) = 0, c2 (x, y, z) = 0. Each of the two constraints defines a surface in
three dimensional (x, y, z)–space, and both constraints together define a curve C,
the intersection of the two constraint surfaces. (Two non–parallel planes in three
dimensional space intersect in a straight line. Likewise, two curved surfaces typically
intersect in a curve.) Now a level set f (x, y, z) = f0 also defines a surface, and the
condition for f to have an extreme point when restricted to C is again that a level
surface and C meet tangentially at some point (x0 , y0 , z0 ). This condition means
that the tangent line of C at the point of contact is entirely in the tangent plane
of the level surface. Since the tangent line of C is the intersection of the tangent
planes of the two constraint surfaces, the tangency condition means that all three
tangent planes intersect in a line. This is a special condition because in general
three planes in three dimensional space have only a single point in common.

As in two dimensions, the gradient ∇f (x0 , y0 , z0 ) is orthogonal to the tangent


plane of the level surface f (x, y, z) = f (x0 , y0 , z0 ) at (x0 , y0 , z0 ). The same holds
for the tangent planes of the constraint surfaces c1 = 0 and c2 = 0. The condition
that these planes intersect in a line implies that the three gradient vectors to which
they are orthogonal are all located in the normal plane of that line and hence are
linearly dependent as illustrated in Figure 4.4. Thus one of these gradient vectors
is a linear combination of the other two, which we write as ∇f = λ1 ∇c1 + λ2 ∇c2 .
For more variables and constraints the situation is similar.
76 Chapter 4 Modeling with Nonlinear Programming

∇c1
∇f
∇c2

tangent
planes

C
tangent

FIGURE 4.4: At a solution point of a three–variable optimization problem with two constraints the
tangent plane of the level surface of f and the tangent planes of the two intersecting constraint surfaces
c1 = 0 and c2 = 0 intersect in the tangent of the constraint curve C. As a consequence all three
gradients are in the normal plane of C and so are linearly dependent.

EXAMPLE 4.8
Find the maxima and minima of

f (x, y, z) = x2 + y 2 − z

subject to

x2 + y 2 = 1
2 2
x +z = 1.

Here we can find a parametric representation of the constraint curve C. Substi-


tuting x2 = 1 − z 2 from the second constraint equation into the first constraint
equation yields y 2 = z 2 , i.e. z = ±y. The first constraint defines a circle which
we parametrize as x = cos ϕ, y = sin ϕ, where −π ≤ ϕ ≤ π. Thus the constraints
define two curves
C± : (x, y, z) = (cos ϕ, sin ϕ, ± sin ϕ).
Note that the two curves intersect if z = 0, i.e., at ϕ = 0 and ϕ = π.
To solve the constrained optimization problem we substitute the parametric
representation of C± into f and set

f± (ϕ) = 1 ∓ sin ϕ.

The extreme points are determined by df± /dϕ = ∓ cos ϕ = 0, hence ϕ = ±π/2, with
values f± (∓π/2) = 2 and f± (±π/2) = 0. Thus there are two maxima at (0, ±1, −1)
and two minima at (0, ±1, 1) with values 2 and 0, respectively. The intersecting
Section 4.4 Geometry of Constrained Optimization 77

f=2
1.5

0.5

z
−0.5
C
−1 f=0
−1.5

1
1
y 0
0 x
−1 −1

FIGURE 4.5: Intersecting constraint cylinders and level surfaces for Example 4.8.

constraint cylinders and the level surfaces for the maximum and minimum values
are shown in Figure 4.5. It can be easily verified a posteriori that the gradient of
f and the gradients of the two constraint functions are linearly dependent at the
four extreme points.

4.4.3 Inequality Constraints


Finally consider the case of inequality constraints for a problem with n variables.
Inequality constraints define a feasible region S in n–dimensional space, and the
objective function is restricted to S. Extreme points can be located in the interior
of S as well as on the boundary. If there are no solutions to ∇f = 0 in the interior,
all extreme points are on the boundary. Assume that c(x) ≥ 0 is one of the
inequality constraints. The boundary of this constraint is the hypersurface defined
by c(x) = 0. Finding an extreme point on this boundary amounts to solving an
optimization problem with a single equality constraint (and possibly an additional
set of inequality constraints). If two inequality constraints c1 ≥ 0, c2 ≥ 0 are
present, the optimal solution may also be located on the intersection of the two
boundary hypersurfaces c1 = c2 = 0 which leads to a problem with two equality
constraints etc. The situation is naturally much more complicated than in linear
programming. Linear programming problems do not have solutions in the interior
of the feasible region.

EXAMPLE 4.9

Consider the problem of minimizing the objective function

x4 x2 y2
f (x, y) = − + .
4 2 2
78 Chapter 4 Modeling with Nonlinear Programming

0.6

0.4

0.2

–0.2

–1

0 –1.5
y –1
–0.5
0.5 0
1 0.5 x
1.5

(a)

1 2
feasible region
0.8 1.5
0.6
1
0.4
f=0.1101
0.5
0.2

y
y

0
−0.2
−0.5
−0.4
−1
−0.6

−0.8 −1.5

−1
−1.5 −1 −0.5 0 0.5 1 1.5 −2
0 0.5 1 1.5 2
x x
(b) (c)

FIGURE 4.6: (a): Three dimensional plot of f (x, y) = x4 /4 − x2 /2 + y 2 /2. (b): Level contours of
f . (c): Contours of f in the right half plane and the constraint boundary x + y = 2.

Unconstrained optimization leads to the equations


∂f
= x3 − x = 0 ⇒ x = 0 or x = ± − 1
∂x
∂f
= y = 0.
∂y

To check the types of the extreme points (0, 0) and (±1, 0) we compute the Hessean
matrices,    
−1 0 1 0
Hf (0, 0) = , Hf (±1, 0) = .
0 1 0 1
From the form of these matrices it follows that (±1, 0) are minimum points (f =
−1/4), and (0, 0) is a saddle point (f = 0). A three–dimensional surface plot of f
is shown in Figure 4.6 (a), and some level contours are displayed in Figure 4.6 (b).
Now consider the problem of minimizing f (x, y) subject to the inequality
constraint
c(x, y) = x + y ≥ 2.
Section 4.5 Modeling Examples 79

Since c(±1, 0) < 2, the global minima of f are not in the feasible region, hence
the optimal solution must be on the boundary. We are then led to the problem of
minimizing f subject to the equality constraint

x + y = 2.

The equation (4.5) leads to

x3 − x = λ, y = λ ⇒ x3 − x − y = 0.

Substituting y = 2 − x from the constraint equation into this equation gives x3 −


2 = 0, with the solution x = 21/3 = 1.2600, and hence y = 2 − 21/3 = 0.7401.
The numerical value of f at this point is 0.11012. Note that the equation for x
also follows directly from the unconstrained, single variable optimization problem
associated with f (x, 2 − x).
In Figure 4.6 (c) the constraint line and some level contours are shown. The
solution point is again revealed as point of tangency.

4.5 MODELING EXAMPLES

EXAMPLE 4.10

A manufacturerer of colored TV’s is planning the introduction of two new products:


a 19–inch stereo color set with a manufacturerer’s suggested retail price of $339 per
year, and a 21–inch stereo color set with a suggested retail price of $339 per year.
The cost of the company is $195 per 19–inch set and $225 per 21–inch set, plus
additional fixed costs of $400, 000 per year. In the competitive market the number
of sales will affect the sales price. It is estimated that for each type of set, the sales
price drops by one cent for each additional unit sold. Furthermore, sales of the
19–set will affect sales of the 21–inch set and vice versa. It is estimated that the
price for the 19–inch set will be reduced by an additional 0.3 cents for each 21–inch
sold, and the price for 21–inch sets will decrease for by 0.4 cents for each 19–inch set
sold. The company believes that when the number of units of each type produced
is consistent with these assumptions all units will be sold. How many units of each
type of set should be manufactured such the profit of the company is maximized?
The relevant variables of this problem are:

s1 : number of units of the 19–inch set produced per year,


s2 : number of units of the 21–inch set produced per year,
p1 : sales price per unit of the 19–inch set ($),
p2 : sales price per unit of the 21–inch set ($),
C: manufacturing costs ($ per year),
R: revenue from sales ($ per year),
P: profit from sales ($ per year).
80 Chapter 4 Modeling with Nonlinear Programming

The market estimates result in the following model equations,

p1 = 339 − 0.01s1 − 0.003s2


p2 = 399 − 0.04s1 − 0.01s2
R = s1 p1 + s2 p2
C = 400, 000 + 195s1 + 225s2
P = R − C.

The profit then becomes a nonlinear function of (s1 , s2 ),

P (s1 , s2 ) = −400, 000 + 144s1 + 174s2 − 0.01s21 − 0.007s1 s2 − 0.01s22 . (4.20)

If the company has unlimited resources, the only constraints are s1 , s2 ≥ 0.

Unconstrained Optimization. We first solve the unconstrained optimization


problem. If P has a maximum in the first quadrant this yields the optimal solution.
The condition for an extreme point of P leads to a linear system of equations for
(s1 , s2 ),
∂P
= 144 − 0.02s1 − 0.007s2 = 0
∂s1
∂P
= 174 − 0.007s1 − 0.02s2 = 0.
∂s2
The solution of these equations is s∗1 = 4735, s∗2 = 7043 with profit value P ∗ =
P (s∗1 , s∗2 ) = 553, 641. Since s∗1 , s∗2 are positive, the inequality constraints are satis-
fied. To determine the type of the extreme point we inspect the Hessean matrix,
 
∗ ∗ −0.02 −0.007
HP (s1 , s2 ) = .
−0.007 −0.02

A sufficient condition for a maximum is that (HP )11 < 0 and det(HP ) > 0. Both
of these conditions are satisfied and so our solution point is indeed a maximum,
in fact a global maximum. In Figure 4.7 (a) a three–dimensional plot of P (s1 , s2 )
is shown. Some level contours are displayed in Figure 4.7 (b). The level contours
play here the role of isoprofit lines. Because P is a nonlinear function, the isoprofit
lines form closed curves that surround the maximum at (s∗1 , s∗2 ).

Constrained Optimization. Now assume the company has limited resources


which restrict the number of units of each type produced per year to

s1 ≤ 5, 000, s2 ≤ 8, 000, s1 + s1 ≤ 10, 000.

The first two constraints are satisfied by (s∗1 , s∗2 ), however s∗1 + s∗2 = 11, 778. The
global maximum point of P is now no longer in the feasible region, thus the optimal
solution must be on the boundary. We therefore solve the constrained optimization
problem
max P
Section 4.5 Modeling Examples 81

5
x 10
6

P(s1,s2)
0

−2

−4
10000
s2 8000
5000 6000
s1
4000
2000
0 0

(a)

12000 370,000 250,000 12000 370,000 250,000


450,000 450,000
10000 500,000 10000 500,000
530,000 532,308
8000 550,000 8000 550,000

s2
2
s

6000 6000

4000 4000

2000 2000 feasible


region
0 0
0 2000 4000 6000 8000 10000
s 0 2000 4000
s1 6000 8000 10000
1

(b) (c)
FIGURE 4.7: (a): Three dimensional plot of P (s1 , s2 ), equation (4.20). (b): Level contours of P .
(c): Level contours of P and feasible region for the constrained optimization problem.

subject to
c(s1 , s2 ) = s1 + s2 − 10, 000 = 0.
We can either substitute s2 or s1 from the constraint equation into P and solve
an unconstrained one–variable optimization problem, or use Lagrange multipliers.
Choosing the second approach, the equation ∇P = λ∇c becomes

144 − 0.02s1 − 0.007s2 = λ


174 − 0.007s1 − 0.02s2 = λ,

which reduces to a single equation for s1 , s2 . Together with the constraint equation
we then have again a system of two linear equations,

−0.013s1 + 0.013s2 = 30
s1 + s2 = 10, 000.

The solution is s∗1 = 3846, s∗2 = 6154, with profit value P ∗ = 532, 308. In Figure
4.7 (c) the feasible region and some contour levels are shown. The optimal solution
82 Chapter 4 Modeling with Nonlinear Programming

y
(0,L)

(0,s2)

(s ,0)
1
b
a x
(−L,0)
(L,0)
(−s ,0)
3

(0,−s )
4

(0,−L)

FIGURE 4.8: Geometry of the problem of Example 4.11.

is revealed as point of tangency of the isoprofit line P = P ∗ and the constraint line.
It is also clear from the figure that the solution point cannot be located on one of
the two other boundary lines s1 = 5, 000 or s2 = 8, 000.

EXAMPLE 4.11
A fish farm has a fish lake on a square area. The length of the diagonal of the
square is 2L. The fish lake has the shape of an ellipse with semi–axes a and b.
The center of the lake is at the center of the square and the semi–axes are on
the diagonals.
√ The owner of the fish farm has fencing material of length l where
l < 4 2L. She wants to surround the lake by a fence in the form of a quadrilateral
whose corner points are on the diagonals of the square. In order that the owner has
enough space to work at the lake, the distance between fence and lake must not be
smaller than a given distance dm . What is the position of the corner points of the
fence such that the enclosed area is maximal?
To formulate this problem as a nonlinear program, we introduce a (x, y)–
coordinate whose origin is at the center of the square. The corner points of the
square are (±L, 0) and (0, ±L). The equation of the fish lake’s boundary is
x2 y2
2
+ 2 = 1.
a b
The corner points of the fence’s quadrilateral have coordinates (s1 , 0), (0, s2 ),
(−s3 , 0), and (0, −s4 ) (0 ≤ sj ≤ L) with (s1 , s2 , s3 , s4 ) to be determined, see Figure
4.8.
To invoke the distance restriction, we have to compute the minimal distance
between the ellipse and the four edges of the quadrilateral. Consider the edge in
Section 4.5 Modeling Examples 83

the first quadrant. The equation of this edge is y = (s2 /s1 )(s1 − x). Some thought
reveals that the minimal distance between this straight line and the ellipse is given
by q
(s1 s2 − d(s1 , s2 ))/ s21 + s22 , (4.21)
where q
d(s1 , s2 ) = a2 s22 + b2 s21 ,
provided s1 s2 ≥ d(s1 , s2 ). Thus the minimum distance condition for this edge can
be formulated as q
s1 s2 − d(s1 , s2 ) ≥ dm s21 + s22 .
The minimum distance conditions for the other three edges are obtained by replac-
ing (s1 , s2 ) in this inequality by (s3 , s2 ), (s3 , s4 ), and (s1 , s4 ), respectively.
The area enclosed by the fence is
1
A(s1 , s2 , s3 , s4 ) = (s1 s2 + s2 s3 + s3 s4 + s4 s1 ).
2
Now the optimization problem can be formulated as

max A(s1 , s2 , s3 , s4 )

subject to the inequality constraints


q
s1 s2 − d(s1 , s2 ) ≥ dm s21 + s22
q
s3 s2 − d(s3 , s2 ) ≥ dm s23 + s22
q
s3 s4 − d(s3 , s4 ) ≥ dm s23 + s24
q
s1 s4 − d(s1 , s4 ) ≥ dm s21 + s24
sj ≤ L (1 ≤ j ≤ 4),

and the equality constraint


q q q q
s21 + s22 + s22 + s23 + s23 + s24 + s24 + s21 = l.

Note that we don’t need to impose the constraints s1 ≥ a and s2 ≥ b. The


minimum distance requirement for (s1 , s2 ) implies s1 ss ≥ d(s1 , s2 ) and this can be
only satisfied if s1 ≥ a and s2 ≥ b.
84 Chapter 4 Modeling with Nonlinear Programming

PROBLEMS
4.1. Extend Example 4.2 for a collection of S schools. √
4.2. Show how Newton’s method for root finding can be used to calculate 3. Com-
pute numerically an iterated sequence that converges to this value. Stop the
iteration if |xn+1 − xn | ≤ 10−5 . What is the effect of changing the initial condi-
tion?
4.3. Use Newton’s method to find the positive root of

g (x) = x − tanh(2x) = 0

up to five decimal places.


4.4. Plot f (x) = x sin(x) in 0 ≤ x ≤ 15 and convince yourself that f (x) has three
local maxima in that range. Compute these maxima up to five decimal places
using Newton’s method.
4.5. Let
f (x, y ) = x4 + y 3 + xy 2 + x2 − y + 1.

Find the quadratic approximation of f (x, y ) at the points


(a) x0 = y0 = 0,
(b) x0 = 1, y0 = 0,
(c) x0 = y0 = 2.
4.6. Compute the Jacobian of
 
x1 x2 − x1 − 1
g (x) =  x1 x2 x3 − 2x2 
2
e−x1 − 3x3 − 1

at x0 = [0, 0, 0, ]T and x0 = [1, 1, 1]T .


4.7. Minimize the objective function

f (x1 , x2 ) = 7x21 + 2x1 x2 + x22 + x41 + x42

using 50 iterations of
(a) Newton’s method
(b) Steepest Descent
with starting value x0 = (3, 3)T . Plot the values of the iterates for each method
on the same graph. You may experiment with the value of α in Equation (4.1).
Hint: start small.
4.8. Consider the system of equations

g1 (x, y ) ≡ x3 + y 3 − 1 = 0,
−x
g2 (x, y ) ≡ ye − sin(y ) − a = 0.

Apply Newton’s method to find two different solutions for each of the values
a = 0.5 and a = 1. Use at most 101 iterations and truncate the computation if

ε ≡ |g1 (x, y )| + |g2 (x, y )| < 10−10 .

Provide the solutions, the starting values, the numbers of iterations, and the final
values of ε in your answer.
Section 4.5 Modeling Examples 85

4.9. Find the minimum of the function

f (x, y ) = 7x2 + 2xy + y 2 + x4 + y 4 + x − y

using Newton’s method. Use at most 101 iterations and truncate the computa-
tion if
∂f (x, y ) ∂f (x, y )
ε≡| |+| | < 10−10 .
∂x ∂y
Provide the solution, the starting value, the number of iterations, and the final
value of ε in your answer.
4.10. Find the minimum of f (x, y ) given in Problem 4.9 using the steepest descent
method with α = 0.04, α = 0.06, α = 0.08, α = 0.1 and α = 0.12. Choose
(x0 , y0 ) = (1, 1) as starting value. Summarize the final values of ε as defined
in Problem 4.21 and the approximate solutions for each of the five values of α
in a table. What is the effect of the magnitude of α on the performance of the
steepest descent method?
4.11. Assume a farmer has L feet of fencing for a rectangular area with lengths x and
y . Determine these lengths such that the enclosed area is a maximum.
4.12. Consider an ellipse with semi-axes a ≥ b. The area enclosed pby the ellipse is
A = πab and the circumference is L = 4aE (a), where e = 1 − b2 /a2 is the
eccentricity and E (e) is the complete elliptic integral of the second kind – a
given function of e. Show that the constrained optimization problem

max(πab)

subject to
4aE (e) = L

leads to the following equation for e,

e 2E 0 (e)
2
=− ,
1−e E (e)

where E 0 (e) = dE (e)/de. Note: It turns out that the only solution of this equation
is e = 0, i.e. a = b. Thus the area of an ellipse with prescribed circumference is
a maximum if the ellipse degenerates to a circle.
4.13. Find all extreme points (local maxima and minima) of

f (x, y ) = x3 + y 2

subject to
y 2 − x2 = 1.

Make a sketch showing the constraint curve, some level curves of f , and the
extreme points as points of tangencies.
4.14. Find the minimum distance of the surface

2x2 + y 2 − z 2 = 1

to the origin.
86 Chapter 4 Modeling with Nonlinear Programming

4.15. Find the points on the unit sphere

x2 + y 2 + z 2 = 1,

for which the function

f (x, y, z ) = 2x2 + y 2 − z 2 − x

has a global maximum and a global minimum, respectively.


4.16. A manufacturer of personal computers currently sells 10, 000 units per month of
a basic model. The manufacture cost per unit is $700 and the current sales price
is $950. During the last quarter the manufacturer lowered the price by $100 in
a few test markets, and the result was a 50% increase in orders. The company
has been advertising its product nationwide at a cost of $50, 0000 per month.
The advertising agency claims that increasing the advertising budget by $10, 000
per month would result in a sales increase of 200 units per month. Management
has agreed to consider an increase in the advertising budget to no more than
$100, 000 per month.
Determine the price and the advertising budget that will maximize the profit.
Make a table comparing the maximal profit and the corresponding values of the
price, the advertising budget, and the number of sales to their current values,
and to the optimal values that would result without advertisement.
Hint: Let N be the number of sales per month. Write N = N0 + ∆Np + ∆Na ,
where N0 is the current value of N , ∆Np is the increase of N due to price
reduction, and ∆Na is the increase of N due to increasing the advertising budget.
Note: If you don’t find a solution in the interior of the feasible region, the optimal
solution is on a boundary.
4.17. A local newspaper currently sells for $1.50 per week and has a circulation of
80, 000 subscribers. Advertising sells for 250/page, and the paper currently sells
350 pages per week (50 pages/day). The management is looking for ways to
increase profit. It is estimated that an increase of 10 cents/week in the sub-
scription price will cause a drop of 5, 000 subscribers. Increasing the price of
advertising by $100/page will cause the paper to lose approximately 50 pages of
advertising in a week. The loss of advertising will also affect circulations, since
one of the reasons people by the newspaper is the advertisement. It is estimated
that a loss of 50 pages of advertisement per week will reduce circulation by 1, 000
subscribers.
(a) Find the weekly subscription price and advertisement price that will maxi-
mize the profit.
(b) Same as (a), but now with the constraint that the advertising price cannot
be increased beyond $400.
Hint: Let M be the number of advertising pages per week. Write M = M0 +∆Ma ,
where M0 is the current value of M , and ∆Ma is the change caused by increasing
the advertising price. Proceed similarly for N , the number of subscribers. Here
you have to consider two causes of change.
4.18. Verify the expression (4.21) in Example 4.11.
In Exercises 4.19–4.25 use an optimization software such as the fmincon function of
Matlab to find the optimal solution.
4.19. Redo the problem of Example 4.10, but now choose as objective function the
marginal profit, i.e., the ratio (R−C )/C of the profit and the total manufacturing
costs.
Section 4.5 Modeling Examples 87

4.20. Maximize the volume xyz of a cardboard subject to the equality constraint xy +
xz + yz = 4 and the inequality constraints

0 ≤ x ≤ 0 .5
2≤y≤3
z ≥ 1.

4.21. Find the (unconstrained) minimum of


2
f (x, y, z ) = x6 + x2 ∗ y 2 + y 4 + z 4 + e−z sin(x + y ).

4.22. Find the minimum and maximum of

f (x, y ) = x3 + y 2 − xy

subject to
x2 + 4y 2 ≤ 2.
4.23. Find the minimum of
2
f (x, y, z ) = sin(x + y ) + cos(y + z ) − e−x

subject to
(a)
x2 + y 2 = 1, z 2 ≤ 1, x2 ≥ y 2 .
(b) constraints as in (a) and in addition

x ≥ 0, y ≤ 0.

4.24. Solve the fencing problem of Example 4.11 for L = 4, a = 1.5, b = 2.5, and
(a) l = 20, dm = 0.3,
(b) l = 20, dm = 0.4,
(c) l = 17, dm = 0.1.
Hint: A good starting value for s1 is (a + L)/2.
4.25. Solve the school problem of Example 4.2 for five districts with coordinates

xj 0 0 0 −100 100
,
yj 0 100 −100 0 0

and
(a) r1 = 200, r2 = 300, r3 = 200, r4 = 500, r5 = 300, c1 = 1500, c2 = 1500,
(b) r1 = 200, r2 = 400, r3 = 200, r4 = 500, r5 = 300, c1 = 700, c2 = 2000.
Hint: A reasonable starting value for wij is rj /2. For the coordinates (a, b, c, d)
you may try (0, 0, 0, 0), (100, 0, −100, 0), or (50, 50, −50, −50).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy