Numerical Optimization - Solutions Manual
Numerical Optimization - Solutions Manual
NUMERICAL OPTIMIZATION
by J. Nocedal and S.J. Wright
Second Edition
Solution Manual Prepared by:
Frank Curtis
Long Hei
Gabriel Lopez-Calva
Jorge Nocedal
Stephen J. Wright
1
Contents
1 Introduction 6
2 Fundamentals of Unconstrained Optimization 6
Problem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Problem 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Problem 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Problem 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Problem 2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Problem 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Problem 2.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Problem 2.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Problem 2.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Problem 2.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Problem 2.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Problem 2.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Problem 2.16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Line Search Methods 14
Problem 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Problem 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Problem 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Problem 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Problem 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Problem 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Problem 3.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Problem 3.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Trust-Region Methods 20
Problem 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Problem 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Problem 4.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Problem 4.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Problem 4.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 Conjugate Gradient Methods 23
Problem 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Problem 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Problem 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2
Problem 5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Problem 5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Problem 5.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Problem 5.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6 Quasi-Newton Methods 28
Problem 6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Problem 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7 Large-Scale Unconstrained Optimization 29
Problem 7.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Problem 7.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Problem 7.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Problem 7.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
8 Calculating Derivatives 31
Problem 8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Problem 8.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Problem 8.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
9 Derivative-Free Optimization 33
Problem 9.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Problem 9.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
10 Least-Squares Problems 35
Problem 10.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Problem 10.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Problem 10.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Problem 10.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Problem 10.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
11 Nonlinear Equations 39
Problem 11.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Problem 11.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Problem 11.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Problem 11.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Problem 11.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Problem 11.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Problem 11.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3
12 Theory of Constrained Optimization 43
Problem 12.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Problem 12.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Problem 12.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Problem 12.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Problem 12.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Problem 12.16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Problem 12.18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Problem 12.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
13 Linear Programming: The Simplex Method 49
Problem 13.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Problem 13.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Problem 13.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Problem 13.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
14 Linear Programming: Interior-Point Methods 52
Problem 14.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Problem 14.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Problem 14.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Problem 14.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Problem 14.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Problem 14.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Problem 14.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Problem 14.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Problem 14.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Problem 14.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Problem 14.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
15 Fundamentals of Algorithms for Nonlinear Constrained Op-
timization 62
Problem 15.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Problem 15.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Problem 15.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Problem 15.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Problem 15.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Problem 15.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4
16 Quadratic Programming 66
Problem 16.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Problem 16.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Problem 16.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Problem 16.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Problem 16.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Problem 16.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
17 Penalty and Augmented Lagrangian Methods 70
Problem 17.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Problem 17.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Problem 17.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
18 Sequential Quadratic Programming 72
Problem 18.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Problem 18.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
19 Interior-Point Methods for Nonlinear Programming 74
Problem 19.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Problem 19.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Problem 19.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5
1 Introduction
No exercises assigned.
2 Fundamentals of Unconstrained Optimization
Problem 2.1
f
x
1
= 100 2(x
2
x
2
1
)(2x
1
) + 2(1 x
1
)(1)
= 400x
1
(x
2
x
2
1
) 2(1 x
1
)
f
x
2
= 200(x
2
x
2
1
)
=f(x) =
_
400x
1
(x
2
x
2
1
) 2(1 x
1
)
200(x
2
x
2
1
)
_
2
f
x
2
1
= 400[x
1
(2x
1
) + (x
2
x
2
1
)(1)] + 2 = 400(x
2
3x
2
1
) + 2
2
f
x
2
x
1
=
2
f
x
1
x
2
= 400x
1
2
f
x
2
2
= 200
=
2
f(x) =
_
400(x
2
3x
2
1
) + 2 400x
1
400x
1
200
_
1. f(x
) =
_
0
0
_
and x
=
_
1
1
_
is the only solution to f(x) = 0
2.
2
f(x
) =
_
802 400
400 200
_
is positive denite since 802 > 0, and det(
2
f(x
)) =
802(200) 400(400) > 0.
3. f(x) is continuous.
(1), (2), (3) imply that x
=
_
4
3
_
.
This is the only point satisfying the rst order necessary conditions.
2
f(x) =
_
2 0
0 4
_
is not positive denite, since det(
2
f(x)) = 8 < 0.
Therefore, x
2
[f(x)] is also not positive denite. Therefore x
is NOT a maximizer.
Thus x
i=1
a
i
x
i
f
1
(x) =
_
_
f
1
x
1
. . .
f
1
xn
_
_ =
_
_
a
1
. . .
a
n
_
_
= a
2
f
1
(x) =
_
_
2
f
1
x
2
1
2
f
1
x
2
x
1
. . .
.
.
.
.
.
.
.
.
.
_
_
=
_
2
P
i
a
i
x
i
xsxt
_
s = 1 n
t = 1 n
= 0
7
6 5.5 5 4.5 4 3.5 3 2.5 2
1
1.5
2
2.5
3
3.5
4
4.5
5
Figure 1: Contour lines of f(x).
(2)
f
2
(x) = x
T
Ax =
n
i=1
n
j=1
A
ij
x
i
x
j
f
2
(x) =
_
f
2
xs
_
s=1n
=
_
j
A
sj
x
j
+
i
A
is
x
i
s=1n
=
_
2
n
j=1
A
sj
x
j
s=1n
(since A is symmetric)
= 2Ax
2
f
2
(x) =
_
2
f
2
xsxt
_
s = 1 n
t = 1 n
=
_
2
P
i
P
j
A
ij
x
i
x
j
xsxt
_
s = 1 n
t = 1 n
=
_
A
st
+A
ts
s = 1 n
t = 1 n
= 2A
8
Problem 2.4
For any univariate function f(x), we know that the second oder Taylor
expansion is
f(x + x) = f(x) +f
(1)
(x)x +
1
2
f
(2)
(x +tx)x
2
,
and the third order Taylor expansion is
f(x + x) = f(x) +f
(1)
(x)x +
1
2
f
(2)
(x)x
2
+
1
6
f
(3)
(x +tx)x
3
,
where t (0, 1).
For function f
1
(x) = cos (1/x) and any nonzero point x, we know that
f
(1)
1
(x) =
1
x
2
sin
1
x
, f
(2)
1
(x) =
1
x
4
_
cos
1
x
+ 2xsin
1
x
_
.
So the second order Taylor expansion for f
1
(x) is
cos
1
x+x
= cos
1
x
+
_
1
x
2
sin
1
x
_
x
1
2(x+tx)
4
_
cos
1
x+tx
2(x +tx) sin
1
x+tx
_
x
2
,
where t (0, 1). Similarly, for f
2
(x) = cos x, we have
f
(1)
2
(x) = sin x, f
(2)
2
(x) = cos x, f
(3)
2
(x) = sin x.
Thus the third order Taylor expansion for f
2
(x) is
cos (x + x) = cos x (sin x)x
1
2
(cos x)x
2
+
1
6
[sin (x +tx)]x
3
,
where t (0, 1). When x = 1, we have
cos (1 + x) = cos 1 (sin 1)x
1
2
(cos 1)x
2
+
1
6
[sin (1 +tx)]x
3
,
where t (0, 1).
9
Problem 2.5
Using a trig identity we nd that
f(x
k
) =
_
1 +
1
2
k
_
2
(cos
2
k + sin
2
k) =
_
1 +
1
2
k
_
2
,
from which it follows immediately that f(x
k+1
) < f(x
k
).
Let be any point in [0, 2]. We aim to show that the point (cos , sin )
on the unit circle is a limit point of x
k
.
From the hint, we can identify a subsequence
k
1
,
k
2
,
k
3
, . . . such that
lim
j
k
j
= . Consider the subsequence x
k
j
j=1
. We have
lim
j
x
k
j
= lim
j
_
1 +
1
2
k
__
cos k
j
sin k
j
_
= lim
j
_
1 +
1
2
k
_
lim
j
_
cos
k
j
sin
k
j
_
=
_
cos
sin
_
.
Problem 2.6
We need to prove that isolated local min strict local min. Equiv-
alently, we prove the contrapositive: not a strict local min not an
isolated local min.
If x
is not a
strict local min, there is some other point x
A
^ such that f(x
) = f(x
A
).
Hence x
A
is also a local min of f in the neighborhood ^ that is dierent
from x
within which x
is a local min, x
< < 90
cos > 0.
p
k
f
|p
k
||f|
= cos > 0 p
k
f < 0.
f =
_
2(x
1
+x
2
2
)
4x
2
(x
1
+x
2
2
)
_
p
k
f
k
x=
0
@
1
0
1
A
=
_
1
1
_
_
2
0
_
= 2 < 0
which implies that p
k
is a descent direction.
p
k
=
_
1
1
_
, x =
_
1
0
_
f(x
k
+
k
p
k
) = f((1 , )
T
) = ((1 ) +
2
)
2
=
d
d
f(x
k
+
k
p
k
) = 2(1 +
2
)(1 + 2) = 0 only when =
1
2
.
It is seen that
d
2
d
2
f(x
k
+
k
p
k
)
=
1
2
= 6(2
2
2 + 1)
=
1
2
= 3 > 0, so
=
1
2
is indeed a minimizer.
Problem 2.10
Note rst that
x
j
=
n
i=1
S
ji
z
i
+s
j
.
11
By the chain rule we have
z
i
f(z) =
n
j=1
f
x
j
x
j
z
i
=
n
j=1
S
ji
f
x
j
=
_
S
T
f(x)
i
.
For the second derivatives, we apply the chain rule again:
2
z
i
z
k
f(z) =
z
k
n
j=1
S
ji
f(x)
x
j
=
n
j=1
n
l=1
S
ji
2
f(x)
x
j
x
l
x
l
z
k
S
lk
=
_
S
T
2
f(x)S
ki
.
Problem 2.13
x
= 0
|x
k+1
x
|
|x
k
x
|
=
k
k + 1
< 1 and
k
k + 1
1.
For any r (0, 1), k
0
such that k > k
0
,
k
k + 1
> r.
This implies x
k
is not Q-linearly convergent.
Problem 2.14
|x
k+1
x
|
|x
k
x
|
2
=
(0.5)
2
k+1
((0.5)
2
k
)
2
=
(0.5)
2
k+1
(0.5)
2
k+1
= 1 < .
Hence the sequence is Q-quadratic.
Problem 2.15
x
k
=
1
k!
x
= lim
n
x
k
= 0
|x
k+1
x
|
|x
k
x
|
=
k!
(k + 1)!
=
1
k + 1
k
0.
12
This implies x
k
is Q-superlinearly convergent.
|x
k+1
x
|
|x
k
x
|
2
=
k!k!
(k + 1)!
=
k!
k + 1
.
This implies x
k
is not Q-quadratic convergent.
Problem 2.16
For k even, we have
|x
k+1
x
|
|x
k
x
|
=
x
k
/k
x
k
=
1
k
0,
while for k odd we have
|x
k+1
x
|
|x
k
x
|
=
(1/4)
2
k
x
k1
/k
= k
(1/4)
2
k
(1/4)
2
k1
= k(1/4)
2
k1
0,
Hence we have
|x
k+1
x
|
|x
k
x
|
=0,
so the sequence is Q-superlinear. The sequence is not Q-quadratic because
for k even we have
|x
k+1
x
|
|x
k
x
|
2
=
x
k
/k
x
2
k
=
1
k
4
2
k
.
The sequence is however R-quadratic as it is majorized by the sequence
z
k
= (0.5)
2
k
, k = 1, 2, . . . . For even k, we obviously have
x
k
= (0.25)
2
k
< (0.5)
2
k
= z
k
,
while for k odd we have
x
k
< x
k1
= (0.25)
2
k1
= ((0.25)
1/2
)
2
k
= (0.5)
2
k
= z
k
.
A simple argument shows that z
k
is Q-quadratic.
13
3 Line Search Methods
Problem 3.2
Graphical solution
We show that if c
1
is allowed to be greater than c
2
, then we can nd a
function for which no steplengths > 0 satisfy the Wolfe conditions.
Consider the convex function depicted in Figure 2, and let us choose c
1
=
0.99.
sufficient decrease line
slope = -1
slope = -1/2
()
()
Figure 2: Convex function and sucient decrease line
We observe that the sucient decrease line intersects the function only once.
Moreover for all points to the left of the intersection, we have
t
()
1
2
.
Now suppose that we choose c
2
= 0.1 so that the curvature condition requires
t
() 0.1. (1)
Then there are clearly no steplengths satisfying the inequality (1) for which
the sucient decrease condition holds.
14
Problem 3.3
Suppose p is a descent direction and dene
() = f(x +p), 0.
Then any minimizer
of () satises
t
(
) = f(x +
p)
T
p = 0. (2)
A strongly convex quadratic function has the form
f(x) =
1
2
x
T
Qx +b
T
x, Q > 0,
and hence
f(x) = Qx +b. (3)
The one-dimensional minimizer is unique, and by Equation (2) satises
[Q(x +
p) +b]
T
p = 0.
Therefore
(Qx +b)
T
p +
p
T
Qp = 0
which together with Equation (3) gives
=
(Qx +b)
T
p
p
T
Qp
=
f(x)
T
p
p
T
Qp
.
Problem 3.4
Let f(x) =
1
2
x
T
Qx+b
T
x+d, with Q positive denite. Let x
k
be the current
iterate and p
k
a non-zero direction. Let 0 < c <
1
2
.
The one-dimensional minimizer along x
k
+ p
k
is (see the previous ex-
ercise)
k
=
f
T
k
p
k
p
T
k
Qp
k
Direct substitution then yields
f(x
k
) + (1 c)
k
f
T
k
p
k
= f(x
k
)
(f
T
k
p
k
)
2
p
T
k
Qp
k
+c
(f
T
k
p
k
)
2
p
T
k
Qp
k
15
Now, since f
k
= Qx
k
+b, after some algebra we get
f(x
k
+
k
p
k
) = f(x
k
)
(f
T
k
p
k
)
2
p
T
k
Qp
k
+
1
2
(f
T
k
p
k
)
2
p
T
k
Qp
k
,
from which the rst inequality in the Goldstein conditions is evident. For
the second inequality, we reduce similar terms in the previous expression to
get
f(x
k
+
k
p
k
) = f(x
k
)
1
2
(f
T
k
p
k
)
2
p
T
k
Qp
k
,
which is smaller than
f(x
k
) +c
k
f
T
k
p
k
= f(x
k
) c
(f
T
k
p
k
)
2
p
T
k
Qp
k
.
Hence the Goldstein conditions are satised.
Problem 3.5
First we have from (A.7)
|x| = |B
1
Bx| |B
1
| |Bx|,
Therefore
|Bx| |x|/|B
1
|
for any nonsingular matrix B.
For symmetric and positive denite matrix B, we have that the matrices
B
1/2
and B
1/2
exist and that |B
1/2
| = |B|
1/2
and |B
1/2
| = |B
1
|
1/2
.
Thus, we have
cos =
f
T
p
|f| |p|
=
p
T
Bp
|Bp| |p|
p
T
Bp
|B| |p|
2
=
p
T
B
1/2
B
1/2
p
|B| |p|
2
=
|B
1/2
p|
2
|B| |p|
2
|p|
2
|B
1/2
|
2
|B| |p|
2
=
1
|B
1
| |B|
1
M
.
16
We can actually prove the stronger result that cos 1/M
1/2
. Dening
p = B
1/2
p = B
1/2
f, we have
cos =
p
T
Bp
|f| |p|
=
p
T
p
|B
1/2
p| |B
1/2
p|
=
| p|
2
|B
1/2
| | p| |B
1/2
| | p|
=
1
|B
1/2
| |B
1/2
|
1
M
1/2
.
Problem 3.6
If x
0
x
+Qx
b
= Q(x
0
x
) +f(x
)
= (x
0
x
)
for the corresponding eigenvalue . From here, it is easy to get
f
T
0
f
0
=
2
(x
0
x
)
T
(x
0
x
),
f
T
0
Qf
0
=
3
(x
0
x
)
T
(x
0
x
),
f
T
0
Q
1
f
0
= (x
0
x
)
T
(x
0
x
).
Direct substitution in equation (3.28) yields
|x
1
x
|
2
Q
= 0 or x
1
= x
.
Therefore the steepest descent method will nd the solution in one step.
Problem 3.7
We drop subscripts on f(x
k
) for simplicity. We have
x
k+1
= x
k
f,
so that
x
k+1
x
= x
k
x
f,
By the denition of | |
2
Q
, we have
|x
k+1
x
|
2
Q
= (x
k+1
x
)
T
Q(x
k+1
x
)
= (x
k
x
f)
T
Q(x
k
x
f)
= (x
k
x
)
T
Q(x
k
x
) 2f
T
Q(x
k
x
) +
2
f
T
Qf
= |x
k
x
|
2
Q
2f
T
Q(x
k
x
) +
2
f
T
Qf
17
Hence, by substituting f = Q(x
k
x
) and = f
T
f/(f
T
Qf), we
obtain
|x
k+1
x
|
2
Q
= |x
k
x
|
2
Q
2f
T
f +
2
f
T
Qf
= |x
k
x
|
2
Q
2(f
T
f)
2
/(f
T
Qf) + (f
T
f)
2
/(f
T
Qf)
= |x
k
x
|
2
Q
(f
T
f)
2
/(f
T
Qf)
= |x
k
x
|
2
Q
_
1
(f
T
f)
2
(f
T
Qf)|x
k
x
|
2
Q
_
= |x
k
x
|
2
Q
_
1
(f
T
f)
2
(f
T
Qf)(f
T
Q
1
f)
_
,
where we used
|x
k
x
|
2
Q
= f
T
Q
1
f
for the nal equality.
Problem 3.8
We know that there exists an orthogonal matrix P such that
P
T
QP = = diag
1
,
2
, ,
n
.
So
P
T
Q
1
P = (P
T
QP)
1
=
1
.
Let z = P
1
x, then
(x
T
x)
2
(x
T
Qx)(x
T
Q
1
x)
=
(z
T
z)
2
(z
T
z)(z
T
1
z)
=
(
i
z
2
i
)
2
(
i
z
2
i
)(
1
i
z
2
i
)
=
1
P
i
i
z
2
i
P
i
z
2
i
P
i
1
i
z
2
i
P
i
z
2
i
.
Let u
i
= z
2
i
/
i
z
2
i
, then all u
i
satisfy 0 u
i
1 and
i
u
i
= 1. Therefore
(x
T
x)
2
(x
T
Qx)(x
T
Q
1
x)
=
1
(
i
u
i
i
)(
i
u
i
1
i
)
=
(u)
(u)
, (4)
where (u) =
1
P
i
u
i
i
and (u) =
i
u
i
1
i
.
Dene function f() =
1
, and let
=
i
u
i
i
. Note that
[
1
,
n
].
Then
(u) =
1
i
u
i
i
= f(
). (5)
18
Let h() be the linear function tting the data (
1
,
1
1
) and (
n
,
1
n
). We
know that
h() =
1
n
+
1
1
n
1
(
n
).
Because f is convex, we know that f() h() holds for all [
1
,
n
].
Thus
() =
i
u
i
f(
i
)
i
u
i
h(
i
) = h(
i
u
i
i
) = h(
). (6)
Combining (4), (5) and (6), we have
(x
T
x)
2
(x
T
Qx)(x
T
Q
1
x)
=
(u)
(u)
f(
)
h(
)
min
1
n
f()
h()
(since
[
1
,
n
])
= min
1
n
1
1
n
+
n
1
n
=
1
n
min
1
n
1
(
1
+n)
=
1
1
+n
2
(
1
+n
1
+n
2
)
(since the minimum happens at d =
1
+n
2
)
=
4
1
n
(
1
+n)
2
.
This completes the proof of the Kantorovich inequality.
Problem 3.13
Let
q
() = a
2
+b+c. We get a, b and c from the interpolation conditions
q
(0) = (0) c = (0),
t
q
(0) =
t
(0) b =
t
(0),
q
(
0
) = (
0
) a = ((
0
) (0)
t
(0)
0
)/
2
0
.
This gives (3.57). The fact that
0
does not satisfy the sucient decrease
condition implies
0 < (
0
) (0) c
1
t
(0)
0
< (
0
) (0)
t
(0)
0
,
where the second inequality holds because c
1
< 1 and
t
(0) < 0. From here,
clearly, a > 0. Hence,
q
is convex, with minimizer at
1
=
t
(0)
2
0
2 [(
0
) (0)
t
(0)
0
]
.
19
Now, note that
0 < (c
1
1)
t
(0)
0
= (0) +c
1
t
(0)
0
(0)
t
(0)
0
< (
0
) (0)
t
(0)
0
,
where the last inequality follows from the violation of sucient decrease at
0
. Using these relations, we get
1
<
t
(0)
2
0
2(c
1
1)
t
(0)
0
=
0
2(1 c
1
)
.
4 Trust-Region Methods
Problem 4.4
Since liminf |g
k
| = 0, we have by denition of the liminf that v
i
0,
where the scalar nondecreasing sequence v
i
is dened by v
i
= inf
ki
|g
k
|.
In fact, since v
i
is nonnegative and nondecreasing and v
i
0, we must
have v
i
= 0 for all i, that is,
inf
ki
|g
k
| = 0, for all i.
Hence, for any i = 1, 2, . . . , we can identify an index j
i
i such that
|g
j
i
| 1/i, so that
lim
i
|g
j
i
| = 0.
By eliminating repeated entries from j
i
i=1
, we obtain an (innite) subse-
quence o of such that lim
iS
|g
i
| = 0. Moreover, since the iterates x
i
iS
are all conned to the bounded set B, we can choose a further subsequence
o such that
lim
i
S
x
i
= x
,
for some limit point x
)| = 0, so
g(x
) = 0, so we are done.
Problem 4.5
Note rst that the scalar function of that we are trying to minimize is
()
def
= m
k
(p
S
k
) = m
k
(
k
g
k
/|g
k
|) = f
k
k
|g
k
|+
1
2
2
k
g
T
k
B
k
g
k
/|g
k
|
2
,
20
while the condition |p
S
k
|
k
and the denition p
S
k
=
k
g
k
/|g
k
| to-
gether imply that the restriction on the scalar is that [1, 1].
In the trivial case g
k
= 0, the function is a constant, so any value will
serve as the minimizer; the value = 1 given by (4.12) will suce.
Otherwise, if g
T
k
B
k
g
k
= 0, is a linear decreasing function of , so its
minimizer is achieved at the largest allowable value of , which is = 1, as
given in (4.12).
If g
T
k
B
k
g
k
,= 0, has a parabolic shape with critical point
=
k
|g
k
|
2
k
g
T
k
B
k
g
k
/|g
k
|
2
=
|g
k
|
3
k
g
T
k
B
k
g
k
.
If g
T
k
B
k
g
k
0, this value of is negative and is a maximizer. Hence, the
minimizing value of on the interval [1, 1] is at one of the endpoints of
the interval. Clearly (1) < (1), so the solution in this case is = 1, as
in (4.12).
When g
T
k
B
k
g
k
0, the value of above is positive, and is a minimizer
of . If this value exceeds 1, then must be decreasing across the interval
[1, 1], so achieves its minimizer at = 1, as in (4.12). Otherwise, (4.12)
correctly identies the formula above as yielding the minimizer of .
Problem 4.6
Because |g|
2
= g
T
g, it is sucient to show that
(g
T
g)(g
T
g) (g
T
Bg)(g
T
B
1
g). (7)
We know from the positive deniteness of B that g
T
Bg > 0, g
T
B
1
g > 0,
and there exists nonsingular square matrix L such that B = LL
T
, and thus
B
1
= L
T
L
1
. Dene u = L
T
g and v = L
1
g, and we have
u
T
v = (g
T
L)(L
1
g) = g
T
g.
The Cauchy-Schwarz inequality gives
(g
T
g)(g
T
g) = (u
T
v)
2
(u
T
u)(v
T
v) = (g
T
LL
T
g)(g
T
L
T
L
1
g) = (g
T
Bg)(g
T
B
1
g).
(8)
Therefore (7) is proved, indicating
=
|g|
4
(g
T
Bg)(g
T
B
1
g)
1. (9)
21
The equality in (8) holds only when L
T
g and L
1
g are parallel. That is,
when there exists constant ,= 0 such that L
T
g = L
1
g. This clearly
implies that g = LL
T
g = Bg,
1
g = L
T
L
1
g = B
1
g, and hence the
equality in (9) holds only when g, Bg and B
1
g are parallel.
Problem 4.8
On one hand,
2
() =
1
1
|p()|
and (4.39) gives
t
2
() =
d
d
1
|p()|
=
d
d
(|p()|
2
)
1/2
=
1
2
(|p()|
2
)
3/2
d
d
(|p()|
2
)
=
1
2
|p()|
3
d
d
n
j=1
(q
T
j
g)
2
(
j
+)
2
= |p()|
3
n
j=1
(q
T
j
g)
2
(
j
+)
3
where q
j
is the j-th column of Q. This further implies
2
()
t
2
()
=
|p()|
1
|p()|
|p()|
3
n
j=1
(q
T
j
g)
2
(
j
+)
3
=
|p()|
2
|p()|
n
j=1
(q
T
j
g)
2
(
j
+)
3
. (10)
On the other hand, we have from Algorithm 4.3 that q = R
T
p and R
1
R
T
=
(B +I)
1
. Hence (4.38) and the orthonormality of q
1
, q
2
, . . . , q
n
give
|q|
2
= p
T
(R
1
R
T
)p = p
T
(B +I)
1
p = p
T
n
j=1
q
T
j
q
j
j
+
p
=
_
_
n
j=1
q
T
j
g
j
+
q
T
j
_
_
_
_
n
j=1
q
T
j
q
j
j
+
_
_
_
_
n
j=1
q
T
j
g
j
+
q
j
_
_
=
n
j=1
(q
T
j
g)
2
(
j
+)
3
. (11)
Substitute (11) into (10), then we have that
2
()
t
2
()
=
|p|
2
|q|
2
|p|
. (12)
Therefore (4.43) and (12) give (in the l-th iteration of Algorithm 4.3)
(l+1)
=
(l)
+
|p
l
|
2
|q
l
|
2
|p
l
|
=
(l)
+
_
|p
l
|
|q
l
|
_
2
_
|p
l
|
_
.
This is exactly (4.44).
22
Problem 4.10
Since B is symmetric, there exist an orthogonal matrix Q and a diagonal
matrix such that B = QQ
T
, where = diag
1
,
2
, . . . ,
n
and
1
2
. . .
n
are the eigenvalues of B. Now we consider two cases:
(a) If
1
> 0, then all the eigenvalues of B are positive and thus B is
positive denite. In this case B +I is positive denite for = 0.
(b) If
1
0, we choose =
1
+ > 0 where > 0 is any xed
real number. Since
1
is the most negative eigenvalue of B, we know that
i
+ > 0 holds for all i = 1, 2, . . . , n. Note that B+I = Q(+I)Q
T
,
and therefore 0 <
1
+
2
+ . . .
n
+ are the eigenvalues of B+I.
Thus B +I is positive denite for this choice of .
5 Conjugate Gradient Methods
Problem 5.2
Suppose that p
0
, . . . , p
l
are conjugate. Let us express one of them, say p
i
,
as a linear combination of the others:
p
i
=
0
p
0
+ +
l
p
l
(13)
for some coecients
k
(k = 0, 1, . . . , l). Note that the sum does not include
p
i
. Then from conjugacy, we have
0 = p
T
0
Ap
i
=
0
p
T
0
Ap
0
+ +
l
p
T
0
Ap
l
=
0
p
T
0
Ap
0
.
This implies that
0
= 0 since the vectors p
0
, . . . , p
l
are assumed to be
conjugate and A is positive denite. The same argument is used to show
that all the scaler coecients
k
(k = 0, 1, . . . , l) in (13) are zero. Equation
(13) indicates that p
i
= 0, which contradicts the fact that p
i
is a nonzero
vector. The contradiction then shows that vectors p
0
, . . . , p
l
are linearly
independent.
Problem 5.3
Let
g() = (x
k
+p
k
)
=
1
2
2
p
T
k
Ap
k
+(Ax
k
b)
T
p
k
+(x
k
).
23
Matrix A is positive denite, so
k
is the minimizer of g() if g
t
(
k
) = 0.
Hence, we get
g
t
(
k
) =
k
p
T
k
Ap
k
+ (Ax
k
b)
T
p
k
= 0,
or
k
=
(Ax
k
b)
T
p
k
p
T
k
Ap
k
=
r
T
k
p
k
p
T
k
Ap
k
.
Problem 5.4
To see that h() = f(x
0
+
0
p
0
+ +
k1
p
k1
) is a quadratic, note that
0
p
0
+ +
k1
p
k1
= P
where P is the n k matrix whose columns are the n 1 vectors p
i
, i.e.
P =
_
_
[ . . . [
p
0
. . . p
k1
[ . . . [
_
_
and is the k 1 matrix
=
_
0
k1
T
.
Therefore
h() =
1
2
(x
0
+P)
T
A(x
0
+P) +b
T
(x
0
+P)
=
1
2
x
T
0
Ax
0
+x
T
0
AP +
1
2
T
P
T
AP +b
T
x
0
+ (b
T
P)
=
1
2
x
T
0
Ax
0
+b
T
x
0
+ [P
T
A
T
x
0
+P
T
b]
T
+
1
2
T
(P
T
AP)
= C +
b
T
+
1
2
T
A
where
C =
1
2
x
T
0
Ax
0
+b
T
x
0
,
b = P
T
A
T
x
0
+P
T
b and
A = P
T
AP.
If the vectors p
0
p
k1
are linearly independent, then P has full column
rank, which implies that
A = P
T
AP
is positive denite. This shows that h() is a strictly convex quadratic.
24
Problem 5.5
We want to show
span r
0
, r
1
= span r
0
, Ar
0
= span p
0
, p
1
. (14)
From the CG iteration (5.14) and p
0
= r
0
we know
r
1
= Ax
1
b = A(x
0
+
0
p
0
) b = (Ax
0
b)
0
Ar
0
= r
0
0
Ar
0
. (15)
This indicates r
1
span r
0
, Ar
0
and furthermore
span r
0
, r
1
span r
0
, Ar
0
. (16)
Equation (15) also gives
Ar
0
=
1
0
(r
0
r
1
) =
1
0
r
0
0
r
1
.
This shows Ar
0
span r
0
, r
1
and furthermore
span r
0
, r
1
span r
0
, Ar
0
. (17)
We conclude from (16) and (17) that spanr
0
, r
1
= span r
0
, Ar
0
.
Similarly, from (5.14) and p
0
= r
0
, we have
p
1
= r
1
+
1
p
0
=
1
r
0
r
1
or r
1
=
1
p
0
p
1
.
Then span r
0
, r
1
span p
0
, p
1
, and span r
0
, r
1
span p
0
, p
1
. So
span r
0
, r
1
= span p
0
, p
1
. This completes the proof.
Problem 5.6
By the denition of r, we have that
r
k+1
= Ax
k+1
b = A(x
k
+
k
p
k
) b
= A
k
x
k
+
k
Ap
k
b = r
k
+
k
Ap
k
.
Therefore
Ap
k
=
1
k
(r
k+1
r
k
). (18)
Then we have
p
T
k
Ap
k
= p
T
k
(
1
k
(r
k+1
r
k
)) =
1
k
p
T
k
r
k+1
k
p
T
k
r
k
.
25
The expanding subspace minimization property of CG indicates that p
T
k
r
k+1
=
p
T
k1
r
k
= 0, and we know p
k
= r
k
+
k
p
k1
, so
p
T
k
Ap
k
=
1
k
(r
T
k
+
k
p
T
k1
)r
k
=
1
k
r
T
k
r
k
k
p
T
k1
r
k
=
1
k
r
T
k
r
k
. (19)
Equation (18) also gives
r
T
k+1
Ap
k
= r
T
k+1
(
1
k
(r
k+1
r
k
))
=
1
k
r
T
k+1
r
k+1
k
r
T
k+1
r
k
=
1
k
r
T
k+1
r
k+1
k
r
T
k+1
(p
k
+
k
p
k1
)
=
1
r
T
k+1
r
k+1
+
1
r
T
k+1
p
k
k
r
T
k+1
p
k1
=
1
r
T
k+1
r
k+1
.
This equation, together with (19) and (5.14d), gives that
k+1
=
r
T
k+1
Ap
k
p
T
k
Ap
k
=
1
r
T
k+1
r
k+1
1
k
r
T
k
r
k
=
r
T
k+1
r
k+1
r
T
k
r
k
.
Thus (5.24d) is equivalent to (5.14d).
Problem 5.9
Minimize
( x) =
1
2
x
T
(C
T
AC
1
) x(C
T
b)
T
x solve (C
T
AC
1
) x =
C
T
b. Apply CG to the transformed problem:
r
0
=
A x
0
b = (C
T
AC
1
)Cx
0
C
T
b = C
T
(Ax
0
b) = C
T
r
0
.
_
p
0
= r
0
= C
T
r
0
My
0
= r
0
_
= p
0
= C
T
(My
0
) = C
T
C
T
Cy
0
= Cy
0
.
26
=
0
=
r
T
0
r
0
p
T
0
A p
0
=
r
T
0
C
1
C
T
r
0
y
T
0
C
T
C
T
AC
1
Cy
0
=
r
T
0
M
1
r
0
y
T
0
Ay
0
=
r
T
0
y
0
p
T
0
Ay
0
=
0
.
x
1
= x
0
+
0
p
0
Cx
1
= Cx
0
+
r
T
0
y
0
p
T
0
Ay
0
(Cy
0
)
= x
1
= x
0
r
T
0
y
0
p
T
0
Ay
0
y
0
= x
0
+
0
p
0
r
1
= r
0
+
0
A p
0
C
T
r
1
= C
T
r
0
+
r
T
0
y
0
p
T
0
Ay
0
C
T
AC
1
(Cy
0
)
= r
1
= r
0
+
r
T
0
y
0
p
T
0
Ay
0
A(y
0
) = r
0
+
0
Ap
0
1
=
r
T
1
r
1
r
T
0
r
0
=
r
T
1
C
1
C
T
r
1
r
T
0
C
1
C
T
r
0
=
r
T
1
M
1
r
1
r
T
0
M
1
r
0
=
r
T
1
y
1
r
T
0
y
0
=
1
p
1
= r
1
+
1
p
0
Cy
1
= C
T
r
1
+
1
(Cy
0
)
= y
1
= M
1
r
1
+
1
y
0
p
1
= y
1
+
1
p
0
( because p
1
= Cp
1
).
By comparing the formulas above with Algorithm 5.3, we can see that
by applying CG to the problem with the new variables, then transforming
back into original variables, the derived algorithm is the same as Algorithm
5.3 for k = 0. Clearly, the same argument can be used for any k; the key is
to notice the relationships:
_
_
x
k
= Cx
k
p
k
= Cp
k
r
k
= C
T
r
k
_
_
.
Problem 5.10
From the solution of Problem 5.9 it is seen that r
i
= C
T
r
i
and r
j
= C
T
r
j
.
Since the unpreconditioned CG algorithm is applied to the transformed
27
problem, by the orthogonality of the residuals we know that r
T
i
r
j
= 0 for
all i ,= j. Therefore
0 = r
T
i
r
j
= r
T
i
C
1
C
T
r
j
= r
T
i
M
1
r
j
.
Here the last equality holds because M
1
= (C
T
C)
1
= C
1
C
T
.
6 Quasi-Newton Methods
Problem 6.1
(a) A function f(x) is strongly convex if all eigenvalues of
2
f(x) are positive
and bounded away from zero. This implies that there exists > 0 such that
p
T
2
f(x)p |p|
2
for any p. (20)
By Taylors theorem, if x
k+1
= x
k
+
k
p
k
, then
f(x
k+1
) = f(x
k
) +
_
1
0
[
2
f(x
k
+z
k
p
k
)
k
p
k
]dz.
By (20) we have
k
p
T
k
y
k
=
k
p
T
k
[f(x
k+1
f(x
k
)]
=
2
k
_
1
0
_
p
T
k
2
f(x
k
+z
k
p
k
)p
k
dz
|p
k
|
2
2
k
> 0.
The result follows by noting that s
k
=
k
p
k
.
(b) For example, when f(x) =
1
x + 1
, we have g(x) =
1
(x + 1)
2
. Obviously
f(0) = 1, f(1) =
1
2
, g(0) = 1, g(1) =
1
4
.
So
s
T
y = (f(1) f(0)) (g(1) g(0)) =
3
8
< 0
and (6.7) does not hold in this case.
28
Problem 6.2
The second strong Wolfe condition is
f(x
k
+
k
p
k
)
T
p
k
c
2
f(x
k
)
T
p
k
which implies
f(x
k
+
k
p
k
)
T
p
k
c
2
f(x
k
)
T
p
k
= c
2
f(x
k
)
T
p
k
since p
k
is a descent direction. Thus
f(x
k
+
k
p
k
)
T
p
k
f(x
k
)
T
p
k
= (c
2
1)f(x
k
)
T
p
k
> 0
since we have assumed that c
2
< 1. The result follows by multiplying both
sides by
k
and noting s
k
=
k
p
k
, y
k
= f(x
k
+
k
p
k
) f(x
k
).
7 Large-Scale Unconstrained Optimization
Problem 7.2
Since s
k
,= 0, the product
H
k+1
s
k
=
_
I
s
k
y
T
k
y
T
k
s
k
_
s
k
= s
k
y
T
k
s
k
y
T
k
s
k
s
k
= 0
illustrates that
H
k+1
is singular.
29
Problem 7.3
We assume line searches are exact, so f
T
k+1
p
k
= 0. Also, recall s
k
=
k
p
k
.
Therefore,
p
k+1
= H
k+1
f
k+1
=
__
I
s
k
y
T
k
y
T
k
s
k
__
I
y
k
s
T
k
y
T
k
s
k
_
+
s
k
s
T
k
y
T
k
s
k
_
f
k+1
=
__
I
p
k
y
T
k
y
T
k
p
k
__
I
y
k
p
T
k
y
T
k
p
k
_
+
k
p
k
p
T
k
y
T
k
p
k
_
f
k+1
=
_
I
p
k
y
T
k
y
T
k
p
k
_
f
k+1
= f
k+1
+
f
T
k+1
y
k
y
T
k
p
k
p
k
,
as given.
Problem 7.5
For simplicity, we consider (x
3
x
4
) as an element function despite the fact
that it is easily separable. The function can be written as
f(x) =
3
i=1
i
(U
i
x)
where
i
(u
1
, u
2
, u
3
, u
4
) = u
2
u
3
e
u
1
+u
3
u
4
,
(v
1
, v
2
) = (v
1
v
2
)
2
,
(w
1
, w
2
) = w
1
w
2
,
and
U
1
= I,
U
2
=
_
0 1 0 0
0 0 1 0
_
,
U
3
=
_
0 0 1 0
0 0 0 1
_
.
30
Problem 7.6
We nd
Bs =
_
ne
i=1
U
T
i
B
[i]
U
i
_
s
=
ne
i=1
U
T
i
B
[i]
s
[i]
=
ne
i=1
U
T
i
y
[i]
= y,
so the secant equation is indeed satised.
8 Calculating Derivatives
Problem 8.1
Supposing that L
c
is the constant in the central dierence formula, that is,
f
x
i
_
f(x +e
i
) f(x e
i
)
2
_
L
c
2
,
and assuming as in the analysis of the forward dierence formula that
[comp(f(x +e
i
)) f(x +e
i
))[ L
f
u,
[comp(f(x e
i
)) f(x e
i
))[ L
f
u,
the total error in the central dierence formula is bounded by
L
c
2
+
2uL
f
2
.
By dierentiating with respect to , we nd that the minimizer is at
=
_
L
f
u
2L
c
_
1/3
,
so when the ratio L
f
/L
c
is reasonable, the choice = u
1/3
is a good one.
By substituting this value into the error expression above, we nd that both
terms are multiples of u
2/3
, as claimed.
31
1 2
4
3
6
5
Figure 3: Adjacency Graph for Problem 8.6
Problem 8.6
See the adjacency graph in Figure 3.
Four colors are required; the nodes corresponding to these colors are 1,
2, 3, 4, 5, 6.
Problem 8.7
We start with
x
1
=
_
_
1
0
0
_
_
, x
2
=
_
_
0
1
0
_
_
, x
3
=
_
_
0
0
1
_
_
.
32
By applying the chain rule, we obtain
x
4
= x
1
x
2
+x
2
x
1
=
_
_
x
2
x
1
0
_
_
,
x
5
= (cos x
3
)x
3
=
_
_
0
0
cos x
3
_
_
,
x
6
= e
x
4
x
4
= e
x
1
x
2
_
_
x
2
x
1
0
_
_
,
x
7
= x
4
x
5
+x
5
x
4
=
_
_
x
2
sin x
3
x
1
sin x
3
x
1
x
2
cos x
3
_
_
,
x
8
= x
6
+x
7
= e
x
1
x
2
_
_
x
2
x
1
0
_
_
+
_
_
x
2
sin x
3
x
1
sin x
3
x
1
x
2
cos x
3
_
_
,
x
9
=
1
x
3
x
8
x
8
x
2
3
x
3
.
9 Derivative-Free Optimization
Problem 9.3
The interpolation conditions take the form
( s
l
)
T
g = f(y
l
) f(x
k
) l = 1, . . . , q 1, (21)
where
s
l
_
(s
l
)
T
, s
l
i
s
l
j
i<j
,
_
1
2
(s
l
i
)
2
__
T
l = 1, . . . , m1,
and s
l
is dened by (9.13). The model (9.14) is uniquely determined if and
only if the system (21) has a unique solution, or equivalently, if and only if
the set s
l
: l = 1, . . . , q 1 is linearly independent.
Problem 9.10
It suces to show that for any v, we have max
j=1,2,...,n+1
v
T
d
j
(1/4n)|v|
1
.
Consider rst the case of v 0, that is, all components of v are nonnegative.
33
We then have
max
j=1,2,...,n+1
v
T
d
j
v
T
d
n+1
1
2n
e
T
v =
1
2n
|v|
1
.
Otherwise, let i be the index of the most negative component of v. We have
that
|v|
1
=
v
j
<0
v
j
+
v
j
0
v
j
n[v
i
[ +
v
j
0
v
j
.
We consider two cases. In the rst case, suppose that
[v
i
[
1
2n
v
j
0
v
j
.
In this case, we have from the inequality above that
|v|
1
n[v
i
[ + (2n)[v
i
[ = (3n)[v
i
[,
so that
max
dT
k
d
T
v d
T
i
v
= (1 1/2n)[v
i
[ + (1/2n)
j,=i
v
j
(1 1/2n)[v
i
[ (1/2n)
j,=i,v
j
<0
v
j
(1 1/2n)[v
i
[ (1/2n)n[v
i
[
(1/2 1/2n)[v
i
[
(1/4)[v
i
[
(1/12n)|v|
1
,
which is sucient to prove the desired result. We now consider the second
case, for which
[v
i
[ <
1
2n
v
j
0
v
j
.
We have here that
|v|
1
n
1
2n
v
j
0
v
j
+
v
j
0
v
j
3
2
v
j
0
v
j
,
34
so that
max
dT
k
d
T
v d
T
n+1
v
=
1
2n
v
j
0
v
j
+
1
2n
v
j
0
v
j
1
2n
n[v
i
[ +
1
2n
v
j
0
v
j
=
1
2
[v
i
[ +
1
2n
v
j
0
v
j
1
4n
v
j
0
v
j
+
1
2n
v
j
0
v
j
=
1
4n
v
j
0
v
j
1
6n
|v|
1
.
which again suces.
10 Least-Squares Problems
Problem 10.1
Recall:
(i) J has full column rank is equivalent to Jx = 0 x = 0;
(ii) J
T
J is nonsingular is equivalent to J
T
Jx = 0 x = 0;
(iii) J
T
J is positive denite is equivalent to x
T
J
T
Jx 0(x) and
x
T
J
T
Jx = 0 x = 0.
(a) We want to show (i) (ii).
(i) (ii). J
T
Jx = 0 x
T
J
T
Jx = 0 |Jx|
2
2
= 0 Jx = 0
(by (i)) x = 0.
(ii) (i). Jx = 0 J
T
Jx = 0 (by (ii)) x = 0.
(b) We want to show (i) (iii).
(i) (iii). x
T
J
T
Jx = |Jx|
2
2
0(x) is obvious. x
T
J
T
Jx = 0
|Jx|
2
2
= 0 Jx = 0 (by (i)) x = 0.
35
(iii) (i). Jx = 0 x
T
J
T
Jx = |Jx|
2
2
= 0 (by (iii)) x = 0.
Problem 10.3
(a) Let Q be a nn orthogonal matrix and x be any given n-vector. Dene
q
i
(i = 1, 2, , n) to be the i-th column of Q. We know that
q
T
i
q
j
=
_
|q
i
|
2
= 1 (if i = j)
0 (if i ,= j).
(22)
Then
|Qx|
2
= (Qx)
T
(Qx)
= (x
1
q
1
+x
2
q
2
+ +x
n
q
n
)
T
(x
1
q
1
+x
2
q
2
+ +x
n
q
n
)
=
n
i=1
n
j=1
x
i
x
j
q
T
i
q
j
(by (22))
=
n
i=1
x
2
i
= |x|
2
.
(b) If = I, then J
T
J = (Q
1
R)
T
(Q
1
R) = R
T
R. We know that the
Cholesky decomposition is unique if the diagonal elements of the upper
triangular matrix are positive, so
R = R.
Problem 10.4
(a) It is easy to see from (10.19) that
J =
n
i=1
i
u
i
v
T
i
=
i:
i
,=0
i
u
i
v
T
i
.
Since the objective function f(x) dened by (10.13) is convex, it suces to
show that f(x
) = 0, where x
) = J
T
(Jx
y)
= J
T
_
_
i:
i
,=0
i
u
i
v
T
i
_
_
i:
i
,=0
u
T
i
y
i
v
i
+
i:
i
=0
i
v
i
_
_
y
_
_
= J
T
_
_
i:
i
,=0
i
(u
T
i
y)u
i
(v
T
i
v
i
) y
_
_
=
_
_
i:
i
,=0
i
v
i
u
T
i
_
_
_
_
i:
i
,=0
(u
T
i
y)u
i
y
_
_
=
i:
i
,=0
i
(u
T
i
y)v
i
(u
T
i
u
i
)
i:
i
,=0
i
v
i
(u
T
i
y)
=
i:
i
,=0
i
(u
T
i
y)v
i
i:
i
,=0
i
v
i
(u
T
i
y) = 0.
(b) If J is rank-decient, we have
x
i:
i
,=0
u
T
i
y
i
v
i
+
i:
i
=0
i
v
i
.
Then
|x
|
2
2
=
i:
i
,=0
_
u
T
i
y
i
_
2
+
i:
i
=0
2
i
,
which is minimized when
i
= 0 for all i with
i
= 0.
37
Problem 10.5
For the Jacobian, we get the same Lipschitz constant:
|J(x
1
) J(x
2
)|
= max
|u|=1
|(J(x
1
) J(x
2
))u|
= max
|u|=1
_
_
_
_
_
_
_
_
_
_
(r
1
(x
1
) r
1
(x
2
))
T
u
.
.
.
(r
m
(x
1
) r
m
(x
2
))
T
u
_
_
_
_
_
_
_
_
_
_
max
|u|=1
max
j=1,...,m
[(r
j
(x
1
) r
j
(x
2
))
T
u[
max
|u|=1
max
j=1,...,m
|r
j
(x
1
) r
j
(x
2
)||u|[cos(r
j
(x
1
) r
j
(x
2
), u)[
L|x
1
x
2
|.
For the gradient, we get
L = L(L
1
+L
2
), with L
1
= max
xT
|r(x)|
1
and
L
2
= max
xT
m
j=1
|r
j
(x)|:
|f(x
1
) f(x
2
)|
=
_
_
_
_
_
_
m
j=1
r
j
(x
1
)r
j
(x
1
)
m
j=1
r
j
(x
2
)r
j
(x
2
)
_
_
_
_
_
_
=
_
_
_
_
_
_
m
j=1
(r
j
(x
1
) r
j
(x
2
))r
j
(x
1
) +
m
j=1
r
j
(x
2
)(r
j
(x
1
) r
j
(x
2
))
_
_
_
_
_
_
j=1
|r
j
(x
1
) r
j
(x
2
)| [r
j
(x
1
)[ +
m
j=1
|r
j
(x
2
)| [r
j
(x
1
) r
j
(x
2
)[
L|x
1
x
2
|
m
j=1
[r
j
(x
1
)[ +L|x
1
x
2
|
m
j=1
|r
j
(x
2
)|
L|x
1
x
2
|.
38
Problem 10.6
If J = U
1
SV
T
, then (J
T
J +I) = V (S
2
+I)V
T
. From here,
p
LM
= V (S
2
+I)
1
SU
T
1
r
=
n
i=1
2
i
+
(u
T
i
r)v
i
=
i:
i
,=0
2
i
+
(u
T
i
r)v
i
.
Thus,
|p
LM
|
2
=
i:
i
,=0
_
i
2
i
+
(u
T
i
r)v
i
_
2
,
and
lim
0
p
LM
=
i:
i
,=0
u
T
i
r
i
v
i
.
11 Nonlinear Equations
Problem 11.1
Note s
T
s = |s|
2
2
is a scalar, so it sucies to show that |ss
T
| = |s|
2
2
. By
denition,
|ss
T
| = max
|x|
2
=1
|(ss
T
)x|
2
.
Matrix multiplication is associative, so (ss
T
)x = s(s
T
x), and s
T
x is a scalar.
Hence,
max
|x|
2
=1
|s(s
T
x)|
2
= max
|x|
2
=1
[s
T
x[|s|
2
.
Last,
[s
T
x[ = [|s|
2
|x|
2
cos
s,x
[ = |s|
2
[ cos
s,x
[,
which is maximized when [ cos
s,x
[ = 1. Therefore,
max
|x|
2
=1
[s
T
x[ = |s|
2
,
which yields the result.
39
Problem 11.2
Starting at x
0
,= 0, we have r
t
(x
0
) = qx
q1
0
. Hence,
x
1
= x
0
x
q
0
qx
q1
0
=
_
1
1
q
_
x
0
.
A straghtforward induction yields
x
k
=
_
1
1
q
_
k
x
0
,
which certainly converges to 0 as k . Moreover,
x
k+1
x
k
= 1
1
q
,
so the sequence converges Q-linearly to 0, with convergence ratio 1 1/q.
Problem 11.3
For this function, Newtons method has the form:
x
k+1
= x
k
r(x)
r
t
(x)
= x
k
x
5
+x
3
+ 4x
5x
4
+ 3x
2
+ 4
.
Starting at x
0
= 1, we nd
x
1
= x
0
x
5
0
+x
3
0
+ 4x
0
5x
4
0
+ 3x
2
0
+ 4
= 1
4
2
= 1,
x
2
= x
1
x
5
1
+x
3
1
+ 4x
1
5x
4
1
+ 3x
2
1
+ 4
= 1 +
4
2
= 1,
x
3
= 1,
.
.
.
.
.
.
.
.
.
as described.
A trivial root of r(x) is x = 0, i.e.,
r(x) = (x 0)(x
4
x
2
4).
The remaining roots can be found by noticing that f(x) = x
4
x
2
4 is
quadratic in y = x
2
. According to the quadratic equation, we have the roots
y =
1
17
2
= x
2
x =
17
2
.
40
As a result,
r(x) = (x)
0
@
x
s
1
17
2
1
A
0
@
x
s
1 +
17
2
1
A
0
@
x +
s
1
17
2
1
A
0
@
x +
s
1 +
17
2
1
A
.
Problem 11.4
The sum-of-squares merit function is in this case
f(x) =
1
2
(sin(5x) x)
2
.
Moreover, we nd
f
t
(x) = (sin(5x) x) (5 cos(5x) 1) ,
f
tt
(x) = 25 sin(5x) (sin(5x) x) + (5 cos(5x) 1)
2
.
The merit function has local minima at the roots of r, which as previously
mentioned are found at approximately x S = 0.519148, 0, 0.519148.
Furthermore, there may be local minima at points where the Jacobian is
singular, i.e., x such that J(x) = 5 cos(5x) 1 = 0. All together, there are
an innite number of local minima described by
x
S
_
x [ 5 cos(5x) = 1 f
tt
(x) 0
_
.
Problem 11.5
First, if J
T
r = 0, then () = 0 for all .
Suppose J
T
r ,= 0. Let the singular value decomposition of J '
mn
be
J = USV
where U '
mn
and V '
nn
are orthogonal. We nd (let z = S
T
U
T
r):
() = |(J
T
J +I)
1
J
T
r|
= |(V
T
S
T
U
T
USV +V
T
V )
1
V
T
z|
= |(V
T
(S
T
S +I)V )
1
V
T
z|
= |V
T
(S
T
S +I)
1
V V
T
z| (sinceV
1
= V
T
)
= |V
T
(S
T
S +I)
1
z|
= |(S
T
S +I)
1
z| (sinceV
T
is orthogonal)
= |(D())
1
z|
41
where D() is a diagonal matrix having
[D()]
ii
=
_
2
i
+, i = 1, . . . , min(m, n)
, i = min(m, n) + 1, . . . , max(m, n).
Each entry of y() = (D())
1
z is of the form
y
i
() =
z
i
[D()]
ii
.
Therefore, [y
i
(
1
)[ < [y
i
(
2
)[ for
1
>
2
> 0 and i = 1, . . . , n, which implies
(
1
) < (
2
) for
1
>
2
> 0.
Problem 11.8
Notice that
JJ
T
r = 0 r
T
JJ
T
r = 0.
If v = J
T
r, then the above implies
r
T
JJ
T
r = v
T
v = |v|
2
= 0
which must mean v = J
T
r = 0.
Problem 11.10
The homotopy map expands to
H(x, ) =
_
x
2
1
_
+ (1 )(x a)
= x
2
+ (1 )x
1
2
(1 +).
For a given , the quadratic formula yields the following roots for the above:
x =
1
_
(1 )
2
+ 2(1 +)
2
=
1
1 + 3
2
2
.
By choosing the positive root, we nd that the zero path dened by
_
= 0 x = 1/2,
(0, 1] x =
1+
1+3
2
2
,
connects (
1
2
, 0) to (1, 1), so continuation methods should work for this choice
of starting point.
42
12 Theory of Constrained Optimization
Problem 12.4
First, we show that local solutions to problem 12.3 are also global solutions.
Take any local solution to problem 12.3, denoted by x
0
. This means that
there exists a neighborhood N(x
0
) such that f(x
0
) f(x) holds for any
x N(x
0
) . The following proof is based on contradiction.
Suppose x
0
is not a global solution, then we take a global solution x ,
which satises f(x
0
) > f( x). Because is a convex set, there exists [0, 1]
such that x
0
+ (1 ) x N(x
0
) . Then the convexity of f(x) gives
f(x
0
+ (1 ) x) f(x
0
) + (1 )f( x)
< f(x
0
) + (1 )f(x
0
)
= f(x
0
),
which contradicts the fact that x
0
is the minimum point in N(x
0
) . It
follows that x
0
must be a global solution, and that any local solution to
problem 12.3 must also be a global solution.
Now, let us prove that the set of global solutions is convex. Let
S = x [ x is a global solution to problem 12.3,
and consider any x
1
, x
2
S such that x
1
,= x
2
and x = x
1
+ (1 )x
2
,
(0, 1). By the convexity of f(x), we have
f(x
1
+ (1 )x
2
) f(x
1
) + (1 )f(x
2
)
= f(x
1
) + (1 )f(x
1
)
= f(x
1
).
Since x , the above must hold as an equality, or else x
1
would not be a
global solution. Therefore, x S and S is a convex set.
Problem 12.5
Recall
f(x) = |v(x)|
= max [v
i
(x)[, i = 1, . . . , m.
43
Minimizing f is equivalent to minimizing t where [v
i
(x)[ t, i = 1, . . . , m;
i.e., the problem can be reformulated as
min
x
t
s.t. t v
i
(x) 0, i = 1, . . . , m,
t +v
i
(x) 0, i = 1, . . . , m.
Similarly, for f(x) = max v
i
(x), i = 1, . . . , m, the minimization problem
can be reformulated as
min
x
t
s.t. t v
i
(x) 0, i = 1, . . . , m.
Problem 12.7
Given
d =
_
I
c
1
(x)c
T
1
(x)
|c
1
(x)|
2
_
f(x),
we nd
c
T
1
(x)d = c
T
1
(x)
_
I
c
1
(x)c
T
1
(x)
|c
1
(x)|
2
_
f(x)
= c
T
1
(x)f(x) +
(c
T
1
(x)c
1
(x))(c
T
1
(x)f(x))
|c
1
(x)|
2
= 0.
Furthermore,
f
T
(x)d = f
T
(x)
_
I
c
1
(x)c
T
1
(x)
|c
1
(x)|
2
_
f(x)
= f
T
(x)f(x) +
(f
T
(x)c
1
(x))(c
T
1
(x)f(x))
|c
1
(x)|
2
= |f(x)|
2
+
(f
T
(x)c
1
(x))
2
|c
1
(x)|
2
The Holder Inequality yields
[f
T
(x)c
1
(x)[ |f
T
(x)||c
1
(x)|
(f
T
(x)c
1
(x))
2
|f
T
(x)|
2
|c
1
(x)|
2
,
44
and our assumption that (12.10) does not hold implies that the above is
satised as a strict inequality. Thus,
f
T
(x)d = |f(x)|
2
+
(f
T
(x)c
1
(x))
2
|c
1
(x)|
2
< |f(x)|
2
+
|f(x)|
2
|c
1
(x)|
2
|c
1
(x)|
2
= 0.
Problem 12.13
The constraints can be written as
c
1
(x) = 2 (x
1
1)
2
(x
2
1)
2
0,
c
2
(x) = 2 (x
1
1)
2
(x
2
+ 1)
2
0,
c
3
(x) = x
1
0,
so
c
1
(x) =
_
2(x
1
1)
2(x
2
1)
_
, c
2
(x) =
_
2(x
1
1)
2(x
2
+ 1)
_
, c
3
(x) =
_
1
0
_
.
All constraints are active at x
)
is not a linearly independent set and LICQ does not hold. However, for
w = (1, 0), c
i
(x
)
T
w > 0 for all i A(x
x
L(x, ) = 2x a
xx
L(x, ) = 2I.
Notice that the second order sucient condition
xx
L(x, ) = 2I > 0 is
satised at all points.
The KKT conditions
x
L(x
) = 0,
c(x
) = 0,
0 imply
x
2
a
and
= 0 or a
T
x
+ =
|a|
2
2
+ = 0.
There are two cases. First, if 0, then the latter condition implies
= 0, so the solution is (x
) =
_
|a|
2
a,
2
|a|
2
_
Problem 12.16
Eliminating the x
2
variable yields
x
2
=
_
1 x
2
1
There are two cases:
Case 1: Let x
2
=
_
1 x
2
1
. The optimization problem becomes
min
x
1
f(x
1
) = x
1
+
_
1 x
2
1
.
The rst order condition is
f = 1
x
1
_
1 x
2
1
= 0,
which is satised by x
1
= 1/
2, 1/
2).
46
Case 2: Let x
2
=
_
1 x
2
1
. The optimization problem becomes
min
x
1
f(x
1
) = x
1
_
1 x
2
1
.
The rst order condition is
f = 1 +
x
1
_
1 x
2
1
= 0,
which is satised by x
1
= 1/
2, 1/
2).
Each choice of sign leads to a distinct solution. However, only case 2 yields
the optimal solution
x
=
_
2
,
1
2
_
.
Problem 12.18
The problem is
min
x,y
(x 1)
2
+ (y 2)
2
s.t. (x 1)
2
5y = 0.
The Lagrangian is
L(x, y, ) = (x 1)
2
+ (y 2)
2
((x 1)
2
5y)
= (1 )(x 1)
2
+ (y 2)
2
+ 5y,
which implies
x
L(x, y, ) = 2(1 )(x 1)
y
L(x, y, ) = 2(y 2) + 5.
The KKT conditions are
2(1
)(x
1) = 0
2(y
2) + 5
= 0
(x
1)
2
5y
= 0.
47
Solving for x
, y
, and
, we nd x
= 1, y
= 0, and
=
4
5
as the only
real solution. At (x
, y
,y
)
=
_
2(x 1)
5
_
(x
,y
)
=
_
0
5
_
,=
_
0
0
_
,
so LICQ is satised.
Now we show that (x
, y
= 4.
We nd
w F
2
(
) w = (w
1
, w
2
) satises [c(x
, y
)]
T
w = 0
_
0 5
_
w
1
w
2
_
= 0
w
2
= 0,
then for all w = (w
1
, 0) where w
1
,= 0,
w
T
2
L(x
, y
)w =
_
w
1
0
_
2(1
4
5
) 0
0 2
_ _
w
1
0
_
=
2
5
w
2
1
> 0 (for w
1
,= 0).
Thus from the second-order sucient condition, we nd (1, 0) is the optimal
solution.
Finally, we substitute (x 1)
2
= 5y into the objective function and get
the following unconstrained optimization problem:
min
y
5y + (y 2)
2
= y
2
+y + 4.
Notice that y
2
+y +4 = (y +
1
2
)
2
+
15
4
15
4
, so y = 1/2 yields an objective
value of 15/4 < 4. Therefore, optimal solutions to this problem cannot yield
solutions to the original problem.
Problem 12.21
We write the problem in the form:
min
x
1
,x
2
x
1
x
2
s.t. 1 x
2
1
x
2
2
0.
48
The Lagrangian function is
L(x
1
, x
2
, ) = x
1
x
2
(1 x
2
1
x
2
2
).
The KKT conditions are
x
2
(2x
1
) = 0
x
1
(2x
2
) = 0
0
(1 x
2
1
x
2
2
) = 0.
We solve this system to get three KKT points:
(x
1
, x
2
, )
_
(0, 0, 0),
_
2
2
,
2
2
,
1
2
_
,
_
2
2
,
2
2
,
1
2
__
Checking the second order condition at each KKT point, we nd
(x
1
, x
2
)
__
2
2
,
2
2
_
,
_
2
2
,
2
2
__
are the optimal points.
13 Linear Programming: The Simplex Method
Problem 13.1
We rst add slack variables z to the constraint A
2
x +B
2
y b
2
and change
it into
A
2
x +B
2
y +z = b
2
, z 0.
Then we introduce surplus variables s
1
and slack variables s
2
into the two-
sided bound constraint l y u:
y s
1
= l, y +s
2
= u, s
1
0, s
2
0.
Splitting x and y into their nonnegative and nonpositive parts, we have
x = x
+
x
, x
+
= max(x, 0) 0, x
= max(x, 0) 0,
y = y
+
y
, y
+
= max(y, 0) 0, y
= max(y, 0) 0.
49
Therefore the objective function and the constraints can be restated as:
max c
T
x +d
T
y min c
T
(x
+
x
) d
T
(y
+
y
)
A
1
x = b
1
A
1
(x
+
x
) = b
1
A
2
x +B
2
y b
2
A
2
(x
+
x
) +B
2
(y
+
y
) +z = b
2
l y u y
+
y
s
1
= l, y
+
y
+s
2
= u,
with all the variables (x
+
, x
, y
+
, y
, z, s
1
, s
2
) nonnegative. Hence the stan-
dard form of the given linear program is:
minimize
x
+
,x
,y
+
,y
,z,s
1
,s
2
_
_
c
c
d
d
0
0
0
_
_
T
_
_
x
+
x
y
+
y
z
s
1
s
2
_
_
subject to
_
_
A
1
A
1
0 0 0 0 0
A
2
A
2
B
2
B
2
I 0 0
0 0 I I 0 I 0
0 0 I I 0 0 I
_
_
_
_
x
+
x
y
+
y
z
s
1
s
2
_
_
=
_
_
b
1
b
2
l
u
_
_
x
+
, x
, y
+
, y
, z, s
1
, s
2
0.
Problem 13.5
It is sucient to show that the two linear programs have identical KKT
systems. For the rst linear program, let be the vector of Lagrangian
multipliers associated with Ax b and s be the vector of multipliers asso-
ciated with x 0. The Lagrangian function is then
L
1
(x, , s) = c
T
x
T
(Ax b) s
T
x.
The KKT system of this problem is given by
A
T
+s = c
Ax b
x 0
0
s 0
T
(Ax b) = 0
s
T
x = 0.
50
For the second linear program, we know that max b
T
min b
T
. Simi-
larly, let x be the vector of Lagrangian multipliers associated with A
T
c
and y be the vector of multipliers associated with 0. By introducing
the Lagrangian function
L
2
(, x, y) = b
T
x
T
(c A
T
) y
T
,
we have the KKT system of this linear program:
Ax b = y
A
T
c
0
x 0
y 0
x
T
(c A
T
) = 0
y
T
= 0.
Dening s = c A
T
and noting that y = Ax b, we can easily verify that
the two KKT systems are identical, which is the desired argument.
Problem 13.6
Assume that there does exist a basic feasible point x for linear program
(13.1), where m n and the rows of A are linearly dependent. Also as-
sume without loss of generality that B( x) = 1, 2, . . . , m. The matrix
B = [A
i
]
i=1,2,...,m
is nonsingular, where A
i
is the i-th column of A.
On the other hand, since m n and the rows of A are linearly dependent,
there must exist 1 k m such that the k-th row of A can be expressed as a
linear combination of other rows of A. Hence, with the same coecients, the
k-th row of B can also expressed as a linear combination of other rows of B.
This implies that B is singular, which obviously contradicts the argument
that B is nonsingular. Then our assumption that there is a basic feasible
point for (13.1) must be incorrect. This completes the proof.
Problem 13.10
By equating the last row of L
1
U
1
to the last row of P
1
L
1
B
+
P
T
1
, we have
the following linear system of 4 equations and 4 unknowns:
l
52
u
33
= u
23
l
52
u
34
+ l
53
u
44
= u
24
l
52
u
35
+ l
53
u
45
+ l
54
u
55
= u
25
l
52
w
3
+ l
53
w
4
+ l
54
w
5
+ w
2
= w
2
.
51
We can either successively retrieve the values of l
52
, l
53
, l
54
and w
2
from
l
52
= u
23
/u
33
l
53
= (u
24
l
52
u
34
)/u
44
l
54
= (u
25
l
52
u
35
l
53
u
45
)/u
55
w
2
= w
2
l
52
w
3
l
53
w
4
l
54
w
5
,
or calculate these values from the unknown quantities using
l
52
= u
23
/u
33
l
53
= (u
24
u
33
u
23
u
34
)/(u
33
u
44
)
l
54
= (u
25
u
33
u
44
u
23
u
35
u
44
u
24
u
33
u
45
+u
23
u
34
u
45
)/(u
33
u
44
u
55
)
w
2
= w
2
w
3
u
23
u
33
w
4
u
24
u
33
u
23
u
34
u
33
u
44
w
5
u
25
u
33
u
44
u
23
u
35
u
44
u
24
u
33
u
45
+u
23
u
34
u
45
u
33
u
44
u
55
.
14 Linear Programming: Interior-Point Methods
Problem 14.1
The primal problem is
min
x
1
,x
2
x
1
s.t. x
1
+x
2
= 1
(x
1
, x
2
) 0,
so the KKT conditions are
F(x, , s) =
_
_
_
_
_
_
x
1
+x
2
1
+s
1
1
+s
2
x
1
s
1
x
2
s
2
_
_
_
_
_
_
= 0,
with (x
1
, x
2
, s
1
, s
2
) 0. The solution to the KKT conditions is
(x
1
, x
2
, s
1
, s
2
, ) = (0, 1, 1, 0, 0),
but F(x, , s) also has the spurious solution
(x
1
, x
2
, s
1
, s
2
, ) = (1, 0, 0, 1, 1).
52
Problem 14.2
(i) For any (x, , s) N
2
(
1
), we have
Ax = b (23a)
A
T
+s = c (23b)
x > 0 (23c)
s > 0 (23d)
|XSe e|
2
1
. (23e)
Given 0
1
<
2
< 1, equation (23e) implies
|XSe e|
2
1
<
2
. (24)
From equations (23a)(23d),(24), we have (x, , s) N
2
(
2
). Thus
N
2
(
1
) N
2
(
2
) when 0
1
<
2
< 1.
For any (x, , s) N
(
1
), we have
Ax = b (25a)
A
T
+s = c (25b)
x > 0 (25c)
s > 0 (25d)
x
i
s
i
1
, i = 1, 2, . . . , n. (25e)
Given 0 <
2
1
1, equation (25d) implies
x
i
s
i
1
2
. (26)
We have from equations (25a)(25d),(26) that (x, , s) N
(
2
).
This shows that N
(
1
) N
(
2
) when 0 <
2
1
1.
(ii) For any (x, , s) N
2
(), we have
Ax = b (27a)
A
T
+s = c (27b)
x > 0 (27c)
s > 0 (27d)
|XSe e|
2
. (27e)
53
Equation (27e) implies
n
i=1
(x
i
s
i
)
2
2
2
. (28)
Suppose that there exists some k 1, 2, . . . , n satisfying
x
k
s
k
< where 1 . (29)
We have
x
k
s
k
< (1 )
= x
k
s
k
< < 0
= (x
k
s
k
)
2
>
2
2
.
Obviously, this contradicts equation (28), so we must have x
k
s
k
for all k = 1, 2, . . . , n. This conclusion, together with equations (27a)
(27d), gives (x, , s) N
(). Therefore N
2
() N
() when
1 .
Problem 14.3
For ( x,
, s) ^
, s) T
0
, (30)
x
i
s
i
, i = 1, . . . , n. (31)
Therefore, for an arbitrary point (x, , s) T
0
we have (x, , s) ^
()
if and only if condition (31) holds. Notice that
x
i
s
i
x
i
s
i
nx
i
s
i
x
T
s
.
Therefore, the range of such that (x, , s) ^
_
x
1
s
1
x
2
s
2
2
_
2
+
_
x
2
s
2
x
1
s
1
2
_
2
>
_
x
1
s
1
+x
2
s
2
2
_
2
2(x
1
s
1
x
2
s
2
)
2
> (x
1
s
1
+x
2
s
2
)
2
2(x
1
s
1
x
2
s
2
) > x
1
s
1
+x
2
s
2
x
1
s
1
x
2
s
2
>
2 + 1
2 1
5.8284,
which holds, for example, when
x =
_
6
1
_
and s =
_
1
1
_
.
Problem 14.5
For (x, , s) ^
i=1
x
i
s
i
> n
x
T
s
n
> > ,
which is a contradiction. Therefore, x
i
s
i
= for i = 1, . . . , n. Along with
condition (32), this coincides with the central path (.
For (x, , s) ^
2
(0) the following conditions hold:
(x, , s) T
0
(34)
n
i=1
(x
i
s
i
)
2
0. (35)
55
If x
i
s
i
,= for some i = 1, . . . , n, then
n
i=1
(x
i
s
i
)
2
> 0,
which contradicts condition (35). Therefore, x
i
s
i
= for i = 1, . . . , n which,
along with condition (34), coincides with (.
Problem 14.7
Assuming
lim
x
i
s
i
0
= lim
x
i
s
i
0
x
T
s/n ,= 0,
i.e., x
k
s
k
> 0 for some k ,= i, we also have
lim
x
i
s
i
0
x
T
s ,= 0 and lim
x
i
s
i
0
log x
T
s ,= .
Consequently,
lim
x
i
s
i
0
= lim
x
i
s
i
0
_
log x
T
s
n
i=1
log x
i
s
i
_
= lim
x
i
s
i
0
log x
T
s lim
x
i
s
i
0
log x
1
s
1
lim
x
i
s
i
0
log x
n
s
n
= c lim
x
i
s
i
0
log x
i
s
i
= ,
as desired, where c is a nite constant.
Problem 14.8
First, assume the coecient matrix
M =
_
_
0 A
T
I
A 0 0
S 0 X
_
_
is nonsingular. Let
M
1
=
_
0 A
T
I
, M
2
=
_
A 0 0
, M
3
=
_
S 0 X
,
then the nonsingularity of M implies that the rows of M
2
are linearly inde-
pendent. Thus, A has full row rank.
56
Second, assume A has full row rank. If M is singular, then certain rows
of M can be expressed as a linear combination of its other rows. We denote
one of these such rows as row m. Since I, S, X are all diagonal matrices
with positive diagonal elements, we observe that m is neither a row of M
1
nor a row of M
3
. Thus m must be a row of M
2
. Due to the structure of
I, S, and X, m must be expressed as a linear combination of rows of M
2
itself. However, this contradicts our assumption that A has full row rank,
so M must be nonsingular.
Problem 14.9
According to the assumptions, the following equalities hold
Ax = 0 (36)
A
T
+ s = 0. (37)
Multiplying equation (36) on the left by
T
and equation (37) on the left
by x
T
yields
T
Ax = 0 (38)
x
T
A
T
+ x
T
s = 0. (39)
Subtracting equation (38) from (39) yields
x
T
s = 0,
as desired.
Problem 14.12
That AD
2
A
T
is symmetric follows easily from the fact that
_
AD
2
A
T
_
T
=
_
A
T
_
T
_
D
2
_
T
(A)
T
= AD
2
A
T
since D
2
is a diagonal matrix.
Assume that A has full row rank, i.e.,
A
T
y = 0 y = 0.
Let x ,= 0 be any vector in
m
and notice:
x
T
AD
2
A
T
x = x
T
ADDA
T
x
=
_
DA
T
x
_
T
_
DA
T
x
_
= v
T
v
= [[v[[
2
2
,
57
where v = DA
T
x is a vector in
m
. Due to the assumption that A has full
row rank it follows that A
T
x ,= 0, which implies v ,= 0 (since D is diagonal
with all positive diagonal elements). Therefore,
x
T
AD
2
A
T
x = [[v[[
2
2
> 0,
so the coecient matrix AD
2
A
T
is positive denite whenever A has full row
rank.
Now, assume that AD
2
A
T
is positive denite, i.e.,
x
T
AD
2
A
T
x > 0
for all nonzero x
m
. If some row of A could be expressed as a linear
combination of other rows in A, then A
T
y = 0 for some nonzero y
m
.
However, this would imply
y
T
AD
2
A
T
y =
_
y
T
AD
2
_ _
A
T
y
_
= 0,
which contradicts the assumption that AD
2
A
T
is positive denite. There-
fore, A must have full row rank.
Finally, consider replacing D by a diagonal matrix in which exactly m of
the diagonal elements are positive and the remainder are zero. Without loss
of generality, assume that the rst m diagonal elements of m are positive.
A real symmetric matrix M is positive denite if and only if there exists a
real nonsingular matrix Z such that
M = ZZ
T
. (40)
Notice that
C = AD
2
A
T
= (AD)(AD)
T
=
_
BD
t
_ _
BD
t
_
T
,
where B is the submatrix corresponding to the rst m columns of A and D
t
is the m m diagonal submatrix of D with all positive diagonal elements.
Therefore, according to (40), the desired results can be extended in this case
if and only if BD
t
is nonsingular, which is guaranteed if the resulting matrix
B has linearly independent columns.
58
Problem 14.13
A Taylor series approximation to H near the point (x, , s) is of the form:
_
x(),
(), s()
_
=
_
x(0),
(0), s(0)
_
+
_
x
t
(0),
t
(0), s
t
(0)
_
+
1
2
2
_
x
tt
(0),
tt
(0), s
tt
(0)
_
+ ,
where
_
x
(j)
(0),
(j)
(0), s
(j)
(0)
_
is the jth derivative of
_
x(),
(), s()
_
with respect to , evaluated at = 0. These derivatives can be deter-
mined by implicitly dierentiating both sides of the equality given as the
denition of H. First, notice that
_
x
t
(),
t
(), s
t
()
_
solves
_
_
0 A
T
I
A 0 0
S() 0
X()
_
_
_
_
x
t
()
t
()
s
t
()
_
_
=
_
_
r
c
r
b
XSe
_
_
. (41)
After setting = 0 and noticing that
X(0) = X and
S(0) = S, the linear
system in (41) reduces to
_
_
0 A
T
I
A 0 0
S 0 X
_
_
_
_
x
t
(0)
t
(0)
s
t
(0)
_
_
=
_
_
r
c
r
b
XSe
_
_
, (42)
which is exactly the system in (14.8). Therefore,
_
x
t
(0),
t
(0), s
t
(0)
_
=
_
x
a
,
a
, s
a
_
. (43)
Dierentiating (41) with respect to yields
_
_
0 A
T
I
A 0 0
S() 0
X()
_
_
_
_
x
tt
()
tt
()
s
tt
()
_
_
=
_
_
0
0
2
X
t
()
S
t
()e
_
_
. (44)
If we let (x
corr
,
corr
, s
corr
) be the solution to the corrector step, i.e.,
when the right-hand-side of (14.8) is replaced by
_
0, 0, X
a
S
a
e
_
,
then after setting = 0 and noting (43) we can see that
_
x
tt
(0),
tt
(0), s
tt
(0)
_
=
1
2
_
x
corr
,
corr
, s
corr
_
. (45)
59
Finally, dierentiating (44) with respect to yields
_
_
0 A
T
I
A 0 0
S() 0
X()
_
_
_
_
x
ttt
()
ttt
()
s
ttt
()
_
_
=
_
_
0
0
3
_
X
tt
()
S
t
() +
S
tt
()
X
t
()
_
e
_
_. (46)
Setting = 0 and noting (43) and (45), we nd
_
_
0 A
T
I
A 0 0
S 0 X
_
_
_
_
x
ttt
(0)
ttt
(0)
s
ttt
(0)
_
_
=
_
_
0
0
3
2
_
X
corr
S
a
+ S
corr
X
a
_
e
_
_. (47)
In total, a Taylor series approximation to H is given by
_
x(),
(), s()
_
= (x, , s)
+
_
x
a
,
a
, s
a
_
+
2
_
x
corr
,
corr
, s
corr
_
+
3
3!
_
x
ttt
(0),
ttt
(0), s
ttt
(0)
_
,
where
_
x
ttt
(0),
ttt
(0), s
ttt
(0)
_
solves (47).
Problem 14.14
By introducing Lagrange multipliers for the equality constraints and the
nonnegativity constraints, the Lagrangian function for this problem is given
by
L(x, y, , s) = c
T
x +d
T
y
T
(A
1
x +A
2
y b) s
T
x.
Applying Theorem 12.1, the rst-order necessary conditions state that for
(x
, y
_
A
T
1
+s c
A
T
2
d
A
1
x +A
2
y b
XSe
_
_
= 0, (53)
(x, s) 0. (54)
Similar to the standard linear programming case, the central path is de-
scribed by the system including (48)-(52) where (51) is replaced by
x
i
s
i
= , i = 1, . . . , n.
The Newton step equations for = are
_
_
0 0 A
T
1
I
0 0 A
T
2
0
A
1
A
2
0 0
S 0 0 X
_
_
_
_
x
y
s
_
_
=
_
_
r
c
r
d
r
b
XSe +e
_
_
(55)
where
r
b
= A
1
x +A
2
y = b, r
c
= A
T
1
+s c, and r
d
= A
T
2
d.
By eliminating s from (55), the augmented system is given by
_
_
0 0 A
T
2
A
1
A
2
0
D
2
0 A
T
1
_
_
_
_
x
y
_
_
=
_
_
r
d
r
b
r
c
+s X
1
e
_
_
, (56)
s = s +X
1
e D
2
x, (57)
where D = S
1/2
X
1/2
.
We can eliminate x from (56) by noting
D
2
x +A
T
1
= r
c
+s X
1
e
x = D
2
_
r
c
+s X
1
e A
T
1
_
,
which yields the system
_
0 A
T
2
A
2
A
1
D
2
A
T
1
_ _
y
_
=
_
r
d
r
b
+A
1
D
2
_
r
c
+s X
1
e
_
_
(58)
x = D
2
_
r
c
+s X
1
e A
T
1
_
(59)
s = s +X
1
e D
2
x. (60)
Unfortunately, there is no way to reduce this system any further in general.
That is, there is no way to create a system similar to the normal-equations
in (14.44).
61
15 Fundamentals of Algorithms for Nonlinear Con-
strained Optimization
Problem 15.3
(a) The formulation is
min x
1
+x
2
s.t. x
2
1
+x
2
2
= 2
0 x
1
1
0 x
2
1.
This problem has only one feasible point, namely x
1
= x
2
= 1. Thus
it has a solution at x
= x
2
= 1, and the optimal objective is 2.
(b) The formulation is
min x
1
+x
2
(61a)
s.t. x
2
1
+x
2
2
1 (61b)
x
1
+x
2
= 3 (61c)
Substituting equation (61c) into (61b), we get
x
2
1
+ (3 x
1
)
2
1 which implies x
2
1
3x
1
+ 4 0.
This inequality has no solution; thus the feasible region of the original
problem is empty. This shows that the problem has no solution.
(c) The formulation is
min x
1
x
2
s.t. x
1
+x
2
= 2
Since the constraint of this problem is linear, we eliminate x
2
from the
objective and get an unconstrained problem, namely
min x
1
(2 x
1
) = (x
1
1)
2
+ 1.
Obviously, when [x
1
1[ +, we see that (x
1
1)
2
+ 1 .
This shows that the original problem is unbounded below, hence it has
no solution.
62
Problem 15.4
The optimization problem is
min
x,y
x
2
+y
2
s.t. (x 1)
3
= y
2
.
If we eliminate x by writing it in terms of y, i.e. x =
3
_
y
2
+ 1, then the
above becomes the unconstrained problem
min f(y) (y
2/3
+ 1)
2
+y
2
.
Notice f 0, so the optimal solution to the unconstrained problem is y
= 0,
which corresponds to the optimal solution (x
, y
=
_
B
1
B
1
N
0 I
_
=
_
y
1
y
2
. . . y
m
z
1
z
2
. . . z
nm
0 0 . . . 0 e
1
e
2
. . . e
nm
_
.
In order to see the linear dependence of
_
Y Z
, we consider
k
1
_
y
1
0
_
+k
2
_
y
2
0
_
+ +k
m
_
y
m
0
_
+t
1
_
z
1
e
1
_
+t
2
_
z
2
e
2
_
+ +t
nm
_
z
nm
e
nm
_
= 0.
(62)
The last (n m) equations of (62) are in fact
t
1
e
1
+t
2
e
2
+ +t
nm
e
nm
= 0,
where e
j
=
_
0 0 0 1 0 0 0
T
. Thus t
1
= t
2
= = t
nm
= 0. This
shows that the rst m equations of (62) are
k
1
y
1
+k
2
y
2
+ +k
m
y
m
= 0.
It follows immediately that k
1
= k
2
= = k
m
= t
1
= t
2
= = t
nm
= 0,
which indicates that the collection of columns of
_
Y Z
form a linearly
independent basis of R
n
.
63
Problem 15.6
Recall A
T
= Y R. Since is a permutation matrix, we know
T
=
1
.
Thus A = R
T
Y
T
. This gives
AA
T
= R
T
Y
T
Y R
T
. (63)
The matrix
_
Y Z
is orthogonal, so Y
T
Y = I. Then (63) gives
AA
T
= R
T
R
T
(AA
T
)
1
= R
1
R
T
T
A
T
(AA
T
)
1
= (Y R
T
)R
1
R
T
T
A
T
(AA
T
)
1
= Y R
T
T
A
T
(AA
T
)
1
b = Y R
T
T
b.
Problem 15.7
(a) We denote the i
th
column of matrix
_
I
(B
1
N)
T
_
=
_
_
[ . . . [ . . . [
y
1
. . . y
i
. . . y
n
[ . . . [ . . . [
_
_
by y
i
.
Then
|y
i
|
2
= 1 +|(B
1
N)
i
|
2
1.
Thus Y is no longer of norm 1. The same argument holds for the
matrix
Z =
_
B
1
N
I
_
.
Furthermore,
Y
T
Z =
_
I B
1
N
_
B
1
N
I
_
= B
1
N +B
1
N = 0,
AZ =
_
B N
_
B
1
N
I
_
= BB
1
N +N = 0.
These show that the columns of Y and Z form an independent set and
Y , Z are valid basis matrices.
64
(b) We have from A =
_
B N
that
AA
T
=
_
B N
_
B
T
N
T
_
= BB
T
+NN
T
.
Therefore,
AY =
_
B N
_
I
(B
1
N)
T
_
= B +N(B
1
N)
T
= B +NN
T
B
T
= (BB
T
+NN
T
)B
T
= (AA
T
)B
T
.
And then,
(AY )
1
= B
T
(AA
T
)
1
=Y (AY )
1
= Y B
T
(AA
T
)
1
=Y (AY )
1
(AA
T
) = Y B
T
(AA
T
)
1
(AA
T
) = Y B
T
=
_
I
(B
1
N)
T
_
B
T
=
_
N
T
B
T
B
T
=
_
B
T
N
T
_
= A
T
.
This implies Y (AY )
1
= A
T
(AA
T
)
1
. Thus Y (AY )
1
b = A
T
(AA
T
)
1
b,
which is the minimum norm solution of Ax = b.
Problem 15.8
The new problem is:
min sin(x
1
+x
2
) +x
2
3
+
1
3
_
x
4
+x
4
5
+
1
2
x
6
_
s.t. 8x
1
6x
2
+ x
3
+ 9x
4
+ 4x
5
= 6
3x
1
+ 2x
2
x
4
+ 6x
5
+ 4x
6
= 4
3x
1
+ 2x
3
1.
If we eliminate variables with (15.11):
_
x
3
x
6
_
=
_
8 6 9 4
3
4
1
2
1
4
3
2
_
_
_
_
_
_
_
x
1
x
2
x
3
x
4
x
5
_
_
_
_
_
_
+
_
6
1
_
,
65
the objective function will turn out to be (15.12). We substitute (15.11)
into the inequality constraint:
1 3x
1
+ 2(8x
1
+ 6x
2
9x
4
4x
5
+ 6)
= 13x
1
+ 12x
2
18x
4
8x
5
+ 12
= 13x
1
+ 12x
2
18x
4
8x
5
11,
which is exactly (15.23). Thus the problem turns out to be minimizing
function (15.12) subject to (15.23).
16 Quadratic Programming
Problem 16.1
(b) The optimization problem can be written as
min
x
1
2
x
T
Gx +d
T
x
s.t. c(x) 0,
where
G =
_
8 2
2 2
_
, d =
_
2
3
_
, and c(x) =
_
_
x
1
x
2
4 x
1
x
2
3 x
1
_
_
.
Dening
A = c(x) =
_
_
1 1
1 1
1 0
_
_
,
we have the Lagrangian
L(x, ) =
1
2
x
T
Gx +d
T
x
T
c(x)
and its corresponding derivatives in terms of the x variables
x
L(x, ) = Gx +d A
T
and
xx
L(x, ) = G.
Consider x = (a, a) '
2
. It is easily seen that such an x is feasible for a 2
and that
q(x) = 7a
2
5a as a .
Therefore, the problem is unbounded. Moreover,
xx
L = G < 0, so no
solution satises the second order necessary conditions are there are no local
minimizers.
66
Problem 16.2
The problem is:
min
x
1
2
(x x
0
)
T
(x x
0
)
s.t. Ax = b.
The KKT conditions are:
x
x
0
A
T
= 0, (64)
Ax
= b. (65)
Multiplying (64) on the left by A yields
Ax
Ax
0
AA
T
= 0. (66)
Substituting (65) into (66), we nd
b Ax
0
= AA
T
,
which implies
=
_
AA
T
_
1
(b Ax
0
). (67)
Finally, substituting (67) into (64) yields
x
= x
0
+A
T
(AA
T
)
1
(b Ax
0
). (68)
Consider the case where A '
1n
. Equation (68) gives
x
x
0
= A
T
(AA
T
)
1
(b Ax
0
) =
1
|A|
2
A
T
(b Ax
0
),
so the optimal objective value is given by
f
=
1
2
(x
x
0
)
T
(x
x
0
)
=
1
2
_
1
|A|
2
2
_
2
(b ax
0
)
T
AA
T
(b Ax
0
)
=
1
2
1
|A|
4
2
_
|A|
2
2
_
(b Ax
0
)
T
(b Ax
0
)
=
1
2
1
|A|
2
2
(b Ax
0
)
2
.
and the shortest distance from x
0
to the solution set of Ax = b is
_
2f
1
|A|
2
2
(b Ax
0
)
2
=
[b Ax
0
[
|A|
2
.
67
Problem 16.6
First, we will show that the KKT conditions for problem (16.3) are satised
by the point satisfying (16.4). The Lagrangian function for problem (16.3)
is
L(x, ) =
1
2
x
T
Gx +d
T
x
T
(Ax b),
so the KKT conditions are
Gx +d A
T
= 0
Ax = b.
The point (x
_
=
_
d
b
_
,
which is exactly the system given by (16.4).
Now assume that the reduced Hessian Z
T
GZ is positive denite. The
second order conditions for (16.3) are satised if w
T
xx
L(x
)w = w
T
Gw >
0 for all w ((x
), w ,= 0. By denition, w ((x
) if w = Zu for
any real u, so
w
T
Gw = u
T
Z
T
GZu > 0
and the second order conditions are satised.
Problem 16.7
Let x = x
+Zu, ,= 0. We nd
q(x) = q(x
+Zu)
=
1
2
(x
+Zu)
T
G(x
+Zu) +d
T
(x
+Zu)
=
1
2
x
T
Gx
+x
T
GZu +
1
2
2
u
T
Z
T
GZu +d
T
x
+d
T
Zu
= q(x
) +
1
2
2
u
T
A
T
GZu +(x
T
GZu +d
T
Zu).
A point (x
+d A
T
.
Taking the transpose and multiplying on the right by Zu, we nd
0 = x
T
GZu +d
T
Zu
T
AZu = x
T
GZu +d
T
Zu,
68
so in fact
q(x) = q(x
) +
1
2
2
u
T
A
T
GZu.
If there exists a u such that u
T
Z
T
GZu < 0, then q(x) < q(x
). Hence
(x
) is a stationary point.
Problem 16.15
Suppose that there is a vector pair (x
+p) = b,
so that x
+p is feasible, while
q(x
+p) = q(x
) +p
T
(Gx
+c) +
1
2
2
p
T
Gp
= q(x
) +
1
2
2
p
T
Gp
q(x
),
where we have used the KKT condition Gx
+ c = A
T
= u
T
Z
T
A
T
Ax
b = 0,
[Ax b]
i
i
= 0, i = 1, . . . , n
0.
69
Introducing slack variables y yields
Gx +d A
T
A
T
= 0,
Ax y b = 0,
Ax
b = 0,
y
i
i
= 0, i = 1, . . . , n
(y, ) 0,
which can be expressed as
F(x, y, , ) =
_
_
Gx +d A
T
A
T
Ax y b
Ax
b
Y e
_
_
= 0.
The analog of (16.58) is
_
_
G A
T
A
T
0
A 0 0 I
A 0 0 0
0 0 Y
_
_
_
_
x
y
_
_
=
_
_
r
d
r
b
r
b
Y e +e
_
_
where
r
d
= Gx +d A
T
, r
b
= Ax y b, and r
b
=
Ax
b.
17 Penalty and Augmented Lagrangian Methods
Problem 17.1
The following equality constrained problem
min
x
x
4
s.t. x = 0
has a local solution at x
)
2
= x
3
+
1
2
(max(x, 0))
2
=
_
x
3
if x 0
x
3
+
1
2
x
2
if x < 0,
which is unbounded for any value of .
Problem 17.5
The penalty function and its gradient are
Q(x; ) = 5x
2
1
+x
2
2
+
2
(x
1
1)
2
and Q(x; ) =
_
( 10)x
1
2x
2
_
,
respectively. For = 1, the stationary point is (1/9, 0) and the contours
are shown in gure 4.
Problem 17.9
For Example 17.1, we know that x
= (1, 1) and
=
1
2
. The goal
is to show that
1
(x; ) does not have a local minimizer at (1, 1) unless
|
=
1
2
.
We have from the denition of the directional derivative that for any
p = (p
1
, p
2
),
D(
1
(x
; ), p) = f(x
)
T
p +
ic
[c
i
(x
)
T
p[
= (p
1
+p
2
) +[2(p
1
+p
2
)[
=
_
(1 2)(p
1
+p
2
) if p
1
+p
2
< 0
(1 + 2)(p
1
+p
2
) if p
1
+p
2
0.
71
1.5 1 0.5 0 0.5 1 1.5
1.5
1
0.5
0
0.5
1
1.5
x
1
x
2
low low
high
high
Figure 4: Contours for the quadratic penalty function Q(x; ), = 1.
It is easily seen that when <
1
2
, we can always choose p
1
+ p
2
< 0 such
that
(1 2)(p
1
+p
2
) < 0,
in which case p is a descent direction for
1
(x
; ) since D(
1
(x
; ), p)
0 always holds. This shows that
1
(x; ) does not have a local minimizer at
x
= (1, 1) unless |
=
1
2
.
18 Sequential Quadratic Programming
Problem 18.4
When
k
,= 1, we have
k
=
0.8s
T
k
B
k
s
k
s
T
k
B
k
s
k
s
T
k
y
k
72
where s
T
k
y
k
< 0.2s
T
k
B
k
s
k
. Therefore
s
T
k
r
k
= s
T
k
(
k
y
k
+ (1
k
)B
k
s
k
)
=
k
(s
T
k
y
k
) + (1
k
)s
T
k
B
k
s
k
=
0.8s
T
k
B
k
s
k
s
T
k
B
k
s
k
s
T
k
y
k
s
T
k
y
k
+
0.2s
T
k
B
k
s
k
s
T
k
y
k
s
T
k
B
k
s
k
s
T
k
y
k
s
T
k
B
k
s
k
=
s
T
k
B
k
s
k
s
T
k
B
k
s
k
s
T
k
y
k
_
0.8s
T
k
y
k
+ 0.2s
T
k
B
k
s
k
s
T
k
y
k
_
=
s
T
k
B
k
s
k
s
T
k
B
k
s
k
s
T
k
y
k
_
0.2s
T
k
B
k
s
k
0.2s
T
k
y
k
_
= 0.2s
T
k
B
k
s
k
> 0.
This shows that the damped BFGS updating satises (18.17).
Problem 18.5
We have
c(x) = x
2
1
+x
2
2
1 and c(x) =
_
2x
1
2x
2
_
,
so the linearized constraint at x
k
is
0 = c(x
k
) +c(x
k
)
T
p
= x
2
1
+x
2
2
1 + 2x
1
p
1
+ 2x
2
p
2
.
(a) At x
k
= (0, 0), the constraint becomes
0 = 1,
which is incompatible.
(b) At x
k
= (0, 1), the constraint becomes
0 = 2p
2
,
which has a solution of the form p = (q, 0), q '.
(c) At x
k
= (0.1, 0.02), the constraint becomes
0 = 0.9896 + 0.2p
1
+ 0.04p
2
,
which has a solution of the form p = (4.948, 0) +q(0.2, 1), q '.
73
(d) At x
k
= (0.1, 0.02), the constraint becomes
0 = 0.9896 0.2p
1
0.04p
2
,
which has a solution of the form p = (4.948, 0) +q(0.2, 1), q '.
19 Interior-Point Methods for Nonlinear Program-
ming
Problem 19.3
Dene the vector function
c(x) = Dr(x),
where D is a diagonal scaling matrix with nonzero diagonal entries. The
Jacobian corresponding to c(x) is
A(x) =
_
_
c
1
(x)
T
.
.
.
c
n
(x)
T
_
_ =
_
_
D
11
r
1
(x)
T
.
.
.
D
nn
r
n
(x)
T
_
_ = DJ(x).
Therefore, the Newton step p is obtained via the solution of the linear system
DJ(x)p = Dr(x),
which is equivalent to
J(x)p = r(x)
since D is nonsingular.
Problem 19.4
Eliminating the linear equation yields x
1
= 2 x
2
. Plugging this expression
into the second equation implies that the solutions satisfy
3x
2
2
+ 2x
2
+ 1 = 0. (69)
Thus, the solutions are
(x
1
, x
2
)
_
(1, 1) ,
_
7
3
,
1
3
__
.
74
Similarly, multiplying the rst equation by x
2
yields the system
_
x
1
x
2
+x
2
2
2x
2
x
1
x
2
2x
2
2
+ 1
_
= 0.
Subtracting the rst equation from the second again yields (69), and the
solutions remain unchanged.
Newtons method applied to the two systems yields the linear systems
_
1 1
x
2
x
1
4x
2
_
d =
_
x
1
+x
2
2
x
1
x
2
2x
2
2
+ 1
_
and
_
x
2
x
1
+ 2x
2
2
x
2
x
1
4x
2
_
d =
_
x
1
x
2
+x
2
2
2x
2
x
1
x
2
2x
2
2
+ 1
_
.
From the point x = (1, 1), the steps are found to be d = (4/3, 2/3) and
d = (1/2, 1/2), respectively.
Problem 19.14
For clarity, dene
U =
_
_
W
0
0
0
_
_
, V =
_
_
WM
T
0
0
0
_
_
T
,
and
C =
_
D A
T
A 0
_
,
where
D =
_
I 0
0
_
and A =
_
A
E
0
A
I
I
_
.
It can easily be shown that
C
1
=
_
D
1
D
1
A
T
(AD
1
A
T
)
1
AD
1
D
1
A
T
(AD
1
A
T
)
1
(AD
1
A
T
)AD
1
(AD
1
A
T
)
1
_
,
so the solution r of the primal-dual system (C+UV
T
)r = s can be obtained
via the ShermanMorrisonWoodbury formula as
r = (C +UV
T
)
1
s =
_
C
1
C
1
U(I +V
T
C
1
U)
1
V
T
C
1
_
s,
which requires only solutions of the system Cv = b for various b.
75