Convex Optimization
Convex Optimization
Advanced camp
Francisco J. Silva
Université de Limoges
francisco.silva@unilim.fr
http://www.unilim.fr/pages perso/francisco.silva/
August 2019
Contents of the course
1
Definition of an optimization problem
f (x̄) ≤ f (x) ∀ x ∈ K.
In the above, f is called an objective function, K is called a feasible set (or constraint
set) and any x̄ solving (P ) is called a global solution to problem (P ).
Usually one also considers the weaker notion, but easier to characterize, of local solution
to problem (P ). Namely, x̄ ∈ K is a local solution to (P ) if there exists δ > 0 such
that f (x̄) ≤ f (x) for all x ∈ K ∩ B(x̄, δ), where
n
B(x̄, δ) := {x ∈ R | kx − x̄k ≤ δ} .
2
In optimization theory one usually studies the following features of problem (P ):
1.- Does there exist a solution x̄ (global or local)?
2.- Optimality conditions, i.e. properties satisfied by the solutions (or local solutions).
3.- Computation algorithms for finding approximate solutions.
In this course we will mainly focus on points 1 and 2 of the previous program.
We will also consider mainly two cases for the feasible set K:
K = Rn (unconstrained case).
Equality and inequality constraints:
n
K = {x ∈ R | gi(x) = 0, i = 1, . . . , m, hj (x) ≤ 0, j = 1, . . . , `} . (1)
In order to tackle point 2 we will assume that f is a smooth function. If the feasible
set (1) is considered, we will also assume that gi and hj are smooth functions.
3
Some mathematical tools
In what follows, we will work in the euclidean space Rn. We denote by h·, ·i the
standard scalar product and by k · k the corresponding norm. Namely,
n q
>
X n n
hx, yi = x y = xiyi ∀ x ∈ R , y ∈ R and kxk = hx, xi.
i=1
n
Gr(f ) := {(x, f (x)) | x ∈ R } .
4
[Level sets] Let c ∈ R. The level set of value c is defined by
n
Levf (c) := {x ∈ R | f (x) = c} .
• When n = 2, the sets Levf (c) are useful in order to draw the graph of a function.
• These sets will also be useful in order to solve graphically two dimensional linear
programming problems, i.e. n = 2, and the function f and the set K are defined
by means of affine functions.
5
Example 1: We consider the function
2
R 3 (x, y) 7→ f (x, y) := x + y + 2 ∈ R,
Note that the optimization problem with this f and K = R2 does not have a solution.
6
Example 3: Consider the function
2 2 2
R 3 (x, y) 7→ f (x, y) := x − y ∈ R.
7
[Differentiability] Let f : Rn → R. We say that f is differentiable at x̄ ∈ Rn if for all
i = 1, . . . , n the partial derivatives
we have that
f (x̄ + h) − f (x̄) − ∇f (x̄) · h
lim = 0.
h→0 khk
8
Remark 1. Notice that f is differentiable at x̄ iff there exists εx̄ : Rn → R, with
limh→0 εx̄(h) = 0 and
f (x̄ + τ h) − f (x̄)
lim = ∇f (x̄) · h.
τ →0, τ >0 τ
9
Remark 2. (i) [Simple criterion to check differentiability] Suppose that A ⊆ Rn is
an open set containing x̄ and that
n
A 3 x 7→ ∇f (x) ∈ R ,
10
which is given by
∂f1 ∂f1
∂x1 (x̄) ... ∂xn (x̄)
... ... ...
∂f ∂fi
Df (x̄) = i
∂x1 (x̄) ... (x̄)
∂xn
... ... ...
∂fm ∂fm
∂x (x̄)
1
... ∂xn (x̄)
(iii) In the previous definitions the fact that the domain of definition of f is Rn is
11
not important. The definition can be extended naturally for functions defined on open
subsets of Rn.
Basic examples:
n
f2(x) = 12 hQx, xi ∀ x ∈ R .
1
f2(x + h) = 2 hQ(x + h), x + hi
1 1 1
= 2 hQx, xi + 2 [hQx, hi + hQh, xi] + 2 hQh, hi
1 1 >
x, hi + 12 hQh, hi
= 2 hQx, xi + h 2 Q + Q
f2(x̄) + h 12 Q + Q> x, hi + khkεx(h),
=
12
where limh→0 εx(h) = 0. Therefore, f2 is differentiable and
> n
∇f2(x) = 12 Q + Q x ∀x∈R .
√ 2 2 1
1 > x>
Df (x) = D( ·)(kxk )D(k · k )(x) = 2 p (2x) = ,
kxk2 kxk
x
which implies that ∇f3(x) = kxk , and, since this function is continuous at every
x 6= 0, we have that f3 is C 1 in the set Rn \ {0}. Let us show that f3 is not
differentiable at x = 0. Indeed, if this is not the case, then all the partial derivatives
∂f3
∂x (0) should exists for all i = 1, . . . , n. Taking, for instance, i = 1, we have
i
k0 + τ e1k − k0k |τ |
lim = lim ,
τ →0 τ τ →0 τ
13
which does not exist, because
|τ | −τ τ |τ |
lim = lim = −1 6= 1 = lim = lim .
τ →0 − τ τ →0− τ τ →0 + τ τ →0 + τ
∂ 2f
∂ ∂f
(x̄) := (x̄),
∂xi∂xj ∂xj ∂xi
exists for all i, j = 1, . . . , n. The following result, due to Clairaut and also known
as Schwarz’s theorem, says that, under appropriate conditions we can change the
derivation order.
14
Theorem 1. Suppose that the function f is twice differentiable in an open set
A ⊆ Rn containing x̄ and that for all i, j = 1, . . . , n the function A 3 x 7→
∂ 2f
∂x ∂x (x) ∈ R is continuous at x̄. Then,
i j
∂ 2f ∂ 2f
(x̄) = (x̄).
∂xi∂xj ∂xj ∂xi
Under the assumptions of the previous theorem, the Jacobian of ∇f (x̄) takes the form
∂ 2f ∂ 2f
(x̄) ... ∂xn ∂x1 (x̄)
∂x21
... ...
...
2 ∂ 2f 2
∂ f
D f (x̄) = ∂xi ∂x1 (x̄) ... (x̄) .
∂xn ∂xi
... ...
...
∂ 2f ∂ 2f
∂xn ∂x1 (x̄) ...
∂xn 2 (x̄)
15
symmetric matrix by the previous result.
If f : Rn → R is twice differentiable in an open set A ⊆ Rn and for all i,
j = 1, . . . , n the function
∂ 2f
A 3 x 7→ (x) ∈ R
∂xi∂xj
where Rx(h) → 0 as h → 0.
16
Then,
∂f x
∂x (0, 0) = e cos(y) (x,y)=(0,0) − 1 = 0,
∂f x
∂y (0, 0) = −e sin(y) (x,y)=(0,0) = 0,
∂ 2f x
∂x 2 (0, 0) = e cos(y) (x,y)=(0,0) = 1,
∂ 2f x
∂y 2 (0, 0) = −e cos(y) (x,y)=(0,0) = −1,
∂ 2f x
∂x∂y (0, 0) = −e sin(y) (x,y)=(0,0)
= 0.
Note that all the first and second order partial derivatives are continuous in Rn.
Therefore, we can apply the previous result and obtain that the Taylor’s expansion of
f at (0, 0) is given by
This expansion shows that locally around (0, 0) the function f above is similar to the
function in Example 3.
17
Some good reading for the previous part
18
Some basic existence results for problem (P )
x̄ = lim xk` .
`→∞
[The basic existence results] Note that by definition, if infx∈Kf (x) = −∞, then f
has no lower bounds in K and, hence, there are no solutions to (P ). On the other
hand, if infx∈Lf (x) is finite, then the existence of a solution can also fail to hold as
the following example shows.
19
Example: Consider the function R 3 x 7→ f (x) := e−x and take K := [0, +∞[.
Then, inf x∈K f (x) = 0 and there is no x ∈ K such that f (x) = 0.
20
Example: Suppose that f : R3 → R is given by f (x, y, z) = x2 − y 3 + sin z and
K = {(x, y, z) | x4 + y 4 + z 4 ≤ 1}. Then f is continuous and K is compact. As
a consequence, problem (P ) admits at least one solution.
Theorem 5. [K closed but not bounded] Suppose that K is non-empty, closed, and
that f is continuous and “coercive” or “infinity at the infinity” in K, i.e.
21
Example: Suppose that f : Rn → R is given by
> n
f (x) = hQx, xi + c x ∀ x ∈ R ,
holds for every closed set K. Since f is also continuous, problem (P ) admits at least
one global solution for any given non-empty closed set K.
22
Example: Suppose that f : R2 → R is given by
2 3 2
f (x, y) = x + y ∀ (x, y) ∈ R ,
and
2
K = {(x, y) ∈ R | y ≥ −1}.
Then,
lim f (x) = +∞ (5)
x∈K,kxk→∞
23
Optimality conditions for unconstrained problems
Notice that, by the second existence theorem, if f is continuous and satisfies that
24
where limz→0 εx̄(z) = 0. Therefore,
∇f (x̄) · h ≥ 0.
Since h is arbitrary, we get that ∇f (x̄) = 0 (take for instance h = −∇f (x̄) in the
previous inequality).
We have the following second order necessary condition for local optimality:
Theorem 7. Suppose that K = Rn and that x̄ is a local solution to problem (P ). If
f is C 2 in an open set A containing x̄, then D 2f (x̄) is positive semidefinite.
In other words,
2 n
hD f (x̄)h, hi ≥ 0 ∀ h ∈ R .
25
Proof. Let us fix h ∈ Rn. By Taylor’s theorem, for all τ > 0 small enough, we have
where Rx̄(τ h) → 0 as τ → 0. Using the local optimality of x̄, the previous result
implies that ∇f (x̄) = 0 and, hence,
τ2 2 2 2
0 ≤ f (x̄ + τ h) − f (x̄) = hD f (x̄)h, hi + τ khk Rx̄(τ h).
2
26
We have the following second order sufficient condition for local optimality.
Theorem 8. Suppose that f : Rn → R is C 2 in an open set A containing x̄ and
that
(i) ∇f (x̄) = 0.
(ii) The matrix D 2f (x̄) is positive definite. In other words,
2 n
hD f (x̄)h, hi > 0 ∀ h ∈ R , h 6= 0.
n 2 2
∀h ∈ R , hD f (x̄)h, hi ≥ λkhk .
Using this inequality, the hypothesis ∇f (x̄) = 0, and the Taylor’s expansion, for all
27
h ∈ Rn such that x̄ + h ∈ A we have that
1 2 2
f (x̄ + h) − f (x̄) = ∇f (x̄) · h + hD f (x̄)h, hi + khk Rx̄(h)
2
λ 2 2
≥ khk + khk Rx̄(h)
2
λ 2
= + Rx̄(h) khk .
2
λ 2 n
f (x̄ + h) − f (x̄) ≥ khk ∀h∈R with khk ≤ δ,
4
which proves the local optimality of x̄.
28
Example: Let us study problem (P ) with K = R2 and
2 3 2 2
R 3 (x, y) 7→ f (x, y) = 2x + 3y + 3x y − 24y.
Therefore, inf (x,y)∈R2 f (x, y) = −∞ and problem (P ) does not admit global
solutions. Let us look for local solutions. We know that if (x, y) is a local solution,
then it should satisfy ∇f (x, y) = (0, 0). This equation gives
6x2 + 6xy = 0,
6y + 3x2 = 24.
From the first equation, we get that x = 0 or x = −y . In the first case, the second
equation yields y = 4, while in the second case we obtain that x2 − 2x − 8 = 0 which
yields the two solutions (4, −4) and (−2, 2). Therefore, we have the three candidates
(0, 4), (4, −4) and (−2, 2). Let us check what can be obtained from the Hessian at
29
these three points. We have that
2 12x + 6y 6x
D f (x, y) = .
6x 6
2 24 0
D f (0, 4) = .
0 6
which is positive definite. This implies that (0, 4) is a local solution of (P ). For the
second candidate, we have
2 24 24 4 4
D f (4, −4) = =6 ,
24 6 4 1
whose determinant is given by 36(−12) < 0, which implies that D 2f (4, −4) is
30
indefinite (the sign of the eigenvalues is not constant). Finally,
2 −12 −12
D f (−2, 2) =
−12 6
which is also indefinite because the sign of the diagonal terms are not constant.
Therefore, (0, 4) is the unique local solution to (P ).
[Maximization problems] If instead of problem (P ) we consider the problem
then x̄ is a local (resp. global) solution to (P 0) iff x̄ is a local (resp. global) solution
to (P ) with f replaced by −f . In particular, if x̄ is a local solution to (P 0) and f is
regular enough, then we have the following first order necessary condition
∇f (x̄) = 0,
31
In other words, D 2f (x̄) is negative semidefinite.
32
Convexity
λx + (1 − λ)y ∈ C.
[Relation between convex functions and convex sets] Given f : Rn → R, let us define
its epigraph epi(f ) by
n o
n+1
epi(f ) := (x, y) ∈ R | y ≥ f (x) .
33
Proposition 1. The function f is convex iff the set epi(f ) is convex.
Proof. Indeed, suppose that f is convex and let (x1, z1), (x2, z2) ∈ epi(f ). Then,
given λ ∈ [0, 1] set
Since, by convexity,
we have that Pλ ∈ epi(f ). Conversely, assume that epi(f ) is convex and let x1,
x2 ∈ Rn and λ ∈ [0, 1]. Since (x1, f (x1)), (x2, f (x2)) ∈ epi(f ), we deduce that
and, hence,
f (λx1 + (1 − λ)x2) ≤ λf (x1) + (1 − λ)f (x2),
which proves the convexity of f .
34
[Strict convexity of a function] A function f : C → R is said to be strictly convex if
for any λ ∈ (0, 1) and x, y ∈ C , with x 6= y , we have that
the equality case in the triangle inequality (ka + bk = kak + kbk iff a = 0 and b = 0
or a = αb with α > 0) shows that the previous inequality holds iff that x = y = 0
35
or x = γy for some γ > 0. By taking x 6= 0 and y = γx with γ ∈ (0, ∞) \ {1}
we conclude that k · k is not strictly convex.
Example: Let β ∈ (1, +∞). Let us show that the function Rn 3 x 7→ kxkβ ∈ R
is strictly convex. Indeed, the real function [0, +∞) 3 t 7→ α(t) := tβ ∈ R is
increasing and strictly convex because
0 β−1 00 β−2
α (t) = βt > 0 and α (t) = β(β − 1)t > 0 ∀ t ∈ (0, +∞).
we get that
β
kλx + (1 − λ)ykβ ≤ (λkxk + (1 − λ)kyk)
(6)
β β
≤ λkxk + (1 − λ)kyk ,
which implies the convexity of k · kβ . Now, in order to prove the strict convexity,
36
assume that for some λ ∈ (0, 1) we have
β β β
kλx + (1 − λ)yk = λkxk + (1 − λ)kyk ,
and let us prove that x = y . Then, all the inequalities in (6) are equalities and, hence,
The equality case in the triangle inequality and the first relation above imply that
x = y = 0 or x = γy for some γ > 0. The strict convexity of α and the second
inequality above imply that kxk = kyk. Therefore, either x = y = 0 or both x and
y are not zero and x = γy for some γ > 0 and kxk = kyk. In the latter case, we
get that α = 1 and, hence, x = y from which the strict convexity follows.
37
[Convexity and differentiability] We have the following result:
Theorem 9. Let f : C → R be a differentiable function. Then,
(i) f is convex in Rn if and only if for every x ∈ C we have
Proof. (i) By definition of convex function, for any x, y ∈ C and λ ∈ (0, 1), we have
and, hence,
f (λy + (1 − λ)x) − f (x) ≤ λ (f (y) − f (x))
38
Since, λy + (1 − λ)x = x + λ(y − x), we get
39
(ii) Since f is convex, by (i) we have that
which is impossible. The proof that (8) implies that f is strictly convex is completely
analogous to the proof that (7) implies that f is convex. The result follows.
40
Proof. (i) Suppose that f is convex. Then, by Taylor’s theorem for every x ∈ C ,
h ∈ Rn and τ > 0 small enough such that x + τ h ∈ C we have
τ2 2 2 2
f (x + τ h) = f (x) + τ ∇f (x) · h + hD f (x)h, hi + τ khk Rx(τ h),
2
which implies, by the first order characterization of convexity, that
2 2
0 ≤ 21 hD f (x)h, hi + khk Rx(τ h).
Using that limτ →0 Rx(τ h) = 0, and the fact that h is arbitrary, we get that
2 n
hD f (x)h, hi ≥ 0 ∀ h ∈ R .
Suppose that D 2f (x) is positive semidefinite for all x ∈ C and assume, for the time
being, that for every x, y ∈ C there exists cxy ∈ {λx + (1 − λ)y | λ ∈ (0, 1)}
such that
2
f (y) = f (x) + ∇f (x) · (y − x) + 21 hD f (cxy )(y − x), y − xi. (10)
41
Then, have that
Remark 3. Note that the positive definiteness of D 2f (x), for all x ∈ C , is only
a sufficient condition for strict convexity but not necessary. Indeed, the function
R 3 x 7→ f (x) = x4 ∈ R is strictly convex but f 00(0) = 0.
42
Then, D 2f (x) = Q and hence f is convex if Q is positive semidefinite and strictly
convex if Q is positive definite.
In this case, the fact that Q is positive definite is also a necessary condition for strict
convexity. Indeed, for simplicity suppose that c = 0 and write Q = P DP >, where
the set of columns of P is an orthonormal basis of eigenvectors of Q (which exists
because Q is symmetric), and D is the diagonal matrix containing the corresponding
>
eigenvalues {λi}N i=1 in the diagonal. Set y(x) = P x. Then,
X n 2
f (x) = λi=1yi(x) .
43
Convex optimization problems
44
Proof. Let us prove that (i) implies (ii). Indeed, by convexity of K we have that
given x ∈ K for any τ ∈ [0, 1] the point τ x + (1 − τ )x̄ = x̄ + τ (x − x̄) ∈ K.
Therefore, by the differentiability of f , if τ is small enough, we have
45
Proposition 2. Suppose that K is convex and that f is strictly convex in K. Then,
there exists at most one solution to problem (P ).
Proof. Assume, by contradiction, that x1 6= x2 are both solutions to (P ). Then,
1 1
2 x 1 + 2 x2 ∈ K and
which is impossible.
Note that
> > 2
f (x) = hA Ax, xi − 2hA b, xi + kbk .
46
and, hence, D 2f (x) = 2A>A, which is symmetric positive semidefinite, and, hence,
f is convex. Let us assume that the columns of A are linearly independent. Then, for
any h ∈ Rn,
>
hA Ah, hi = 0 ⇔ Ah = 0 ⇔ h = 0,
i.e. for all x ∈ Rn, the matrix D 2f (x) is symmetric positive definite and, hence, f
is strictly convex. Moreover, denoting by λmin > 0 the smallest eigenvalue of 2A>A,
we have
2 > 2
f (x) ≥ λminkxk − 2hA b, xi + kbk .
and, hence, f is infinity at the infinity. Therefore, problem (12) admits only one solution
x̄. By Remark 4, the solution x̄ is characterized by
> > > −1 >
A Ax̄ = A b, i.e. x̄ = (A A) A b.
[Projection on a closed and convex set] Suppose that K is a nonempty closed and
convex set and let y ∈ Rn. Consider the problem the projection problem
47
Note that K being closed and the cost functional being coercive, we have the existence
of at least one solution x̄. In order, to characterize x̄ notice that the set of solutions to
(P rojK) is the same as the set of solutions to the problem
n o
1 2
inf 2 kx − yk | x ∈ K .
Then, since the cost functional of the problem above is strictly convex, Proposition 2
implies that x̄ is its unique solution and, hence, is also the unique solution to (P rojK).
Moreover, by Theorem 11(ii), we have that x̄ is characterized by the inequality
m
b ∈ Im(A) := {Ax | x ∈ R }.
Suppose that
n
K = {x ∈ R | Ax = b}. (14)
48
Then, K is closed, convex and nonempty. Moreover, for any h ∈ Ker(A) we have
that x̄ + h ∈ K. As a consequence, (13) implies that
(y − x̄) · h ≤ 0 ∀ h ∈ Ker(A),
⊥ n >
V := {z ∈ R | z v = 0 ∀ v ∈ V }.
49
or, equivalently,
⊥
y = x̄ + z for some z ∈ Ker(A) . (16)
[Convex problems with equality constraints] Now, we consider the same set K as in
(14) but we consider a general differentiable convex objective function f : Rn → R.
We will need the following result from Linear Algebra.
Lemma 2. Let A ∈ Mm,n(R). Then, Ker(A)⊥ = Im(A>).
Proof. By the previous footnote, the desired relation is equivalent to Im(A>)⊥ =
Ker(A). Now, x ∈ Im(A>)⊥ iff hx, A>yi = 0 for all y ∈ Rm, and this holds iff
hAx, yi = 0 for all y ∈ Rm, i.e. x ∈ Ker(A).
Proposition 3. Let f : Rn → R be differentiable and convex and suppose that the
set K in (14) is nonempty. Then x̄ is a global solution to (P ) iff x̄ ∈ K and there
exists λ ∈ Rm such that
>
∇f (x̄) + A λ = 0. (17)
Proof. We are going to show that (17) is equivalent to (11) from which the result
follows. Indeed, exactly as in the previous example, we have that (11) is equivalent to
∇f (x̄) · h = 0 ∀ h ∈ Ker(A),
50
i.e.
⊥
∇f (x̄) ∈ Ker(A) .
Lemma 2 implies the existence of µ ∈ Rm such that ∇f (x̄) = A>µ. Setting
λ = −µ we get (17).
Example: Let Q ∈ Mn,n(R) be symmetric and positive definite, and c ∈ Rn. In the
framework of the previous proposition, suppose that f is given by
> n
f (x) = 21 hQx, xi + c x ∀ x ∈ R ,
and that A has m linearly independent columns. A classical linear algebra result states
that this is equivalent to the fact that the m lines of A are linearly independent. In
this case, we say that A has full rank.
Under the previous assumptions on Q, we have seen that f is strictly convex.
Moreover, the condition on the columns of A implies that Im(A) = Rm and, hence,
K= 6 ∅. Now, by Proposition 3 the point x̄ solves (P ) iff x̄ ∈ K and there exists
λ ∈ Rm such that (17) holds. In other words, there exists λ ∈ Rm such that
>
Ax̄ = b, and Qx̄ + c + A λ = 0.
51
The second equation above yields x̄ = −Q−1(c + A>λ) and, hence, by the first
equation, we get
−1 −1 >
AQ c + AQ A λ + b = 0. (18)
Let us show that M := AQ−1A> is invertible. Indeed, since M ∈ Mm,m(R) it
suffices to show that M y = 0 implies that y = 0. Now, let y ∈ Rm such that
M y = 0. Then, hM y, yi = 0 and, hence, hQ−1A>y, A>yi = 0, which implies,
since Q−1 is also positive definite, that A>y = 0. Now, since the columns of A> are
also linearly independent, we deduce that y = 0, i.e. M is invertible. Using this fact,
we can solve for λ in (18), obtaining
−1 −1
λ = −M AQ c + b .
We deduce that
−1 > −1 −1
x̄ = −Q c−A M AQ c + b , (19)
52
Example: Let us now consider the projection problem
min 1
2 kx − yk2
s.t. Ax = b.
Noting that 12 kx − yk2 = 21 kxk2 − y >x + 12 kyk2, the previous problem has the same
solution than
min 1
2 kxk 2
− y>x
s.t. Ax = b,
53
Note that if h ∈ Ker(A)
confirming (16).
[Separation of a point and a closed convex set] We have the following result:
Proposition 4. Let K be a nonempty closed and convex set and let y ∈
/ K. Then,
there exists p ∈ Rn such that
Proof. Let x̄ ∈ K be the projection of y onto K. Let us define the affine function
54
x̄ ∈ Πx̄ and, by (13), `(x) ≤ 0 for all x ∈ K. Now, since `(y) = ky − xk2 > 0
(because y ∈ / K), we deduce that `(x) < `(y) for all x ∈ K, which yields
hy − x̄, xi < hy − x̄, yi for all x ∈ K. Setting p := y − x̄, we have proven the
result.
[Cones and polar cones]
Definition 3. (i) A set C ⊆ Rn is a cone if
∀ h ∈ C, ∀ τ ≥ 0 we have τ h ∈ C.
55
which, denoting by A ∈ Mm,n(R) the matrix whose ith row is ai, can be written as
n
C = {h ∈ R | Ax = 0} = Ker(A),
56
Proof. If λ ∈ Rm with λi ≥ 0 ∀ i = 1, . . . , m, then for every h ∈ C we have
m
>
X
hA λ, hi = hλ, Ahi = λihai, hi ≤ 0,
i=1
and, hence, A>λ ∈ C ◦. Now, denote by B the set on the right hand side of (20).
Clearly, B is convex and nonempty. Moreover, B can also be shown to be closed.
Suppose that u ∈ C ◦ and u ∈ / B . Then, by Proposition 4, there exists p ∈ Rn such
that
hp, xi < hp, ui ∀ x ∈ B,
i.e.
> m
hp, A λi < hp, ui ∀ λ ∈ R , λi ≥ 0 ∀ i = 1, . . . , m. (21)
Now, the previous inequality has the following two consequences:
(i) hp, ui > 0. Indeed, it suffices to take λ = 0 in (21).
(ii) hp, aii ≤ 0 for all i = 1, . . . , m. Indeed, fix i ∈ {1, . . . , m} and γ > 0. By
taking
λ = (0, . . . , γ , . . . , 0)
z}|{
i
57
in (21) we get γhp, aii < hp, ui which implies that hp, aii ≤ 0 (if this is not the
case, by taking γ large enough we get a contradiction).
From (i)-(ii), we conclude that p ∈ C and hp, ui > 0 which contradicts the fact that
u ∈ C ◦.
Now, let us consider the case where C is defined by both linear equalities and
inequalities. Let ai ∈ Rn (i = 1, . . . , m) and a0j ∈ Rn (j = 1, . . . , p). Suppose
that C is given by
( )
n
hai, hi = 0 ∀ i = 1, . . . , m
C= h∈R .
ha0j , hi ≤ 0 ∀ j = 1, . . . , p
Lemma 4. We have
X m p
◦ 0
X
C = λiai + µj aj µj ≥ 0 ∀ j = 1, . . . , p .
i=1 j=1
58
Proof. The set C can be written as
hai, hi ≤ 0 ∀ i = 1, . . . , m,
n
C= h∈R h−ai, hi ≤ 0 ∀ i = 1, . . . , m, .
ha0j , hi ≤ 0 ∀ j = 1, . . . , p
Pm Pm Pp
u = i=1 αi ai + i=1 βi (−ai ) + j=1 µj a0j
Pm Pp 0
= i=1 (αi − βi )ai + j=1 µj aj .
[Application to convex problems with affine equality and inequality constraints] Let
ai ∈ Rn, bi ∈ R (i = 1, . . . , m), a0j ∈ Rn and b0j ∈ R (j = 1, . . . , p). Suppose
59
that the constraint set K is given by
( )
n
hai, xi + bi = 0 ∀ i = 1, . . . , m,
K= x∈R . (22)
ha0j , xi + b0j ≤ 0 ∀ j = 1, . . . , p
Pm Pp
∇f (x̄) + i=1 λiai + j=1 µj a0j = 0,
(23)
and ∀ j ∈ {1, . . . , p} we have µj ≥ 0, and µj (ha0j , x̄i + b0j ) =0
60
Now, let us define the set of active inequality constraints
n o
0 0
I(x̄) = j ∈ {1, . . . , p} | haj , x̄i + bj = 0 ,
where the last inequality holds because ha0j , x̄i+b0j < 0 for all j ∈ {1, . . . , p}\I(x̄)
and we can pick τ small enough in order to ensure that ha0j , x̄i + b0j + τ ha0j , hi < 0
61
for all j ∈ {1, . . . , p} \ I(x̄). Indeed, it suffices to take τ > 0 such that
0 0 0
τ max |haj , hi| < min −haj , x̄i − bj .
j∈{1,...,m}\I(x̄) j∈{1,...,m}\I(x̄)
h∇f (x̄), hi ≥ 0 ∀ h ∈ C,
m p
0
X X
−∇f (x̄) = λiai + µj aj .
i=1 j=1
Condition (23) follows directly from the previous relation. Conversely, assume that
62
(x̄, λ, µ) satisfies (23) and define the function L(·, λ, µ) as
m p
0 0
X X
L(x, λ, µ) = f (x) + λi(hai, xi + bi) + µj (haj , xi + bj ).
i=1 j=1
63
Example: Let us consider the projection problem
1 2
inf 2 kx − yk ,
x∈K
where
n
K = {x ∈ R | x1 ≤ x2 ≤ . . . ≤ xn} .
Clearly K is nonempty, closed and convex. Therefore, the projection x̄ of y onto K
exists and it is unique. Let us define
0 0
1 , −1 , . . . , 0), and bj = 0.
aj := (0, . . . , z}|{
z }| {
j j+1
Then,
n o
n 0 0
K= x∈R | haj , xi + bj ≤ 0 ∀ j = 1, . . . , n − 1 .
By the previous proposition, x̄ solves (P ) iff x̄ ∈ K and there exists µ ∈ Rn−1 such
that (23) holds. Namely,
64
x̄1 − y1 + µ1 = 0
x̄2 − y2 + µ2 − µ1 = 0
..
.
x̄n−1 − yn−1 + µn−1 − µn−2 = 0
x̄n − yn − µn−1 = 0
µi ≥ 0 µi hi (x̄) = 0 ∀ i = 1, . . . , n − 1,
which is equivalent to
x̄1 − y1 ≤ 0
x̄2 + x̄1 − (y2 + y1 ) ≤ 0
..
.
Pn−1 Pn−1
k=1
x̄ k − y ≤0
k=1 k
Pn Pn
k=1 x̄ k − k=1 yk = 0,
P
i Pi
k=1 x̄ k − k=1 yk hi (x̄) = 0 ∀ i = 1, . . . , n − 1.
65
Let us compute x̄ when n = 4 and y = (2, 1, 5, 4). In this case, we have
x̄1 − 2 ≤ 0
x̄2 + x̄1 − 3 ≤ 0
x̄3 + x̄2 + x̄1 − 8 ≤ 0
x̄4 + x̄3 + x̄2 + x̄1 − 12 = 0
(x̄1 − 2)(x̄1 − x̄2 ) = 0
(x̄2 + x̄1 − 3)(x̄2 − x̄3 ) = 0
(x̄3 + x̄3 + x̄1 − 8)(x̄3 − x̄4 ) = 0
The first two equations and the constraint x̄1 ≤ x̄2 imply that x̄1 = x̄2 < 2. Then,
taking x̄1 = x̄2 = 3/2, the second equation is satisfied, but the third inequality cannot
be satisfied with an equality and, hence, we take x̄3 = x̄4 and the fourth relation
yields x̄3 = x̄4 = 9/2. Thus, the point (x̄1, x̄2, x̄3, x̄4) = (3/2, 3/2, 9/2, 9/2)
satisfies the previous system and, hence, solves the projection problem.
66
Optimization problems with equality and inequality
constraints
n→+∞ n→+∞
τn > 0 ∀ n ∈ N, τn −→ 0, n −→ 0,
(25)
x̄ + τnh + τnn ∈ K ∀ n ∈ N.
67
Remark 5. (i) By definition h ∈ TK(x̄) iff there exists (τn)n∈N ⊆ R and
n→+∞ n→+∞
(hn)n∈N ⊆ Rn such that τn ≥ 0, τn −→ 0, hn −→ h, and x̄ + τnhn ∈ K for
all n ∈ N.
(ii) It is easy to see that TK(x̄) is indeed a closed cone.
Proof. Let h ∈ TK(x̄) and let (τn)n∈N and (hn)n∈N be as in Remark 5(i). Then, for
n large enough we have
which yields
h∇f (x̄), hni + khnkεx̄(τnhn) ≥ 0.
68
Therefore, letting n → ∞, we get
h∇f (x̄), hi ≥ 0,
[Optimization problems with equality and inequality constraints] We suppose now that
the constraint system is given by
n
K := {x ∈ R | gi(x) = 0, ∀ i = 1, . . . , m, hj (x) ≤ 0 ∀ j = 1, . . . , p} ,
69
where gi : Rn → R (i = 1, . . . , m), and hj : Rn → R (j = 1, . . . , p) are
differentiable functions. In this case, Problem (P ) is usually written as
min f (x)
s.t. gi(x) = 0, ∀ i = 1, . . . , m,
hj (x) ≤ 0, ∀ j = 1, . . . , p.
70
Lemma 5. The following inclusion holds
( )
n
h∇gi(x̄), hi = 0 ∀ i = 1, . . . , m,
TK(x̄) ⊆ h ∈ R . (26)
h∇hj (x̄), hi ≤ 0 ∀ j ∈ I(x̄)
Proof. Let h ∈ TK (x̄) and let (τn)n∈N and (hn)n∈N be as in Remark 5(i). Then,
for every i = 1, . . . , m, we have
h∇gi(x̄), hi = 0.
71
Then, dividing by τn and letting n → ∞, we get
h∇hj (x̄), hi ≤ 0.
Then, it is easy to see that TK ((0, 0)) = {(0, γ) | γ ≥ 0} and the right hand side
of (26) is given by n o
2
(h1, h2) ∈ R | h1 ≥ 0 .
Definition 6. (i) We say that the constraint functions gi (i = 1, . . . , m) and hj
(j = 1, . . . , p) are qualified at x̄ if
( )
n
h∇gi(x̄), hi = 0 ∀ i = 1, . . . , m,
TK(x̄) = h ∈ R . (27)
h∇hj (x̄), hi ≤ 0 ∀ j ∈ I(x̄)
72
(ii) Any condition ensuring that the constraint functions are qualfied is called a
constraint qualification condition.
Then, it is easy to check that the constraint functions are qualified at (0, 0) in the
first formulation but they are not qualified at (0, 0) in the second one.
[The Karush-Kuhn-Tucker theorem] The main result here is the following first order
optimality condition.
73
Theorem 13. [Karush-Kuhn-Tucker] Let x̄ ∈ K be a local solution to (P ).
Assume that f , gi (i = 1, . . . , m), hj (j = 1, . . . , p) are C 1 and that the
constraint functions are qualified at x̄. Then, there exist (λ1, . . . , λm) ∈ Rm
and (µ1, . . . , µp) ∈ Rp such that
Pm Pp
∇f (x̄) + i=1 λi ∇gi (x̄) + j=1 µj ∇hj (x̄) = 0,
(28)
and ∀ j ∈ {1, . . . , p} we have µj ≥ 0, and µj hj (x̄) = 0.
Proof. Since the constraint functions are qualified at x̄, by Lemma 4 we have that
m
X X λi ∈ R ∀ i = 1, . . . , m,
NK(x̄) = λi∇gi(x̄) + µj ∇hj (x̄) .
i=1 j∈I(x̄)
µ j ≥ 0 ∀ j ∈ I(x̄)
74
Then, by Corollary 1, there exist λi ∈ R (i = 1, . . . , m) and µj ≥ 0 (j ∈ I(x̄))
such that
Xm X
−∇f (x̄) = λi∇gi(x̄) + µj ∇hj (x̄).
i=1 j∈I(x̄)
Relation (28) follows by setting µj = 0 for all j ∈ {1, . . . , p} \ I(x̄).
Let g(x) = (g1(x), . . . , gm(x)) and h(x) = (h1(x), . . . , hp(x)). The Lagrangian
L : Rn × Rm × Rp → R is defined by
and, at a local solution x̄, the optimality system (29) reads: there exists (λ, µ) ∈ Rm+p
such that
∇xL(x̄, λ, µ) = 0,
(30)
and ∀ j ∈ {1, . . . , p} we have µj ≥ 0, and µj hj (x̄) = 0.
System (30) is usually called KKT system and (λ, µ) are called Lagrange multipliers.
75
[The KKT system as a sufficient condition for convex problems] The KKT condition is
also sufficient for convex problems.
Proposition 6. Suppose that f is convex and C 1, gi(x) = hai, xi + bi, for ai ∈ Rn,
bi ∈ R (i = 1, . . . , m), and hj (j = 1, . . . , p) is convex and C 1. Moreover,
assume that there exists (x̄, λ, µ) ∈ Rn+m+p such that x̄ ∈ K and the KKT
system (30) holds at (x̄, λ, µ). Then x̄ is a global solution to (P ).
Proof. Note that L(x̄, λ, µ) = f (x̄) and that L(·, λ, µ) is convex. Then, for any
x ∈ K,
76
Remark 7. (i) [Equality constraints only] Suppose that we only have equality
constraints. Then, if x̄ ∈ K solves (P ) and the constraint functions are qualified at
x̄, then there exists λ ∈ Rm such that
m
X
∇f (x̄) + λi∇gi(x̄) = 0.
i=1
In this case the Lagrangian is given by L(x, λ) = f (x) + hλ, g(x)i, and the previous
relation can be written as
∇xL(x̄, λ) = 0. (31)
(ii) [Inequality constraints only] Suppose that we only have inequality constraints.
Then, if x̄ ∈ K solves (P ) and the constraint functions are qualified at x̄, then there
exists µ ∈ Rp such that
Pp
∇f (x̄) + j=1 µj ∇hj (x̄) = 0,
(32)
µj ≥ 0 and µj hj (x̄) = 0 ∀ j = 1, . . . , p.
In this case the Lagrangian is given by L(x, µ) = f (x) + hµ, h(x)i, and the previous
77
relation can be written as
∇xL(x̄, µ) = 0,
µj ≥ 0 and µj hj (x̄) = 0 ∀ j = 1, . . . , p.
max f (x)
s.t. gi(x) = 0, ∀ i = 1, . . . , m,
hj (x) ≤ 0, ∀ j = 1, . . . , p.
In this case, if x̄ is a local solution and the constraint functions are qualified at x̄, then
there exists λ ∈ Rm and µ ∈ Rp such that
It is important to notice that, differently from the case where only equality constraints
were present, when inequality constraints are present the optimality system for local
78
solutions of the minimization and maximization problems differ. The coincidence of
both optimality systems is generally false and it is a specific feature of the problem
with equality constraints and of the unconstrained problem.
(iv) The following example shows that the assumption on the qualification of the
constraints plays an important role in the necessary condition. Consider the problem
min y
s.t. x2 − y 3 = 0,
−x ≤ 0.
In this case, (x̄, ȳ) = (0, 0) is a global solution and (29) reads
0 2x −1 0
+λ +µ = ,
1 −3y 2 (x,y)=(0,0)
0 0
which is impossible. Note that ∇g1(0, 0) = (0, 0) and ∇h1(0, 0) = (−1, 0).
Therefore,
2 2
{h ∈ R |h∇g1(0, 0), hi = 0, h∇h1(0, 0), hi ≤ 0} = {h ∈ R | h1 ≥ 0},
79
and TK(0, 0) = {h ∈ R2 |h1 = 0, h2 ≥ 0}. Thus, g1 and h1 are not qualified at
(0, 0).
The following two conditions are more general but, at the same time, they are more
80
difficult to check.
81
(28) holds. In general, (M F ) only implies that the set of (λ, µ) ∈ Rm+p such that
(28) is a compact set.
min xy
s.t. x2 + (y + 1)2 = 1.
82
x = 0, y = −1. Thus, every (x, y) ∈ R2 \ {(0, −1)} satisfies (M F ). Since
(0, −1) ∈
/ K we deduce that (M F ) holds for every (x, y) ∈ K.
The Lagrangian L : R2 × R → R of this problem is given by
2 2
L(x, y, λ) = xy + λ(x + (y + 1) − 1).
By Theorem 13, we have the existence of λ ∈ R such that (31) holds at (x̄, ȳ, λ).
Now,
ȳ + 2λx̄ = 0,
∇(x,y)L(x̄, ȳ, λ) = 0 ⇔
x̄ + 2λ(ȳ + 1) = 0,
(34)
ȳ = −2λx̄,
⇔
(1 − 4λ2)x̄ = −2λ.
Now, 1 − 4λ2 = 0 iff λ = 1/2 or λ = −1/2, and both cases contradict the last
equality above. Therefore, 1 − 4λ2 6= 0 and, hence,
2λ −4λ2
x̄ = and ȳ = .
4λ2 − 1 4λ2 − 1
83
Since ∇λL(x̄, ȳ, λ) = g(x̄, ȳ) = 0, we get
2 2
4λ2
2λ
4λ2 −1
+ 1− 4λ2 −1
= 1,
2 2 2
⇔ 4λ + 1 = (4λ − 1)
⇔ (4λ2 − 1)2 − (4λ2 − 1) − 2 = 0,
which yields √ √
2 1+ 9 2 1− 9
4λ − 1 = 2 or 4λ − 1 = 2
i.e. λ2 = 3/4 or λ2 = 0.
√ √
If λ = 0, then (34) yields√ x̄ = ȳ = 0. If λ√= 3/2 we get x̄ = 3/2 and
ȳ = −3/2. If λ = − 3/2 we get x̄ = − 3/2 and ȳ = −3/2. Thus, the
candidates to solve the problem are
√ √
(x̄1, ȳ1) = (0, 0), (x̄2, ȳ2) = ( 3/2, −3/2) and (x̄3, ȳ3) = (− 3/2, −3/2).
√ √
We have f (x̄1, ȳ1) = 0, f (x̄2, ȳ2) = −3 3/4 and f (x̄3, ȳ3) = 3 3/4.
Therefore, the global solution is (x̄2, ȳ2).
84
In the second example, we consider a problem where K is defined by inequality
constraints only.
Example: Consider the problem
min 4x2 + y 2 − x − 2y
s.t. 2x + y ≤ 1,
x2 ≤ 1.
8 0 −1
Note that, setting Q = and c = , in the notation for (P ) we
0 2 −2
have
f (x) = 1
2 hQ(x, y), (x, y)i + c>(x, y)
h1(x, y) = 2x + y − 1
h2(x, y) = x2 − 1.
Note that the feasible set is nonempty, convex, closed and the Slater condition is
satisfied (for instance h1(0, 0) < 0, h2(0, 0) < 0). Moreover, f is continuous,
strictly convex, differentiable and infinity at the infinity (Q is positive definite). We
85
deduce that there exists a unique solution (x̄, ȳ) ∈ R2 to problem (P ) and (x̄, ȳ) is
characterized by the KKT system (32). A point (x̄, ȳ, µ1, µ2) satisfies (32) iff
! ! ! !
x̄ 2 2x̄ 0
Q + c + µ1 + µ2 = ,
ȳ 1 0 0
iff
8x̄ + 2µ1 + 2µ2x̄ = 1
2ȳ + µ1 = 2,
µ1 ≥ 0, µ2 ≥ 0, µ1h1(x̄, ȳ) = 0, µ2h2(x̄, ȳ) = 0,
86
which is impossible, because µ1 > 0 and µ2 > 0. In the second case, we should have
6 + µ1 = 2, which is also impossible.
Case 3: µ1 = 0, µ2 > 0. We obtain x̄2 = 1 and ȳ = 1, which gives (x̄, ȳ) = (1, 1)
or (x̄, ȳ) = (−1, 1). If (x̄, ȳ) = (1, 1) we should have 8 + 2µ2 = 1, which is
impossible. If (x̄, ȳ) = (−1, 1), we should have −8 − 2µ2 = 1, which is also
impossible.
Case 4: µ1 > 0, µ2 = 0. We obtain
2x̄ + ȳ = 1,
8x̄ + 2µ1 = 1
2ȳ + µ1 = 2
which implies
2x̄ + ȳ = 1,
8x̄ − 4ȳ = −3,
which gives (x̄, ȳ) = (1/16, 7/8) and µ1 = 1/4. This point (x̄, ȳ) belongs to
K. Therefore, we conclude that (x̄, ȳ, µ1, µ2) = (1/16, 7/8, 1/4, 0) is the unique
solution to the KKT system, and, hence, (x̄, ȳ) = (1/16, 7/8) is the unique global
87
solution to (P ).
88
Dynamic programming in discrete time: the finite
horizon case
89
• The “travel time” (in hours) F (x, x0) of each V 0 ∈ Γ(x). For instance,
F (C 0, D) = 2.
• In order to compute the shortest path one could enumerate all the paths and choose
the one with the smaller travel time.
• However, it is more convenient to notice that if a path is optimal between A and E ,
and this path passes through a vertex x, then the “sub-path” between x and E will be
optimal for the shortest path problem between x and E .
• This suggest to parametrize the optimal travel time by the departure point. Let us set
V (x) as the smallest time needed to go from x to E . Then,
V (E) = 0,
V (D) = 5, V (D 0 ) = 2,
V (C) = 6, V (C 0 ) = min{2 + V (D), 1 + V (D 0 )} = 3, V (C 00 ) = 3,
V (B) = min{2 + V (C), 1 + V (C 0 )} = 4, V (B 0 ) = min{2 + V (C 0 ), 4 + V (C 00 )} = 5,
V (A) = min{1 + V (B), 1 + V (B 0 )} = 5.
• We deduce that the shortest travel time is V (A) = 5 and the shortest path is
ABC 0D 0E .
90
[The general framework] We are interested in the problem
(T −1 )
X
sup Ft(xt, xt+1) + FT (xT ) , (Pf h)
(xt ) t=0
where
• xt ∈ X for all t = 0, . . . , T . The set X is called the state space.
• xt+1 ∈ Γt(xt) for all t = 0, . . . , T − 1. For all x ∈ X , and t = 0, . . . , T − 1,
Γt(x) is a nonempty subset of X .
• Ft(xt, xt+1) denotes the profit at time t for the pair (xt, xt+1), and FT (xT ) denotes
the final profit for the final state xT . Notice that redefining FT −1(xT −1, xT ) as
91
function at (k, x) is defined as
(T −1 )
X
V (k, x) = sup Ft(xt, xt+1) | xk = x, xt+1 ∈ Γt(xt) ∀ t = k, . . . , T − 1 .
t=k
(35)
For k = T and x ∈ X , we set V (T, x) = 0 (recall that we have a zero final cost).
The main result here is the following
Proof. Let (xk , xk+1, . . . , xT ) be a feasible sequence for the problem defining
92
V (k, x), i.e. xk = x and xt+1 ∈ Γt(xt), for all t = k, . . . , T − 1. Then,
T −1
X T −1
X
V (k, x) ≥ Ft(xt, xt+1) = Ft(x, xk+1) + Ft(xt, xt+1).
t=k t=k+1
Using that the previous inequality holds for any (xk+1, . . . , xT ) such that xt+1 ∈
Γt(xt), for all t = k + 1, . . . , T − 1, by taking the supremum of the right hand side
with respect to those (xk+1, xk+2, . . . , xT ), we get
Conversely, for any (x0k , x0k+1, . . . , x0T ) such that x0k = x and x0t+1 ∈ Γt(x0t), for all
93
t = k, . . . , T − 1, we have
PT −1 PT −1
t=k Ft(x0t, x0t+1) = Ft(x0, x0k+1) + t=k+1 Ft(x0t, x0t+1)
≤ Ft(x, x0k+1) + V (k + 1, x0k+1)
≤ sup Fk (x, xk+1) + V (k + 1, xk+1) xk+1 ∈ Γk (x) .
Using that (x0k , x0k+1, . . . , x0T ) is an arbitrary admissible sequence, by taking the
supremum on the left hand side, we get
[Backward solution] By Theorem 14, we can solve backward for V using relations (36).
In particular, (36) characterizes the value function V . Now, et us assume that for all
(k, x) ∈ {0, . . . , T − 1} × X there exists s(k, x) ∈ Γk (x) such that
94
Then, by the very definition we have that (xk , xk+1, . . . , xT ), with
solves the problem in the right hand side of (35). In particular, this problem admits a
solution if
• X ⊆ Rn .
• For all (k, x) ∈ {0, . . . , T − 1} × X the set Γt(x) is nonempty and compact,
• and Ft and V (t, ·) are continuous for all t = k, . . . , T − 1.
(i) [Lower semicontinuity] For every y ∈ Γ(x) and every sequence (xn)n∈N such that
xn → x, as n → ∞, there exists a sequence (yn)n∈N such that yn ∈ Γk (xn), for all
95
n ∈ N, and yn → y as n → ∞.
(ii) [Upper semicontinuity] If (xn)n∈N and (yn)n∈N are two sequences such that xn → x,
as n → ∞, and yn ∈ Γk (xn), for all n ∈ N, then (yn)n∈N has a subsequence which
converges to a point y ∈ Γk (x).
• Let us solve (37) by using nonlinear programming tools. Note first that the cost
function is continuous, strictly convex and the feasible set is nonempty, convex and
compact. Therefore, there exists a unique solution x̄ to (37). Since the set K is defined
by affine-constraints, x̄ is characterized by the KKT system. Consider the Lagrangian
L : RN × R × RN → R defined by
N N
! N
X 2
X X
L(x, λ, µ) = xi +λ xi − y − µixi.
i=1 i=1 i=1
96
Then, (x̄1, . . . , x̄N ) is characterized by the existence of λ ∈ R and µ ∈ RN such
that
∂xi L(x, λ, µ) = 2x̄i + λ − µi = 0 ∀ i = 1, . . . , N.
PN
i=1 x̄i = y
(38)
µi ≥ 0 and µixi = 0 ∀ i = 1, . . . , N.
If y = 0, the only feasible point is xi = 0 for all i = 0, . . . , N and, hence,
x̄ = (0, . . . , 0) is the solution. If y > 0 and there exists î ∈ {1, . . . , N } such
that x̄î = 0, then, the first equation in (38) yields λ = µî ≥ 0. On the other hand,
there must exist i0 ∈ {1, . . . , N } such that x̄i0 > 0 (otherwise
P
i x̄i = y , x̄i ≥ 0
for all i ∈ {1, . . . , N } would not hold). The first and third equation in (38) imply
that λ = −2x̄i0 < 0, which is a contradiction. As a consequence, x̄i > 0 for all
i ∈ {1, . . . , N }. Then, the first and the second conditions in (38) yield x̄i = y/N
for all i ∈ {1, . . . , N }. Thus,
2
x̄ = (y/N, . . . , y/N ) is the solution to the problem and the optimal cost is y /N.
• Let us find the same conclusion by using dynamic programming techniques. Let us
97
define
( N N
)
X 2
X
V (k, y) := inf xi xi = y, xi ≥ 0 ∀ i = k, . . . , N .
i=k i=k
Note that we are interested in V (1, y). Notice that the problem defining V (1, y) has
not the form discussed in the previous subsection. However, arguing as in the proof of
Theorem 14, we can prove that (exercise)
inf x2k + V (k + 1, y − xk ) | 0 ≤ xk ≤ y
V (k, y) = ∀ y ≥ 0,
2
(39)
V (N, y) = y ∀ y ≥ 0.
98
Similarly,
2
s(k, y) = y/(N − k + 1) and V (k, y) = y /(N − k + 1).
Thus, we recover V (1, y) = y 2/N for the optimal cost and, adapting the definition of
99
successor according to the dynamic programming principle (39), for the solution we get
x1 = s(1, y) = y/N,
x2 = s(2, y − x1) = (y − y/N )/(N − 1) = y/N,
x3 = s(3, y − x1 − x2) = y(1 − 2/N )/(N − 2) = y/N,
...
PN −1
xN = s N, y − i=1 xi = y/N.
100
problem (37) can be written as
(N −2 )
X 2 2
inf (zi+1 − zi) + zN −1 | z0 = y, 0 ≤ zi+1 ≤ zi ∀ i = 0, . . . , N − 2 .
i=0
By applying Theorem 14 to the problem above, we can also recover the desired solution
(exercise).
101
Dynamic programming in discrete time: the infinite
horizon case
102
Finally, the preference over consumption is supposed to have the form
∞
X t
β U (ci) for some discount factor β ∈ (0, 1),
t=0
and some utility function U . For a given initial capital k0 > 0, the utility maximization
problem is P∞ t
sup t=0 β U (f (kt ) − kt+1 )
∞
X t
sup β F (xt, xt+1), (Pih)
(xt ) t=0
where
• xt ∈ X for all t ≥ 0, with X being a given set and x0 ∈ X being prescribed.
103
• xt+1 ∈ Γ(xt), where, for all x ∈ X , Γ(x) is a nonempty subset of X .
• F (xt, xt+1) denotes the profit for the pair (xt, xt+1). In this section F is assumed to
be bounded.
∞
( )
X t
V (x) := sup β F (xt, xt+1) x0 = x, xt+1 ∈ Γ(xt) ∀ t ≥ 0 .
t=0
Proof. The proof is similar to the proof of Theorem 15. Indeed, for any admissible
104
sequence (xt)t≥0 we have
P∞
V (x) ≥ t=0 β tF (xt, xt+1)
P∞
= F (x, x1) + β t=1 β t−1F (xt, xt+1), (41)
P∞
= F (x, x1) + β t=0 β tF (xt+1, xt+2).
Now, let us fix x1 ∈ Γ(x). Note that if (x0t)t≥1 is an admissible sequence for
the problem defining V (x1), then x, x01, x02, x03, ... is an admissible sequence for the
problem defining V (x). This remark and (41) imply
Conversely, for any admissible sequence (x0t)t≥0 for the problem defining V (x), we
105
have
P∞ 0 0 P∞
t=0 β t
F (x t , x t+1 ) = F (x, x01) + β t=0 β t
F (x 0
t+1 , x 0
t+2 ),
We conclude that
V (x) ≤ sup F (x, x1) + βV (x1) x1 ∈ Γ(x) .
Remark 10. (i) Differently from the finite horizon case, in which the dynamic
programming relations characterize the value function, in general equation (40) can
have more than one solution. However, under our boundedness assumption on F ,
which in practice are rather restrictive, it is possible to show that the functional
equation (40) admits a unique solution. As a consequence (40) characterizes the value
function V .
106
Moreover, this solution can be computed as the limit of the following sequence of
functions
n o
`+1 `
V (x) = sup F (x, xt+1) + βV (xt+1) xt+1 ∈ Γ(x) , ∀ x ∈ X,
As a consequence, the sequence defined recursively by x̄0 = x, x̄t+1 = s(x̄t) for all
t ≥ 0 solves the problem defining V (x).
107