0% found this document useful (0 votes)
13 views108 pages

Convex Optimization

This document provides a summary of the contents of an optimization course. The course will cover topics such as: [1] unconstrained optimization; [2] optimization with constraints including convex and equality/inequality constrained problems; and [3] dynamic optimization in discrete time including finite and infinite horizon problems. The document defines an optimization problem and discusses examining properties like existence of solutions and optimality conditions. It also introduces some mathematical tools used in optimization like level sets and differentiability.

Uploaded by

Montassar Mhamdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views108 pages

Convex Optimization

This document provides a summary of the contents of an optimization course. The course will cover topics such as: [1] unconstrained optimization; [2] optimization with constraints including convex and equality/inequality constrained problems; and [3] dynamic optimization in discrete time including finite and infinite horizon problems. The document defines an optimization problem and discusses examining properties like existence of solutions and optimality conditions. It also introduces some mathematical tools used in optimization like level sets and differentiability.

Uploaded by

Montassar Mhamdi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Lectures on optimization

Advanced camp

Francisco J. Silva
Université de Limoges
francisco.silva@unilim.fr
http://www.unilim.fr/pages perso/francisco.silva/

August 2019
Contents of the course

1.- Some mathematical preliminaries.


2.- Unconstrained optimization.
3.- Optimization with constraints
3.1.- Convex problems.
3.2.- Optimization problems with equality and inequality constraints.

4.- Dynamic optimization in discrete time.


3.1.- Finite horizon problems.
3.2.- Infinite horizon problems.

1
Definition of an optimization problem

 An optimization problem has the form

Find x̄ ∈ Rn such that f (x̄) = min {f (x) | x ∈ K} , (P )

where K ⊆ Rn is a given set. By definition, this mean to find x̄ ∈ K such that

f (x̄) ≤ f (x) ∀ x ∈ K.

 In the above, f is called an objective function, K is called a feasible set (or constraint
set) and any x̄ solving (P ) is called a global solution to problem (P ).
 Usually one also considers the weaker notion, but easier to characterize, of local solution
to problem (P ). Namely, x̄ ∈ K is a local solution to (P ) if there exists δ > 0 such
that f (x̄) ≤ f (x) for all x ∈ K ∩ B(x̄, δ), where
n
B(x̄, δ) := {x ∈ R | kx − x̄k ≤ δ} .

2
 In optimization theory one usually studies the following features of problem (P ):
1.- Does there exist a solution x̄ (global or local)?
2.- Optimality conditions, i.e. properties satisfied by the solutions (or local solutions).
3.- Computation algorithms for finding approximate solutions.
 In this course we will mainly focus on points 1 and 2 of the previous program.
 We will also consider mainly two cases for the feasible set K:
 K = Rn (unconstrained case).
 Equality and inequality constraints:
n
K = {x ∈ R | gi(x) = 0, i = 1, . . . , m, hj (x) ≤ 0, j = 1, . . . , `} . (1)

 In order to tackle point 2 we will assume that f is a smooth function. If the feasible
set (1) is considered, we will also assume that gi and hj are smooth functions.

3
Some mathematical tools

 In what follows, we will work in the euclidean space Rn. We denote by h·, ·i the
standard scalar product and by k · k the corresponding norm. Namely,

n q
>
X n n
hx, yi = x y = xiyi ∀ x ∈ R , y ∈ R and kxk = hx, xi.
i=1

We will often use the alternative notation x · y for hx, yi.

 [Graph of a function] Let f : Rn → R. The graph Gr(f ) ⊆ Rn+1 is defined by

n
Gr(f ) := {(x, f (x)) | x ∈ R } .

4
 [Level sets] Let c ∈ R. The level set of value c is defined by
n
Levf (c) := {x ∈ R | f (x) = c} .

• When n = 2, the sets Levf (c) are useful in order to draw the graph of a function.
• These sets will also be useful in order to solve graphically two dimensional linear
programming problems, i.e. n = 2, and the function f and the set K are defined
by means of affine functions.

5
Example 1: We consider the function
2
R 3 (x, y) 7→ f (x, y) := x + y + 2 ∈ R,

whose level set is given, for all c ∈ R, by


n o
2
Levf (c) := (x, y) ∈ R | x + y + 2 = c .

Note that the optimization problem with this f and K = R2 does not have a solution.

Example 2: Consider the function


2 2 2
R 3 (x, y) 7→ f (x, y) := x + y ∈ R.

Then Levf (c) = ∅ if c < 0 and, if c ≥ 0,


2 2
Levf (c) = {(x, y) | x + y = c},

i.e. the circle centered at 0 and of radius c.

6
Example 3: Consider the function
2 2 2
R 3 (x, y) 7→ f (x, y) := x − y ∈ R.

In this case the level sets are given, for all c ∈ R, by


2 2
Levf (c) = {(x, y) | y = x − c}.

7
 [Differentiability] Let f : Rn → R. We say that f is differentiable at x̄ ∈ Rn if for all
i = 1, . . . , n the partial derivatives

∂f f (x̄ + τ ei) − f (x̄)


(x̄) := lim (where ei := (0, . . . , z}|{
1 , . . . , 0)),
∂xi τ →0 τ
i

exist and, defining the gradient of f at x̄ by


 
∂f ∂f n
∇f (x̄) := (x̄), . . . , (x̄) ∈ R ,
∂x1 ∂xn

we have that
f (x̄ + h) − f (x̄) − ∇f (x̄) · h
lim = 0.
h→0 khk

If f is differentiable at every x belonging to a set A ⊆ Rn, we say that f is


differentiable in A.

8
Remark 1. Notice that f is differentiable at x̄ iff there exists εx̄ : Rn → R, with
limh→0 εx̄(h) = 0 and

f (x̄ + h) = f (x̄) + ∇f (x̄) · h + khkεx̄(h). (2)

In particular, f is continuous at x̄.

Lemma 1. [Directional differentiability] Assume that f is differentiable at x̄ and let


h ∈ Rn. Then,

f (x̄ + τ h) − f (x̄)
lim = ∇f (x̄) · h.
τ →0, τ >0 τ

Proof. By (2), for every τ > 0, we have

f (x̄ + τ h) − f (x̄) = τ ∇f (x̄) · h + τ khkεx̄(τ h).

Dividing by τ and letting τ → 0 gives the result.

9
Remark 2. (i) [Simple criterion to check differentiability] Suppose that A ⊆ Rn is
an open set containing x̄ and that

n
A 3 x 7→ ∇f (x) ∈ R ,

is well-defined and continuous at x̄. Then, f is differentiable at x̄.


As a consequence, if ∇f is continuous in A, then f is differentiable in A. In this case,
we say that f is C 1 in A (i.e. differentiability and continuity of ∇f in A). When f is
C 1 in Rn we simply say that f is C 1.

(ii) The notion of differentiability can be extended to a function f : Rn → Rm. In


this case, f is differentiable at x̄ if there exists L ∈ Mm×n(R) such that

kf (x̄ + h) − f (x̄) − Lhk


lim → 0.
khk→0 khk

If f is differentiable at x̄, then L = Df (x̄), called the Jacobian matrix of f at x̄,

10
which is given by
 
∂f1 ∂f1
 ∂x1 (x̄) ... ∂xn (x̄) 
 ... ... ... 
 
 ∂f ∂fi 
Df (x̄) =  i
 ∂x1 (x̄) ... (x̄) 
∂xn 
 
 ... ... ... 
 
∂fm ∂fm
∂x (x̄)
1
... ∂xn (x̄)

Note that when m = 1 we have that Df (x̄) = ∇f (x̄)>.

The chain rule says that if f : Rn → Rm is differentiable at x̄ and g : Rm → Rp is


differentiable at f (x̄), then g ◦ f : Rn → Rp is differentiable at x̄ and the following
identity holds
D(g ◦ f )(x̄) = Dg(f (x̄))Df (x̄).

(iii) In the previous definitions the fact that the domain of definition of f is Rn is

11
not important. The definition can be extended naturally for functions defined on open
subsets of Rn.

Basic examples:

(i) Let c ∈ Rn and consider the function f1 : Rn → R defined by f1(x) = c · x.


Then, for every x ∈ Rn, we have ∇f1(x) = c and, hence, f is differentiable.
(ii) Let Q ∈ Mn×n(R) and consider the function f2 : Rn → R defined by

n
f2(x) = 12 hQx, xi ∀ x ∈ R .

Then, for all x ∈ Rn and h ∈ Rn, we have

1
f2(x + h) = 2 hQ(x + h), x + hi
1 1 1
= 2 hQx, xi + 2 [hQx, hi + hQh, xi] + 2 hQh, hi
1 1 >
x, hi + 12 hQh, hi

= 2 hQx, xi + h 2 Q + Q
f2(x̄) + h 12 Q + Q> x, hi + khkεx(h),

=

12
where limh→0 εx(h) = 0. Therefore, f2 is differentiable and
 
> n
∇f2(x) = 12 Q + Q x ∀x∈R .

In particular, if Q is symmetric, then ∇f2(x) = Qx.


n
p the function f3 : R → R defined by f3(x) = kxk. Then, since
(iii) Consider
f3(x) = kxk2, if x 6= 0, the chain rule shows that

√ 2 2 1
1 > x>
Df (x) = D( ·)(kxk )D(k · k )(x) = 2 p (2x) = ,
kxk2 kxk
x
which implies that ∇f3(x) = kxk , and, since this function is continuous at every
x 6= 0, we have that f3 is C 1 in the set Rn \ {0}. Let us show that f3 is not
differentiable at x = 0. Indeed, if this is not the case, then all the partial derivatives
∂f3
∂x (0) should exists for all i = 1, . . . , n. Taking, for instance, i = 1, we have
i

k0 + τ e1k − k0k |τ |
lim = lim ,
τ →0 τ τ →0 τ

13
which does not exist, because

|τ | −τ τ |τ |
lim = lim = −1 6= 1 = lim = lim .
τ →0 − τ τ →0− τ τ →0 + τ τ →0 + τ

 [Second order derivative and Taylor expansion] Suppose that f : Rn → R is C 1. In


particular, the function Rn 3 x 7→ ∇f (x) ∈ Rn is well defined. If this function
is differentiable at x̄, then we say that f is twice differentiable at x̄. If f is twice
differentiable at every x belonging to a set A ⊆ Rn, then we say that f is twice
differentiable in A.
If this is the case, then, by definition,

∂ 2f
 
∂ ∂f
(x̄) := (x̄),
∂xi∂xj ∂xj ∂xi

exists for all i, j = 1, . . . , n. The following result, due to Clairaut and also known
as Schwarz’s theorem, says that, under appropriate conditions we can change the
derivation order.

14
Theorem 1. Suppose that the function f is twice differentiable in an open set
A ⊆ Rn containing x̄ and that for all i, j = 1, . . . , n the function A 3 x 7→
∂ 2f
∂x ∂x (x) ∈ R is continuous at x̄. Then,
i j

∂ 2f ∂ 2f
(x̄) = (x̄).
∂xi∂xj ∂xj ∂xi

Under the assumptions of the previous theorem, the Jacobian of ∇f (x̄) takes the form

∂ 2f ∂ 2f
 
(x̄) ... ∂xn ∂x1 (x̄)
∂x21
... ...
 

 ... 

 
2 ∂ 2f 2
∂ f
D f (x̄) =  ∂xi ∂x1 (x̄) ... (x̄) .
 
∂xn ∂xi
... ...
 

 ... 

∂ 2f ∂ 2f
 
∂xn ∂x1 (x̄) ...
∂xn 2 (x̄)

This matrix, called the Hessian matrix of f at x̄ belongs to Mn×n(R) and it is a

15
symmetric matrix by the previous result.
If f : Rn → R is twice differentiable in an open set A ⊆ Rn and for all i,
j = 1, . . . , n the function

∂ 2f
A 3 x 7→ (x) ∈ R
∂xi∂xj

is continuous, we say that f is C 2 in A.


 [Taylor’s theorem] We admit the following important result:

Theorem 2. Suppose that f : Rn → R is C 2 in an open set A ⊆ Rn. Then, for all


x ∈ A and h such that x + h ∈ A, we have the following expansion
2 2
f (x + h) = f (x) + ∇f (x) · h + 12 hD f (x)h, hi + khk Rx(h),

where Rx(h) → 0 as h → 0.

Example: Consider the function f : R2 → R defined by f (x, y) = ex cos(y)−x−1.

16
Then,
∂f x
∂x (0, 0) = e cos(y) (x,y)=(0,0) − 1 = 0,
∂f x
∂y (0, 0) = −e sin(y) (x,y)=(0,0) = 0,
∂ 2f x
∂x 2 (0, 0) = e cos(y) (x,y)=(0,0) = 1,
∂ 2f x
∂y 2 (0, 0) = −e cos(y) (x,y)=(0,0) = −1,
∂ 2f x
∂x∂y (0, 0) = −e sin(y) (x,y)=(0,0)
= 0.
Note that all the first and second order partial derivatives are continuous in Rn.
Therefore, we can apply the previous result and obtain that the Taylor’s expansion of
f at (0, 0) is given by

f ((0, 0) + h) = f (0, 0) + ∇f (0, 0) · h + 12 hD 2f (0, 0)h, hi + khk2Rx̄(h),


= 0 + 0 + 21 h21 − 21 h22 + khk2R(0,0)(h),
1 2
= 2 h1 − 21 h22 + khk2R(0,0)(h).

This expansion shows that locally around (0, 0) the function f above is similar to the
function in Example 3.

17
Some good reading for the previous part

 Chapters 2 and 3 in “Vector calculus”, sixth edition, by J. E. Marsden and A. Tromba.

 Chapter 14 in “Calculus: Early transcendentals”, eight edition, by J. Stewart.

18
Some basic existence results for problem (P )

 [Compactness] Recall that A ⊆ Rn is called compact if A is closed and bounded (i.e.


A is closed and there exists R > 0 such that kxk ≤ R for all x ∈ A).
Let us recall an important result concerning the compactness of a set A.

Theorem 3. [Bolzano-Weierstrass theorem] A set A ⊆ Rn is compact if and only


if every sequence (xk )k∈N of elements of A has a convergence subsequence. This
means that there exists x̄ ∈ A and a subsequence (xk` )`∈N of (xk )k∈N such that

x̄ = lim xk` .
`→∞

 [The basic existence results] Note that by definition, if infx∈Kf (x) = −∞, then f
has no lower bounds in K and, hence, there are no solutions to (P ). On the other
hand, if infx∈Lf (x) is finite, then the existence of a solution can also fail to hold as
the following example shows.

19
Example: Consider the function R 3 x 7→ f (x) := e−x and take K := [0, +∞[.
Then, inf x∈K f (x) = 0 and there is no x ∈ K such that f (x) = 0.

Definition 1. We say that (xk )k∈N ⊆ K is a minimizing sequence for (P ) if

inf f (x) = lim f (xk ).


x∈K k→∞

By definition, a minimizing sequence always exists if K is non-empty.

Theorem 4. [Weierstrass theorem, K compact] Suppose that f : Rn → R is


continuous and that K is non-empty and compact. Then, problem (P ) admits at
least one global solution.

Proof. Let (xk )k∈N ∈ K be a minimizing sequence. By compactness, there exists


x̄ ∈ K and a subsequence (xk` )`∈N of (xk )k∈N such that x̄ = lim`→∞ xk` . Then,
by continuity
f (x̄) = lim f (xk` ) = inf f (x).
`→∞ x∈K

20
Example: Suppose that f : R3 → R is given by f (x, y, z) = x2 − y 3 + sin z and
K = {(x, y, z) | x4 + y 4 + z 4 ≤ 1}. Then f is continuous and K is compact. As
a consequence, problem (P ) admits at least one solution.

Theorem 5. [K closed but not bounded] Suppose that K is non-empty, closed, and
that f is continuous and “coercive” or “infinity at the infinity” in K, i.e.

lim f (x) = +∞. (3)


x∈K, kxk→∞

Then, problem (P ) admits at least one global solution.


Proof. Let (xk )k∈N ∈ K be a minimizing sequence. Since inf x∈K f (x) = −∞ or
inf x∈K f (x) ∈ R and limk→∞ f (xk ) = inf x∈K f (x), there exists R > 0 such
that (xk )k∈N ⊆ KR := {x0 ∈ K |f (x0) ≤ R} ⊆ K. By continuity of f , this
set is closed and bounded because f is coercive. Arguing as in the previous proof,
the compactness of KR implies the existence of x̄ ∈ KR such that a subsequence of
(xk )k∈N converges to x̄, which, by continuity of f , implies that x̄ solves (P ).

21
Example: Suppose that f : Rn → R is given by
> n
f (x) = hQx, xi + c x ∀ x ∈ R ,

where Q ∈ Mn,n(R) is symmetric and positive definite, and c ∈ Rn. Since


2 n
hQx, xi ≥ λmin(Q)kxk ∀x∈R

(where λmin(Q) > 0 is the smallest eigenvalue of Q), we have that


2 n
f (x) ≥ λmin(Q)kxk − kckkxk ∀ x ∈ R .

This implies that f (x) → ∞ as kxk → ∞. Therefore,

lim f (x) = ∞, (4)


x∈K,kxk→∞

holds for every closed set K. Since f is also continuous, problem (P ) admits at least
one global solution for any given non-empty closed set K.

22
Example: Suppose that f : R2 → R is given by
2 3 2
f (x, y) = x + y ∀ (x, y) ∈ R ,

and
2
K = {(x, y) ∈ R | y ≥ −1}.
Then,
lim f (x) = +∞ (5)
x∈K,kxk→∞

holds (exercise) and, hence, (P ) admits at least one global solution.

23
Optimality conditions for unconstrained problems

 Notice that, by the second existence theorem, if f is continuous and satisfies that

lim f (x) = +∞,


kxk→∞

then, if K = Rn, problem (P ) admits at least one global solution.

 [First order optimality conditions for unconstrained problems]


We have the following result
Theorem 6. [Fermat’s rule] Suppose that K = Rn and that x̄ is a local solution to
problem (P ). If f is differentiable at x̄, then ∇f (x̄) = 0.
Proof. For every h ∈ Rn and τ > 0, the local optimality of x̄ yields

f (x̄) ≤ f (x̄ + τ h) = f (x̄) + τ ∇f (x̄) · h + τ khkεx̄(τ h),

24
where limz→0 εx̄(z) = 0. Therefore,

0 ≤ τ ∇f (x̄) · h + τ khkεx̄(τ h).

Dividing by τ and letting τ → 0, we get

∇f (x̄) · h ≥ 0.

Since h is arbitrary, we get that ∇f (x̄) = 0 (take for instance h = −∇f (x̄) in the
previous inequality).

 [Second order optimality conditions for unconstrained problems]

We have the following second order necessary condition for local optimality:
Theorem 7. Suppose that K = Rn and that x̄ is a local solution to problem (P ). If
f is C 2 in an open set A containing x̄, then D 2f (x̄) is positive semidefinite.
In other words,
2 n
hD f (x̄)h, hi ≥ 0 ∀ h ∈ R .

25
Proof. Let us fix h ∈ Rn. By Taylor’s theorem, for all τ > 0 small enough, we have

f (x̄ + τ h) = f (x̄) + ∇f (x̄) · (τ h) + 12 hD 2f (x̄)τ h, τ hi + kτ hk2Rx̄(τ h),

where Rx̄(τ h) → 0 as τ → 0. Using the local optimality of x̄, the previous result
implies that ∇f (x̄) = 0 and, hence,

τ2 2 2 2
0 ≤ f (x̄ + τ h) − f (x̄) = hD f (x̄)h, hi + τ khk Rx̄(τ h).
2

Dividing by τ 2 and passing to the limit with τ → 0, we get the result.

26
We have the following second order sufficient condition for local optimality.
Theorem 8. Suppose that f : Rn → R is C 2 in an open set A containing x̄ and
that
(i) ∇f (x̄) = 0.
(ii) The matrix D 2f (x̄) is positive definite. In other words,

2 n
hD f (x̄)h, hi > 0 ∀ h ∈ R , h 6= 0.

Then, x̄ is a local solution to (P ).

Proof. Let λ > 0 be the smallest eigenvalue of D 2f (x̄), then

n 2 2
∀h ∈ R , hD f (x̄)h, hi ≥ λkhk .

Using this inequality, the hypothesis ∇f (x̄) = 0, and the Taylor’s expansion, for all

27
h ∈ Rn such that x̄ + h ∈ A we have that
1 2 2
f (x̄ + h) − f (x̄) = ∇f (x̄) · h + hD f (x̄)h, hi + khk Rx̄(h)
2
λ 2 2
≥ khk + khk Rx̄(h)
2
 
λ 2
= + Rx̄(h) khk .
2

Since Rx̄(h) → 0 as h → 0, we can choose δ > 0 such that khk ≤ δ , x̄ + h ∈ A


and |Rx̄(h)| ≤ λ4 . As a consequence,

λ 2 n
f (x̄ + h) − f (x̄) ≥ khk ∀h∈R with khk ≤ δ,
4
which proves the local optimality of x̄.

28
Example: Let us study problem (P ) with K = R2 and
2 3 2 2
R 3 (x, y) 7→ f (x, y) = 2x + 3y + 3x y − 24y.

First, consider the sequence (xk , yk ) = (−k, 0) for k ∈ N. Then,


3
f (xk , yk ) = −2k → −∞ as k → ∞.

Therefore, inf (x,y)∈R2 f (x, y) = −∞ and problem (P ) does not admit global
solutions. Let us look for local solutions. We know that if (x, y) is a local solution,
then it should satisfy ∇f (x, y) = (0, 0). This equation gives

6x2 + 6xy = 0,
6y + 3x2 = 24.

From the first equation, we get that x = 0 or x = −y . In the first case, the second
equation yields y = 4, while in the second case we obtain that x2 − 2x − 8 = 0 which
yields the two solutions (4, −4) and (−2, 2). Therefore, we have the three candidates
(0, 4), (4, −4) and (−2, 2). Let us check what can be obtained from the Hessian at

29
these three points. We have that

 
2 12x + 6y 6x
D f (x, y) = .
6x 6

For the first candidate, we have

 
2 24 0
D f (0, 4) = .
0 6

which is positive definite. This implies that (0, 4) is a local solution of (P ). For the
second candidate, we have

   
2 24 24 4 4
D f (4, −4) = =6 ,
24 6 4 1

whose determinant is given by 36(−12) < 0, which implies that D 2f (4, −4) is

30
indefinite (the sign of the eigenvalues is not constant). Finally,
 
2 −12 −12
D f (−2, 2) =
−12 6

which is also indefinite because the sign of the diagonal terms are not constant.
Therefore, (0, 4) is the unique local solution to (P ).
 [Maximization problems] If instead of problem (P ) we consider the problem

Find x̄ ∈ Rn such that f (x̄) = max {f (x) | x ∈ K} , (P 0)

then x̄ is a local (resp. global) solution to (P 0) iff x̄ is a local (resp. global) solution
to (P ) with f replaced by −f . In particular, if x̄ is a local solution to (P 0) and f is
regular enough, then we have the following first order necessary condition

∇f (x̄) = 0,

as well as the following second order necessary condition


2 n
hD f (x̄)h, hi ≤ 0 ∀ h ∈ R .

31
In other words, D 2f (x̄) is negative semidefinite.

Conversely, if x̄ ∈ Rn is such that ∇f (x̄) = 0 and


2 n
hD f (x̄)h, hi < 0 ∀ h ∈ R , h 6= 0.

Then, x̄ is a local solution to (P 0).

32
Convexity

 [Convexity of a set] A non-empty set C ⊆ Rn is called convex if for any λ ∈ [0, 1]


and x, y ∈ C , we have that

λx + (1 − λ)y ∈ C.

Let us fix a convex set C ⊆ Rn.


 [Convexity of a function] A function f : C → R is said to be convex if for any
λ ∈ [0, 1] and x, y ∈ C , we have that

f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y).

 [Relation between convex functions and convex sets] Given f : Rn → R, let us define
its epigraph epi(f ) by
n o
n+1
epi(f ) := (x, y) ∈ R | y ≥ f (x) .

33
Proposition 1. The function f is convex iff the set epi(f ) is convex.
Proof. Indeed, suppose that f is convex and let (x1, z1), (x2, z2) ∈ epi(f ). Then,
given λ ∈ [0, 1] set

Pλ := λ(x1, z1) + (1 − λ)(x2, z2) = (λx1 + (1 − λ)x2, λz1 + (1 − λ)z2)

Since, by convexity,

f (λx1 + (1 − λ)x2) ≤ λf (x1) + (1 − λ)f (x2) ≤ λz1 + (1 − λ)z2,

we have that Pλ ∈ epi(f ). Conversely, assume that epi(f ) is convex and let x1,
x2 ∈ Rn and λ ∈ [0, 1]. Since (x1, f (x1)), (x2, f (x2)) ∈ epi(f ), we deduce that

(λx1 + (1 − λ)x2, λf (x1) + (1 − λ)f (x2)) ∈ epi(f ),

and, hence,
f (λx1 + (1 − λ)x2) ≤ λf (x1) + (1 − λ)f (x2),
which proves the convexity of f .

34
 [Strict convexity of a function] A function f : C → R is said to be strictly convex if
for any λ ∈ (0, 1) and x, y ∈ C , with x 6= y , we have that

f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y).

 [Concavity and strict concavity of a function] A function f : C → R is said to be


concave if −f is convex. Similarly, the function f is strictly concave if −f is strictly
convex.
Example: Let us show that the function Rn 3 x 7→ kxk ∈ R is convex but not
strictly convex. Indeed, the convexity follows from the triangle inequality

kλx + (1 − λ)yk ≤ kλxk + k(1 − λ)yk = λkxk + (1 − λ)kyk.

Now, if we have that for some λ ∈ (0, 1)

kλx + (1 − λ)yk = λkxk + (1 − λ)kyk

the equality case in the triangle inequality (ka + bk = kak + kbk iff a = 0 and b = 0
or a = αb with α > 0) shows that the previous inequality holds iff that x = y = 0

35
or x = γy for some γ > 0. By taking x 6= 0 and y = γx with γ ∈ (0, ∞) \ {1}
we conclude that k · k is not strictly convex.

Example: Let β ∈ (1, +∞). Let us show that the function Rn 3 x 7→ kxkβ ∈ R
is strictly convex. Indeed, the real function [0, +∞) 3 t 7→ α(t) := tβ ∈ R is
increasing and strictly convex because
0 β−1 00 β−2
α (t) = βt > 0 and α (t) = β(β − 1)t > 0 ∀ t ∈ (0, +∞).

As a consequence, for any λ ∈ [0, 1], using that

kλx + (1 − λ)yk ≤ λkxk + (1 − λ)kyk,

we get that
β
kλx + (1 − λ)ykβ ≤ (λkxk + (1 − λ)kyk)
(6)
β β
≤ λkxk + (1 − λ)kyk ,

which implies the convexity of k · kβ . Now, in order to prove the strict convexity,

36
assume that for some λ ∈ (0, 1) we have
β β β
kλx + (1 − λ)yk = λkxk + (1 − λ)kyk ,

and let us prove that x = y . Then, all the inequalities in (6) are equalities and, hence,

kλx + (1 − λ)yk = λkxk + (1 − λ)kyk,


β
and (λkxk + (1 − λ)kyk) = λkxkβ + (1 − λ)kykβ .

The equality case in the triangle inequality and the first relation above imply that
x = y = 0 or x = γy for some γ > 0. The strict convexity of α and the second
inequality above imply that kxk = kyk. Therefore, either x = y = 0 or both x and
y are not zero and x = γy for some γ > 0 and kxk = kyk. In the latter case, we
get that α = 1 and, hence, x = y from which the strict convexity follows.

37
 [Convexity and differentiability] We have the following result:
Theorem 9. Let f : C → R be a differentiable function. Then,
(i) f is convex in Rn if and only if for every x ∈ C we have

f (y) ≥ f (x) + ∇f (x) · (y − x), ∀ y ∈ C. (7)

(ii) f is strictly convex in Rn if and only if for every x ∈ C we have

f (y) > f (x) + ∇f (x) · (y − x), ∀ y ∈ C, y 6= x. (8)

Proof. (i) By definition of convex function, for any x, y ∈ C and λ ∈ (0, 1), we have

f (λy + (1 − λ)x) ≤ λf (y) + (1 − λ)f (x),

and, hence,
f (λy + (1 − λ)x) − f (x) ≤ λ (f (y) − f (x))

38
Since, λy + (1 − λ)x = x + λ(y − x), we get

f (x + λ(y − x)) − f (x)


≤ f (y) − f (x).
λ
Letting λ → 0, Lemma 1 yields

∇f (x) · (y − x) ≤ f (y) − f (x).

Conversely, take x1 and x2 in C , λ ∈]0, 1[ and define xλ := λx1 + (1 − λ)x2. By


assumption,

∀i ∈ {1, 2} , f (xi) ≥ f (xλ) + ∇f (xλ) · (xi − xλ),

and we obtain, by taking the convex combination, that

λf (x1) + (1 − λ)f (x2) ≥ f (xλ) + ∇f (xλ) · (λx1 + (1 − λ)x2 − xλ) = f (xλ),

which shows that f is convex.

39
(ii) Since f is convex, by (i) we have that

f (y) ≥ f (x) + ∇f (x) · (y − x), ∀ y ∈ C. (9)

Suppose that there exists z ∈ C such that z 6= x and

f (z) = f (x) + ∇f (x) · (z − x).

Let y = 12 x + 21 z . Then, by (9), and strict convexity, we get

f (x)+∇f (x)·( 12 z − 12 x) ≤ f (y) < 21 f (x)+ 12 f (z) = f (x)+∇f (x)·( 12 z − 12 x),

which is impossible. The proof that (8) implies that f is strictly convex is completely
analogous to the proof that (7) implies that f is convex. The result follows.

Theorem 10. Let f : C → R be C 2 in C and suppose that C is open (besides being


convex). Then
(i) f is convex if and only if D 2f (x) is positive semidefinite for all x ∈ C .
(ii) f is strictly convex if D 2f (x) is positive definite for all x ∈ C

40
Proof. (i) Suppose that f is convex. Then, by Taylor’s theorem for every x ∈ C ,
h ∈ Rn and τ > 0 small enough such that x + τ h ∈ C we have

τ2 2 2 2
f (x + τ h) = f (x) + τ ∇f (x) · h + hD f (x)h, hi + τ khk Rx(τ h),
2
which implies, by the first order characterization of convexity, that
2 2
0 ≤ 21 hD f (x)h, hi + khk Rx(τ h).

Using that limτ →0 Rx(τ h) = 0, and the fact that h is arbitrary, we get that
2 n
hD f (x)h, hi ≥ 0 ∀ h ∈ R .

Suppose that D 2f (x) is positive semidefinite for all x ∈ C and assume, for the time
being, that for every x, y ∈ C there exists cxy ∈ {λx + (1 − λ)y | λ ∈ (0, 1)}
such that
2
f (y) = f (x) + ∇f (x) · (y − x) + 21 hD f (cxy )(y − x), y − xi. (10)

41
Then, have that

f (y) ≥ f (x) + ∇f (x) · (y − x) ∀ x, y ∈ C,

and, hence, f is convex. It remains to prove (10). Defining g(τ ) := f (x + τ (y − x))


for all τ ∈ [0, 1], formula (10) follows from the equality
0 00
g(1) = g(0) + g (0) + 21 g (τ̂ )

for some τ̂ ∈ (0, 1).


(ii) The assertion follows directly from (10), with y 6= y , and Theorem 9(ii).

Remark 3. Note that the positive definiteness of D 2f (x), for all x ∈ C , is only
a sufficient condition for strict convexity but not necessary. Indeed, the function
R 3 x 7→ f (x) = x4 ∈ R is strictly convex but f 00(0) = 0.

Example: Let Q ∈ Mn,n(R) be symmetric and let f : Rn → R be defined by


> >
f (x) = 21 x Qx + c x.

42
Then, D 2f (x) = Q and hence f is convex if Q is positive semidefinite and strictly
convex if Q is positive definite.
In this case, the fact that Q is positive definite is also a necessary condition for strict
convexity. Indeed, for simplicity suppose that c = 0 and write Q = P DP >, where
the set of columns of P is an orthonormal basis of eigenvectors of Q (which exists
because Q is symmetric), and D is the diagonal matrix containing the corresponding
>
eigenvalues {λi}N i=1 in the diagonal. Set y(x) = P x. Then,
X n 2
f (x) = λi=1yi(x) .

If Q is not definite positive, then there exists j ∈ {1, . . . , N } such that


λj ≤ 0. Then, it is easy to see that f is not strictly convex on the set
{x ∈ Rn | yi(x) = 0, for all i ∈ {1, . . . , n} \ {j}}.

43
Convex optimization problems

 [Optimality conditions for convex problems] Let us begin with a definition.


Definition 2. Problem (P ) is called convex if f is convex and K is a non-empty
closed and convex set.
We have the following result.
Theorem 11. [Characterization of solutions for convex problems] Suppose that
problem (P ) is convex and that f : Rn → R is differentiable in K. Then, the
following statements are equivalent:
(i) x̄ is a local solution to (P ).
(ii) The following inequality holds:

h∇f (x̄), x − x̄i ≥ 0 ∀ x ∈ K. (11)

(iii) x̄ is a global solution to (P ).

44
Proof. Let us prove that (i) implies (ii). Indeed, by convexity of K we have that
given x ∈ K for any τ ∈ [0, 1] the point τ x + (1 − τ )x̄ = x̄ + τ (x − x̄) ∈ K.
Therefore, by the differentiability of f , if τ is small enough, we have

0 ≤ f (x̄ + τ (x − x̄)) − f (x̄) = τ ∇f (x̄) · (x − x̄) + τ kx − x̄kεx̄(τ kx − x̄k),

where limh→0 εx̄(h) = 0. Dividing by τ and letting τ → 0, we get (ii).


The proof that (ii) implies (iii) follows directly from the inequalities

f (x) ≥ f (x̄) + ∇f (x̄) · (x − x̄) ≥ f (x̄) ∀ x ∈ K.

Finally, (iii) implies (i) is trivial. The result follows.

Remark 4. In particular, if f : Rn → R is convex and differentiable and K = Rn, the


relation
∇f (x̄) = 0,
is a necessary and sufficient condition for x̄ to be a global solution to (P ).

45
Proposition 2. Suppose that K is convex and that f is strictly convex in K. Then,
there exists at most one solution to problem (P ).
Proof. Assume, by contradiction, that x1 6= x2 are both solutions to (P ). Then,
1 1
2 x 1 + 2 x2 ∈ K and

f ( 12 x1 + 12 x2) < 12 f (x1) + 21 f (x2) = 1


2 min f (x) + 21 min f (x) = min f (x),
x∈K x∈K x∈K

which is impossible.

 [Least squares] Let A ∈ Mm,n(R), b ∈ Rm and consider the system Ax = b.


Suppose that m > n. This type of systems appear, for instance, in data fitting
problem and it is often ill-posed, in the sense that there is no x satisfying the equation.
In this case, one usually considers the optimization problem
2
min n f (x) := kAx − bk . (12)
x∈K:=R

Note that
> > 2
f (x) = hA Ax, xi − 2hA b, xi + kbk .

46
and, hence, D 2f (x) = 2A>A, which is symmetric positive semidefinite, and, hence,
f is convex. Let us assume that the columns of A are linearly independent. Then, for
any h ∈ Rn,
>
hA Ah, hi = 0 ⇔ Ah = 0 ⇔ h = 0,
i.e. for all x ∈ Rn, the matrix D 2f (x) is symmetric positive definite and, hence, f
is strictly convex. Moreover, denoting by λmin > 0 the smallest eigenvalue of 2A>A,
we have
2 > 2
f (x) ≥ λminkxk − 2hA b, xi + kbk .
and, hence, f is infinity at the infinity. Therefore, problem (12) admits only one solution
x̄. By Remark 4, the solution x̄ is characterized by
> > > −1 >
A Ax̄ = A b, i.e. x̄ = (A A) A b.

 [Projection on a closed and convex set] Suppose that K is a nonempty closed and
convex set and let y ∈ Rn. Consider the problem the projection problem

inf {kx − yk | x ∈ K} . (P rojK)

47
Note that K being closed and the cost functional being coercive, we have the existence
of at least one solution x̄. In order, to characterize x̄ notice that the set of solutions to
(P rojK) is the same as the set of solutions to the problem
n o
1 2
inf 2 kx − yk | x ∈ K .

Then, since the cost functional of the problem above is strictly convex, Proposition 2
implies that x̄ is its unique solution and, hence, is also the unique solution to (P rojK).
Moreover, by Theorem 11(ii), we have that x̄ is characterized by the inequality

(y − x̄) · (x − x̄) ≤ 0 ∀ x ∈ K. (13)

Example: Let b ∈ Rm and A ∈ Mm×n be such that

m
b ∈ Im(A) := {Ax | x ∈ R }.

Suppose that
n
K = {x ∈ R | Ax = b}. (14)

48
Then, K is closed, convex and nonempty. Moreover, for any h ∈ Ker(A) we have
that x̄ + h ∈ K. As a consequence, (13) implies that

(y − x̄) · h ≤ 0 ∀ h ∈ Ker(A),

and, using that h ∈ Ker(A) iff −h ∈ Ker(A), we get that

(y − x̄) · h = 0 ∀ h ∈ Ker(A). (15)

Conversely, since for every x ∈ K we have x − x̄ ∈ Ker(A), relation (15) implies


(13), and, hence, (15) characterizes x̄. Note that (15) can be written as1
n o
⊥ n >
y − x̄ ∈ Ker(A) = v ∈ R | v h = 0 ∀ h ∈ Ker(A) ,
1
Recall that given a subspace V of Rn , the orthogonal space V ⊥ is defined by

⊥ n >
V := {z ∈ R | z v = 0 ∀ v ∈ V }.

Two important properties of the orthogonal space are V ⊕ V ⊥ = Rn , and (V ⊥ )⊥ = V .

49
or, equivalently,

y = x̄ + z for some z ∈ Ker(A) . (16)
 [Convex problems with equality constraints] Now, we consider the same set K as in
(14) but we consider a general differentiable convex objective function f : Rn → R.
We will need the following result from Linear Algebra.
Lemma 2. Let A ∈ Mm,n(R). Then, Ker(A)⊥ = Im(A>).
Proof. By the previous footnote, the desired relation is equivalent to Im(A>)⊥ =
Ker(A). Now, x ∈ Im(A>)⊥ iff hx, A>yi = 0 for all y ∈ Rm, and this holds iff
hAx, yi = 0 for all y ∈ Rm, i.e. x ∈ Ker(A).
Proposition 3. Let f : Rn → R be differentiable and convex and suppose that the
set K in (14) is nonempty. Then x̄ is a global solution to (P ) iff x̄ ∈ K and there
exists λ ∈ Rm such that
>
∇f (x̄) + A λ = 0. (17)
Proof. We are going to show that (17) is equivalent to (11) from which the result
follows. Indeed, exactly as in the previous example, we have that (11) is equivalent to

∇f (x̄) · h = 0 ∀ h ∈ Ker(A),

50
i.e.

∇f (x̄) ∈ Ker(A) .
Lemma 2 implies the existence of µ ∈ Rm such that ∇f (x̄) = A>µ. Setting
λ = −µ we get (17).

Example: Let Q ∈ Mn,n(R) be symmetric and positive definite, and c ∈ Rn. In the
framework of the previous proposition, suppose that f is given by
> n
f (x) = 21 hQx, xi + c x ∀ x ∈ R ,

and that A has m linearly independent columns. A classical linear algebra result states
that this is equivalent to the fact that the m lines of A are linearly independent. In
this case, we say that A has full rank.
Under the previous assumptions on Q, we have seen that f is strictly convex.
Moreover, the condition on the columns of A implies that Im(A) = Rm and, hence,
K= 6 ∅. Now, by Proposition 3 the point x̄ solves (P ) iff x̄ ∈ K and there exists
λ ∈ Rm such that (17) holds. In other words, there exists λ ∈ Rm such that
>
Ax̄ = b, and Qx̄ + c + A λ = 0.

51
The second equation above yields x̄ = −Q−1(c + A>λ) and, hence, by the first
equation, we get
−1 −1 >
AQ c + AQ A λ + b = 0. (18)
Let us show that M := AQ−1A> is invertible. Indeed, since M ∈ Mm,m(R) it
suffices to show that M y = 0 implies that y = 0. Now, let y ∈ Rm such that
M y = 0. Then, hM y, yi = 0 and, hence, hQ−1A>y, A>yi = 0, which implies,
since Q−1 is also positive definite, that A>y = 0. Now, since the columns of A> are
also linearly independent, we deduce that y = 0, i.e. M is invertible. Using this fact,
we can solve for λ in (18), obtaining
 
−1 −1
λ = −M AQ c + b .

We deduce that
  
−1 > −1 −1
x̄ = −Q c−A M AQ c + b , (19)

is the unique solution to this problem.

52
Example: Let us now consider the projection problem

min 1
2 kx − yk2
s.t. Ax = b.

Noting that 12 kx − yk2 = 21 kxk2 − y >x + 12 kyk2, the previous problem has the same
solution than

min 1
2 kxk 2
− y>x
s.t. Ax = b,

which corresponds to Q = In×n (the n × n identity matrix) and c = −y . Then,


(19) implies that the solution of this problem is given by

> > −1 > > −1


x̄ = (I − A (AA ) A)y + A (AA ) b.

53
Note that if h ∈ Ker(A)

hy − x̄, hi = hA>(AA>)−1Ay − A>(AA>)−1b, hi


= hAA>)−1Ay − (AA>)−1b, Ahi
= 0,

confirming (16).

 [Separation of a point and a closed convex set] We have the following result:
Proposition 4. Let K be a nonempty closed and convex set and let y ∈
/ K. Then,
there exists p ∈ Rn such that

hp, xi < hp, yi ∀ x ∈ K.

Proof. Let x̄ ∈ K be the projection of y onto K. Let us define the affine function

`(x) := hy − x̄, xi − hy − x̄, x̄i.

Then, the tangent plane to K at x̄ is given by Πx̄ := {x ∈ Rd | `(x) = 0}. Indeed,

54
x̄ ∈ Πx̄ and, by (13), `(x) ≤ 0 for all x ∈ K. Now, since `(y) = ky − xk2 > 0
(because y ∈ / K), we deduce that `(x) < `(y) for all x ∈ K, which yields
hy − x̄, xi < hy − x̄, yi for all x ∈ K. Setting p := y − x̄, we have proven the
result.
 [Cones and polar cones]
Definition 3. (i) A set C ⊆ Rn is a cone if

∀ h ∈ C, ∀ τ ≥ 0 we have τ h ∈ C.

(ii) The set


◦ n
C := {u ∈ R | hu, hi ≤ 0 ∀ h ∈ C} ,
is called the polar cone of C .
The simplest example of a cone is any subspace V of Rn. In this case, V is a convex
cone and we have V ◦ = V ⊥, the orthogonal space to V . In particular, if ai ∈ Rn
(i = 1, . . . , m), then, the set
n
C = {h ∈ R | hai, hi = 0 ∀ i = 1, . . . , m},

55
which, denoting by A ∈ Mm,n(R) the matrix whose ith row is ai, can be written as
n
C = {h ∈ R | Ax = 0} = Ker(A),

is a convex cone and, as a consequence of Lemma 2, we have


n o
◦ ⊥ > > m
C = Ker(A) = Im(A ) = A λ | λ ∈ R .

Now, suppose that C is given by


n
C = {h ∈ R | hai, hi ≤ 0 ∀ i = 1, . . . , m}.

Our purpose now is to compute C ◦.


Lemma 3. Denote, as before, by A ∈ Mm×n(R) the matrix whose ith row is given
by ai. Then

 >
C = A λ | λ ∈ Rm, λi ≥ 0 ∀ i = 1, . . . , m ,
Pm (20)
= { i=1 λiai | λi ≥ 0 ∀ i = 1, . . . , m} .

56
Proof. If λ ∈ Rm with λi ≥ 0 ∀ i = 1, . . . , m, then for every h ∈ C we have
m
>
X
hA λ, hi = hλ, Ahi = λihai, hi ≤ 0,
i=1

and, hence, A>λ ∈ C ◦. Now, denote by B the set on the right hand side of (20).
Clearly, B is convex and nonempty. Moreover, B can also be shown to be closed.
Suppose that u ∈ C ◦ and u ∈ / B . Then, by Proposition 4, there exists p ∈ Rn such
that
hp, xi < hp, ui ∀ x ∈ B,
i.e.
> m
hp, A λi < hp, ui ∀ λ ∈ R , λi ≥ 0 ∀ i = 1, . . . , m. (21)
Now, the previous inequality has the following two consequences:
(i) hp, ui > 0. Indeed, it suffices to take λ = 0 in (21).
(ii) hp, aii ≤ 0 for all i = 1, . . . , m. Indeed, fix i ∈ {1, . . . , m} and γ > 0. By
taking
λ = (0, . . . , γ , . . . , 0)
z}|{
i

57
in (21) we get γhp, aii < hp, ui which implies that hp, aii ≤ 0 (if this is not the
case, by taking γ large enough we get a contradiction).

From (i)-(ii), we conclude that p ∈ C and hp, ui > 0 which contradicts the fact that
u ∈ C ◦.

Now, let us consider the case where C is defined by both linear equalities and
inequalities. Let ai ∈ Rn (i = 1, . . . , m) and a0j ∈ Rn (j = 1, . . . , p). Suppose
that C is given by
( )
n
hai, hi = 0 ∀ i = 1, . . . , m
C= h∈R .
ha0j , hi ≤ 0 ∀ j = 1, . . . , p

Lemma 4. We have
 
X m p 
◦ 0
X
C = λiai + µj aj µj ≥ 0 ∀ j = 1, . . . , p .
 
i=1 j=1

58
Proof. The set C can be written as
 

 hai, hi ≤ 0 ∀ i = 1, . . . , m, 

 
n
C= h∈R h−ai, hi ≤ 0 ∀ i = 1, . . . , m, .
 
ha0j , hi ≤ 0 ∀ j = 1, . . . , p

 

Lemma 3 implies that u ∈ C ◦ iff there exist α1 ≥ 0, β1 ≥ 0, . . . , αm ≥ 0, βm ≥ 0


and µ1 ≥ 0, . . . , µp ≥ 0 such that

Pm Pm Pp
u = i=1 αi ai + i=1 βi (−ai ) + j=1 µj a0j
Pm Pp 0
= i=1 (αi − βi )ai + j=1 µj aj .

The result follows by setting λi = αi − βi for all i = 1, . . . , m.

 [Application to convex problems with affine equality and inequality constraints] Let
ai ∈ Rn, bi ∈ R (i = 1, . . . , m), a0j ∈ Rn and b0j ∈ R (j = 1, . . . , p). Suppose

59
that the constraint set K is given by
( )
n
hai, xi + bi = 0 ∀ i = 1, . . . , m,
K= x∈R . (22)
ha0j , xi + b0j ≤ 0 ∀ j = 1, . . . , p

The main result in this section is the following.

Proposition 5. Suppose that f : Rn → R is convex and C 1. Moreover, assume that


the constraint set K is nonempty and given by (22). Then, x̄ is a global solution to
(P ) iff x̄ ∈ K and there exist λ ∈ Rm, µ ∈ Rp such that

Pm Pp
∇f (x̄) + i=1 λiai + j=1 µj a0j = 0,
(23)
and ∀ j ∈ {1, . . . , p} we have µj ≥ 0, and µj (ha0j , x̄i + b0j ) =0

Proof. Suppose that x̄ solves (P ). Then, by Theorem 11(ii), we have

h∇f (x̄), x − x̄i ≥ 0 ∀ x ∈ K. (24)

60
Now, let us define the set of active inequality constraints
n o
0 0
I(x̄) = j ∈ {1, . . . , p} | haj , x̄i + bj = 0 ,

and consider the set


( )
n
hai, hi = 0 ∀ i = 1, . . . , m
C := h∈R .
ha0j , hi ≤ 0 ∀ j ∈ I(x̄)

Let τ > 0 and h ∈ C . Then, we claim that x̄ + τ h ∈ K if τ is small enough.


Indeed,

hai, x̄ + τ hi + bi = hai, x̄i + bi = 0 ∀ i = 1, . . . , m,


ha0j , x̄ + τ hi + b0j ≤ ha0j , x̄i + b0j ≤ 0 ∀ j ∈ I(x̄),
ha0j , x̄ + τ hi + b0j = ha0j , x̄i + b0j + τ ha0j , hi < 0 ∀ j ∈ {1, . . . , p} \ I(x̄),

where the last inequality holds because ha0j , x̄i+b0j < 0 for all j ∈ {1, . . . , p}\I(x̄)
and we can pick τ small enough in order to ensure that ha0j , x̄i + b0j + τ ha0j , hi < 0

61
for all j ∈ {1, . . . , p} \ I(x̄). Indeed, it suffices to take τ > 0 such that

0 0 0
τ max |haj , hi| < min −haj , x̄i − bj .
j∈{1,...,m}\I(x̄) j∈{1,...,m}\I(x̄)

Thus, by (24) we deduce that

h∇f (x̄), hi ≥ 0 ∀ h ∈ C,

which means that



−∇f (x̄) ∈ C .
Thus, from Lemma 4 we get the existence of λ ∈ Rm, µ ∈ Rp, such that µj ≥ 0 for
all j ∈ {1, . . . , p}, µj = 0 for all j ∈ {1, . . . , p} \ I(x̄), and

m p
0
X X
−∇f (x̄) = λiai + µj aj .
i=1 j=1

Condition (23) follows directly from the previous relation. Conversely, assume that

62
(x̄, λ, µ) satisfies (23) and define the function L(·, λ, µ) as
m p
0 0
X X
L(x, λ, µ) = f (x) + λi(hai, xi + bi) + µj (haj , xi + bj ).
i=1 j=1

Then, L(·, λ, µ) is convex, C 1, and satisfies

L(x, λ, µ) = f (x̄), and ∇xL(x̄, λ, µ) = 0.

Thus, using that for all x ∈ K we have L(x, λ, µ) ≤ f (x), we get

f (x̄) = L(x̄, λ, µ) + h∇xL(x̄, λ, µ), x − x̄i ≤ L(x, λ, µ) ≤ f (x),

for all x ∈ K, which implies that x̄ is a global solution to (P ).

63
Example: Let us consider the projection problem

1 2
inf 2 kx − yk ,
x∈K

where
n
K = {x ∈ R | x1 ≤ x2 ≤ . . . ≤ xn} .
Clearly K is nonempty, closed and convex. Therefore, the projection x̄ of y onto K
exists and it is unique. Let us define

0 0
1 , −1 , . . . , 0), and bj = 0.
aj := (0, . . . , z}|{
z }| {
j j+1

Then,
n o
n 0 0
K= x∈R | haj , xi + bj ≤ 0 ∀ j = 1, . . . , n − 1 .

By the previous proposition, x̄ solves (P ) iff x̄ ∈ K and there exists µ ∈ Rn−1 such
that (23) holds. Namely,

64
x̄1 − y1 + µ1 = 0
x̄2 − y2 + µ2 − µ1 = 0
..
.
x̄n−1 − yn−1 + µn−1 − µn−2 = 0
x̄n − yn − µn−1 = 0
µi ≥ 0 µi hi (x̄) = 0 ∀ i = 1, . . . , n − 1,

which is equivalent to

x̄1 − y1 ≤ 0
x̄2 + x̄1 − (y2 + y1 ) ≤ 0
..
.
Pn−1 Pn−1
k=1
x̄ k − y ≤0
k=1 k
Pn Pn
k=1 x̄ k − k=1 yk = 0,
P 
i Pi
k=1 x̄ k − k=1 yk hi (x̄) = 0 ∀ i = 1, . . . , n − 1.

65
Let us compute x̄ when n = 4 and y = (2, 1, 5, 4). In this case, we have
x̄1 − 2 ≤ 0
x̄2 + x̄1 − 3 ≤ 0
x̄3 + x̄2 + x̄1 − 8 ≤ 0
x̄4 + x̄3 + x̄2 + x̄1 − 12 = 0
(x̄1 − 2)(x̄1 − x̄2 ) = 0
(x̄2 + x̄1 − 3)(x̄2 − x̄3 ) = 0
(x̄3 + x̄3 + x̄1 − 8)(x̄3 − x̄4 ) = 0

The first two equations and the constraint x̄1 ≤ x̄2 imply that x̄1 = x̄2 < 2. Then,
taking x̄1 = x̄2 = 3/2, the second equation is satisfied, but the third inequality cannot
be satisfied with an equality and, hence, we take x̄3 = x̄4 and the fourth relation
yields x̄3 = x̄4 = 9/2. Thus, the point (x̄1, x̄2, x̄3, x̄4) = (3/2, 3/2, 9/2, 9/2)
satisfies the previous system and, hence, solves the projection problem.

66
Optimization problems with equality and inequality
constraints

 [Abstract optimality condition] In this section we establish an abstract optimality


condition for the general problem (P ) with K being a nonempty closed set and
f : Rn → R being differentiable. We need first the following definition.
Definition 4. Let x̄ ∈ K. We say that h ∈ Rn is a tangent vector to K at x̄ if
there exist (τn)n∈N ⊆ R and (n)n∈N ⊆ Rn such that

n→+∞ n→+∞
τn > 0 ∀ n ∈ N, τn −→ 0, n −→ 0,
(25)
x̄ + τnh + τnn ∈ K ∀ n ∈ N.

The set of tangent vectors to K at x̄ is called the tangent cone to K at x̄ and it is


denoted by TK(x̄).

67
Remark 5. (i) By definition h ∈ TK(x̄) iff there exists (τn)n∈N ⊆ R and
n→+∞ n→+∞
(hn)n∈N ⊆ Rn such that τn ≥ 0, τn −→ 0, hn −→ h, and x̄ + τnhn ∈ K for
all n ∈ N.
(ii) It is easy to see that TK(x̄) is indeed a closed cone.

Using this notion, we can prove the following result.

Theorem 12. [Abstract optimality condition] Suppose that x̄ ∈ K a local solution


to (P ) and that f is differentiable at x̄. Then,

h∇f (x̄), hi ≥ 0 ∀ h ∈ TK(x̄).

Proof. Let h ∈ TK(x̄) and let (τn)n∈N and (hn)n∈N be as in Remark 5(i). Then, for
n large enough we have

f (x̄) ≤ f (x̄ + τnhn) = f (x̄) + τnh∇f (x̄), hni + τnkhnkεx̄(τnhn),

which yields
h∇f (x̄), hni + khnkεx̄(τnhn) ≥ 0.

68
Therefore, letting n → ∞, we get

h∇f (x̄), hi ≥ 0,

from which the result follows.


Now, we need the following definition
Definition 5. The normal cone NK(x̄) to K at x̄ ∈ K is defined by

NK(x̄) := (TK) .

Corollary 1. Suppose that x̄ ∈ K a local solution to (P ) and that f is differentiable


at x̄. Then,
−∇f (x̄) ∈ NK(x̄).

 [Optimization problems with equality and inequality constraints] We suppose now that
the constraint system is given by
n
K := {x ∈ R | gi(x) = 0, ∀ i = 1, . . . , m, hj (x) ≤ 0 ∀ j = 1, . . . , p} ,

69
where gi : Rn → R (i = 1, . . . , m), and hj : Rn → R (j = 1, . . . , p) are
differentiable functions. In this case, Problem (P ) is usually written as

min f (x)
s.t. gi(x) = 0, ∀ i = 1, . . . , m,
hj (x) ≤ 0, ∀ j = 1, . . . , p.

Let x̄ ∈ K and set

I(x̄) := {j ∈ {1, . . . , p} | hj (x̄) = 0} ,

for the set of indexes of active inequality constraints at x̄.

Let us study the tangent cone TK(x̄).

70
Lemma 5. The following inclusion holds
( )
n
h∇gi(x̄), hi = 0 ∀ i = 1, . . . , m,
TK(x̄) ⊆ h ∈ R . (26)
h∇hj (x̄), hi ≤ 0 ∀ j ∈ I(x̄)

Proof. Let h ∈ TK (x̄) and let (τn)n∈N and (hn)n∈N be as in Remark 5(i). Then,
for every i = 1, . . . , m, we have

0 = gi(x̄ + τnhn) = gi(x̄) + τnh∇gi(x̄), hni + τnkhnkεgi,x̄(τnhn)


= τnh∇gi(x̄), hni + τnkhnkεgi,x̄(τnhn).

Then, dividing by τn and letting n → ∞, we get

h∇gi(x̄), hi = 0.

Similarly, for every j ∈ I(x̄),

0 ≥ hj (x̄ + τnhn) = hj (x̄) + τnh∇hj (x̄), hni + τnkhnkεhj ,x̄(τnhn)


= τnh∇hj (x̄), hni + τnkhnkεhj ,x̄(τnhn).

71
Then, dividing by τn and letting n → ∞, we get

h∇hj (x̄), hi ≤ 0.

The result follows.


Unfortunately, the converse inclusion does not always hold.
Example: Consider the set
n o
2 2 3
K := (x, y) ∈ R | x − y = 0, −x ≤ 0 .

Then, it is easy to see that TK ((0, 0)) = {(0, γ) | γ ≥ 0} and the right hand side
of (26) is given by n o
2
(h1, h2) ∈ R | h1 ≥ 0 .
Definition 6. (i) We say that the constraint functions gi (i = 1, . . . , m) and hj
(j = 1, . . . , p) are qualified at x̄ if
( )
n
h∇gi(x̄), hi = 0 ∀ i = 1, . . . , m,
TK(x̄) = h ∈ R . (27)
h∇hj (x̄), hi ≤ 0 ∀ j ∈ I(x̄)

72
(ii) Any condition ensuring that the constraint functions are qualfied is called a
constraint qualification condition.

Remark 6. In general, the qualified character of the constraints is not a geometrical


property of the set K. Indeed, consider the set
n o
2 2
K= (x, y) ∈ R y−x =0 ,

which can also be written as


n o
2 2 4
K = (x, y) ∈ R y − x = 0, −y ≤ 0 .

Then, it is easy to check that the constraint functions are qualified at (0, 0) in the
first formulation but they are not qualified at (0, 0) in the second one.

 [The Karush-Kuhn-Tucker theorem] The main result here is the following first order
optimality condition.

73
Theorem 13. [Karush-Kuhn-Tucker] Let x̄ ∈ K be a local solution to (P ).
Assume that f , gi (i = 1, . . . , m), hj (j = 1, . . . , p) are C 1 and that the
constraint functions are qualified at x̄. Then, there exist (λ1, . . . , λm) ∈ Rm
and (µ1, . . . , µp) ∈ Rp such that
Pm Pp
∇f (x̄) + i=1 λi ∇gi (x̄) + j=1 µj ∇hj (x̄) = 0,
(28)
and ∀ j ∈ {1, . . . , p} we have µj ≥ 0, and µj hj (x̄) = 0.

Equivalently, setting g(x) = (g1(x), . . . , gm(x)) and h = (h1(x), . . . , hp(x))


for all x ∈ Rn, there exists λ ∈ Rm and µ ∈ Rp such that

∇f (x̄) + Dg(x̄)>λ + Dh(x̄)>µ = 0,


(29)
and ∀ j ∈ {1, . . . , p} we have µj ≥ 0, and µj hj (x̄) = 0.

Proof. Since the constraint functions are qualified at x̄, by Lemma 4 we have that
 
m
X X λi ∈ R ∀ i = 1, . . . , m, 
NK(x̄) = λi∇gi(x̄) + µj ∇hj (x̄) .

i=1 j∈I(x̄)
µ j ≥ 0 ∀ j ∈ I(x̄) 

74
Then, by Corollary 1, there exist λi ∈ R (i = 1, . . . , m) and µj ≥ 0 (j ∈ I(x̄))
such that
Xm X
−∇f (x̄) = λi∇gi(x̄) + µj ∇hj (x̄).
i=1 j∈I(x̄)
Relation (28) follows by setting µj = 0 for all j ∈ {1, . . . , p} \ I(x̄).

Let g(x) = (g1(x), . . . , gm(x)) and h(x) = (h1(x), . . . , hp(x)). The Lagrangian
L : Rn × Rm × Rp → R is defined by

L(x, λ, µ) = f (x) + hλ, g(x)i + hµ, h(x)),

and, at a local solution x̄, the optimality system (29) reads: there exists (λ, µ) ∈ Rm+p
such that
∇xL(x̄, λ, µ) = 0,
(30)
and ∀ j ∈ {1, . . . , p} we have µj ≥ 0, and µj hj (x̄) = 0.

System (30) is usually called KKT system and (λ, µ) are called Lagrange multipliers.

75
[The KKT system as a sufficient condition for convex problems] The KKT condition is
also sufficient for convex problems.
Proposition 6. Suppose that f is convex and C 1, gi(x) = hai, xi + bi, for ai ∈ Rn,
bi ∈ R (i = 1, . . . , m), and hj (j = 1, . . . , p) is convex and C 1. Moreover,
assume that there exists (x̄, λ, µ) ∈ Rn+m+p such that x̄ ∈ K and the KKT
system (30) holds at (x̄, λ, µ). Then x̄ is a global solution to (P ).

Proof. Note that L(x̄, λ, µ) = f (x̄) and that L(·, λ, µ) is convex. Then, for any
x ∈ K,

f (x̄) = L(x̄, λ, µ) + h∇xL(x̄, λ, µ), x − µi ≤ L(x̄, λ, µ)


Pm
= f (x) + i=1 λigi(x)
Pp
+ j=1 µj hj (x)
≤ f (x),

which implies that x̄ is a global solution to (P ).

Note that no qualification condition is needed in Proposition 6.

76
Remark 7. (i) [Equality constraints only] Suppose that we only have equality
constraints. Then, if x̄ ∈ K solves (P ) and the constraint functions are qualified at
x̄, then there exists λ ∈ Rm such that
m
X
∇f (x̄) + λi∇gi(x̄) = 0.
i=1

In this case the Lagrangian is given by L(x, λ) = f (x) + hλ, g(x)i, and the previous
relation can be written as
∇xL(x̄, λ) = 0. (31)
(ii) [Inequality constraints only] Suppose that we only have inequality constraints.
Then, if x̄ ∈ K solves (P ) and the constraint functions are qualified at x̄, then there
exists µ ∈ Rp such that
Pp
∇f (x̄) + j=1 µj ∇hj (x̄) = 0,
(32)
µj ≥ 0 and µj hj (x̄) = 0 ∀ j = 1, . . . , p.

In this case the Lagrangian is given by L(x, µ) = f (x) + hµ, h(x)i, and the previous

77
relation can be written as
∇xL(x̄, µ) = 0,
µj ≥ 0 and µj hj (x̄) = 0 ∀ j = 1, . . . , p.

(iii) [Maximization problems] Consider the maximization problem

max f (x)
s.t. gi(x) = 0, ∀ i = 1, . . . , m,
hj (x) ≤ 0, ∀ j = 1, . . . , p.

In this case, if x̄ is a local solution and the constraint functions are qualified at x̄, then
there exists λ ∈ Rm and µ ∈ Rp such that

−∇f (x̄) + Dg(x̄)>λ + Dh(x̄)>µ = 0,


(33)
and ∀ j ∈ {1, . . . , p} we have µj ≥ 0, and µj hj (x̄) = 0.

It is important to notice that, differently from the case where only equality constraints
were present, when inequality constraints are present the optimality system for local

78
solutions of the minimization and maximization problems differ. The coincidence of
both optimality systems is generally false and it is a specific feature of the problem
with equality constraints and of the unconstrained problem.
(iv) The following example shows that the assumption on the qualification of the
constraints plays an important role in the necessary condition. Consider the problem

min y
s.t. x2 − y 3 = 0,
−x ≤ 0.

In this case, (x̄, ȳ) = (0, 0) is a global solution and (29) reads
       
0 2x −1 0
+λ +µ = ,
1 −3y 2 (x,y)=(0,0)
0 0
which is impossible. Note that ∇g1(0, 0) = (0, 0) and ∇h1(0, 0) = (−1, 0).
Therefore,
2 2
{h ∈ R |h∇g1(0, 0), hi = 0, h∇h1(0, 0), hi ≤ 0} = {h ∈ R | h1 ≥ 0},

79
and TK(0, 0) = {h ∈ R2 |h1 = 0, h2 ≥ 0}. Thus, g1 and h1 are not qualified at
(0, 0).

 [On constraint qualifications] Let us now comment on some well-known constraint


qualifications. The first two conditions are easy to check but they can be applied only
when K is convex.

• [Affine constraints] If gi(x) = hai, xi + bi for all i = 1, . . . , m, and hj (x) =


ha0j , xi + b0j for all j = 1, . . . , m, where ai ∈ Rn, a0j ∈ Rn, bi ∈ R and b0j ∈ R,
then the constraint functions are qualified at every x ∈ K.

• [Slater condition] If gi(x) = hai, xi + bi for all i = 1, . . . , m, with ai ∈ Rn and


bi ∈ R, and, for all j = 1, . . . , p, the function hj is convex, then the constraint
functions are qualified at every x ∈ K if the following condition holds:

There exists x0 ∈ Rn such that gi(x0) = 0 ∀i = 1, . . . , m,


(SLC)
and hj (x0) < 0 ∀j = 1, . . . , p.

The following two conditions are more general but, at the same time, they are more

80
difficult to check.

• [Mangasarian-Fromovitz] The constraint functions are qualified at x̄ ∈ K if the


following conditions hold:

(i) the set of vectors {∇g1(x̄), . . . , ∇gm(x̄)} are linearly independent.


(ii) there exists d¯ ∈ Ker(Dg(x̄)) such that
(M F )

∇hj (x̄) · d¯ < 0 ∀ j ∈ I(x̄).

• [Linear independence constraint qualification] The constraint functions are qualified at


x̄ ∈ K if the following condition holds: the vectors
 m
(∇gi(x̄))i=1, (∇hj (x̄))j∈I(x̄) are linearly independent. (LICQ)

Remark 8. [On (LICQ) and the uniqueness of Lagrange multipliers] It is easy to


check that (LICQ) implies (M F ), but the converse is false. Moreover, it is easy to
check that (LICQ) implies that there exist at most one (λ, µ) ∈ Rm+p such that

81
(28) holds. In general, (M F ) only implies that the set of (λ, µ) ∈ Rm+p such that
(28) is a compact set.

 [Some examples] In the first example, we consider a problem where K is defined by


equality constraints only.
Example: Let us consider the problem

min xy
s.t. x2 + (y + 1)2 = 1.

In this case f : R2 → R, is given by f (x, y) = xy , and K = {(x, y) ∈


R2 | g(x, y) = 0}, with g : R2 → R being given by g(x, y) = x2 + (y + 1)2 − 1.
Note that K is given by the cercle centered at (0, −1) with radius 1. Hence, K is
a compact subset of R2. The function f being continuous, the Weierstrass theorem
implies that the optimization problem has at least one solution (x̄, ȳ) ∈ K.
Let us study the qualification condition (M F ) (when only equality constraints are
present). We have ∇g(x, y) = (2x, 2(y + 1)) and, hence, ∇g(x, y) = 0 iff

82
x = 0, y = −1. Thus, every (x, y) ∈ R2 \ {(0, −1)} satisfies (M F ). Since
(0, −1) ∈
/ K we deduce that (M F ) holds for every (x, y) ∈ K.
The Lagrangian L : R2 × R → R of this problem is given by

2 2
L(x, y, λ) = xy + λ(x + (y + 1) − 1).

By Theorem 13, we have the existence of λ ∈ R such that (31) holds at (x̄, ȳ, λ).
Now,
ȳ + 2λx̄ = 0,
∇(x,y)L(x̄, ȳ, λ) = 0 ⇔
x̄ + 2λ(ȳ + 1) = 0,
(34)
ȳ = −2λx̄,

(1 − 4λ2)x̄ = −2λ.
Now, 1 − 4λ2 = 0 iff λ = 1/2 or λ = −1/2, and both cases contradict the last
equality above. Therefore, 1 − 4λ2 6= 0 and, hence,

2λ −4λ2
x̄ = and ȳ = .
4λ2 − 1 4λ2 − 1

83
Since ∇λL(x̄, ȳ, λ) = g(x̄, ȳ) = 0, we get
2 2
4λ2
 

4λ2 −1
+ 1− 4λ2 −1
= 1,
2 2 2
⇔ 4λ + 1 = (4λ − 1)
⇔ (4λ2 − 1)2 − (4λ2 − 1) − 2 = 0,

which yields √ √
2 1+ 9 2 1− 9
4λ − 1 = 2 or 4λ − 1 = 2

i.e. λ2 = 3/4 or λ2 = 0.
√ √
If λ = 0, then (34) yields√ x̄ = ȳ = 0. If λ√= 3/2 we get x̄ = 3/2 and
ȳ = −3/2. If λ = − 3/2 we get x̄ = − 3/2 and ȳ = −3/2. Thus, the
candidates to solve the problem are
√ √
(x̄1, ȳ1) = (0, 0), (x̄2, ȳ2) = ( 3/2, −3/2) and (x̄3, ȳ3) = (− 3/2, −3/2).
√ √
We have f (x̄1, ȳ1) = 0, f (x̄2, ȳ2) = −3 3/4 and f (x̄3, ȳ3) = 3 3/4.
Therefore, the global solution is (x̄2, ȳ2). 

84
In the second example, we consider a problem where K is defined by inequality
constraints only.
Example: Consider the problem

min 4x2 + y 2 − x − 2y
s.t. 2x + y ≤ 1,
x2 ≤ 1.
   
8 0 −1
Note that, setting Q = and c = , in the notation for (P ) we
0 2 −2
have
f (x) = 1
2 hQ(x, y), (x, y)i + c>(x, y)
h1(x, y) = 2x + y − 1
h2(x, y) = x2 − 1.
Note that the feasible set is nonempty, convex, closed and the Slater condition is
satisfied (for instance h1(0, 0) < 0, h2(0, 0) < 0). Moreover, f is continuous,
strictly convex, differentiable and infinity at the infinity (Q is positive definite). We

85
deduce that there exists a unique solution (x̄, ȳ) ∈ R2 to problem (P ) and (x̄, ȳ) is
characterized by the KKT system (32). A point (x̄, ȳ, µ1, µ2) satisfies (32) iff
! ! ! !
x̄ 2 2x̄ 0
Q + c + µ1 + µ2 = ,
ȳ 1 0 0

µ1 ≥ 0, µ2 ≥ 0, µ1h1(x̄, ȳ) = 0, µ2h2(x̄, ȳ) = 0,

iff
8x̄ + 2µ1 + 2µ2x̄ = 1
2ȳ + µ1 = 2,
µ1 ≥ 0, µ2 ≥ 0, µ1h1(x̄, ȳ) = 0, µ2h2(x̄, ȳ) = 0,

Case 1: µ1 = µ2 = 0. We obtain (x̄, ȳ) = (1/8, 1) ∈ / K.


Case 2: µ1 > 0, µ2 > 0. In this case, we obtain 2x̄ + ȳ = 1, x̄2 = 1, which gives
(x̄, ȳ) = (1, −1) or (x̄, ȳ) = (−1, 3). In the first case, we should have

2µ1 + 2µ2 = −7,

86
which is impossible, because µ1 > 0 and µ2 > 0. In the second case, we should have
6 + µ1 = 2, which is also impossible.
Case 3: µ1 = 0, µ2 > 0. We obtain x̄2 = 1 and ȳ = 1, which gives (x̄, ȳ) = (1, 1)
or (x̄, ȳ) = (−1, 1). If (x̄, ȳ) = (1, 1) we should have 8 + 2µ2 = 1, which is
impossible. If (x̄, ȳ) = (−1, 1), we should have −8 − 2µ2 = 1, which is also
impossible.
Case 4: µ1 > 0, µ2 = 0. We obtain

2x̄ + ȳ = 1,
8x̄ + 2µ1 = 1
2ȳ + µ1 = 2

which implies
2x̄ + ȳ = 1,
8x̄ − 4ȳ = −3,
which gives (x̄, ȳ) = (1/16, 7/8) and µ1 = 1/4. This point (x̄, ȳ) belongs to
K. Therefore, we conclude that (x̄, ȳ, µ1, µ2) = (1/16, 7/8, 1/4, 0) is the unique
solution to the KKT system, and, hence, (x̄, ȳ) = (1/16, 7/8) is the unique global

87
solution to (P ).

88
Dynamic programming in discrete time: the finite
horizon case

 [Introduction: Shortest path between two vertices A and E on a graph] Consider a


salesman who has to go from city A to city E according to the following graph

The data in the graph G are


• For every vertex x we denote by Γ(x) the set of successors. For instance,
Γ(B) = {C, C 0}.

89
• The “travel time” (in hours) F (x, x0) of each V 0 ∈ Γ(x). For instance,
F (C 0, D) = 2.

• In order to compute the shortest path one could enumerate all the paths and choose
the one with the smaller travel time.
• However, it is more convenient to notice that if a path is optimal between A and E ,
and this path passes through a vertex x, then the “sub-path” between x and E will be
optimal for the shortest path problem between x and E .
• This suggest to parametrize the optimal travel time by the departure point. Let us set
V (x) as the smallest time needed to go from x to E . Then,
V (E) = 0,
V (D) = 5, V (D 0 ) = 2,
V (C) = 6, V (C 0 ) = min{2 + V (D), 1 + V (D 0 )} = 3, V (C 00 ) = 3,
V (B) = min{2 + V (C), 1 + V (C 0 )} = 4, V (B 0 ) = min{2 + V (C 0 ), 4 + V (C 00 )} = 5,
V (A) = min{1 + V (B), 1 + V (B 0 )} = 5.

• We deduce that the shortest travel time is V (A) = 5 and the shortest path is
ABC 0D 0E .

90
 [The general framework] We are interested in the problem
(T −1 )
X
sup Ft(xt, xt+1) + FT (xT ) , (Pf h)
(xt ) t=0

where
• xt ∈ X for all t = 0, . . . , T . The set X is called the state space.
• xt+1 ∈ Γt(xt) for all t = 0, . . . , T − 1. For all x ∈ X , and t = 0, . . . , T − 1,
Γt(x) is a nonempty subset of X .
• Ft(xt, xt+1) denotes the profit at time t for the pair (xt, xt+1), and FT (xT ) denotes
the final profit for the final state xT . Notice that redefining FT −1(xT −1, xT ) as

F̃T −1(xT −1, xT ) = FT −1(xT −1, xT ) + FT (xT ),

we can assume, without loss of generality, that FT ≡ 0.

 [Dynamic Programming relation] As in the shortest path problem, it is a good idea


to parametrize problem (Pf h). Given, (k, x) ∈ {1, . . . , T − 1} × X , the value

91
function at (k, x) is defined as

(T −1 )
X
V (k, x) = sup Ft(xt, xt+1) | xk = x, xt+1 ∈ Γt(xt) ∀ t = k, . . . , T − 1 .
t=k
(35)
For k = T and x ∈ X , we set V (T, x) = 0 (recall that we have a zero final cost).
The main result here is the following

Theorem 14. [Dynamic Programming] The following relations holds:

V (k, x) = sup {Fk (x, xk+1) + V (k + 1, xk+1) | xk+1 ∈ Γk (x)} ,


∀ k = 0, . . . , T − 1, x ∈ X. (36)
V (T, x) = 0 ∀ x ∈ X.

Proof. Let (xk , xk+1, . . . , xT ) be a feasible sequence for the problem defining

92
V (k, x), i.e. xk = x and xt+1 ∈ Γt(xt), for all t = k, . . . , T − 1. Then,

T −1
X T −1
X
V (k, x) ≥ Ft(xt, xt+1) = Ft(x, xk+1) + Ft(xt, xt+1).
t=k t=k+1

Using that the previous inequality holds for any (xk+1, . . . , xT ) such that xt+1 ∈
Γt(xt), for all t = k + 1, . . . , T − 1, by taking the supremum of the right hand side
with respect to those (xk+1, xk+2, . . . , xT ), we get

V (k, x) ≥ Ft(x, xk+1) + V (k + 1, xk+1)

Therefore, by taking the supremum with respect to xk+1 ∈ F (xk ), we get

V (k, x) ≥ sup {Fk (x, xk+1) + V (k + 1, xk+1) | xk+1 ∈ Γk (x)} .

Conversely, for any (x0k , x0k+1, . . . , x0T ) such that x0k = x and x0t+1 ∈ Γt(x0t), for all

93
t = k, . . . , T − 1, we have
PT −1 PT −1
t=k Ft(x0t, x0t+1) = Ft(x0, x0k+1) + t=k+1 Ft(x0t, x0t+1)
≤ Ft(x, x0k+1) + V (k + 1, x0k+1)

≤ sup Fk (x, xk+1) + V (k + 1, xk+1) xk+1 ∈ Γk (x) .

Using that (x0k , x0k+1, . . . , x0T ) is an arbitrary admissible sequence, by taking the
supremum on the left hand side, we get

V (k, x) ≤ sup {Fk (x, xk+1) + V (k + 1, xk+1) | xk+1 ∈ Γk (x)} .

The result follows.

 [Backward solution] By Theorem 14, we can solve backward for V using relations (36).
In particular, (36) characterizes the value function V . Now, et us assume that for all
(k, x) ∈ {0, . . . , T − 1} × X there exists s(k, x) ∈ Γk (x) such that

V (k, x) = Fk (x, s(k, x)) + V (k + 1, s(k, x)).

94
Then, by the very definition we have that (xk , xk+1, . . . , xT ), with

xk = x and xt+1 := s(t, xt) ∀ t = k, . . . , T − 1,

solves the problem in the right hand side of (35). In particular, this problem admits a
solution if
• X ⊆ Rn .
• For all (k, x) ∈ {0, . . . , T − 1} × X the set Γt(x) is nonempty and compact,
• and Ft and V (t, ·) are continuous for all t = k, . . . , T − 1.

Remark 9. Under the previous assumptions, V (t, ·) (t = 0, . . . , T − 1) is continuous


if Fk and Γk are continuous for all k = t . . . , T − 1. Concerning the latter continuity,
we recall that the correspondence Γk is called continuous at x ∈ X if the following
conditions hold:

(i) [Lower semicontinuity] For every y ∈ Γ(x) and every sequence (xn)n∈N such that
xn → x, as n → ∞, there exists a sequence (yn)n∈N such that yn ∈ Γk (xn), for all

95
n ∈ N, and yn → y as n → ∞.
(ii) [Upper semicontinuity] If (xn)n∈N and (yn)n∈N are two sequences such that xn → x,
as n → ∞, and yn ∈ Γk (xn), for all n ∈ N, then (yn)n∈N has a subsequence which
converges to a point y ∈ Γk (x).

Example: Let y ≥ 0 and consider the problem


( N N
)
X 2
X
inf xi | xi = y, xi ≥ 0 ∀ i = 1, . . . , N . (37)
i=1 i=1

• Let us solve (37) by using nonlinear programming tools. Note first that the cost
function is continuous, strictly convex and the feasible set is nonempty, convex and
compact. Therefore, there exists a unique solution x̄ to (37). Since the set K is defined
by affine-constraints, x̄ is characterized by the KKT system. Consider the Lagrangian
L : RN × R × RN → R defined by
N N
! N
X 2
X X
L(x, λ, µ) = xi +λ xi − y − µixi.
i=1 i=1 i=1

96
Then, (x̄1, . . . , x̄N ) is characterized by the existence of λ ∈ R and µ ∈ RN such
that
∂xi L(x, λ, µ) = 2x̄i + λ − µi = 0 ∀ i = 1, . . . , N.
PN
i=1 x̄i = y
(38)
µi ≥ 0 and µixi = 0 ∀ i = 1, . . . , N.
If y = 0, the only feasible point is xi = 0 for all i = 0, . . . , N and, hence,
x̄ = (0, . . . , 0) is the solution. If y > 0 and there exists î ∈ {1, . . . , N } such
that x̄î = 0, then, the first equation in (38) yields λ = µî ≥ 0. On the other hand,
there must exist i0 ∈ {1, . . . , N } such that x̄i0 > 0 (otherwise
P
i x̄i = y , x̄i ≥ 0
for all i ∈ {1, . . . , N } would not hold). The first and third equation in (38) imply
that λ = −2x̄i0 < 0, which is a contradiction. As a consequence, x̄i > 0 for all
i ∈ {1, . . . , N }. Then, the first and the second conditions in (38) yield x̄i = y/N
for all i ∈ {1, . . . , N }. Thus,

2
x̄ = (y/N, . . . , y/N ) is the solution to the problem and the optimal cost is y /N.

• Let us find the same conclusion by using dynamic programming techniques. Let us

97
define
( N N
)
X 2
X
V (k, y) := inf xi xi = y, xi ≥ 0 ∀ i = k, . . . , N .
i=k i=k

Note that we are interested in V (1, y). Notice that the problem defining V (1, y) has
not the form discussed in the previous subsection. However, arguing as in the proof of
Theorem 14, we can prove that (exercise)

inf x2k + V (k + 1, y − xk ) | 0 ≤ xk ≤ y

V (k, y) = ∀ y ≥ 0,
2
(39)
V (N, y) = y ∀ y ≥ 0.

Solving backwards, we get

inf x2N −1 + V (N, y − xN −1) | 0 ≤ xN −1 ≤ y



V (N − 1, y) =
inf xN −1 + (y − xN −1)2 | 0 ≤ xN −1 ≤ y .
 2
=

From the last expression we get s(N − 1, y) = y/2 and V (N − 1, y) = y 2/2.

98
Similarly,

inf x2N −2 + V (N − 1, y − xN −2) | 0 ≤ xN −2 ≤ y



V (N − 2, y) =
 2

(y−xN −2 )
= inf x2N −2 + 2 0 ≤ xN −2 ≤ y ,

from which we get s(N − 2, y) = y/3, and V (N − 2, y) = y 2/3. Recursively, for


all k = 1, . . . , N we get

2
s(k, y) = y/(N − k + 1) and V (k, y) = y /(N − k + 1).

Thus, we recover V (1, y) = y 2/N for the optimal cost and, adapting the definition of

99
successor according to the dynamic programming principle (39), for the solution we get

x1 = s(1, y) = y/N,
x2 = s(2, y − x1) = (y − y/N )/(N − 1) = y/N,
x3 = s(3, y − x1 − x2) = y(1 − 2/N )/(N − 2) = y/N,
...
 PN −1 
xN = s N, y − i=1 xi = y/N.

recovering our previous result.


Another way to tackle the problem is to perform a change of variable in order to be in
the framework of the previous subsection.
Let us define
N −i
X
zi := xj ∀ i = 0, . . . , N − 1.
j=1

Then, z0 = y , zN −1 = x1 and zi − zi+1 = xN −i for all i = 0, . . . , N − 2. Then,

100
problem (37) can be written as
(N −2 )
X 2 2
inf (zi+1 − zi) + zN −1 | z0 = y, 0 ≤ zi+1 ≤ zi ∀ i = 0, . . . , N − 2 .
i=0

By applying Theorem 14 to the problem above, we can also recover the desired solution
(exercise).

101
Dynamic programming in discrete time: the infinite
horizon case

 [Introduction: A model of optimal growth] We consider an economy where at each


period t a single good is produced. This good can be used for consumption or for
investment. At period t we denote by ct the consumption, by it the investment, by
kt the capital and by yt the production of the good. By assuming that yt = F (kt),
for some production function F , and that the capital depreciates at constant rate
δ ∈ (0, 1), we get the following relations

ct + it = yt = F (kt) and kt+1 = (1 − δ)kt + it.

Thus, setting f (kt) = F (kt) + (1 − δ)kt, we obtain

ct = F (kt) + (1 − δ)kt − kt+1 = f (kt) − kt+1.

Naturally, we impose ct ≥ 0 and kt ≥ 0, conditions which imply 0 ≤ kt+1 ≤ f (kt).

102
Finally, the preference over consumption is supposed to have the form

X t
β U (ci) for some discount factor β ∈ (0, 1),
t=0

and some utility function U . For a given initial capital k0 > 0, the utility maximization
problem is P∞ t
sup t=0 β U (f (kt ) − kt+1 )

s.t. 0 ≤ kt+1 ≤ f (kt).

 [The mathematical framework] For β ∈ (0, 1), we consider the problem


X t
sup β F (xt, xt+1), (Pih)
(xt ) t=0

where
• xt ∈ X for all t ≥ 0, with X being a given set and x0 ∈ X being prescribed.

103
• xt+1 ∈ Γ(xt), where, for all x ∈ X , Γ(x) is a nonempty subset of X .
• F (xt, xt+1) denotes the profit for the pair (xt, xt+1). In this section F is assumed to
be bounded.

 [Dynamic Programming equation] Given x ∈ X , we define the value function


( )
X t
V (x) := sup β F (xt, xt+1) x0 = x, xt+1 ∈ Γ(xt) ∀ t ≥ 0 .
t=0

A first important consequence of our assumptions is that V : X → R is well-defined.


The main result here is the following

Theorem 15. [Dynamic Programming] For all x ∈ X we have

V (x) = sup {F (x, x1) + βV (x1) | x1 ∈ Γ(x)} . (40)

Proof. The proof is similar to the proof of Theorem 15. Indeed, for any admissible

104
sequence (xt)t≥0 we have
P∞
V (x) ≥ t=0 β tF (xt, xt+1)
P∞
= F (x, x1) + β t=1 β t−1F (xt, xt+1), (41)
P∞
= F (x, x1) + β t=0 β tF (xt+1, xt+2).

Now, let us fix x1 ∈ Γ(x). Note that if (x0t)t≥1 is an admissible sequence for
the problem defining V (x1), then x, x01, x02, x03, ... is an admissible sequence for the
problem defining V (x). This remark and (41) imply

V (x) ≥ F (x, x1) + βV (x1),

and, hence, since x1 ∈ Γ(x) is arbitrary, we get



V (x) ≥ sup F (x, x1) + βV (x1) x1 ∈ Γ(x) .

Conversely, for any admissible sequence (x0t)t≥0 for the problem defining V (x), we

105
have
P∞ 0 0 P∞
t=0 β t
F (x t , x t+1 ) = F (x, x01) + β t=0 β t
F (x 0
t+1 , x 0
t+2 ),

≤ F (x, x01) + βV (x01)



≤ sup F (x, x1) + βV (xt+1) x1 ∈ Γ(x) .

We conclude that

V (x) ≤ sup F (x, x1) + βV (x1) x1 ∈ Γ(x) .

Relation (40) follows.

Remark 10. (i) Differently from the finite horizon case, in which the dynamic
programming relations characterize the value function, in general equation (40) can
have more than one solution. However, under our boundedness assumption on F ,
which in practice are rather restrictive, it is possible to show that the functional
equation (40) admits a unique solution. As a consequence (40) characterizes the value
function V .

106
Moreover, this solution can be computed as the limit of the following sequence of
functions
n o
`+1 `
V (x) = sup F (x, xt+1) + βV (xt+1) xt+1 ∈ Γ(x) , ∀ x ∈ X,

with V 0 : X → R being arbitrary.


(ii) If in addition we assume that X is a compact subset of Rn, Γ(x) closed for all
x ∈ X , and that F and the correspondence Γ are continuous, then the value function
V can also be shown to be continuous. In particular, for every x ∈ X , there exists
s(x) ∈ Γ(x) such that

V (x) = F (x, s(x)) + βV (s(x)).

As a consequence, the sequence defined recursively by x̄0 = x, x̄t+1 = s(x̄t) for all
t ≥ 0 solves the problem defining V (x).

107

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy