0% found this document useful (0 votes)
6 views24 pages

Clnote Sept24

The document discusses optimization in nonlinear programming, focusing on optimality criteria and the nature of stationary points using Hessian matrices. It outlines stopping conditions for minimization and maximization, as well as convergence speed and strategies for moving between points in optimization. Additionally, it details line search methods and Wolfe conditions for ensuring sufficient decrease in objective functions during optimization processes.

Uploaded by

Sreya Banerjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views24 pages

Clnote Sept24

The document discusses optimization in nonlinear programming, focusing on optimality criteria and the nature of stationary points using Hessian matrices. It outlines stopping conditions for minimization and maximization, as well as convergence speed and strategies for moving between points in optimization. Additionally, it details line search methods and Wolfe conditions for ensuring sufficient decrease in objective functions during optimization processes.

Uploaded by

Sreya Banerjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Optimization

Nonlinear programming:
Multi-dimensional minimization methods
Optimality Criteria
Given an objective function f(x), where x = [x1, x2, …… xN]T
and let x* be the optimum point.
Taylor series approximation of f(x) at x* will yield

f (x ) = f (x *) + ∇f (x *) (x − x *) + (x − x *)T ∇ 2 f (x *)(x − x *) + Ο3 (∆x )


T 1
2

• Stationary Condition: ∇f (x *) = 0

• Sufficient minimum criteria ∇ 2 f (x *) positive definite

Sufficient maximum criteria ∇ f (x *) negative definite


2

At stationary point,
1
𝛻𝛻𝑓𝑓 = 𝑓𝑓 𝐱𝐱 − f 𝐱𝐱 ∗ = (𝐱𝐱 − 𝐱𝐱 ∗ )𝑇𝑇 𝛻𝛻 2 𝑓𝑓 𝐱𝐱 − 𝐱𝐱 ∗ ≅ 𝑋𝑋 𝑇𝑇 𝐻𝐻𝐻𝐻
2
Nature of stationary points

• Hessian H positive definite:


• Quadratic form X T HX > 0
• Eigenvalues λi > 0

● Local nature: minimum


Nature of stationary points (2)
• Hessian H negative definite:
• Quadratic form X T HX < 0
• Eigenvalues λi < 0

● Local nature: maximum


Nature of stationary points (3)
• Hessian H indefinite:
• Quadratic form X T HX ≠ 0
• Eigenvalues λi < 0 for i = 1,2,m and λi > 0 for i = m+1,..n

● Local nature: saddle point


Nature of stationary points (4)
• Hessian H positive semi-definite:
• Quadratic form X T HX ≥ 0
• Eigenvalues λi ≥ 0

● Local nature: valley


Nature of stationary points (5)
• Hessian H negative semi-definite:
• Quadratic form X T HX ≤ 0
• Eigenvalues λi ≤ 0

● Local nature: ridge


Stationary point nature summary
X T HX λi Definiteness H Nature x*

>0 Positive d. Minimum

≥0 Positive semi-d. Valley

≠0 Indefinite Saddlepoint

≤0 Negative semi-d. Ridge

<0 Negative d. Maximum


Stopping Conditions: 𝑔𝑔(𝐱𝐱 𝑘𝑘 ) = 0 and for
minimization 𝐻𝐻 𝐱𝐱 𝑘𝑘 𝑖𝑖𝑖𝑖 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
maximization 𝐻𝐻 𝐱𝐱 𝑘𝑘 𝑖𝑖𝑖𝑖 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑

Practical Stopping Criteria:

1. 𝑔𝑔(𝐱𝐱 𝑘𝑘 ) ≤ 𝜖𝜖

2. 𝑔𝑔(𝐱𝐱 𝑘𝑘 ) ≤ 𝜖𝜖 1 + 𝑓𝑓(𝐱𝐱 𝑘𝑘 )

𝑓𝑓(𝐱𝐱 𝑘𝑘 )−𝑓𝑓(𝐱𝐱 𝑘𝑘+1 )


3. 𝑓𝑓(𝐱𝐱 𝑘𝑘 )
≤ 𝜖𝜖
Speed of Convergence
Assume that an optimization algorithm generates a sequence
{xk-2 , xk-1, xk, xk+1, ….} that converges to x*

How fast this sequence converges to x* ?

Definition
Sequence {xk-2 , xk-1, xk, xk+1, ….} converges to x* with order p if
𝐱𝐱 𝑘𝑘+1 −𝐱𝐱 ∗
lim 𝑝𝑝 = 𝛽𝛽, where β < ∞
𝑘𝑘→∞ 𝐱𝐱 𝑘𝑘 −𝐱𝐱 ∗

𝑝𝑝
Asymptotically, 𝐱𝐱 𝑘𝑘+1 − 𝐱𝐱 ∗ = 𝛽𝛽 𝐱𝐱 𝑘𝑘 − 𝐱𝐱 ∗

Higher the value of p, the faster is the convergence


Case 1: p = 1, 0 < β < 1 Linear Convergence
Some examples: β = 0.1, 𝐱𝐱 𝑜𝑜 − 𝐱𝐱 ∗ = 0.1
Norms of 𝐱𝐱 𝑜𝑜 − 𝐱𝐱 ∗ : 10-1 , 10-2 , 10-3 , 10-4 , ……
β = 0.9, 𝐱𝐱 𝑜𝑜 − 𝐱𝐱 ∗ = 0.1
Norms of 𝐱𝐱 𝑜𝑜 − 𝐱𝐱 ∗ : 10-1 , 0.09, 0.081 , 0.0729, ……

Case 2: p = 2, β > 0 Quadratic Convergence


Examples: β = 1, 𝐱𝐱 𝑜𝑜 − 𝐱𝐱 ∗ = 0.1
Norms of 𝐱𝐱 𝑜𝑜 − 𝐱𝐱 ∗ : 10-1 , 10-2 , 10-4 , 10-8 , ……

Case 3: Suppose an algorithm generates a convergent


sequence such that
𝐱𝐱 𝑘𝑘+1 −𝐱𝐱 ∗ 𝐱𝐱 𝑘𝑘+1 −𝐱𝐱 ∗
lim 𝑝𝑝=1 = 0, and lim 𝑝𝑝=2 = ∞,
𝑘𝑘→∞ 𝐱𝐱 𝑘𝑘 −𝐱𝐱 ∗ 𝑘𝑘→∞ 𝐱𝐱 𝑘𝑘 −𝐱𝐱 ∗

Which means β > 0 and 1<p<2,


Then the convergence is called Super-linear Convergence
Two Strategy to move
from xk to xk+1
1. Line search
2. Trust Region

Line Search Strategy


f(xk+1) = f(xk + αk dk) = f(xk) + αk g(xk)T dk

Since, αk is positive scalar, g(xk)T dk = f(xk+1) - f(xk) < 0


Step Length determination
Exact Line search:
Given a descent direction dk , calculate αk by solving the following
single variable optimization problem
𝛼𝛼 𝑘𝑘 = arg min 𝜑𝜑 𝛼𝛼 = 𝑓𝑓(𝐱𝐱 𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 )
𝛼𝛼>0
• Analytical as well as all single variable numerical minimization
methods can be used.
• It is very expensive and requires too many evaluations of the
objective function and its derivative.
• It is not required to find exact step length when multivariable
problem is far away from optimum solution.

As long as there is an acceptable calculation of step length


which ensures that the objective function has sufficient
descent, the exact line search can be avoided and In-exact
Line Search may be employed.
In-exact line search
1. Choose arbitrary αk such that f(xk+1) = f(xk + αk dk ) < f(xk)

This requirement is not sufficient as shown in the example below:

Consider the problem: min x2

Local and Global minimum at x* = 0


Let xk = (-1)k (1+2-k) and dk = (-1)k

{x} : {2, -3/2, 5/4, -9/8, ….}


{f} : {4, 9/4, 25/16, 81/64, ….}

So, f(xk+1) < f(xk) for k=0,1,2,…

the sequence xk does not converge

Small decrease of function values relative to step length


In-exact line search
1. Choose arbitrary αk such that f(xk+1) = f(xk + αk dk ) < f(xk)

This requirement is not sufficient as shown in the 2nd example below:

Consider the problem: min x2

Local and Global minimum at x* = 0


Let xk = (1+2-k) and dk = (-1)

{x} : {2, 3/2, 5/4, 9/8, ….}


{f} : {4, 9/4, 25/16, 81/64, ….}

So, f(xk+1) < f(xk) for k=0,1,2,…

the sequence xk does not converge

Step lengths are too small relative to decrease of function values


Wolfe Conditions
A popular inexact line search condition stipulates that αk should first of all
give sufficient decrease in the objective function f , as measured by the
inequality:
f(xk + αk dk ) ≤ f(xk) + c1 αk (gk)T dk for c1 ∈ (0,1)

In other words, the reduction in f should be proportional to both the step


length αk and the directional derivative (gk)T dk. This inequality is known as
Armijo condition.

Define
φ1(α) = f(xk) + c1 αk (gk)T dk

Choose αk such that


f(xk + αk dk ) ≤ φ1(αk)

In practice, c1 is to be quite
small, ~ 10-4
The sufficient decrease condition is not enough by itself to ensure that the
algorithm makes reasonable progress because it is satisfied for all sufficiently
small values of α.
To rule out unacceptably short steps we introduce a second requirement, called
the curvature condition, which requires αk to satisfy

g(xk + αk dk )T dk ≥ c2 (gk)T dk for some constant c2 ∈ (c1, 1)


The sufficient decrease (Armijo condition) and curvature conditions are
known collectively as the Wolfe conditions.

f(xk + αk dk ) ≤ f(xk) + c1 αk (gk)T dk

g(xk + αk dk )T dk ≥ c2 (gk)T dk with 0 < c1 < c2 < 1


Goldstein’s Condition ensure that step lengths are not too small.
Define
φ2(α) = f(xk) + c2 αk (gk)T dk for some constant c2 ϵ (c1, 1)

Choose αk such that f(xk + αk dk ) ≥ φ2(αk)


Armijo – Goldstein’s Condition:

Choose αk such that φ2(αk) ≤ f(xk + αk dk ) ≤ φ1(αk)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy