0% found this document useful (0 votes)
72 views15 pages

Quasi Newton PDF

The document discusses quasi-Newton methods for optimization including: 1. The BFGS update which approximates the Hessian and preserves positive definiteness. 2. The BFGS update satisfies the secant condition and guarantees descent directions. 3. BFGS converges superlinearly for strongly convex problems with Lipschitz continuous Hessians.

Uploaded by

Sparrow Jack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views15 pages

Quasi Newton PDF

The document discusses quasi-Newton methods for optimization including: 1. The BFGS update which approximates the Hessian and preserves positive definiteness. 2. The BFGS update satisfies the secant condition and guarantees descent directions. 3. BFGS converges superlinearly for strongly convex problems with Lipschitz continuous Hessians.

Uploaded by

Sparrow Jack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

EE236C (Spring 2011-12)

2. Quasi-Newton methods

• variable metric methods

• quasi-Newton methods

• BFGS update

• limited-memory quasi-Newton methods

2-1
Newton method for unconstrained minimization

minimize f (x)

f convex, twice continously differentiable

Newton method

x+ = x − t∇2f (x)−1∇f (x)

• advantages: fast convergence, affine invariance


• disadvantages: requires second derivatives, solution of linear equation

can be too expensive for large scale applications

Quasi-Newton methods 2-2


Variable metric methods

x+ = x − tH −1∇f (x)

H ≻ 0 is approximation of the Hessian at x, chosen to:

• avoid calculation of second derivatives


• simplify computation of search direction

‘variable metric’ interpretation (EE236B, lecture 10, page 11)

∆x = −H −1∇f (x)

is steepest descent direction at x for quadratic norm

T
1/2
kzkH = z Hz

Quasi-Newton methods 2-3


Quasi-Newton methods

given starting point x(0) ∈ dom f , H0 ≻ 0


for k = 1, 2, . . ., until a stopping criterion is satisfied
−1
1. compute quasi-Newton direction ∆x = −Hk−1 ∇f (x(k−1))
2. determine step size t (e.g., by backtracking line search)
3. compute x(k) = x(k−1) + t∆x
4. compute Hk

• different methods use different rules for updating H in step 4


• can also propagate Hk−1 to simplify calculation of ∆x

Quasi-Newton methods 2-4


Broyden-Fletcher-Goldfarb-Shanno (BFGS) update

BFGS update

yy T Hk−1ssT Hk−1
Hk = Hk−1 + T −
y s sT Hk−1s

where
s = x(k) − x(k−1), y = ∇f (x(k)) − ∇f (x(k−1))

inverse update
T T
ssT
   
sy ys
Hk−1 = I− T −1
Hk−1 I − T + T
y s y s y s

• note that y T s > 0 for strictly convex f ; see page 1-11


• cost of update or inverse update is O(n2) operations

Quasi-Newton methods 2-5


Positive definiteness

if y T s > 0, BFGS update preserves positive definitess of Hk

proof: from inverse update formula,

T
T T
(sT v)2
  
s v s v
v T Hk−1v = v− T y −1
Hk−1 v− T y + T
s y s y y s

• if Hk−1 ≻ 0, both terms are nonnegative for all v


• second term is zero only if sT v = 0; then first term is zero only if v = 0

this ensures that ∆x = −Hk−1∇f (x(k)) is a descent direction

Quasi-Newton methods 2-6


Secant condition

BFGS update satisfies the secant condition Hk s = y, i.e.,

Hk (x(k) − x(k−1)) = ∇f (x(k)) − ∇f (x(k−1))

interpretation: define second-order approximation at x(k)

(k) (k) T (k) 1


fquad(z) = f (x ) + ∇f (x ) (z − x ) + (z − x(k))T Hk (z − x(k))
2

secant condition implies that gradient of fquad agrees with f at x(k−1):

∇fquad(x(k−1)) = ∇f (x(k)) + Hk (x(k−1) − x(k))


= ∇f (x(k−1))

Quasi-Newton methods 2-7


secant method
for f : R → R, BFGS with unit step size gives the secant method

′ (k)
f (x ) f ′(x(k)) − f ′(x(k−1))
x(k+1) (k)
=x − , Hk =
Hk x(k) − x(k−1)

x(k−1) x(k) x(k+1)


fquad (z)

f ′(z)

Quasi-Newton methods 2-8


Convergence

global result

if f is strongly convex, BFGS with backtracking line search (EE236B,


lecture 10-6) converges from any x(0), H (0) ≻ 0

local convergence

if f is strongly convex and ∇2f (x) is Lipschitz continuous, local


convergence is superlinear : for sufficiently large k,

kx(k+1) − x⋆k2 ≤ ck kx(k) − x⋆k2 → 0

where ck → 0 (cf., quadratic local convergence of Newton method)

Quasi-Newton methods 2-9


Example
m
X
minimize cT x − log(bi − aTi x)
i=1
n = 100, m = 500
Newton BFGS
2 2
10 10

0 0
10 10
f (x(k)) − f ⋆

f (x(k)) − f ⋆
-2 -2
10 10

-4 -4
10 10

-6 -6
10 10

-8 -8
10 10

-10 -10
10 10

-12 -12
10 10
0 1 2 3 4 5 6 7 8 9 0 20 40 60 80 100 120 140

k k

cost per Newton iteration: O(n3) plus computing ∇2f (x)


cost per BFGS iteration: O(n2)

Quasi-Newton methods 2-10


Square root BFGS update

to improve numerical stability, can propagate Hk in factored form

if Hk−1 = Lk−1LTk−1 then Hk = Lk LTk with

T
(αỹ − s̃) s̃
 
Lk = Lk−1 I + ,
s̃T s̃

where 1/2
T

s̃ s̃
ỹ = L−1
k−1 y, s̃ = Lk−1s, α=
yT s

if Lk−1 is triangular, cost of reducing Lk to triangular is O(n2)

Quasi-Newton methods 2-11


Optimality of BFGS update

X = Hk solves the convex optimization problem

−1 −1
minimize tr(Hk−1 X) − log det(Hk−1 X) − n
subject to Xs = y

• cost function is nonnegative, equal to zero only if X = Hk−1


• also known as relative entropy between densities N (0, X), N (0, Hk−1)

optimality result follows from KKT conditions: X = Hk satisfies

1 T
X −1
= −1
Hk−1 − (sν + νsT ), Xs = y, X≻0
2

with ! !
T −1
1 −1 y Hk−1 y
ν= 2Hk−1 y − 1+ s
sT y yT s

Quasi-Newton methods 2-12


Davidon-Fletcher-Powell (DFP) update

switch Hk−1 and X in objective on previous page

minimize tr(Hk−1X −1) − log det(Hk−1X −1) − n


subject to Xs = y

• minimize relative entropy between N (0, Hk−1) and N (0, X)


• problem is convex in X −1 (with constraint written as s = X −1y)
• solution is ‘dual’ of BFGS formula
T T
yy T
   
ys sy
Hk = I− T Hk−1 I − T + T
s y s y s y

(known as DFP update)

pre-dates BFGS update, but is less often used

Quasi-Newton methods 2-13


Limited memory quasi-Newton methods

main disadvantage of quasi-Newton method is need to store Hk or Hk−1

limited-memory BFGS (L-BFGS): do not store Hk−1 explicitly

• instead we store the m (e.g., m = 30) most recent values of

sj = x(j) − x(j−1), yj = ∇f (x(j)) − ∇f (x(j−1))

• we evaluate ∆x = Hk−1∇f (x(k)) recursively, using


! !
sj yjT yj sTj sj sTj
Hj−1 = I− −1
Hj−1 I− + T
yjT sj yjT sj yj s j

−1
for j = k, k − 1, . . . , k − m + 1, assuming, for example, Hk−m =I
• cost per iteration is O(nm); storage is O(nm)

Quasi-Newton methods 2-14


References

• J. Nocedal and S. J. Wright, Numerical Optimization (2006), chapters


6 and 7

• J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained


Optimization and Nonlinear Equations (1983)

Quasi-Newton methods 2-15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy