10 Unconstrained
10 Unconstrained
Newton’s method
Self-concordant functions
Implementation
minimize f (x)
▶ we assume
– f convex, twice continuously differentiable (hence dom f open)
– optimal value p★ = inf x f (x) is attained at x★ (not necessarily unique)
∇f (x) = Px + q = 0
▶ 2nd condition is hard to verify, except when all sublevel sets are closed
– equivalent to condition that epi f is closed
– true if dom f = Rn
– true if f (x) → ∞ as x → bd dom f
1
f (x) − p★ ≤ ∥∇f (x) ∥ 22
2m
▶ useful as stopping criterion (if you know m, which usually you do not)
Convex Optimization Boyd and Vandenberghe 9.6
Outline
Newton’s method
Self-concordant functions
Implementation
f (x + tΔx)
– very slow if 𝛾 ≫ 1 or 𝛾 ≪ 1
x (0)
x2
0
– example for 𝛾 = 10 at right
x (1)
– called zig-zagging
−4
−10 0 10
x1
x (0) x (0)
x (2)
x (1)
x (1)
Í500
▶ f (x) = cT x − i=1 log(bi − aTi x)
104
102
f (x (k) ) − p★
100 exact l.s.
10 −2
backtracking l.s.
10 −4 0 50 100 150 200
k
Newton’s method
Self-concordant functions
Implementation
−∇f (x)
−∇f (x)
Δxnsd Δxnsd
x (0)
x (0)
x (2)
x (1) x (2)
x (1)
▶ steepest descent with backtracking line search for two quadratic norms
▶ ellipses show {x | ∥x − x (k) ∥ P = 1}
▶ interpretation of steepest descent with quadratic norm ∥ · ∥ P : gradient descent after change
of variables x̄ = P1/2 x
▶ shows choice of P has strong effect on speed of convergence
Newton’s method
Self-concordant functions
Implementation
1
f (x + v) = f (x) + ∇f (x) T v + vT ∇2 f (x)v
b
2
b
f
(x, f (x))
(x + Δxnt , f (x + Δxnt )) f
f (x + v) = ∇f (x) + ∇2 f (x)v = 0
∇f (x + v) ≈ ∇b
b
f′
f′
(x + Δxnt , f ′ (x + Δxnt ))
(x, f ′ (x))
1/2
▶ Δxnt is steepest descent direction at x in local Hessian norm ∥u∥ ∇2 f (x) = uT ∇2 f (x)u
x + Δxnsd
x + Δxnt
assumptions
▶ f strongly convex on S with constant m
▶ ∇2 f is Lipschitz continuous on S, with constant L > 0:
f (x (0) ) − p★
+ log2 log2 (𝜖0 /𝜖)
𝛾
105
x (0) 100
f (x (k) ) − p★
x (1) 10 −5
10 −10
10 −150 1 2 3 4 5
k
backtracking
10 −5 1
exact line search
10 −10 0.5 backtracking
10 −15 0
0 2 4 6 8 10 0 2 4 6 8
k k
105
f (x (k) ) − p★
100
10 −5
0 5 10 15 20
k
Newton’s method
Self-concordant functions
Implementation
definition
▶ convex f : R → R is self-concordant if |f ′′′ (x)| ≤ 2f ′′ (x) 3/2 for all x ∈ dom f
▶ f : Rn → R is self-concordant if g(t) = f (x + tv) is self-concordant for all x ∈ dom f , v ∈ Rn
examples on R
▶ linear and quadratic functions
▶ negative logarithm f (x) = − log x
▶ negative entropy plus negative logarithm: f (x) = x log x − log x
properties
▶ preserved under positive scaling 𝛼 ≥ 1, and sum
▶ preserved under composition with affine function
▶ if g is convex with dom g = R++ and |g′′′ (x)| ≤ 3g′′ (x)/x then
is self-concordant
examples: properties can be used to show that the following are s.c.
▶ f (x) = − m T T
Í
i=1 log(bi − ai x) on {x | ai x < bi , i = 1, . . . , m}
▶ f (X) = − log det X on Sn++
▶ f (x) = − log(y2 − xT x) on {(x, y) | ∥x∥ 2 < y}
f (x (0) ) − p★
+ log2 log2 (1/𝜖)
𝛾
20
iterations
15
◦: m = 100, n = 50
□: m = 1000, n = 500 10
^ : m = 1000, n = 50
5
0
0 5 10 15 20 25 30 35
f (x (0) ) − p★
▶ number of iterations much smaller than 375(f (x (0) ) − p★) + 6
▶ bound of the form c(f (x (0) ) − p★) + 6 with smaller c (empirically) valid
Convex Optimization Boyd and Vandenberghe 9.36
Outline
Newton’s method
Self-concordant functions
Implementation
main effort in each iteration: evaluate derivatives and solve Newton system
HΔx = −g