p5 CO Opti Algo
p5 CO Opti Algo
Learning
Amit Chattopadhyay
IIIT-Bangalore
1
5. Optimization Algorithms
Overview
xk+1 = xk + sk vk
First-order methods:
• Gradient descent method
• Subgradient method
• Proximal gradient descent
• Stochastic gradient descent
Second-order methods:
• Newton’s method
• Barrier method
• Primal-dual interior-point methods
• Quasi-Newton methods
• Proximal Newton method
4
First-order: Gradient Descent Method
∇f0 (xk )
vk = −
∥∇f0 (xk )∥
5
First-order: Gradient Descent Method
Stepsize:
Restriction of f0 along vk : ϕ(s) = f0 (xk + svk )
To find s > 0 such that: ϕ(s) < ϕ(0)
Exact Line Search: s ∗ = arg min ϕ(s)
s≥0
(Computationally Expensive)
6
Stepsize: Practical Approach
Armijo Condition
Valid step sizes must satisfy: ϕ(s) ≤ ϕ(0) + s(αδk )
More explicitly, f0 (xk + svk ) ≤ f0 (xk ) + sα(∇f0 (xk )T vk ) for chosen
α ∈ (0, 1)
Note: s̄ is the smallest point where ϕ(s) and l̄(s) cross. Armijo
condition is satisfied ∀s ∈ (0,s̄).
7
Stepsize: Practical Approach
8
Stepsize: Lower Bound
sk ≥ slb , ∀k = 0, 1, . . .
9
Convergence
k
Thus, slb α ∑ ∥∇f0 (xi )∥22 ≤ f0 (x0 ) − f (xk+1 ) ≤ f0 (x0 ) − f0∗
i=0
=⇒ lim ∥∇f0 (xk )∥2 = 0 (the algorithm converges to a stationary point
k→∞
of f0 )
10
Convergence
k
∑ ∥∇f0 (xi )∥22 ≥ (k + 1) i=0,...,k
min ∥∇f0 (xi )∥
i=0
1 1
q
=⇒ gk∗ = min ∥∇f0 (xi )∥2 ≤ √ √ f0 (x0 ) − f0∗
i=0,...,k 1 + k slb α
1
=⇒ gk∗ ∝ √
1+k
Stopping criterion is set as: ∥∇f0 (xi )∥2 ≤ ε
The exit condition is achieved in
1 f0 (x0 ) − f0∗
kmax =
ε2 slb α
.
11
Convergence: Convex Function
∥x0 − x∗ ∥22
kmax =
2εslb
12
Second-order: Newton’s Algorithm
13
Second-order: Newton’s Algorithm
14
Stochastic Gradient Descent Method (SGD)
1 m
minn f (x) = ∑ fi (x)
x∈R m i=1
15