25 Optimization
25 Optimization
courses.d2l.ai/berkeley-stat-157
Optimization Problems
• General form:
minimize f(x) subject to x ∈ C
• Cost function f : ℝn → ℝ
• Constraint set example
C = {x | h1(x) = 0,…, hm(x) = 0, g1(x) ≤ 0,…, gr(x) ≤ 0}
• Unconstraint if C = ℝn
courses.d2l.ai/berkeley-stat-157
Local Minima and Global Minima
f(x*) ≤ f(x) ∀x ∈ C
courses.d2l.ai/berkeley-stat-157
Convex Set
• A subset C of ℝn is called
convex if
αx + (1 − α)y ∈ C
∀α ∈ [0,1] ∀x, y ∈ C
courses.d2l.ai/berkeley-stat-157
Convex Function
• f : C → ℝ is called convex if
f(αx + (1 − α)y)
≤ αf(x) + (1 − α)f(y)
∀α ∈ [0,1] ∀x, y ∈ C
• If the inequality is strict
whenever α ∈ (0,1) and
x ≠ y, then f is called strictly
convex
courses.d2l.ai/berkeley-stat-157
First-order condition
courses.d2l.ai/berkeley-stat-157
Second-order conditions
∇2 f(x) ⪰ 0 ∀x ∈ C
∇2 f(x) ≻ 0 ∀x ∈ C
courses.d2l.ai/berkeley-stat-157
Convex and Non-convex Examples
• Convex
• Linear regression f(x) = ∥Wx − b∥22
∇f(x) = 2WT (Wx − b), ∇2 f(x) = 2WT W
• Softmax regression
• Non-convex
• Multi-layer perception
• Convolution neural networks
• Recurrent neural networks
courses.d2l.ai/berkeley-stat-157
Convex Optimization
Global minima
courses.d2l.ai/berkeley-stat-157
Proof
courses.d2l.ai/berkeley-stat-157
Gradient
Descent
courses.d2l.ai/berkeley-stat-157
Algorithm
• Choose initial x0
• At time t = 1,…, T
xt = xt−1 − η ∇f(xt−1)
courses.d2l.ai/berkeley-stat-157
The Choice of Learning Rate
courses.d2l.ai/berkeley-stat-157
Convergence Rate
courses.d2l.ai/berkeley-stat-157
Proof
( 2 )
Lη
f(y) ≤ f(x) − 1 − η∥∇f(x)∥2
courses.d2l.ai/berkeley-stat-157
Proof III
∑
f(xt) − f(x*) ≤
∑ ( ∥xt−1 − x*∥2 − ∥xt − x*∥2) /2η
t=1 t=1
courses.d2l.ai/berkeley-stat-157
Apply to Deep Learning
courses.d2l.ai/berkeley-stat-157
Stochast
Gradient
Descent
Singapore Dollar (SGD) 1000
~740 USD
courses.d2l.ai/berkeley-stat-157
Algorithm
xt = xt−1 − η ∇f(xt−1)
1 n
∑
f(x) = ℓi(x)
n i=0
courses.d2l.ai/berkeley-stat-157
Sample Example
𝔼 [ ∇ℓti(x)] = 𝔼[ ∇f(x)]
courses.d2l.ai/berkeley-stat-157
Convergence Rate
courses.d2l.ai/berkeley-stat-157
In Practice
courses.d2l.ai/berkeley-stat-157
Code…
courses.d2l.ai/berkeley-stat-157
Mini-batch SGD
courses.d2l.ai/berkeley-stat-157
Algorithm
courses.d2l.ai/berkeley-stat-157
Code…
courses.d2l.ai/berkeley-stat-157