Lecture 9
Lecture 9
Math Foundations
Team
Introduction
Defination
Working rule
Example 1
Example 1
Example 2
Example 3
Motivation
Unconstrained Optimization
Figure: left: with a learning rate of 0.01, local minimum is reached within a
couple of steps. right: When learning rate is reduced to 0.001, we need
relatively more steps to reach the local minimum
Example
Figure: left: with a learning rate of 0.01, minimum is reached. right: When
learning rate is reduced to 0.001, we need relatively more steps to reach the
minimum
Batch gradient descent
Mini-batch stochastic gradient
Stochastic Gradient Descent
Figure: left: Though the loss update is done for every sample in SGD, this
plot shows the loss averaged over 100 such updates. right: A summary of
measured accuracy for various methods
Learning rate Algorithm 1 : Decay
► The new bounds on the search interval [a, b] are reset based on
the exclusions mentioned in the previous slide.
► At the end of the process we are left with an interval
containing 0 or 1 evaluated point.
► If we have an interval containing no evaluated point, we select a
random point α = p in the reset interval [a, b], and then another
point q in the larger of the intervals [a, p] and [p, b].
► On the other hand if we are left with an interval [a, b]
containing a single evaluated point α = p, then we select α =
q in the larger of the intervals [a, p] and [p, b].
► This yields another four points on which to continue the
golden-section search. We continue until we achieve the
desired accuracy.
When do we use line search?