0% found this document useful (0 votes)
9 views9 pages

Op Tim Ization

The document discusses methods for optimizing neural networks, focusing on finding the minimum value of non-differentiable functions and the challenges posed by local minima. It highlights gradient descent and its variations, including stochastic gradient descent and mini-batch SGD, as well as techniques like dropout to prevent overfitting. Key concepts such as learning rates and the importance of search space exploration are also addressed.

Uploaded by

PRK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

Op Tim Ization

The document discusses methods for optimizing neural networks, focusing on finding the minimum value of non-differentiable functions and the challenges posed by local minima. It highlights gradient descent and its variations, including stochastic gradient descent and mini-batch SGD, as well as techniques like dropout to prevent overfitting. Key concepts such as learning rates and the importance of search space exploration are also addressed.

Uploaded by

PRK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Optimizing neural networks

Usman Roshan
How do we find the min value of a
function?
• Given f(x) find x that minimizes f(x). This is a
key fundamental problem with broad
applications across different areas
• Let us start with f(x) that is non-differentiable.
For example the objective of traveling
salesman problem is non-differentiable.
Local search
• Local search is a fundamental search method
in machine learning and AI
• Given a non-differentiable objective we
perform local search to find its minimum
• If the objective is differentiable we get the
optimal search direction with the gradient
Neural network objective
• Non-linear objective, multiple local minima
• As a result optimization is much harder than
that of a convex objective
• Standard approach: gradient descent:
– Calculate first derivatives of each hidden variable
– For inner layers we use the chain rule (see google
sheet for derivations)
Gradient descent
• So we run gradient descent until convergence,
then what is the problem?
• May converge on a local minima and require
random restarts
• Overfitting: a big problem for many years
• How can we prevent overfitting?
• How can we explore the search space better
without getting stuck in local minima?
Stochastic gradient descent
• A simple but beautifully powerful idea introduced
by Leon Bottou in 2000
• Original SGD:
– While not converged:
• Select a single datapoint in order from the data
• Compute gradient with just one point
• Update parameters
• Pros: broader search
• Cons: final solution may be poor, may be hard to
converge
Stochastic gradient descent
• Mini-batch SGD:
– While not converged:
• Select a random batch of datapoints
• Compute gradient with the batch
• Update parameters
• Mini-batch pros: generally better solution with
better convergence than single datapoint
• Batch sizes are usually small
Learning rate
• Key to the search is the step size
• Ideally we start with a somewhat large size
(0.1 or 0.01) and reduce by power of 10 after a
few epochs
• Adaptive step size is the best but may slow the
search
Dropout
• A simple method introduced in 2014 to prevent
overfitting
• Procedure:
– During training we decide with probability p to update a
node’s weights or not.
– We set p to be typically 0.5
• Highly effective in deep learning:
– Decreases overfitting
– Increases training time
• Can be loosely interpreted as ensemble of networks

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy