0% found this document useful (0 votes)

9 views9 pages

Op Tim Ization

The document discusses methods for optimizing neural networks, focusing on finding the minimum value of non-differentiable functions and the challenges posed by local minima. It highlights gradient descent and its variations, including stochastic gradient descent and mini-batch SGD, as well as techniques like dropout to prevent overfitting. Key concepts such as learning rates and the importance of search space exploration are also addressed.

Uploaded by

PRK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views9 pages

Op Tim Ization

Uploaded by

PRK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Optimizing neural networks

Usman Roshan
How do we find the min value of a
function?
• Given f(x) find x that minimizes f(x). This is a
key fundamental problem with broad
applications across different areas
• Let us start with f(x) that is non-differentiable.
For example the objective of traveling
salesman problem is non-differentiable.
Local search
• Local search is a fundamental search method
in machine learning and AI
• Given a non-differentiable objective we
perform local search to find its minimum
• If the objective is differentiable we get the
optimal search direction with the gradient
Neural network objective
• Non-linear objective, multiple local minima
• As a result optimization is much harder than
that of a convex objective
• Standard approach: gradient descent:
– Calculate first derivatives of each hidden variable
– For inner layers we use the chain rule (see google
sheet for derivations)
Gradient descent
• So we run gradient descent until convergence,
then what is the problem?
• May converge on a local minima and require
random restarts
• Overfitting: a big problem for many years
• How can we prevent overfitting?
• How can we explore the search space better
without getting stuck in local minima?
Stochastic gradient descent
• A simple but beautifully powerful idea introduced
by Leon Bottou in 2000
• Original SGD:
– While not converged:
• Select a single datapoint in order from the data
• Compute gradient with just one point
• Update parameters
• Pros: broader search
• Cons: final solution may be poor, may be hard to
converge
Stochastic gradient descent
• Mini-batch SGD:
– While not converged:
• Select a random batch of datapoints
• Compute gradient with the batch
• Update parameters
• Mini-batch pros: generally better solution with
better convergence than single datapoint
• Batch sizes are usually small
Learning rate
• Key to the search is the step size
• Ideally we start with a somewhat large size
(0.1 or 0.01) and reduce by power of 10 after a
few epochs
• Adaptive step size is the best but may slow the
search
Dropout
• A simple method introduced in 2014 to prevent
overfitting
• Procedure:
– During training we decide with probability p to update a
node’s weights or not.
– We set p to be typically 0.5
• Highly effective in deep learning:
– Decreases overfitting
– Increases training time
• Can be loosely interpreted as ensemble of networks

DL Regularization
No ratings yet
DL Regularization
51 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
Lecture 7 - Optimization Part I
No ratings yet
Lecture 7 - Optimization Part I
38 pages
Tut04 - One Algorithm To Optimize Them All
No ratings yet
Tut04 - One Algorithm To Optimize Them All
19 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Comparison of Gradient Descent Algorithms On Training Neural Networks
No ratings yet
Comparison of Gradient Descent Algorithms On Training Neural Networks
20 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
Implement 03-1
No ratings yet
Implement 03-1
24 pages
Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
Optim
No ratings yet
Optim
33 pages
DL Test-2
No ratings yet
DL Test-2
28 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
1 Intro
No ratings yet
1 Intro
91 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
UNIT3
No ratings yet
UNIT3
37 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
PCA and Convex Optimization and Bias, Variance-2
No ratings yet
PCA and Convex Optimization and Bias, Variance-2
29 pages
Cours 5
No ratings yet
Cours 5
23 pages
Optimizer
No ratings yet
Optimizer
13 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
UNIT2
No ratings yet
UNIT2
25 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
Technical Writing
No ratings yet
Technical Writing
8 pages
Gradient Descent A Fundamental Optimization Algorithm
No ratings yet
Gradient Descent A Fundamental Optimization Algorithm
30 pages
Optimization Techniques (SGD Alternatives)
No ratings yet
Optimization Techniques (SGD Alternatives)
34 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Training NNs
No ratings yet
Training NNs
34 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
S09 DNN Gradients Wip
No ratings yet
S09 DNN Gradients Wip
28 pages
9.b Handout-3-GD Variants
No ratings yet
9.b Handout-3-GD Variants
3 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
Linear Models (Unit II) Chapter III 1
No ratings yet
Linear Models (Unit II) Chapter III 1
24 pages
4 - Gradient Descent and Stochastic GD
No ratings yet
4 - Gradient Descent and Stochastic GD
37 pages
Discussion 4 CS771
No ratings yet
Discussion 4 CS771
25 pages
04 Optimization
No ratings yet
04 Optimization
62 pages
Unit 2.2
No ratings yet
Unit 2.2
46 pages
Module 2
No ratings yet
Module 2
67 pages
HMD-Deep Learning-Lecture 2-2024
No ratings yet
HMD-Deep Learning-Lecture 2-2024
47 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
Stochastic Gradient Descent - Term Paper
No ratings yet
Stochastic Gradient Descent - Term Paper
8 pages
Deep Learning: Course Code: Unit 1
No ratings yet
Deep Learning: Course Code: Unit 1
41 pages
Lec7 8+CNN 2
No ratings yet
Lec7 8+CNN 2
69 pages
Smarter Decisions – The Intersection of Internet of Things and Decision Science
From Everand
Smarter Decisions – The Intersection of Internet of Things and Decision Science
Jojo Moolayil
No ratings yet
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
List of Experiments
No ratings yet
List of Experiments
2 pages
TP VAM Solved Examples
No ratings yet
TP VAM Solved Examples
5 pages
Lecture 6 Greedy Technique
No ratings yet
Lecture 6 Greedy Technique
27 pages
25.18 Solutions:: Xyk y K y K y K
No ratings yet
25.18 Solutions:: Xyk y K y K y K
3 pages
3.1. LPM + Graphic Approach
No ratings yet
3.1. LPM + Graphic Approach
44 pages
Instructions: Hint:: ALL Exact Correct To 3 Signicant Gures
No ratings yet
Instructions: Hint:: ALL Exact Correct To 3 Signicant Gures
4 pages
LA Assignment 1
No ratings yet
LA Assignment 1
2 pages
Ex . R :re: Ax. This A' Ax
No ratings yet
Ex . R :re: Ax. This A' Ax
15 pages
Math 361S Lecture Notes Numerical Solution of Odes: Part I: Topics Covered
No ratings yet
Math 361S Lecture Notes Numerical Solution of Odes: Part I: Topics Covered
21 pages
3-6 Solving Systems of Linear Equations in 3 Variables
No ratings yet
3-6 Solving Systems of Linear Equations in 3 Variables
17 pages
Matrix Power ComputationBand Toeplitz Structure
No ratings yet
Matrix Power ComputationBand Toeplitz Structure
5 pages
Simplex Algorithem
No ratings yet
Simplex Algorithem
38 pages
System of Linear Equations
No ratings yet
System of Linear Equations
59 pages
Newton Raphson Method MCQ
100% (1)
Newton Raphson Method MCQ
15 pages
LS3 Math-Worksheets-JHS (To Combine or Not To Combine (Polynomials) )
No ratings yet
LS3 Math-Worksheets-JHS (To Combine or Not To Combine (Polynomials) )
8 pages
4 - Jacobi Method
No ratings yet
4 - Jacobi Method
11 pages
Step 1: Develop The Cost Table From The Given Problem:: Assignment Technique
No ratings yet
Step 1: Develop The Cost Table From The Given Problem:: Assignment Technique
2 pages
Percobaan Mat
No ratings yet
Percobaan Mat
4 pages
A - Practical - Guide - To - Spline - de Boor
No ratings yet
A - Practical - Guide - To - Spline - de Boor
8 pages
Lec13 Dynamic Programming
No ratings yet
Lec13 Dynamic Programming
47 pages
Summative Test # 1 Math 7 I. Read Carefully and Write The Correct Answer On The Blank Provided
No ratings yet
Summative Test # 1 Math 7 I. Read Carefully and Write The Correct Answer On The Blank Provided
2 pages
The Division Algorithm (Keith Conrad)
No ratings yet
The Division Algorithm (Keith Conrad)
10 pages
Gerstman PP15
No ratings yet
Gerstman PP15
20 pages
Graphs of Polynomial Functions
No ratings yet
Graphs of Polynomial Functions
2 pages
Shu Chi Wang (2020) - Essentially NO and WENO Schemes - Cambridge University Press
No ratings yet
Shu Chi Wang (2020) - Essentially NO and WENO Schemes - Cambridge University Press
63 pages
Numerical Methods Introduction To Numerical Methods
No ratings yet
Numerical Methods Introduction To Numerical Methods
23 pages
Midterm Exam - Attempt Review
No ratings yet
Midterm Exam - Attempt Review
16 pages
Polynomial Function PDF
100% (2)
Polynomial Function PDF
5 pages
Mth603 Finalterm Subjective: Question No: 50 (Marks: 5)
No ratings yet
Mth603 Finalterm Subjective: Question No: 50 (Marks: 5)
43 pages
Factoring
No ratings yet
Factoring
101 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Op Tim Ization

Uploaded by

Op Tim Ization

Uploaded by

Optimizing neural networks

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.