0% found this document useful (0 votes)

33 views41 pages

Deep Learning: Course Code: Unit 1

This document discusses gradient descent, an algorithm used to find the minimum of a loss function. It begins by defining partial derivatives and providing examples. It then introduces gradient descent, noting that it finds local optima by taking steps in the direction of the negative gradient. The gradient descent algorithm is outlined as initializing weights, computing the gradient at each step, and updating the weights based on the learning rate until convergence. Challenges like learning rates, local minima, and computation costs are also mentioned.

Uploaded by

Toxic Lucien

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views41 pages

Deep Learning: Course Code: Unit 1

Uploaded by

Toxic Lucien

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Deep learning

• Course Code:
• Unit 1
Introduction to Deep learning
• Lecture 5
Loss optimization (Gradient
decent)
Partial differentiation
• Here f’x to mean “the partial derivative with respect to x”.
• also called “del” or “dee” or “curly dee”
• Example with Explanation
• Take a function of one variable x:
• f(x) = x2
• It’s derivative using power rule:
• First order derivative :: f’(x) = 2x
• Now take a function of two variables x and y:
• f(x,y) = x2 + y3
• To find its partial derivative with respect to x we consider y as a constant:
• Partial derivative wrt X :: f’x = 2x + 0
• = 2x
• Now, to find the partial derivative with respect to y, we consider x as a constant:
• Partial derivative wrt Y :: f’y = 0 + 3y2
• = 3y2
Gradient Descent Algorithm

https://www.kdnuggets.com/2020/05/5-concepts-g
radient-descent-cost-function.html
Gradient Descent
• Method to find local optima of differentiable a function
• Intuition: gradient tells us direction of greatest increase, negative gradient gives us
direction of greatest decrease
• Take steps in directions that reduce the function value
• Definition of derivative guarantees that if we take a small enough step in the
direction of the negative gradient, the function will decrease in value
• How small is small enough?

9
Gradient Descent

Gradient Descent Algorithm:

• Pick an initial point

• Iterate until convergence

where is the step size (sometimes called learning rate)

10
Gradient Descent

Gradient Descent Algorithm:

• Pick an initial point

• Iterate until convergence

where is the step size (sometimes called learning rate)

When do we stop?

11
Gradient Descent

Gradient Descent Algorithm:

• Pick an initial point

• Iterate until convergence

where is the step sizePossible

(sometimes called learning rate)
Stopping Criteria: iterate until for some

How small should be?

12
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4

13
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =− 4 −. 8 ⋅ 2 ⋅( − 4)

14
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2 . 4

15
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
(2)
𝑥 =2.4 −.8 ⋅ 2⋅ 2.4

(1)
𝑥 =0.4
16
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
(2)
𝑥 =−1.44

17
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2 . 4
1.44
(3)
𝑥 =. 864
(4 )
𝑥 =− 0 . 5184
(5)
𝑥 =0 . 31104

(30)
𝑥 =−8 . 84296 𝑒−07
18
Gradient Descent

Step size: .9

19
Gradient Descent

Step size: .2

20
Gradient Descent

Step size matters!

21
Gradient Descent

Step size matters!

22
Overview
• Gradient descent is the standard algorithm
Variants of gradient descent
• Stochastic gradient descent
Variants of gradient descent
• Mini batch SGD
Challenges
• Learning rates
• Local minima
• We will look at methods to deal with the above issues
• https://towardsdatascience.com/batch-mini-batch-stochastic-gradient-descent-7a62ec
ba642a
Loss Optimization
W* = argmin J(W)
w

• Weights are on x and y axis

whereas loss are marked on z axis.
• For any value of w, we can see the
loss at that point.
• We need to find the point on this
landscape with minimum loss.

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Loss Optimization
W* = argmin J(W)
w
• Randomly pick a place on this
landscape to start finding the
minimum weights.
• Form this random place we find
how this landscape is changing,
how the slope of landscape is
changes using gradient of the loss
with respect to each of the weights.
• The gradient is a vector which gives
us the direction in which loss
function has the steepest ascent.

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Loss Optimization
W* = argmin J(W)
w • Gradient tell us which way to move
to find the steepest landscape using
function:

• Here we can see the higher

landscape with respect to the
selected point so we need to take
step in direction that’s lower than
the selected point.

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Loss Optimization
W* = argmin J(W)
w

• Take small step in opposite direction

of gradient.
• On getting the lower point, the
process need to be repeated over
and over again until we converged
to a local minimum point.

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Gradient Descent
• Repeat until Convergence

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Gradient Descent

Algorithm for gradient

descent:
1. Initialize the weights randomly ~N(0, ) weights = tf.random_normal( )

2. Loop until finding the convergence:

grads = tf.gradients(loss, weights)
3. Compute gradient,
4. Update weight, W weights_new = weights.assign(weights – lr * grads)

5. Return weights

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Gradient Descent
Algorithm for gradient
descent:
1. Initialize the weights randomly ~N(0, ) weights = tf.random_normal( )

2. Loop until finding the convergence:

𝜕 𝐽 (𝒘) grads = tf.gradients(loss, weights)
3. Compute gradient, 𝜕𝒘
4. Update weight, W weights_new = weights.assign(weights – lr * grads)

5. Return weights

• Amount of weights are updated during training is referred to as the step size or the learning rate.
• The learning rate is a configurable hyper parameter used in the training of neural networks that has a small positive
value, often in the range between 0.0 and 1.0.
• The learning rate controls how quickly the model is adapted to the problem.
Amity Centre for Artificial Intelligence, Amity University, Noida, India
Gradient Descent

Algorithm for gradient

descent:
1. Initialize the weights randomly ~N(0, )
2. Loop until finding the convergence:
3. Compute gradient,
4. Update weight, W
Can be very
5. Return weights computationally
intensive to
compute!

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Stochastic Gradient Descent

Algorithm for gradient

descent:
1. Initialize the weights randomly ~N(0,)
2. Loop until finding the convergence:
3. Pick a single data point i,
4. Compute gradient,
5. Update weight, W
Easy to compute but
6. Return weights very noisy
(stochastic)!

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Stochastic Gradient Descent with momentum
• SGD is noisy & requires more iteration to
reach minima. Adding a momentum term to
regular SGD for faster convergence of loss
function.
• SGD oscillates between either direction of
gradient & updates the weights accordingly.
By adding a fraction of the previous update
to the current update will make the process
a bit faster. velocity v denote
• Updated weight, Wt+1 = the change in the
gradient to reach the
= β Vt-1 + global minima.
• learning rate should be decreased wit
momentum term.

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Mini-batch Gradient Descent

Algorithm for gradient

descent:
1. Initialize the weights randomly ~N(0,)

2. Loop until finding the convergence:

3. Pick batch of B data points

4. Compute gradient, =

5. Update weight, W
Fast to compute and a
6. Return weights much better estimate of
the true gradient!

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Mini-batches while training
• Mini-batch gradient descent is a variation of the gradient
descent algorithm that splits the training dataset into small
batches that are used to calculate model error and update
model coefficients.
• Mini-batch gradient descent seeks to find a balance between the
robustness of stochastic gradient descent and the efficiency of
batch gradient descent.
• More accurate estimation of gradient
• Smoother convergence Allows for larger learning
rates

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Mini-batches while training

More accurate estimation of gradient

Smoother convergence Allows for larger learning
rates

Mini-batches lead to fast training

Increase the computation and achieve increased speed on
GPU’s

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Summary
• Batch Gradient Descent (BGD):
It uses the entire dataset at every step, making it slow for large datasets.
However, it is computationally inefficient for large dataset, since it produces a stable
error gradient and a stable convergence
• Stochastic Gradient Descent (SGD):
It is on the other extreme of the idea, using a single example (batch of 1) per each
learning step. Much faster, may return noisy gradients which can cause the error rate to
jump around
• Mini Batch Gradient Descent:
Computes the gradients on small random sets of instances called mini batches.
Reduce noise from SGD and still more efficient than BGD

Amity Centre for Artificial Intelligence, Amity University, Noida, India

The Ultimate Guide To Automation Testing: Joe Colantonio - Testtalks - Guild Conferences
No ratings yet
The Ultimate Guide To Automation Testing: Joe Colantonio - Testtalks - Guild Conferences
20 pages
Request For Personal Financial Information Under The Guise of Helping Children PDF Credit Card Credit 3
100% (1)
Request For Personal Financial Information Under The Guise of Helping Children PDF Credit Card Credit 3
1 page
Gradient Descent
No ratings yet
Gradient Descent
17 pages
Assessment 1 ICTNWK543
No ratings yet
Assessment 1 ICTNWK543
32 pages
Edu en Vsicm8 Lab
No ratings yet
Edu en Vsicm8 Lab
167 pages
KST OPC UA 10 en
100% (1)
KST OPC UA 10 en
39 pages
DNN Full Merged Compressed Compressed
No ratings yet
DNN Full Merged Compressed Compressed
863 pages
Ilovepdf Merged Unit 1 Compressed
No ratings yet
Ilovepdf Merged Unit 1 Compressed
223 pages
Compuload CL6000-WEB Version
No ratings yet
Compuload CL6000-WEB Version
2 pages
Topview: 1. Introduction of Topview Software
100% (1)
Topview: 1. Introduction of Topview Software
16 pages
Comparison of Gradient Descent Algorithms On Training Neural Networks
No ratings yet
Comparison of Gradient Descent Algorithms On Training Neural Networks
20 pages
Objective 2: Explain The Functions and Uses of The Major Types of Software Tools
No ratings yet
Objective 2: Explain The Functions and Uses of The Major Types of Software Tools
7 pages
Business Project
No ratings yet
Business Project
26 pages
SEB Complaint - Cross - and - Moncla - 20221104 FINAL
100% (1)
SEB Complaint - Cross - and - Moncla - 20221104 FINAL
54 pages
Logic Circuits Switching Theory
No ratings yet
Logic Circuits Switching Theory
2 pages
Lecture 7 - Optimization Part I
No ratings yet
Lecture 7 - Optimization Part I
38 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
Introduction To Programming: Term: Year: Faculty: Dr. Nurul Huda
No ratings yet
Introduction To Programming: Term: Year: Faculty: Dr. Nurul Huda
46 pages
File System Diagnostic Error
No ratings yet
File System Diagnostic Error
8 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
ML Lecture2
No ratings yet
ML Lecture2
36 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
Paper 2
No ratings yet
Paper 2
27 pages
cs188 Fa23 Note23
No ratings yet
cs188 Fa23 Note23
2 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
4 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Backpropagation, Sgmiod Neuron & Gradient Discend
No ratings yet
Backpropagation, Sgmiod Neuron & Gradient Discend
29 pages
5 Optimizers
No ratings yet
5 Optimizers
10 pages
ICS3U1 - Pseudocode Questions
No ratings yet
ICS3U1 - Pseudocode Questions
7 pages
Bitaps: The Widely Known As Bitcoin Block Explore
No ratings yet
Bitaps: The Widely Known As Bitcoin Block Explore
6 pages
UNIT2
No ratings yet
UNIT2
25 pages
21CSC101T: Object Oriented Design and Programming UNIT-5
No ratings yet
21CSC101T: Object Oriented Design and Programming UNIT-5
120 pages
CV Lec4
No ratings yet
CV Lec4
46 pages
DSL 124
No ratings yet
DSL 124
3 pages
GD Types
No ratings yet
GD Types
98 pages
UNIT3
No ratings yet
UNIT3
37 pages
Cover Letter TC
No ratings yet
Cover Letter TC
2 pages
Chapter 4 - Optimization
No ratings yet
Chapter 4 - Optimization
44 pages
Gradient Descent Regression
No ratings yet
Gradient Descent Regression
14 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Grayish Orange and White Simple Professional ResumeCurriculum Vitae
No ratings yet
Grayish Orange and White Simple Professional ResumeCurriculum Vitae
1 page
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
ARTS Gr56 MG DLL Q3 w3
No ratings yet
ARTS Gr56 MG DLL Q3 w3
9 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Snowflake Bentley Pages 1-31 - Flip PDF Download - Fliphtml5
No ratings yet
Snowflake Bentley Pages 1-31 - Flip PDF Download - Fliphtml5
31 pages
Gradient Descent A Fundamental Optimization Algorithm
No ratings yet
Gradient Descent A Fundamental Optimization Algorithm
30 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
PCA and Convex Optimization and Bias, Variance-2
No ratings yet
PCA and Convex Optimization and Bias, Variance-2
29 pages
9.b Handout-3-GD Variants
No ratings yet
9.b Handout-3-GD Variants
3 pages
AI33
No ratings yet
AI33
6 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
14-RMSProp and Adam Optimization-12!08!2024
No ratings yet
14-RMSProp and Adam Optimization-12!08!2024
2 pages
LInear
No ratings yet
LInear
14 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
S09 DNN Gradients Wip
No ratings yet
S09 DNN Gradients Wip
28 pages
ANNAPOORNA-CHILUVERU Resume
No ratings yet
ANNAPOORNA-CHILUVERU Resume
3 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Cours 5
No ratings yet
Cours 5
23 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Lesson 4 Training ANNs
No ratings yet
Lesson 4 Training ANNs
34 pages
4 - Gradient Descent and Stochastic GD
No ratings yet
4 - Gradient Descent and Stochastic GD
37 pages
Ict Notes - O' Level
No ratings yet
Ict Notes - O' Level
100 pages
Op Tim Ization
No ratings yet
Op Tim Ization
9 pages
N5 Computing Science
No ratings yet
N5 Computing Science
36 pages
Advanced Database Lab Manual
No ratings yet
Advanced Database Lab Manual
27 pages
Gradient Descent Deep Learning Lecture
No ratings yet
Gradient Descent Deep Learning Lecture
5 pages
Lesson 4 SJF SJRT CPU Scheduling 2
No ratings yet
Lesson 4 SJF SJRT CPU Scheduling 2
12 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
Gradient Descent (GD) - GD With Momentum - Nesterov Accelerated GD - Stochastic GD - OrIGINAL
No ratings yet
Gradient Descent (GD) - GD With Momentum - Nesterov Accelerated GD - Stochastic GD - OrIGINAL
25 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
chp2 Gradient Descent Algorithm
No ratings yet
chp2 Gradient Descent Algorithm
5 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
Optim
No ratings yet
Optim
33 pages
Log Cat 1744009026577
No ratings yet
Log Cat 1744009026577
15 pages
Module 4 Lab 3
No ratings yet
Module 4 Lab 3
6 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
Radmin 3 Datasheet
No ratings yet
Radmin 3 Datasheet
6 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
Implement 03-1
No ratings yet
Implement 03-1
24 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Deep Learning: Course Code: Unit 1

Uploaded by

Deep Learning: Course Code: Unit 1

Uploaded by

Deep learning

Gradient Descent Algorithm:

• Pick an initial point

• Iterate until convergence

where is the step size (sometimes called learning rate)

Gradient Descent Algorithm:

• Pick an initial point

• Iterate until convergence

where is the step size (sometimes called learning rate)

Gradient Descent Algorithm:

• Pick an initial point

• Iterate until convergence

where is the step sizePossible

How small should be?

Step size matters!

Step size matters!

• Weights are on x and y axis

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

• Here we can see the higher

Amity Centre for Artificial Intelligence, Amity University, Noida, India

• Take small step in opposite direction

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Algorithm for gradient

2. Loop until finding the convergence:

Amity Centre for Artificial Intelligence, Amity University, Noida, India

2. Loop until finding the convergence:

Algorithm for gradient

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Algorithm for gradient

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Algorithm for gradient

2. Loop until finding the convergence:

3. Pick batch of B data points

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

More accurate estimation of gradient

Mini-batches lead to fast training

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.