0% found this document useful (0 votes)

6 views55 pages

Gradient Descent

Gradient descent is a widely used optimization algorithm in machine learning that aims to minimize a loss function by adjusting model parameters based on the computed gradient. It involves updating parameters in the direction that reduces the loss, controlled by a learning rate. However, challenges such as local minima and saddle points can hinder its effectiveness, necessitating techniques like momentum and adaptive learning rates to improve convergence.

Uploaded by

22d143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views55 pages

Gradient Descent

Uploaded by

22d143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Gradient Descent

Clearly not good but how bad it is?

Finding the Loss
Finding the better way of traversing the error
surface so that we can reach the minimum
value quickly without resorting to the brute
force approach is

Gradient descent
Gradient Based Optimization
The goal of optimization in machine learning is to find the parameters of the model
that minimize (or maximize) a loss function. The loss function quantifies how well
the model's predictions match the actual data.
Gradient Based Optimization
Gradient descent is a widely used optimization algorithm for minimizing the loss
function. It involves the following steps:

1. Compute the Gradient:

○ The gradient of the loss function with respect to the model parameters is
computed. This gradient tells us how to adjust the parameters to reduce
the loss.
○ Mathematically, if θ represents the model parameters and L(θ) is the loss
function, the gradient is ∇θL(θ).
Gradient Based Optimization
Update the Parameters:

● The model parameters are updated in the direction that reduces the loss. This
is done using the gradient information:

θ←θ−η∇L(θ)
Here, η (eta) is the learning rate, a hyperparameter that controls the size of the
step taken in the direction of the negative gradient.
Gradient Descent
Gradient Descent is a standard optimization algorithm. It is frequently the first optimization

algorithm introduced to train machine learning. A gradient is a measurement that quantifies the

steepness of a line or curve. Mathematically, it details the direction of the ascent or descent of a

line.

Descent is the action of going downwards. Therefore, the gradient descent algorithm quantifies

downward motion.
import numpy as np
X = [0.5,2.5] def gradient_descent():
Y = [0.2,0.9] w, b, eta, epochs = -0.1, -2.0, 1.0,
1000
def f(w,b,x): for i in range(epochs):
return 1.0/(1.0+np.exp(-(w*x - b))) dw, db = 0,0
for x,y in zip(X,Y):
def error(w,b): dw += grad_w(w, b, x, y)
err = 0.0 db += grad_b(w, b, x, y)
for x,y in zip(X,Y): w = w - eta * dw
fx = f(w,b,x) b = b - eta * db
err += 0.5 * (fx-y) ** 2 print(error(w,b))
return err print(w,b)
gradient_descent()
def grad_b(w,b,x,y):
fx = f(w,b,x)
return (fx- y) * fx * (1-fx)

def grad_w(w,b,x,y):
fx = f(w,b,x)
return (fx- y )*fx * (1-fx) * x
Types of Gradient Descent
Challenges with gradient descent
While gradient descent is the most common approach for optimization problems, it
does come with its own set of challenges. Some of them include:

● Local minima and saddle points

● Vanishing and Exploding Gradients
Local Minimum
A local minimum is a concept from optimization theory that refers to a point in the
parameter space of a function where the function's value is lower than or equal to
the values of nearby points, but not necessarily the lowest value the function can
achieve globally.

Local Minima and Convergence: Gradient descent can converge to a local

minimum if the initial point or the optimization path leads it to one. This is because,
in the vicinity of a local minimum, the gradient becomes close to zero, which
signals that further updates are minimal. As a result, gradient descent may
prematurely stop at a local minimum without exploring the entire parameter space
Saddle points
A saddle point is a critical point of a function where the gradient is zero, but it is
not a minimum or maximum. In other words, it's a point where the function is
relatively flat in some directions and steep in others. Saddle points can be thought
of as points of inflection, where the function transitions from being concave
(curving upward) to convex (curving downward) or vice versa.

Slowing Down Convergence: Gradient descent can get stuck at a saddle point
because the gradient is zero, and the algorithm doesn't know which direction to
move to make progress. This can slow down convergence and make the
optimization process more challenging, especially in high-dimensional spaces.
Saddle points
Difficulty arises from saddle points, i.e., points where one
dimention slopes up and another slopes down.

These saddle points are usually surrounded by the

plateau of the same error, which makes it difficult for GD
to escape, as the gradient is close to zero in all
dimensions.
A plateau, in the context of optimization and gradient-based
algorithms, refers to a region of a function's landscape where the
gradient is close to zero.
The red and green curves intersect at a generic saddle point
in two dimensions. Along the green curve the saddle point
looks like a local minimum, while it looks like a local
maximum along the red curve.
Escaping Local Minima and Saddle points
Momentum: Momentum in optimization algorithms introduces a "velocity" term
that helps the optimization process continue past shallow local minima and saddle
points.

Adaptive Learning Rate Methods: Algorithms like Adam and RMSprop adjust the
learning rate based on past gradients to handle flat regions and local minima
better.

Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
Solution HW4
No ratings yet
Solution HW4
5 pages
Lect 5 - Gradient Descent
No ratings yet
Lect 5 - Gradient Descent
31 pages
Reviews On Optimization of A Rocket's Trajectory For Maximum Payload
No ratings yet
Reviews On Optimization of A Rocket's Trajectory For Maximum Payload
32 pages
Distributed Optimization Methods For Multi-Robot Systems Part I Tutorial
No ratings yet
Distributed Optimization Methods For Multi-Robot Systems Part I Tutorial
17 pages
Loss Functions and Metrics in Deep Learning A Revi
No ratings yet
Loss Functions and Metrics in Deep Learning A Revi
53 pages
Unit3 Rev3
No ratings yet
Unit3 Rev3
201 pages
Lec 5 - Gradient-Descent
No ratings yet
Lec 5 - Gradient-Descent
31 pages
Survey TPAMI 2023 Preprint
No ratings yet
Survey TPAMI 2023 Preprint
20 pages
Gradient Descent
No ratings yet
Gradient Descent
108 pages
GD Algo
No ratings yet
GD Algo
18 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
Second-Order Network For Decentralized Frequency Control of Multi-Microgrid Systems
No ratings yet
Second-Order Network For Decentralized Frequency Control of Multi-Microgrid Systems
12 pages
Gradient Descent
No ratings yet
Gradient Descent
52 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
Mod 3
No ratings yet
Mod 3
101 pages
1D Methods
No ratings yet
1D Methods
14 pages
OF Fall1403 HW3
No ratings yet
OF Fall1403 HW3
2 pages
Muntaha
No ratings yet
Muntaha
27 pages
Gradient Descend
No ratings yet
Gradient Descend
64 pages
Gradient Descent A Fundamental Optimization Algorithm
No ratings yet
Gradient Descent A Fundamental Optimization Algorithm
30 pages
Machine Learning: Neural Networks Slides Mostly Adapted From Tom Mithcell, Han and Kamber
No ratings yet
Machine Learning: Neural Networks Slides Mostly Adapted From Tom Mithcell, Han and Kamber
41 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
DL - Unit 2
No ratings yet
DL - Unit 2
60 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
06 23ECE216 GradientDescent v2
No ratings yet
06 23ECE216 GradientDescent v2
73 pages
4 - Gradient Descent and Stochastic GD
No ratings yet
4 - Gradient Descent and Stochastic GD
37 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
Assignment B 4 GradientDescent
No ratings yet
Assignment B 4 GradientDescent
5 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
Quiz-2 Along With Solution
No ratings yet
Quiz-2 Along With Solution
2 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Gradient Based Particle Swarm
No ratings yet
Gradient Based Particle Swarm
7 pages
Mathematical Analysis of Descent Algorithms in Artificial Intelligence Convergence, Loss Landscapes, and Structural Optimization
No ratings yet
Mathematical Analysis of Descent Algorithms in Artificial Intelligence Convergence, Loss Landscapes, and Structural Optimization
8 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Optim
No ratings yet
Optim
33 pages
ML Module 5 Full Notes
No ratings yet
ML Module 5 Full Notes
23 pages
What Is Gradient Descent - Built in
No ratings yet
What Is Gradient Descent - Built in
11 pages
DL Unit-2
No ratings yet
DL Unit-2
32 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
Linear Models (Unit II) Chapter III 1
No ratings yet
Linear Models (Unit II) Chapter III 1
24 pages
Lecture 5
No ratings yet
Lecture 5
31 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
Opt Class CH17102 - Unit 2
No ratings yet
Opt Class CH17102 - Unit 2
25 pages
LInear
No ratings yet
LInear
14 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
Foundations of Machine Learning - 3
No ratings yet
Foundations of Machine Learning - 3
38 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Ee5239 HW3-2
No ratings yet
Ee5239 HW3-2
8 pages
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
No ratings yet
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
8 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Open Electives Compressed
No ratings yet
Open Electives Compressed
373 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Chapter
No ratings yet
Chapter
46 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
DLbook
No ratings yet
DLbook
165 pages
Chapter Gradient Descent
No ratings yet
Chapter Gradient Descent
6 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
Module 3dl1
No ratings yet
Module 3dl1
11 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
DNN M3 Optimization
No ratings yet
DNN M3 Optimization
81 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
MAT6007 - Session8 - Gradient Descent
No ratings yet
MAT6007 - Session8 - Gradient Descent
16 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Survey of FNN
No ratings yet
Survey of FNN
25 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Interior Gradient and Proximal Methods For Convex and Conic Optimization
No ratings yet
Interior Gradient and Proximal Methods For Convex and Conic Optimization
29 pages
Syllabus LecturePlan 15MAT303
No ratings yet
Syllabus LecturePlan 15MAT303
3 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Novel Smart Antenna
No ratings yet
Novel Smart Antenna
13 pages
Reading+10+ +Introduction+to+Deep+Learning
No ratings yet
Reading+10+ +Introduction+to+Deep+Learning
21 pages
Lecture Notes: Some Notes On Gradient Descent: Marc Toussaint
No ratings yet
Lecture Notes: Some Notes On Gradient Descent: Marc Toussaint
4 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
Filters May Be Used For Three Information-Processing Tasks: 1. Filtering
No ratings yet
Filters May Be Used For Three Information-Processing Tasks: 1. Filtering
7 pages
DL - Assignment 4 Solution
No ratings yet
DL - Assignment 4 Solution
6 pages
ML QB (Vtu)
No ratings yet
ML QB (Vtu)
6 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Gradient Descent

Uploaded by

Gradient Descent

Uploaded by

Gradient Descent

Clearly not good but how bad it is?

1. Compute the Gradient:

● Local minima and saddle points

Local Minima and Convergence: Gradient descent can converge to a local

These saddle points are usually surrounded by the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.