0% found this document useful (0 votes)
6 views55 pages

Gradient Descent

Gradient descent is a widely used optimization algorithm in machine learning that aims to minimize a loss function by adjusting model parameters based on the computed gradient. It involves updating parameters in the direction that reduces the loss, controlled by a learning rate. However, challenges such as local minima and saddle points can hinder its effectiveness, necessitating techniques like momentum and adaptive learning rates to improve convergence.

Uploaded by

22d143
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views55 pages

Gradient Descent

Gradient descent is a widely used optimization algorithm in machine learning that aims to minimize a loss function by adjusting model parameters based on the computed gradient. It involves updating parameters in the direction that reduces the loss, controlled by a learning rate. However, challenges such as local minima and saddle points can hinder its effectiveness, necessitating techniques like momentum and adaptive learning rates to improve convergence.

Uploaded by

22d143
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Gradient Descent

Clearly not good but how bad it is?


Finding the Loss
Finding the better way of traversing the error
surface so that we can reach the minimum
value quickly without resorting to the brute
force approach is

Gradient descent
Gradient Based Optimization
The goal of optimization in machine learning is to find the parameters of the model
that minimize (or maximize) a loss function. The loss function quantifies how well
the model's predictions match the actual data.
Gradient Based Optimization
Gradient descent is a widely used optimization algorithm for minimizing the loss
function. It involves the following steps:

1. Compute the Gradient:


○ The gradient of the loss function with respect to the model parameters is
computed. This gradient tells us how to adjust the parameters to reduce
the loss.
○ Mathematically, if θ represents the model parameters and L(θ) is the loss
function, the gradient is ∇θL(θ).
Gradient Based Optimization
Update the Parameters:

● The model parameters are updated in the direction that reduces the loss. This
is done using the gradient information:

θ←θ−η∇L(θ)
Here, η (eta) is the learning rate, a hyperparameter that controls the size of the
step taken in the direction of the negative gradient.
Gradient Descent
Gradient Descent is a standard optimization algorithm. It is frequently the first optimization

algorithm introduced to train machine learning. A gradient is a measurement that quantifies the

steepness of a line or curve. Mathematically, it details the direction of the ascent or descent of a

line.

Descent is the action of going downwards. Therefore, the gradient descent algorithm quantifies

downward motion.
import numpy as np
X = [0.5,2.5] def gradient_descent():
Y = [0.2,0.9] w, b, eta, epochs = -0.1, -2.0, 1.0,
1000
def f(w,b,x): for i in range(epochs):
return 1.0/(1.0+np.exp(-(w*x - b))) dw, db = 0,0
for x,y in zip(X,Y):
def error(w,b): dw += grad_w(w, b, x, y)
err = 0.0 db += grad_b(w, b, x, y)
for x,y in zip(X,Y): w = w - eta * dw
fx = f(w,b,x) b = b - eta * db
err += 0.5 * (fx-y) ** 2 print(error(w,b))
return err print(w,b)
gradient_descent()
def grad_b(w,b,x,y):
fx = f(w,b,x)
return (fx- y) * fx * (1-fx)

def grad_w(w,b,x,y):
fx = f(w,b,x)
return (fx- y )*fx * (1-fx) * x
Types of Gradient Descent
Challenges with gradient descent
While gradient descent is the most common approach for optimization problems, it
does come with its own set of challenges. Some of them include:

● Local minima and saddle points


● Vanishing and Exploding Gradients
Local Minimum
A local minimum is a concept from optimization theory that refers to a point in the
parameter space of a function where the function's value is lower than or equal to
the values of nearby points, but not necessarily the lowest value the function can
achieve globally.

Local Minima and Convergence: Gradient descent can converge to a local


minimum if the initial point or the optimization path leads it to one. This is because,
in the vicinity of a local minimum, the gradient becomes close to zero, which
signals that further updates are minimal. As a result, gradient descent may
prematurely stop at a local minimum without exploring the entire parameter space
Saddle points
A saddle point is a critical point of a function where the gradient is zero, but it is
not a minimum or maximum. In other words, it's a point where the function is
relatively flat in some directions and steep in others. Saddle points can be thought
of as points of inflection, where the function transitions from being concave
(curving upward) to convex (curving downward) or vice versa.

Slowing Down Convergence: Gradient descent can get stuck at a saddle point
because the gradient is zero, and the algorithm doesn't know which direction to
move to make progress. This can slow down convergence and make the
optimization process more challenging, especially in high-dimensional spaces.
Saddle points
Difficulty arises from saddle points, i.e., points where one
dimention slopes up and another slopes down.

These saddle points are usually surrounded by the


plateau of the same error, which makes it difficult for GD
to escape, as the gradient is close to zero in all
dimensions.
A plateau, in the context of optimization and gradient-based
algorithms, refers to a region of a function's landscape where the
gradient is close to zero.
The red and green curves intersect at a generic saddle point
in two dimensions. Along the green curve the saddle point
looks like a local minimum, while it looks like a local
maximum along the red curve.
Escaping Local Minima and Saddle points
Momentum: Momentum in optimization algorithms introduces a "velocity" term
that helps the optimization process continue past shallow local minima and saddle
points.

Adaptive Learning Rate Methods: Algorithms like Adam and RMSprop adjust the
learning rate based on past gradients to handle flat regions and local minima
better.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy