0% found this document useful (0 votes)

3 views10 pages

Gradient Descent Algorithm.Y...

The document provides an overview of the Gradient Descent algorithm, a key optimization technique used in machine learning to minimize cost functions by iteratively adjusting model parameters. It explains the algorithm's technical background, methodology, variations (Batch, Stochastic, Mini-Batch), and its applications in various domains such as linear regression and deep learning. Additionally, it discusses the importance of the learning rate and compares Gradient Descent with other optimization methods.

Uploaded by

gautamdhodi02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views10 pages

Gradient Descent Algorithm.Y...

Uploaded by

gautamdhodi02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Title: Gradient Descent Algorithm

Student Name: Gautam Dhodi

Student Enrollment: 0827CS221095

Overview:
Gradient Descent is a fundamental optimization algorithm used extensively in
machine learning to minimize a function. This function typically represents a cost
or loss function, quantifying the error between a model's predictions and the actual
data. By iteratively adjusting the model's parameters in the direction opposite to the
gradient of this function, Gradient Descent aims to find the parameter values that
minimize the error, thereby improving the model's performance. It's a cornerstone
of many machine learning algorithms due to its effectiveness and relative
simplicity.

Technical Background:
Gradient Descent operates on the principle of iteratively moving towards the
minimum of a function. The gradient of a function is a vector that contains the
partial derivatives of the function with respect to each of its input variables.

For a function J(θ1, θ2, ..., θn), the gradient is defined as:

∇J(θ) =
[
∂J/∂θ1
∂J/∂θ2
...
∂J/∂θn
]
The gradient vector points in the direction of the steepest ascent of the function. To
find the minimum, Gradient Descent takes steps in the opposite direction of the
gradient. The size of these steps is determined by the learning rate, denoted by α.
The iterative update rule for the parameters θ is:

θk+1 = θk - α∇J(θk)

Where:
● θk represents the parameter vector at the k-th iteration.
● α is the learning rate, controlling the step size.
The algorithm continues updating the parameters until a stopping criterion is met.
Common stopping criteria include:
● The change in the cost function or parameter values falls below a specified
threshold.
● A maximum number of iterations is reached.
The choice of learning rate (α) significantly impacts the algorithm's performance.
A small learning rate may lead to slow convergence, while a large learning rate can
cause oscillations or divergence.

Insights:
● Direction of Steepest Descent: Gradient Descent leverages the fact that the
negative gradient provides the direction of the steepest decrease of the
function.
● Iterative Process: It's an iterative process that progressively refines parameter
estimates to approach the minimum.
● Sensitivity to Cost Function: The shape of the cost function's landscape
affects how efficiently Gradient Descent finds the minimum.
● Learning Rate Importance: Selecting an appropriate learning rate is essential
for both the speed and stability of convergence.
Methodology:
The Gradient Descent algorithm follows these general steps:
1. Initialization: Initialize the parameter vector θ with some starting values.
2. Compute Gradient: Calculate the gradient of the cost function J(θ) at the
current parameter values.
3. Update Parameters: Update the parameters using the update rule: θnew =
θold - α∇J(θold).
4. Iteration: Repeat steps 2 and 3 until a stopping criterion is met.
Different variations of Gradient Descent exist, primarily differing in how the
gradient is computed:
● Batch Gradient Descent: Computes the gradient using the entire training
dataset. This provides an accurate gradient estimate but can be computationally
expensive for large datasets.
● Stochastic Gradient Descent (SGD): Computes the gradient using a single,
randomly selected data point. SGD is faster per iteration but introduces more
noise, leading to a potentially less stable path towards the minimum.
● Mini-Batch Gradient Descent: Computes the gradient using a small,
randomly selected subset (mini-batch) of the data. This balances the
computational efficiency of SGD with the more stable gradient estimates of
Batch Gradient Descent.
Architecture:
Gradient Descent, as an optimization algorithm, is applied within the architecture
of a machine learning system. It is a key component in training various models,
including:
● Linear Regression: To minimize the mean squared error and find optimal
weights and bias.
● Logistic Regression: To minimize the cost function (e.g., cross-entropy) and
determine the decision boundary.
● Neural Networks: In deep learning, Gradient Descent and its variants are used
to update the weights and biases of network layers through backpropagation.
The "architecture" here refers to the combination of the model, the cost function,
and the optimization process (Gradient Descent) used to train the model.

This diagram illustrates how Gradient Descent iteratively moves towards the
minimum of a function in one dimension.

J(θ1, θ2)
^
|
Level Curves
|
* (Initial θ)
/
/ Step 1
*
/
/ Step 2
* . . . -> θ_min
/
/
*
+----------------> θ1
θ2

This 2D contour plot visualizes Gradient Descent's iterative steps towards the
minimum of a function with two parameters.
(A conceptual diagram showing the different paths taken by Batch GD (smooth),
SGD (erratic), and Mini-Batch GD (intermediate) towards the minimum.)

Facts and Figures:

● Convergence Rate: The convergence rate depends on factors like the learning
rate, cost function shape, and the number of parameters.
● Computational Cost: Batch GD is computationally expensive for large
datasets; SGD is faster per iteration but may require more iterations.
● Learning Rate Sensitivity: Improper learning rates can lead to slow
convergence, oscillations, or divergence.
● Local Minima: In non-convex functions, Gradient Descent may get stuck in
local minima.
Comparative Studies:
Gradient Descent serves as a foundation, and other optimization algorithms
improve upon it or offer alternatives. Some comparisons include:
● Newton's Method: Uses second-order derivatives (Hessian) for faster
convergence but is computationally expensive for large problems.
● Conjugate Gradient: Employs conjugate directions for more efficient
convergence in certain problems.
● Adam (Adaptive Moment Estimation): An adaptive learning rate method
that combines the advantages of RMSprop and momentum, often leading to
faster convergence.
● RMSprop (Root Mean Square Propagation): Adaptively scales learning
rates for each parameter.
● L-BFGS (Limited-memory BFGS): A quasi-Newton method that
approximates the Hessian, more efficient for large-scale problems.
While these advanced algorithms often offer benefits, Gradient Descent remains a
core concept.

Application Areas:
Gradient Descent is a fundamental algorithm across various domains, particularly
in machine learning:
● Machine Learning Model Training: Used to train a wide variety of models
like linear regression, logistic regression, and support vector machines.
● Deep Learning: Essential for training neural networks through
backpropagation and its variants (SGD, Adam, RMSprop).
● General Optimization: Applied to solve optimization problems in fields like
engineering, finance, and operations research.
● Image Processing and Computer Vision: Used in image registration and
training Convolutional Neural Networks for image tasks.
● Natural Language Processing (NLP): Employed in training word
embeddings and language models.
Additional Information:
Code Snippet (Python - Linear Regression with Gradient Descent):

import numpy as np

def cost_function(X, y, w, b):
m = len(y)
y_pred = np.dot(X, w) + b
cost = (1 / (2 * m)) * np.sum((y_pred - y) ** 2)
return cost

def gradient_descent(X, y, w_init, b_init, alpha, num_iters):
m = len(y)
w = w_init
b = b_init
history = []

for i in range(num_iters):
y_pred = np.dot(X, w) + b
dw = (1 / m) * np.dot(X.T, (y_pred - y))
db = (1 / m) * np.sum(y_pred - y)

w = w - alpha * dw
b = b - alpha * db
cost = cost_function(X, y, w, b)
history.append(cost)

if i % 100 == 0:
print(f"Iteration {i}, Cost: {cost:.4f}")

return w, b, history

# Example
X = np.array([[1], [2], [3], [4]])
y = np.array([2, 4, 5, 4])
w_init = 0
b_init = 0
alpha = 0.01
iterations = 1000

w_trained, b_trained, cost_history = gradient_descent(X, y, w_init, b_init, alpha,
iterations)
print(f"Trained w: {w_trained[0]:.4f}")
print(f"Trained b: {b_trained:.4f}")

Real-world Problem Statement:

Consider predicting housing prices based on factors like square footage and the
number of bedrooms. Linear regression with Gradient Descent can be used to
determine the optimal weights for these features and the bias that minimize the
error between predicted and actual prices.

References:
● Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
● Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
● Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical
Learning. Springer.
● Ruder, S. (2016). An overview of gradient descent optimization algorithms.
arXiv preprint arXiv:1609.04747.

Chap6 (Regression)
No ratings yet
Chap6 (Regression)
74 pages
(ML&PR 2025) Lec2 Regression II
No ratings yet
(ML&PR 2025) Lec2 Regression II
41 pages
CSD411 Week7 Regression
No ratings yet
CSD411 Week7 Regression
75 pages
(Ebook) The Transformers Legends by David Cian ISBN 9780743497916, 0743497910 Download
100% (2)
(Ebook) The Transformers Legends by David Cian ISBN 9780743497916, 0743497910 Download
67 pages
Ann 3
No ratings yet
Ann 3
58 pages
Unit3 Rev3
No ratings yet
Unit3 Rev3
201 pages
Gradient DescentSummartyL5
No ratings yet
Gradient DescentSummartyL5
7 pages
Gradient Descent
No ratings yet
Gradient Descent
7 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
(PR 2024) Lec2 Regression II
No ratings yet
(PR 2024) Lec2 Regression II
41 pages
GD Types
No ratings yet
GD Types
98 pages
Gradient Descent A Fundamental Optimization Algorithm
No ratings yet
Gradient Descent A Fundamental Optimization Algorithm
30 pages
Gradient Descent
No ratings yet
Gradient Descent
27 pages
Gradient Descent
No ratings yet
Gradient Descent
58 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
4 - Gradient Descent and Stochastic GD
No ratings yet
4 - Gradient Descent and Stochastic GD
37 pages
Assignment B 4 GradientDescent
No ratings yet
Assignment B 4 GradientDescent
5 pages
07 Gradient Descent For Linear Regression 10 Min
No ratings yet
07 Gradient Descent For Linear Regression 10 Min
5 pages
Lec05-1-Gradient Descent-Detailed
No ratings yet
Lec05-1-Gradient Descent-Detailed
62 pages
Introduction To Gradient Descent
No ratings yet
Introduction To Gradient Descent
8 pages
Module 4 Lab 3
No ratings yet
Module 4 Lab 3
6 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
Optim
No ratings yet
Optim
33 pages
Gradient Descent From Scratch Complete Intuition
No ratings yet
Gradient Descent From Scratch Complete Intuition
8 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Gradient Descent Final
No ratings yet
Gradient Descent Final
27 pages
LInear
No ratings yet
LInear
14 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Gradient Decent
No ratings yet
Gradient Decent
40 pages
Paper 2
No ratings yet
Paper 2
27 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
chp2 Gradient Descent Algorithm
No ratings yet
chp2 Gradient Descent Algorithm
5 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Gradient Descent and SGD
No ratings yet
Gradient Descent and SGD
8 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
AI33
No ratings yet
AI33
6 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
3 Gradient Descent
No ratings yet
3 Gradient Descent
8 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
14-RMSProp and Adam Optimization-12!08!2024
No ratings yet
14-RMSProp and Adam Optimization-12!08!2024
2 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Yash 21bsds12
No ratings yet
Yash 21bsds12
3 pages
3 Types of Gradient Descent Algorithms For Small & Large Datasets
No ratings yet
3 Types of Gradient Descent Algorithms For Small & Large Datasets
9 pages
Improving Quality in Food Products: Nestlé's Strategies For Standard Operating Procedures (SOP) and Documentation
No ratings yet
Improving Quality in Food Products: Nestlé's Strategies For Standard Operating Procedures (SOP) and Documentation
10 pages
DataAnalytics - Google Sheets
No ratings yet
DataAnalytics - Google Sheets
3 pages
5 Junior P.E and Arts
No ratings yet
5 Junior P.E and Arts
83 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Drone Suppliers Uae
No ratings yet
Drone Suppliers Uae
5 pages
Real-Time Dashboard For Managing Employee Records
No ratings yet
Real-Time Dashboard For Managing Employee Records
1 page
Da PDF
No ratings yet
Da PDF
1 page
Sec PDF
No ratings yet
Sec PDF
1 page
Interfacing of LED 8051
No ratings yet
Interfacing of LED 8051
16 pages
Machine Learning - Exploring The Model - Resp
No ratings yet
Machine Learning - Exploring The Model - Resp
18 pages
Anupam Jain Resume
No ratings yet
Anupam Jain Resume
2 pages
1 2
No ratings yet
1 2
2 pages
Assignment 1 To 4 - BTC507 - 20376005
No ratings yet
Assignment 1 To 4 - BTC507 - 20376005
35 pages
6 7
No ratings yet
6 7
1 page
Implementation of Modbus Slave TCPIP For Alfen NG9xx Platform
No ratings yet
Implementation of Modbus Slave TCPIP For Alfen NG9xx Platform
15 pages
Preparation and Delivery of Sermons Manual
No ratings yet
Preparation and Delivery of Sermons Manual
4 pages
Software Configuration Management Report (SCMR) SIH 1381 Analysis and Identification of Malicious Mobile Applications
No ratings yet
Software Configuration Management Report (SCMR) SIH 1381 Analysis and Identification of Malicious Mobile Applications
5 pages
f1 New
No ratings yet
f1 New
5 pages
Requirement Analysis Report SIH 1381 Analysis and Identification of Malicious Mobile Applications
No ratings yet
Requirement Analysis Report SIH 1381 Analysis and Identification of Malicious Mobile Applications
7 pages
Rajveer Singh To Saideep 3 Address
No ratings yet
Rajveer Singh To Saideep 3 Address
1 page
Os
No ratings yet
Os
3 pages
Cs406 Java
No ratings yet
Cs406 Java
4 pages
Ok Java Case Study
No ratings yet
Ok Java Case Study
18 pages
Indore To Agra
No ratings yet
Indore To Agra
2 pages
Rohit Karkale Resume
No ratings yet
Rohit Karkale Resume
2 pages
Astm A799a799m - 10
No ratings yet
Astm A799a799m - 10
4 pages
A Conversation With William Rathje-Anthropology Today
No ratings yet
A Conversation With William Rathje-Anthropology Today
7 pages
Recruitment Selection Training
No ratings yet
Recruitment Selection Training
29 pages
Sarthak Resume
No ratings yet
Sarthak Resume
1 page
7 Habits of Highly Effective People
No ratings yet
7 Habits of Highly Effective People
2 pages
Techniques in Measuring Microbial Growth
No ratings yet
Techniques in Measuring Microbial Growth
7 pages
Health - Lisa Bouslimani - Mental Wellbeing 2024-06-22
No ratings yet
Health - Lisa Bouslimani - Mental Wellbeing 2024-06-22
2 pages
Lubrizol 1038 - Auto Gear Oil - Tds
No ratings yet
Lubrizol 1038 - Auto Gear Oil - Tds
3 pages
New Resume
No ratings yet
New Resume
1 page
Skymionic Beams PDF
No ratings yet
Skymionic Beams PDF
6 pages
Science: Quarter 2 - 3 Where D O I C O Mef Rom ?
No ratings yet
Science: Quarter 2 - 3 Where D O I C O Mef Rom ?
23 pages
Quality Work Life
No ratings yet
Quality Work Life
12 pages
Hydroline Breather FSB TB 130417
No ratings yet
Hydroline Breather FSB TB 130417
3 pages
B. Stage 1 and 2
No ratings yet
B. Stage 1 and 2
20 pages
Gender: Project All Numerates Pre-Test Results
100% (1)
Gender: Project All Numerates Pre-Test Results
6 pages
F4 Chapter 3 (Exercise 6)
No ratings yet
F4 Chapter 3 (Exercise 6)
3 pages
A Journey of Self-Actualization of Amir in The Kite Runner
No ratings yet
A Journey of Self-Actualization of Amir in The Kite Runner
4 pages
Evidence and Practice
No ratings yet
Evidence and Practice
18 pages
Writing Ratios and Proportions
No ratings yet
Writing Ratios and Proportions
10 pages
Tire Dimensions
No ratings yet
Tire Dimensions
1 page
Filling Station Case Study
No ratings yet
Filling Station Case Study
22 pages
MasterCast 222 TDS-974770
No ratings yet
MasterCast 222 TDS-974770
2 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
CNP Bill
No ratings yet
CNP Bill
1 page
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Gradient Descent Algorithm.Y...

Uploaded by

Gradient Descent Algorithm.Y...

Uploaded by

Title: Gradient Descent Algorithm

Student Name: Gautam Dhodi

Facts and Figures:

import numpy as np

Real-world Problem Statement:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Gradient Descent Algorithm.Y...

Uploaded by

Gradient Descent Algorithm.Y...

Uploaded by

Title: Gradient Descent Algorithm

Student Name: Gautam Dhodi

Facts and Figures:

import numpy as np​

Real-world Problem Statement:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

import numpy as np