0% found this document useful (0 votes)

6 views15 pages

Linear Regression - Gradient Descent Method

The document provides an overview of linear regression and gradient descent optimization, detailing the cost function, its parabolic shape, and the iterative process of gradient descent to minimize the cost function. It explains the importance of the learning rate in convergence and presents the steps for both simple and multiple linear regression using gradient descent. Additionally, an example is included to illustrate the first iteration of gradient descent optimization for a multiple linear regression model.

Uploaded by

Gargi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views15 pages

Linear Regression - Gradient Descent Method

Uploaded by

Gargi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Linear Regression

(Gradient Descent Optimization)

D r. JASMEET S INGH
ASSISTANT P ROFESSOR, C SED
T IET, PATIALA
Simple Linear Regression- Cost Function
Overview
 We know, the linear function that binds the input Table 1: Dataset for SLR containing 3
variable x with the corresponding predicted value instances
of (yˆ) is given by:
𝑦ˆ=𝛽 + 𝛽 𝑥 Independent Variable Dependent Variable
(xi) (yi)
 The cost function (mean square error function) is
given by: 1 1
1 2 2
J(𝛽 , 𝛽 )= (𝑦 − 𝛽 − 𝛽 𝑥 )
𝑛
3 3
 The cost function is a function of 𝛽 and 𝛽 .
Let’s plot the cost function as function of 𝛽
considering 𝛽 =0 (for the sake of simplicity i.e.,
2D view). Consider the dataset shown in Table 1
Plot of Cost Function of SLR
Table 2: Value of Cost Function 𝐽 𝛽 for different values of 𝛽 J(β1) vs. β1
(using dataset shown in Table 1) 5
4.5
4
S.No 𝛽 1 3.5
J(𝛽 )= (𝑦 − 𝛽 𝑥 ) 3
𝑛

J(β1)
2.5

1 1 𝐽 𝛽 = [ 1−1 + (2 − 2) +(3 − 3) ]=0 2

1.5

2 0.5 𝐽 𝛽 = [ 1 − 0.5 + (2 − 1) +(3 − 1.5) ]=1.67

1
0.5

3 0 𝐽 𝛽 = [ 1−0 + (2 − 0) +(3 − 0) ]=4.67

0
0 0.5 1 1.5 2 2.5
β1
4 1.5 𝐽 𝛽 = [ 1 − 1.5 + (2 − 3) +(3 − 4.5) ]=1.67 It is clear from the above function that the cost
5 2 function is parabolic in shape (bowl-shaped)
𝐽 𝛽 = [ 1−2 + (2 − 4) +(3 − 6) ]=4.67
with one point of minimum where the mean
square error is zero.
Plot of Cost Function of SLR
(Cost Function as function of )
SURFACE PLOT CONTOUR PLOT
(BOWL SHAPED CURVE WITH ONLY ONE POINT OF (SAME COLOR LINES MEAN SAME VALUE OF COST FUNCTION AT
MINIMUM) DIFFERENT POINTS OF 𝛽 AND 𝛽 )

Contour plot represents a 3-dimensional surface by plotting constant

z slices, called contours, on a 2-dimensional format.
Gradient Descent Optimization- Introduction
 Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable
function.
 Gradient descent is simply used to find the values of a function's parameters (coefficients) that
minimize a cost function as far as possible. i.e.,
𝑎𝑟𝑔𝑚𝑖𝑛( , , …….., ) 𝐽(𝛽 , 𝛽 , 𝛽 … … . . , 𝛽 )
 It's based on minimizing a convex cost function and tweaks its parameters iteratively to
minimize a given function to its local minimum.
 It considers gradient of the cost function to tune the parameters.
 Gradient can be considered as the slope of a function. The higher the gradient, the steeper the slope
and the faster a model can learn. But if the slope is zero, the model stops learning.
 In mathematical terms, a gradient is a partial derivative of the function with respect to its inputs.
Gradient Descent Optimization- Introduction
 The image illustrates the cost function from a
top-down view and the black arrows are the
steps of gradient descent algorithm.
 The algorithm will reach to different local or
global minimum depending upon the initial
value of 𝛽 and 𝛽 .
 The gradient in this context is a vector that
contains the direction of the steepest step the
algorithm can take and also how long that step
should be.
Steps of Gradient Descent Optimization
 In order to minimize any differentiable cost function, 𝐽(𝛽 , 𝛽 , 𝛽 … … . . , 𝛽 ) , containing
parameters 𝛽 , 𝛽 , 𝛽 … … . . , 𝛽 , following steps are followed in gradient descent optimization:
1. Initialize the parameters, 𝛽 , 𝛽 , 𝛽 … … . . , 𝛽 , to any arbitrary values. Usually, these are set to 0
initial value.
2. Update the values of parameters 𝛽 , 𝛽 , 𝛽 … … . . , 𝛽 , using the following equation (until
convergence or for fixed number of iterations:
( )
𝛽 =𝛽 −𝛼 𝑓𝑜𝑟 𝑗 = 0,1,2, … … . . 𝑘
 This update must be simultaneous i.e., the RHS of the above equations must be stored in temporary
variables for each value of j and then simultaneously assigned.
( )
 Here 𝛼 is called the fixed step size that controls the step size and is called the gradient of
cost function.
 Convergence of 𝛽 ’s means that there is no change in value of 𝛽 which will happen only when
( )
=0
Gradient Descent Optimization- Intuition
 The intuition behind gradient descent optimization is that it may
start from any arbitrary point it may converge at some local or
global minimum.
 For instance, consider cost function with only one parameter (θ1).
The shape of cost function is shown in the images 1 and 2.
( )
 If we start from θ1 as shown in figure 1, then gradient is
positive. Therefore, 𝜃 = 𝜃 − 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦 . So, it will
slowly move towards the minimum point.
( )
 If we start from θ1 as shown in figure 2, then gradient is
negative. Therefore, 𝜃 = 𝜃 + 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦 . So, it will
again slowly move towards the minimum point.
Learning Rate in Gradient Descent
Why is learning rate fixed?
Gradient descent algorithm, can
converge to a local minimum, even
with fixed learning rate.

 As we approach a local minimum,

gradient descent will automatically
take smaller steps.
 So, no need to decrease learning rate
over time.
Learning Rate in Gradient Descent Contd…
What if learning rate is too small? What if learning rate is too large?
 If learning rate is small, then gradient  If learning rate is too large, then
descent will take a lot of time to gradient descent can overshoot the
converge (as shown in the figure). minimum.
 It may fail to converge or even
diverge (as shown in figure below).
Gradient Descent Optimization for
Multiple Linear Regression (MLR)
A multiple linear regression model with k independent predictor variables x1,x2...,xk predicts the
output variable as:
𝑦 ˆ = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … … . . +𝛽 𝑥
 The cost function (mean square error function) is given by:
1
𝐽= (𝑦 − 𝛽 − 𝛽 𝑥 − 𝛽 𝑥 − 𝛽 𝑥 − ⋯ … … … . . −𝛽 𝑥 )
2𝑛

Gradient of the cost function with respect to input parameters is given by:
𝜕𝐽 1
= (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … … . . +𝛽 𝑥 − 𝑦 )
𝜕𝛽 𝑛
Gradient Descent Optimization for MLR
Similarly, = ∑ (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … … . . +𝛽 𝑥 − 𝑦 ) × 𝑥

= ∑ (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … … . . +𝛽 𝑥 − 𝑦 ) × 𝑥
𝜕𝐽 1
= (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … … . . +𝛽 𝑥 − 𝑦 ) × 𝑥
𝜕𝛽 𝑛
⋮
⋮
⋮
𝝏𝑱 𝟏
In general, 𝝏𝜷 = 𝒏 ∑𝒏𝒊 𝟏(𝜷𝟎 + 𝜷𝟏 𝒙𝒊𝟏 + 𝜷𝟐 𝒙𝒊𝟐 + 𝜷𝟑 𝒙𝒊𝟑 + ⋯ … … … . . +𝜷𝒌 𝒙𝒊𝒌 − 𝒚𝒊 ) × 𝒙𝒊𝒋
𝒋
Gradient Descent Optimization for MLR
The gradient descent optimization for Multiple Linear Regression is summarized as below:
1. Initialize 𝛽 =0 , 𝛽 = 0, 𝛽 = 0,…………………………… 𝛽 = 0
2. Update parameters until convergence or for fixed number of iterations using following
equation:
𝛼
𝛽 =𝛽 − (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … … . . +𝛽 𝑥 − 𝑦 ) × 𝑥
𝑛

For j=0,1,2,3……………..k
Where xi0=1 and k are the total number of iterations
Gradient Descent Optimization for MLR-
Example
Consider the following dataset that shows the Stock Index Price as function of Interest Rate and
Unemployment rate. For the given dataset show first iteration of gradient descent optimization for
linear regression. Initialize 𝛽 =0 , 𝛽 = 0, , 𝛽 = 0 and consider learning rate as 0.01

Interest Rate (xi1) Unemployment rate (xi2) Stock Index Price (yi)
2.75 5.3 1464
2.5 5.3 1394
2.25 5.5 1159
2 5.7 1130
2 5.9 1075
2 6 1047
1.75 5.9 965
1.75 6.1 719
Gradient Descent Optimization for MLR-
Example
Initially, 𝛽 =0 , 𝛽 = 0, , 𝛽 = 0
S.No (xi1) (xi2) (yi) xi1xi2 xi1yi xi2yi (xi1)2 (xi2i)2
Iteration I: 1 2.75 5.3 1464 14.575 4026 7759.2 7.5625 28.09
𝛼 2 2.5 5.3 1394 13.25 3485 7388.2 6.25 28.09
𝑡𝑒𝑚𝑝0 ≔ 𝛽 − (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 − 𝑦
𝑛 2607.7
3 2.25 5.5 1159 12.375 6374.5 5.0625 30.25
0.01 5
𝑡𝑒𝑚𝑝0 ≔ 0 − 8 × 0 + 17 × 0 + 45.7 × 0 − 8953 = 11.19
8 4 2 5.7 1130 11.4 2260 6441 4 32.49

𝑡𝑒𝑚𝑝1 ≔ 𝛽 − ∑ (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 − 𝑦 × 𝑥 ) 5 2 5.9 1075 11.8 2150 6342.5 4 34.81

.
6 2 6 1047 12 2094 6282 4 36
𝑡𝑒𝑚𝑝1 ≔ 0 − 17 × 0 + 37 × 0 + 96.4 × 0 − 19569.75 = 24.46
1688.7
7 1.75 5.9 965 10.325 5693.5 3.0625 34.81
5
𝑡𝑒𝑚𝑝2 ≔ 𝛽 − ∑ (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 − 𝑦 × 𝑥 ) 1258.2
8 1.75 6.1 719 10.675 4385.9 3.0625 37.21
. 5
𝑡𝑒𝑚𝑝2 ≔ 0 − 45.7 × 0 + 96.4 × 0 + 261.75 × 0 − 50666.8 = 63.33 19569. 50666.
Total 17 45.7 8953 96.4 37 261.75
75 8
𝛽 =11.19 , 𝛽 = 24.46, , 𝛽 = 63.33

Handbook of Dynamic Game Theory
No ratings yet
Handbook of Dynamic Game Theory
1,288 pages
Chapter 6 - Distribution and Network Models: Cengage Learning Testing, Powered by Cognero
100% (1)
Chapter 6 - Distribution and Network Models: Cengage Learning Testing, Powered by Cognero
42 pages
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
No ratings yet
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
38 pages
Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
2004 Prenticehall S Thomas Foster Jr1595
No ratings yet
2004 Prenticehall S Thomas Foster Jr1595
53 pages
Gradient Descent Unit3
No ratings yet
Gradient Descent Unit3
9 pages
Chap6 (Regression)
No ratings yet
Chap6 (Regression)
74 pages
Optimal Power Flow Using Archimedes Optimizer Algorithm
No ratings yet
Optimal Power Flow Using Archimedes Optimizer Algorithm
16 pages
Gradient Descent and Cost Function
No ratings yet
Gradient Descent and Cost Function
35 pages
Taylor Introms12 PPT 02
No ratings yet
Taylor Introms12 PPT 02
44 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Bundle Adjustment - A Modern Synthesis: Bill - Triggs@
No ratings yet
Bundle Adjustment - A Modern Synthesis: Bill - Triggs@
71 pages
Linear Regression
No ratings yet
Linear Regression
95 pages
CV Just
No ratings yet
CV Just
41 pages
8 - Linear Regression - Gradient Descent Method
No ratings yet
8 - Linear Regression - Gradient Descent Method
21 pages
01-Data Pricing Strategy Based On Data Quality
No ratings yet
01-Data Pricing Strategy Based On Data Quality
10 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
Gradient Descent
No ratings yet
Gradient Descent
108 pages
Codes/Ads - A Fortran Control Program For Engineering Synthesis Using The ADS Optimization Program
No ratings yet
Codes/Ads - A Fortran Control Program For Engineering Synthesis Using The ADS Optimization Program
89 pages
Aggregate Planning
No ratings yet
Aggregate Planning
14 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
GD Types
No ratings yet
GD Types
98 pages
Water Pricing
No ratings yet
Water Pricing
16 pages
Kalman Filter Slides
No ratings yet
Kalman Filter Slides
27 pages
Linear Regression 1
No ratings yet
Linear Regression 1
10 pages
Assignment B 4 GradientDescent
No ratings yet
Assignment B 4 GradientDescent
5 pages
Gradient Descent Example PDF
No ratings yet
Gradient Descent Example PDF
3 pages
Introduction To Gradient Descent
No ratings yet
Introduction To Gradient Descent
8 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
ML Lecture # 03 Gradient Descent
No ratings yet
ML Lecture # 03 Gradient Descent
23 pages
(PR 2024) Lec2 Regression II
No ratings yet
(PR 2024) Lec2 Regression II
41 pages
Lec05-1-Gradient Descent-Detailed
No ratings yet
Lec05-1-Gradient Descent-Detailed
62 pages
Chapter 2
No ratings yet
Chapter 2
58 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
ERAD 2002: A Variational Method For Attenuation Correction of Radar Signal
No ratings yet
ERAD 2002: A Variational Method For Attenuation Correction of Radar Signal
6 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Gradient Descent Algorithm.Y...
No ratings yet
Gradient Descent Algorithm.Y...
10 pages
Ch02 Solutions
No ratings yet
Ch02 Solutions
34 pages
Lmfit
No ratings yet
Lmfit
117 pages
07 Gradient Descent For Linear Regression 10 Min
No ratings yet
07 Gradient Descent For Linear Regression 10 Min
5 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Role of Tolerances and Dimensions in Design
No ratings yet
Role of Tolerances and Dimensions in Design
33 pages
Gradient Descent From Scratch Complete Intuition
No ratings yet
Gradient Descent From Scratch Complete Intuition
8 pages
Gradient Descent Final
No ratings yet
Gradient Descent Final
27 pages
Gradient Decent
No ratings yet
Gradient Decent
40 pages
Week 9 PDF
No ratings yet
Week 9 PDF
70 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
AI33
No ratings yet
AI33
6 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Decision Science PPT 2
No ratings yet
Decision Science PPT 2
20 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
12 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
Gradient Descent and Cost Function
No ratings yet
Gradient Descent and Cost Function
14 pages
0401544-Hydraulic Structures
No ratings yet
0401544-Hydraulic Structures
38 pages
Stochastic Gradient Descent Algorithm
No ratings yet
Stochastic Gradient Descent Algorithm
6 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
Deep Learning (Part 8) - Coursesteach
No ratings yet
Deep Learning (Part 8) - Coursesteach
16 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Deep Learning by AndrewNG Tutorial Notes
No ratings yet
Deep Learning by AndrewNG Tutorial Notes
298 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Location Capacity Demand Allocation Telecom Optic
No ratings yet
Location Capacity Demand Allocation Telecom Optic
10 pages
Dr. R K Singh Professor, Operations Management Management Development Institute, Gurgaon
100% (1)
Dr. R K Singh Professor, Operations Management Management Development Institute, Gurgaon
106 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Example 2: Feed Mix Problem
100% (1)
Example 2: Feed Mix Problem
11 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Finals Smath
No ratings yet
Finals Smath
4 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
Separations and Reaction Engineering Design Project Production of MTBE
No ratings yet
Separations and Reaction Engineering Design Project Production of MTBE
10 pages
Linear Programming
100% (1)
Linear Programming
6 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Linear Regression - Gradient Descent Method

Uploaded by

Linear Regression - Gradient Descent Method

Uploaded by

Linear Regression

(Gradient Descent Optimization)

1 1 𝐽 𝛽 = [ 1−1 + (2 − 2) +(3 − 3) ]=0 2

2 0.5 𝐽 𝛽 = [ 1 − 0.5 + (2 − 1) +(3 − 1.5) ]=1.67

3 0 𝐽 𝛽 = [ 1−0 + (2 − 0) +(3 − 0) ]=4.67

Contour plot represents a 3-dimensional surface by plotting constant

 As we approach a local minimum,

𝑡𝑒𝑚𝑝1 ≔ 𝛽 − ∑ (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 − 𝑦 × 𝑥 ) 5 2 5.9 1075 11.8 2150 6342.5 4 34.81

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.