Linear Regression - Gradient Descent Method
Linear Regression - Gradient Descent Method
D r. JASMEET S INGH
ASSISTANT P ROFESSOR, C SED
T IET, PATIALA
Simple Linear Regression- Cost Function
Overview
We know, the linear function that binds the input Table 1: Dataset for SLR containing 3
variable x with the corresponding predicted value instances
of (yˆ) is given by:
𝑦ˆ=𝛽 + 𝛽 𝑥 Independent Variable Dependent Variable
(xi) (yi)
The cost function (mean square error function) is
given by: 1 1
1 2 2
J(𝛽 , 𝛽 )= (𝑦 − 𝛽 − 𝛽 𝑥 )
𝑛
3 3
The cost function is a function of 𝛽 and 𝛽 .
Let’s plot the cost function as function of 𝛽
considering 𝛽 =0 (for the sake of simplicity i.e.,
2D view). Consider the dataset shown in Table 1
Plot of Cost Function of SLR
Table 2: Value of Cost Function 𝐽 𝛽 for different values of 𝛽 J(β1) vs. β1
(using dataset shown in Table 1) 5
4.5
4
S.No 𝛽 1 3.5
J(𝛽 )= (𝑦 − 𝛽 𝑥 ) 3
𝑛
J(β1)
2.5
Gradient of the cost function with respect to input parameters is given by:
𝜕𝐽 1
= (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … … . . +𝛽 𝑥 − 𝑦 )
𝜕𝛽 𝑛
Gradient Descent Optimization for MLR
Similarly, = ∑ (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … … . . +𝛽 𝑥 − 𝑦 ) × 𝑥
= ∑ (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … … . . +𝛽 𝑥 − 𝑦 ) × 𝑥
𝜕𝐽 1
= (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … … . . +𝛽 𝑥 − 𝑦 ) × 𝑥
𝜕𝛽 𝑛
⋮
⋮
⋮
𝝏𝑱 𝟏
In general, 𝝏𝜷 = 𝒏 ∑𝒏𝒊 𝟏(𝜷𝟎 + 𝜷𝟏 𝒙𝒊𝟏 + 𝜷𝟐 𝒙𝒊𝟐 + 𝜷𝟑 𝒙𝒊𝟑 + ⋯ … … … . . +𝜷𝒌 𝒙𝒊𝒌 − 𝒚𝒊 ) × 𝒙𝒊𝒋
𝒋
Gradient Descent Optimization for MLR
The gradient descent optimization for Multiple Linear Regression is summarized as below:
1. Initialize 𝛽 =0 , 𝛽 = 0, 𝛽 = 0,…………………………… 𝛽 = 0
2. Update parameters until convergence or for fixed number of iterations using following
equation:
𝛼
𝛽 =𝛽 − (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … … . . +𝛽 𝑥 − 𝑦 ) × 𝑥
𝑛
For j=0,1,2,3……………..k
Where xi0=1 and k are the total number of iterations
Gradient Descent Optimization for MLR-
Example
Consider the following dataset that shows the Stock Index Price as function of Interest Rate and
Unemployment rate. For the given dataset show first iteration of gradient descent optimization for
linear regression. Initialize 𝛽 =0 , 𝛽 = 0, , 𝛽 = 0 and consider learning rate as 0.01
Interest Rate (xi1) Unemployment rate (xi2) Stock Index Price (yi)
2.75 5.3 1464
2.5 5.3 1394
2.25 5.5 1159
2 5.7 1130
2 5.9 1075
2 6 1047
1.75 5.9 965
1.75 6.1 719
Gradient Descent Optimization for MLR-
Example
Initially, 𝛽 =0 , 𝛽 = 0, , 𝛽 = 0
S.No (xi1) (xi2) (yi) xi1xi2 xi1yi xi2yi (xi1)2 (xi2i)2
Iteration I: 1 2.75 5.3 1464 14.575 4026 7759.2 7.5625 28.09
𝛼 2 2.5 5.3 1394 13.25 3485 7388.2 6.25 28.09
𝑡𝑒𝑚𝑝0 ≔ 𝛽 − (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 − 𝑦
𝑛 2607.7
3 2.25 5.5 1159 12.375 6374.5 5.0625 30.25
0.01 5
𝑡𝑒𝑚𝑝0 ≔ 0 − 8 × 0 + 17 × 0 + 45.7 × 0 − 8953 = 11.19
8 4 2 5.7 1130 11.4 2260 6441 4 32.49
.
6 2 6 1047 12 2094 6282 4 36
𝑡𝑒𝑚𝑝1 ≔ 0 − 17 × 0 + 37 × 0 + 96.4 × 0 − 19569.75 = 24.46
1688.7
7 1.75 5.9 965 10.325 5693.5 3.0625 34.81
5
𝑡𝑒𝑚𝑝2 ≔ 𝛽 − ∑ (𝛽 + 𝛽 𝑥 + 𝛽 𝑥 − 𝑦 × 𝑥 ) 1258.2
8 1.75 6.1 719 10.675 4385.9 3.0625 37.21
. 5
𝑡𝑒𝑚𝑝2 ≔ 0 − 45.7 × 0 + 96.4 × 0 + 261.75 × 0 − 50666.8 = 63.33 19569. 50666.
Total 17 45.7 8953 96.4 37 261.75
75 8
𝛽 =11.19 , 𝛽 = 24.46, , 𝛽 = 63.33