0% found this document useful (0 votes)

107 views46 pages

L3 Linear Regression and Gradient Descent

Linear regression models a relationship between two variables (x and y) using a linear function. The parameters (θ0 and θ1) of the linear function (hypothesis h) are chosen to minimize a cost function J. Gradient descent is an algorithm that can be used to iteratively update the parameters and reduce the cost function, finding the optimal parameters that fit the training data.

Uploaded by

joseph karim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views46 pages

L3 Linear Regression and Gradient Descent

Uploaded by

joseph karim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Linear Regression and

Gradient Descent
Mariette Awad

Slide sources for this set of slides: Stanford Intro to ML course

Lesson Objectives
• How to model ML problems (using hypothesis function)
• How to learn ML models (using cost function and objective function)
• Learn Gradient Descent (most popular algorithm) for solving the
optimization problem in the objective function, and deriving ML model
parameters.

• Learn more about regression and applications of the above key concepts
with linear regression:
• Learn what are ”regression” models
• Learn what linear “regression” models are
• Illustrate how parameters are used to represent a linear model
• Explain what is a hypothesis function and how it is learned
• Explain what is a cost function and how it is used to learn the hypothesis
• Illustrate how the cost function changes
• Explain Gradient Descent Algorithm
Outline

• Intro to Linear Models

• Regression Model Representation / Hypothesis Function
• Cost Function and Objective Function for Regression
• Behavior of Cost Function with one parameter
• Behavior of Cost Function with multiple parameters
• Gradient Descent: Intuition and Mathematical Description
Linear Models
500
Example 1 - Predicting House 400

Prices Price
300

200
• Predict House price in Portland, OR. Given (in 1000s of
dollars)
100
prices per size
0
• Learning algorithm can fit a straight line or 0 1000 2000 3000
Size (feet2)
quadratic. Later we will see how to choose.
• We have given the algorithm “right Size in Price ($) in
answers” (actual price), and the task of the feet2 (x) 1000's (y)
algorithm is to find more “right answers”. 2104 460
• House prices problem is called regression: 1416 232
Predict continuous valued numbers. 1534 315
852 178
… …
Notations
•

• m = Number of training examples

• x’s = “input” variable / features
• y’s = “output” variable / “target” variable
Supervised Learning
Learning Goal Process:
Training Set • Feed training data to Learning
Algorithm
• The output of the learning
algorithm a function (h) a
Learning Algorithm hypothesis (name used
historically. May not be best
name)
• h is a function that maps the x-
Size of h Estimated values to y’s
house price • The hypothesis function takes the
size of house as input and
produce the price of the house
Model Representation
How do we represent h ?
Learning Goal • For now (start simple), we will use
the following representation
Training Set
• h(x) = 0 + 1x
• i ‘s : 0 and 1 are the parameters
of the model
Learning Algorithm • shorthand h(x)
• In this case, h is predicting that y is a
linear function of x
• Why linear function? Just to start
Size of h Estimated simple. Later we will do more
house price complicated models.
Linear regression with one variable.
Univariate linear regression.
Size in feet2 (x) Price ($) in 1000's (y)
Training Set 2104 460
1416 232
1534 315
852 178
… …
Hypothesis:
‘s: Parameters
How to choose ‘s ?
Cost Function
How to choose ‘s ? How do we pick the
best “h”
500

400

300

200

100

0
0 500 1000 1500 2000 2500 3000

Need a cost function to assess “goodness” of function

Consider the following three cases

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
Learning Objective

• We want to come up with values for y

the i ‘s so that the model fits the
data well.
x

Idea: Choose so that

is close to for
our training examples
Formal Description of Cost Function (1/2)
• Formally, we want to solve the minimization problem:
1 𝑖 𝑖 2
• min 2𝑚 σ𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑠𝑒𝑡 ℎ 𝑥 −𝑦
(𝜃0 ,𝜃1 )
• In other words, find the values of 0 and 1 so that the sum is minimized.
• This is our objective function for linear regression
• By convention, we define a cost function
1 𝑖 𝑖 2
• J(0 , 1) = σ𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑠𝑒𝑡 ℎ 𝑥 − 𝑦
2𝑚
• Objective Function: We want to minimize J(0 , 1) over 0 and 1,
min J(0 , 1)
(𝜃0 ,𝜃1 )
• This particular choice of cost function is called the squared error
function. It is very common for regression problems.
Formal Description (2/2)
Hypothesis:

Parameters:

Cost Function:

Goal/Objective Function:
Behavior of Cost Function
Two Key things to consider
• Two key functions we want to understand:
• hypothesis function (h) and
• the cost function (J)

• To visualize the cost function, we simplify the

hypothesis to h(x) = 1x
• So the optimization objective is to
• Minimize (over 1) J(1)
Simplified
Hypothesis:

Parameters:

Cost Function:

Goal:
Illustration & Intuition
•Consider the case where, the training data
consists of the following (x,y) values: (1,1), (2,2),
and (3,3)
•Let’s check the two key parts (h and J).
•Consider different possible parameter values (for
h) and see the impact on J
Behavior of Cost Function with
one parameter
Case study different 1 values:
h(x) = x; h(x) = 0.5x and h(x) = 0

(for fixed , this is a function of x) (function of the parameter )

3 3

2 2

y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2

y
1 𝑖 (𝑖)
1
ℎ𝜃 𝑥 =𝑦
Type equation here.
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
𝜃1 = 1
x
𝑚 𝑚
1 2 1 2 1
𝐽 𝜃1 = ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) = ෍ 𝜃1 𝑥 (𝑖) − 𝑦 (𝑖) = 02 + 02 + 02 = 0 𝐽 1 =0
2𝑚 2𝑚 2𝑚
𝑖=1 𝑖=1
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2

y
1 1

ℎ𝜃 (𝑥 𝑖 )
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1 2 2 2
1 3.5
𝐽 0.5 = 0.5 − 1 + 1−2 + 1.5 − 3 = 3.5 = ≈ 0.58
2𝑚 2×3 6
5.25

(for fixed , this is a function of x) (function of the parameter )

3 3
𝜃1 = 1

2 2

y 𝜃1 = 0.5

1 1
𝜃1 = 1
𝜃1 = 0
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
𝜃1 = −0.5
x
1 2 2 2
1
𝐽 0 = 1 + 2 + 3 = × 14 ≈ 2.3 𝑓𝑖𝑛𝑑𝑖𝑛𝑔 𝜃1 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑤𝑒 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐽(𝜃1 )
2𝑚 6
Best 1 (h)
• For this example, the best 1 = 1.
5.25

1
𝜃1
0
-0.5 0 0.5 1 1.5 2 2.5
Behavior of Cost Function
with multiple parameters
Visualization of the cost function when
considering both 0 and 1
Hypothesis:
Parameters:
Cost Function:
Goal:
Cost Function Visualization for two parameters
• A particular set of (0 , 1) values correspond to a point in the 3D plot
or on one of the contour plots.
(for fixed , this is a function of x) (function of the parameters )

𝐽(𝜃0 , 𝜃1 )

ℎ(𝑥)

−0.15

800

𝜃0 , 𝜃1
(for fixed , this is a function of x) (function of the parameters )

ℎ 𝑥 = 360 + 0 ∙ 𝑥

𝜃0 = 360, 𝜃1 = 0
(for fixed , this is a function of x) (function of the parameters )

ℎ(𝑥)
(for fixed , this is a function of x) (function of the parameters )

ℎ(𝑥)
Gradient Descent
Intuitions and Math Description
Overview of Gradient Descent
• Need an Algorithm to efficiently find the values of i
that optimize the cost function J(i)
• Solution: Gradient Descent Algorithm to minimize a
cost function J
• Approach:
• Start with random choices of 0 and 1 (common choice is to set
them to 0’s)
• Keep changing 0 and 1 to reduce J(0 , 1)
• Until we hopefully get to a minimum
Idea behind Gradient Descent
• Consider the Figure with two hills, and consider
standing on one point on one of the hills.
• The idea of gradient descent is that you look
around you 360 degrees and ask yourself in
what direction should I move if I want to move
the fastest down hill.
• Starting from a given point, you may end up in
one local minimum (one valley bottom).
• If you start from another point, you may end up
in another local minimum (another valley
bottom).
Mathematical Description • The notation “:=”means we are
assigning the right to the left. This is
Gradient descent algorithm
similar to programming “=”.
𝑟𝑒𝑝𝑒𝑎𝑡 𝑢𝑛𝑡𝑖𝑙 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒 { • Alpha is called the “learning rate”. It
𝜕
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 J 𝜃0 , 𝜃1 𝑓𝑜𝑟 𝑗 = 0 𝑎𝑛𝑑 𝑗 = 1 controls the size of the step we take
𝜕𝜃𝑗
every time in the steepest descent.
}
• The derivative term is the partial
Correct: Simultaneous update derivative. (needs calculus
𝜕 background).
𝑡𝑒𝑚𝑝0 ≔ 𝜃0 − 𝛼 J 𝜃 ,𝜃
𝜕𝜃𝑗 0 1 • Note the subtilty in the algorithm is to
𝜕 have the SIMULTANEOUS update. Note
𝑡𝑒𝑚𝑝1 ≔ 𝜃1 − 𝛼 J 𝜃 ,𝜃
𝜕𝜃𝑗 0 1 that if you do not do simultaneous
𝜃0 ≔ 𝑡𝑒𝑚𝑝0 update, you would probably be
𝜃1 ≔ 𝑡𝑒𝑚𝑝1
implementing a different algorithm
with different properties.
Consider case of one variable
• To convey the intuition, we will use one 5.25
parameter Minimize (over 1) J(1)
• Notice now the partial is just the derivative and
the derivative with respect to 1 is the same as 3
the slope of the tangent going through that
2
point.
• With a parabolic shape, on the right side of the 1
minimum, the slope is positive. On the left side 𝜃1
0
of the minimum, the slope is negative.
-0.5 0 0.5 1 1.5 2 2.5
Behavior
• On the right side, we would then update the
parameter with -alpha*the positive slope ==
negative number. So we are moving left
(backwards) towards the minimum and this is
the right thing to do.

• On the left side, we would then update the

parameter with -alpha*the negative slope ==
positive number. So we are moving right
(forward) towards the minimum and this is the
right thing to do.

• So both cases, the gradient descent works

correctly.
Learning Rate
About the learning rate (alpha):
• If the learning too small, the gradient descent can be slow
• If the learning too large, the gradient descent may overshoot the minimum. It may even fail to
converge or even diverge.

What if the parameter theta is already at a local minimum,

what would the gradient descent do?
• It turns out that a local minimum, the derivative is zero,
So the parameter is left unchanged (which is what we
want)

• Note that even of alpha is fixed the gradient descent may

still converge if the derivatives get smaller as we get closer
to the minimum, so do not overshoot it. So in effect, we are
taking smaller and smaller steps. As we approach local
minimum gradient descent automatically takes smaller steps
and no need to reduce alpha
Visualizing Learning Rate

How CF varies
with iterations,
for different
learning rates?
Applying Gradient Descent to
Linear Regression
Applying GD to Linear regression
• Putting gradient descent algorithm with cost function for linear
regression to get out our first ML algorithm.

• We want to apply gradient descent on our minimum square function

1 𝑖 𝑖 2
min ෍ ℎ 𝑥 −𝑦
(𝜃0 ,𝜃1 ) 2𝑚 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑠𝑒𝑡

1 𝑖 𝑖 2
• The cost function J(0 , 1) = σ𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑠𝑒𝑡 ℎ 𝑥 −𝑦
2𝑚
Applying the gradient descent, we need to
find the partial derivatives:
𝜕 𝜕 1 2
J 𝜃0 , 𝜃1 = ∙ σ𝑚
𝑖=1 ℎ𝜃 𝑥 𝑖 −𝑦 𝑖
𝜕𝜃𝑗 𝜕𝜃𝑗 2𝑚
𝑚
𝜕 1 𝑖 𝑖 2
= ∙ ෍ 𝜃0 + 𝜃1 𝑥 −𝑦
𝜕𝜃𝑗 2𝑚
𝑖=1

𝜕 1 1
𝜃0 𝑜𝑟 𝑗 = 0: J 𝜃0 , 𝜃1 = σ𝑚 ℎ𝜃 𝑥 𝑖
−𝑦 𝑖
𝜕𝜃0 𝑚 𝑖=1

𝜕 1 1
𝜃1 𝑜𝑟 𝑗 = 1: J 𝜃0 , 𝜃1 = σ𝑚 ℎ𝜃 𝑥 𝑖
−𝑦 𝑖
∙𝑥 𝑖
𝜕𝜃1 𝑚 𝑖=1
Gradient descent algorithm

𝑟𝑒𝑝𝑒𝑎𝑡 𝑢𝑛𝑡𝑖𝑙 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒

{
𝑚
1 𝑖 𝑖
𝜃0 ≔ 𝜃0 − 𝛼 ෍ ℎ𝜃 𝑥 −𝑦
𝑚
𝑖=1
𝑚
1 𝑖 𝑖 𝑖
𝜃1 ≔ 𝜃1 − 𝛼 ෍ ℎ𝜃 𝑥 −𝑦 ∙𝑥
𝑚
𝑖=1
}

Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
Functions Modeling Change A Preparation for Calculus 5th Edition Connally Solutions Manual - Download All Chapters Immediately In PDF Format
100% (1)
Functions Modeling Change A Preparation for Calculus 5th Edition Connally Solutions Manual - Download All Chapters Immediately In PDF Format
66 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Full Download Trigonometry Charles P. Mckeague PDF
100% (7)
Full Download Trigonometry Charles P. Mckeague PDF
53 pages
R2023-AIML-Curriculum & Syllabus Batch 2024-2025
No ratings yet
R2023-AIML-Curriculum & Syllabus Batch 2024-2025
75 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
12 pages
Machine Learning Notes by Standard Andrew Ng
No ratings yet
Machine Learning Notes by Standard Andrew Ng
142 pages
[ML&PR 2025] Lec2 Regression II
No ratings yet
[ML&PR 2025] Lec2 Regression II
41 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
CS229
No ratings yet
CS229
69 pages
Linear Regression
No ratings yet
Linear Regression
30 pages
Cbse Grade 11-Ch.2 Relations and Functions Worksheet
No ratings yet
Cbse Grade 11-Ch.2 Relations and Functions Worksheet
3 pages
CSE445 T3 Linear Regression One Variable
No ratings yet
CSE445 T3 Linear Regression One Variable
57 pages
[MLP] MidtermNote
No ratings yet
[MLP] MidtermNote
31 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Fuzzy Sets - Zadeh - 1965
100% (1)
Fuzzy Sets - Zadeh - 1965
16 pages
[PR 2024] Lec2 Regression II
No ratings yet
[PR 2024] Lec2 Regression II
41 pages
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
Lecture 6
No ratings yet
Lecture 6
51 pages
Lecture 3
No ratings yet
Lecture 3
56 pages
lecture7-linear-regression
No ratings yet
lecture7-linear-regression
36 pages
ML02
No ratings yet
ML02
25 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
Lecture 2-Linear-Regression-Part1
No ratings yet
Lecture 2-Linear-Regression-Part1
80 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
43 pages
Unit 4 - Linear Regression
No ratings yet
Unit 4 - Linear Regression
52 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
cs229 2
No ratings yet
cs229 2
275 pages
Regression
No ratings yet
Regression
30 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
Linear Regression
No ratings yet
Linear Regression
30 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
2 (1)
No ratings yet
2 (1)
18 pages
04 LinearRegression PDF
No ratings yet
04 LinearRegression PDF
61 pages
ML Coursera
No ratings yet
ML Coursera
10 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Slide 3 - Linear Regression One Variable
No ratings yet
Slide 3 - Linear Regression One Variable
60 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Hydrostatic Equilibrium Centrifugal Field - Liquid Height
100% (2)
Hydrostatic Equilibrium Centrifugal Field - Liquid Height
1 page
Unit VI Optimization Techniques question bank solved answer
No ratings yet
Unit VI Optimization Techniques question bank solved answer
20 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
2 1 - Abb 615 Series Technical Manual A
No ratings yet
2 1 - Abb 615 Series Technical Manual A
164 pages
Classification With Deep Neural Networks and Logistic Loss: Zihan Zhang
No ratings yet
Classification With Deep Neural Networks and Logistic Loss: Zihan Zhang
117 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
4. Gradient Descent
No ratings yet
4. Gradient Descent
15 pages
OFP 009 Mathematics
No ratings yet
OFP 009 Mathematics
205 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
7 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
12 pages
Mean Value Theorems
No ratings yet
Mean Value Theorems
10 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Multiple Attribute Decision-Making Method Under Hesitant Single Valued Neutrosophic Uncertain Linguistic Environment
No ratings yet
Multiple Attribute Decision-Making Method Under Hesitant Single Valued Neutrosophic Uncertain Linguistic Environment
20 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
Machine Learning - Exploring The Model - Resp
No ratings yet
Machine Learning - Exploring The Model - Resp
18 pages
Algebra 1 Sol 2015
No ratings yet
Algebra 1 Sol 2015
57 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Precalculus m3 End of Module Assessment
No ratings yet
Precalculus m3 End of Module Assessment
37 pages
Chapter 06
No ratings yet
Chapter 06
43 pages
calculus1 by Khalid Farhan chapter 1
No ratings yet
calculus1 by Khalid Farhan chapter 1
15 pages
Russell - On Some Difficulties in The Theory of Transfinite
No ratings yet
Russell - On Some Difficulties in The Theory of Transfinite
25 pages
Math 101 e
No ratings yet
Math 101 e
17 pages
Unit 5 - Cov-Mm
No ratings yet
Unit 5 - Cov-Mm
30 pages
EMATH-121 Electronic Engineering
No ratings yet
EMATH-121 Electronic Engineering
17 pages
Maharashtra HSC Maths Syllabus_1738809650883
No ratings yet
Maharashtra HSC Maths Syllabus_1738809650883
10 pages
Unit 10
No ratings yet
Unit 10
16 pages
Week 1 Notes
No ratings yet
Week 1 Notes
35 pages
Chapter Two: 2. LINEAR PROGRAMMING: Application and Model Formulation
No ratings yet
Chapter Two: 2. LINEAR PROGRAMMING: Application and Model Formulation
51 pages
Bma C 101
No ratings yet
Bma C 101
4 pages
Big 10 AP Review (Inactive Key)
No ratings yet
Big 10 AP Review (Inactive Key)
3 pages
Chapter 1 Functions and Graphs 1.1 Functions: Example 2 Given
No ratings yet
Chapter 1 Functions and Graphs 1.1 Functions: Example 2 Given
8 pages
I Pu Maths 1
No ratings yet
I Pu Maths 1
2 pages
5 Roll's Theorem.
No ratings yet
5 Roll's Theorem.
5 pages
Created 2 Marks - 2023
No ratings yet
Created 2 Marks - 2023
2 pages
Python programs for practical exam
No ratings yet
Python programs for practical exam
1 page
Linear and Nonlinear Programming Essentials
From Everand
Linear and Nonlinear Programming Essentials
Tanushri Kaniyar
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

L3 Linear Regression and Gradient Descent

Uploaded by

L3 Linear Regression and Gradient Descent

Uploaded by

Linear Regression and

Slide sources for this set of slides: Stanford Intro to ML course

• Intro to Linear Models

• m = Number of training examples

Need a cost function to assess “goodness” of function

• We want to come up with values for y

Idea: Choose so that

• To visualize the cost function, we simplify the

(for fixed , this is a function of x) (function of the parameter )

(for fixed , this is a function of x) (function of the parameter )

• On the left side, we would then update the

• So both cases, the gradient descent works

What if the parameter theta is already at a local minimum,

• Note that even of alpha is fixed the gradient descent may

• We want to apply gradient descent on our minimum square function

𝑟𝑒𝑝𝑒𝑎𝑡 𝑢𝑛𝑡𝑖𝑙 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.