Optimization23 22

The document discusses gradient descent for linear regression. It introduces linear regression as an optimization problem to minimize the mean squared error between predicted and actual output values. Gradient descent is then presented as an algorithm to optimize the parameters in linear regression. It works by iteratively updating the parameters in the direction of the negative gradient of the cost function until convergence. The document contrasts batch, mini-batch, and stochastic gradient descent, noting that stochastic gradient descent uses a single randomly selected training example to approximate the gradient, making it suitable for large datasets.

Uploaded by

Vamsi Krishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views32 pages

Optimization23 22

Uploaded by

Vamsi Krishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Class 22

Gradient Descent in
ML
SLIDES FROM ANDREW NG
Let’s Learn a bit of ML
Linear Regression
Start with an example f i e d
i m pli
Predicting Housing prices in Hyderabad
l y s
s s
Gro ple
Size(x) Price(y) x a m
In Sqft INR in
We are given with some data in the form
of a Table with
e
L Notation:
x as size of houses in sft and
1100 199 y as the price in INR in Lakhs = Number of examples
= Input variable
1400 245
Task of ML is to learn the hidden = Output variable
1425 319 relationship between x, y and many (, ) = One training example
1550 240 other hidden features and to be able to
predict correctly. (, ) = training example
1600 312
1700 279
1700 310 If I want to buy a property of 2000 Sft, how much is the price?
1875 380
Given this data, a friend has a house 750 sft - how much can she be
2350 405 expected to get?"
2450 540
Predicting Housing prices in Hyderabad

Size(x) Price(y Assume that x and y are related in a

In Sqft ) linear fashion
In INR
1100 199
1400 245 ML task is to find the best , and from the
1425 319 given table
1550 240
1600 312 What do we mean by the best , and
1700 279
1700 310
1875 380
Suppose the ML system tells that the best = 1 and = 0.2
2350 405
2450 540
𝑚
1 (𝑖) 2
Let’s Formulate it𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝜃 𝜃 1 2 𝑚 ∑ ( 𝑦 − ( 𝜃0 +𝜃1 𝑥 ) )
0
(𝑖)

𝑖=1

Size(x) Price(y For any given , and x Predicted Actual y Sq Error

In Sqft ) y
In INR For a we predict as 1100 221 199 484
1100 199
1400 281 245 1296
1400 245 For =1, and
For we predict 1425 286 319 1089
1425 319
1550 311 240
1550 240
1600 321 312 81
1600 312 Squared Error for each observation
1700 341 279
1700 279
Squared Error for first observation
1700 341 310 961
1700 310
1875 376 380 16
1875 380
Mean Square Error (taken over all 2350 471 405 4356
2350 405 observations) 2450 491 540
2450 540
Linear Regression  Unconstrained
Optimization
𝑚
1 (𝑖) 2
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐽 ( 𝜃 0 , 𝜃1 )=¿ ∑
2 𝑚 𝑖=1
( 𝑦 − ( 𝜃 0+𝜃 1 𝑥 ) ) ¿
(𝑖)
For Linear Regression, J is convex

SLIDES FROM ANDREW NG

Contour Plots

SLIDES FROM ANDREW NG

Graphical Interpretation
Price (₹)
in 100000’s

Size(x) Price(y Error 𝒚 =𝜽𝟎 + 𝜽𝟏 𝒙

In Sqft )
In INR 400

1100 199 Predicted

300
1400 245
1425 319 Actual
200
1550 240
1600 312 100

1700 279
1700 310
500 1000 1500 2000 2500
1875 380
2350 405
2450 540 Size in sft
Gradient descent for linear
regression