0% found this document useful (0 votes)

8 views51 pages

Lecture 5 - Linear Regression

The document outlines the details of Lecture 5 for CS2109S, focusing on Linear Regression and its applications in machine learning. It includes information about the upcoming midterm exam, covering topics up to Lecture 4, and provides a recap of key concepts in machine learning. The lecture also discusses various aspects of linear regression, including performance measures, gradient descent, and challenges in regression analysis.

Uploaded by

Runjia Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views51 pages

Lecture 5 - Linear Regression

Uploaded by

Runjia Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

CS2109S: Introduction to AI and Machine Learning

Lecture 5:
Linear Regression
13 February 2023

1
Announcement

2
Midterm (Confirmed)
• Date & Time: 5 March, 16:00 – 18:00
• Please come at 15:30. You are allowed into hall at 15:45
• Venue: MPSH 2A & 2B
• Format: Digital Assessment (Examplify)
• Materials: all topics covered before recess week until Lecture 4
• Cheatsheet: 1 x A4 paper, both sides
• Calculators: Not allowed. Examplify has a built-in calculator

More details will be announced later.

3
Materials

4
Recap
• Machine Learning
• What is ML? – machine that learns through data
• Types of Feedback: supervised, unsupervised, semi-supervised, reinforcement
• Supervised Learning
• Performance Measure
• Regression: mean squared error, mean absolute error
• Classification: correctness, accuracy, confusion matrix, precision, recall, F1
• Decision Trees
• Decision Tree Learning (DTL): greedy, top-down, recursive algorithm
• Entropy and Information Gain
• Different types of attributes: many values, differing costs, missing values
• Pruning: min-sample, max-depth
• Ensemble Methods: bagging, boosting

5
Outline
• Linear Regression
• Gradient Descent
• Gradient Descent Algorithm
• Linear Regression with Gradient Descent
• Variants of Gradient Descent
• Linear Regression: Challenges and Solutions
• Linear Regression with Many Attributes
• Dealing with Features of Different Scales
• Dealing with Non-Linear Relationship
• Normal Equation

6
Outline
• Linear Regression
• Gradient Descent
• Gradient Descent Algorithm
• Linear Regression with Gradient Descent
• Variants of Gradient Descent
• Linear Regression: Challenges and Solutions
• Linear Regression with Many Attributes
• Dealing with Features of Different Scales
• Dealing with Non-Linear Relationship
• Normal Equation

7
Regression
Housing price prediction
($M)
500

400

300

200

100

(sqft)
500 1000 1500 2000 2500

1150?
8
1
Linear Regression ℎ𝑤 𝑥 = 100 + 𝑥
2
1
ℎ𝑤 𝑥 = 100 + 𝑥
Housing price prediction 5
($M) 1
ℎ𝑤 𝑥 = 0 + 𝑥
500 5
400

300 ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥
200 Find 𝒘 that “fits the data well”!
What does this mean?
100

(sqft)
500 1000 1500 2000 2500

0? 1150? 2600?
9
Linear Regression: Measuring Fit
For a set of 𝑚 examples
𝑥 1 , 𝑦 (1) , … , 𝑥 𝑚 , 𝑦 (𝑚) ($M)
500
we can compute the average (mean)
400
squared error as follows.
300
𝑚 200
1 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ ℎ𝑤 (𝑥 𝑖 ) − 𝑦 (𝑖) 100
2𝑚
𝑖=1
(sqft)
Loss function 𝑦ො (𝑖) 500 1000 1500 2000 2500

Mathematical
convenience Want to minimize the loss/error!
10
Linear Regression: Measuring Fit
How do we know how to
For a set of 𝑚 examples position/rotate our line?

𝑥 1 , 𝑦 (1) , … , 𝑥 𝑚 , 𝑦 (𝑚) ($M)

500
we can compute the average (mean)
400
squared error as follows.
300
𝑚 200
1 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ ℎ𝑤 (𝑥 𝑖 ) − 𝑦 (𝑖) 100
2𝑚
𝑖=1
(sqft)
Loss function 𝑦ො (𝑖) 500 1000 1500 2000 2500

Mathematical
convenience Want to minimize the loss/error!
11
Linear Regression Naïve Approach: Enumerate all possible lines

ℎ𝑤 𝑥 = 0𝑥
Hypothesis: 𝑦 𝐽𝑀𝑆𝐸 (𝑤)
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1
12 + 22 + 32
=
6
14
=
6

12
Linear Regression Naïve Approach: Enumerate all possible lines

ℎ𝑤 𝑥 = 0.5𝑥
Hypothesis: 𝑦 𝐽𝑀𝑆𝐸 (𝑤)
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1
2
0.5−1 + 1−2 2 + 1.5−3 2
=
2×3
0.52 + 1 + 1.5 2
= 6
3.5
= 6
13
Linear Regression Naïve Approach: Enumerate all possible lines

ℎ𝑤 𝑥 = 1𝑥
Hypothesis: 𝑦 𝐽𝑀𝑆𝐸 (𝑤)
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1

14
Linear Regression Naïve Approach: Enumerate all possible lines

ℎ𝑤 𝑥 = 1.5𝑥
Hypothesis: 𝑦 𝐽𝑀𝑆𝐸 (𝑤)
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1

15
Linear Regression Naïve Approach: Enumerate all possible lines

ℎ𝑤 𝑥 = 2𝑥
Hypothesis: 𝑦 𝐽𝑀𝑆𝐸 (𝑤)
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1

16
Linear Regression Naïve Approach: Enumerate all possible lines

Select this line

17
Linear Regression Loss Landscape

Hypothesis: 𝑦 ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 (𝑤)

ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1

Want to get here!

Can we do better?
18
Linear Regression Better Approach

Hypothesis: 𝑦 ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 (𝑤)

ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1
𝑤1 ← 𝑤1 + 𝑐 𝑤1 ← 𝑤1 − 𝑐
← 0.5 + 𝑐 ← 1.5 − 𝑐
How do we get
the appropriate c?
19
Linear Regression Better Approach

Hypothesis: 𝑦 ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 (𝑤)

ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1
𝑚 𝑤1 ← 𝑤1 + 𝑐 𝑤1 ← 𝑤1 − 𝑐
𝜕𝐽𝑀𝑆𝐸 𝑤 1 𝑖
← 0.5 + 𝑐 ← 1.5 − 𝑐
− = − ෍(𝑤1 𝑥 − 𝑦 𝑖 )𝑥 (𝑖)
𝜕𝑤 𝑚 𝜕𝐽𝑀𝑆𝐸 𝑤1
𝑖=1
= − 0.5 × 1 − 1 × 1 = +0.5 +𝑐 𝑤1 ← 𝑤1 −
𝜕𝑤1 20
= −(1.5 × 1 − 1) × 1 = −0.5 −𝑐
Outline
• Linear Regression
• Gradient Descent
• Gradient Descent Algorithm
• Linear Regression with Gradient Descent
• Variants of Gradient Descent
• Linear Regression: Challenges and Solutions
• Linear Regression with Many Attributes
• Dealing with Features of Different Scales
• Dealing with Non-Linear Relationship
• Normal Equation

21
Gradient Descent Remember Hill-climbing?

• Start at some 𝑤
• Pick a nearby 𝑤 that reduces 𝐽 𝑤

𝜕𝐽 𝑤0 , 𝑤1 , …
𝑤𝑗 ← 𝑤𝑗 − 𝛾
Learning Rate
𝜕𝑤𝑗
• Repeat until minimum is reached

22
Gradient Descent: 1 Parameter
𝐽𝑀𝑆𝐸 (𝑤)
• Start at some 𝑤0
• Pick a nearby 𝑤0 that reduces 𝐽 𝑤0 15

10
𝜕𝐽 𝑤0
𝑤0 ← 𝑤0 − 𝛾
Learning Rate
𝜕𝑤0 5
• Repeat until minimum is reached
𝑤0
0.5 1 1.5 2
As it gets closer to a minimum,
• The gradient becomes smaller
• The steps becomes smaller

23
Gradient Descent: 1 Parameter
𝐽𝑀𝑆𝐸 (𝑤)
• Start at some 𝑤0
• Pick a nearby 𝑤0 that reduces 𝐽 𝑤0 15

10
𝜕𝐽 𝑤0
𝑤0 ← 𝑤0 − 𝛾
Learning Rate
𝜕𝑤0 5
• Repeat until minimum is reached
𝑤0
0.5 1 1.5 2
𝜸 too large

24
Gradient Descent: 1 Parameter
𝐽𝑀𝑆𝐸 (𝑤)
• Start at some 𝑤0
• Pick a nearby 𝑤0 that reduces 𝐽 𝑤0 15

10
𝜕𝐽 𝑤0
𝑤0 ← 𝑤0 − 𝛾
Learning Rate
𝜕𝑤0 5
• Repeat until minimum is reached
𝑤0
0.5 1 1.5 2
𝜸 too small

25
Gradient Descent: 2 Parameters
• Start at some 𝑤 = (𝑤0 , 𝑤1 )
• Pick a nearby 𝑤 that reduces 𝐽 𝑤 𝐽

𝜕𝐽 𝑤0 , 𝑤1
𝑤𝑗 ← 𝑤𝑗 − 𝛾
Learning Rate
𝜕𝑤𝑗
• Repeat until minimum is reached
𝑤0
𝑤1

26
Image Credit: https://www.researchgate.net/figure/The-2D-Caussian-function-of-Example-2-Example-1-Consider-the-following-strongly-convex_fig2_230787652
Gradient Descent
Common Mistakes
𝑤0 changed!
• Start at some 𝑤
𝜕𝐽 𝑤0 , 𝑤1
𝑤0 = 𝑤0 − 𝛾
• Pick a nearby 𝑤 that reduces 𝐽 𝑤 𝜕𝑤0
𝜕𝐽 𝑤0 , 𝑤1
𝑤1 = 𝑤1 − 𝛾
𝜕𝑤1
𝜕𝐽 𝑤0 , 𝑤1 , …
𝑤𝑗 ← 𝑤𝑗 − 𝛾
Learning Rate
𝜕𝑤𝑗
𝜕𝐽 𝑤0 , 𝑤1
• Repeat until minimum is reached 𝑎=
𝜕𝑤0
𝜕𝐽 𝑤0 , 𝑤1
𝑏=
𝜕𝑤1
𝑤0 = 𝑤0 − 𝛾𝑎
𝑤1 = 𝑤1 − 𝛾𝑏

27
Image Credit: https://python.plainenglish.io/logistic-regression-in-machine-learning-from-scratch-872b1fedd05b

Linear Regression with Gradient Descent

Hypothesis: Loss Function:
𝑚
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 1
𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1 = ෍( 𝑤0 +𝑤1 𝑥 (𝑖) − 𝑦 (𝑖) )2
2𝑚
𝑖=1
𝑚
𝜕𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1 𝜕 1
= ෍( 𝑤0 +𝑤1 𝑥 (𝑖) − 𝑦 (𝑖) )2
𝜕𝑤𝑗 𝜕𝑤𝑗 2𝑚
𝑖=1
𝑚
𝜕𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1 1
= ෍( 𝑤0 +𝑤1 𝑥 (𝑖) − 𝑦 (𝑖) )
𝜕𝑤0 𝑚
𝑖=1
𝑚
𝜕𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1 1 𝑖 𝑖 𝑖
= ෍ 𝑤0 +𝑤1 𝑥 −𝑦 . 𝑥1
𝜕𝑤1 𝑚
𝑖=1
Theorem: MSE loss function is convex for linear regression.
• One minimum, global minimum Can we use mean absolute error (MAE) for our J?
28
MAE is not fully differentiable (its derivative is undefined at 0)
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
29
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
30
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
31
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
32
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
33
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
34
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
35
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
36
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
37
Credit: Andrew Ng
Variants of Gradient Descent
𝑚
𝜕𝐽 𝑤0 , 𝑤1 , … 1 2
𝑤𝑗 ← 𝑤𝑗 − 𝛾 𝐽𝑀𝑆𝐸 (𝑤) = ෍ ℎ𝑤 (𝑥 𝑖 ) − 𝑦 (𝑖)
𝜕𝑤𝑗 2𝑚
𝑖=1

($M) ($M) ($M)

500 500 500

400 400 400

300 300 300

200 200 200

100 100 100

(sqft) (sqft) (sqft)

500 1000 1500 2000 2500 500 1000 1500 2000 2500 500 1000 1500 2000 2500

(Batch) Gradient Descent Mini-batch Gradient Descent Stochastic Gradient Descent (SGD)
• Consider all training examples • Consider a subset of training • Select one random data point at a time
examples at a time • Cheapest (Fastest) / iteration
• Cheaper (Faster) / iteration • More randomness, may escape local
• Randomness, may escape minima*
local minima* 38
Variants of Gradient Descent

Credit: analyticsvidhya.com 39
Escaping Local Minima / Plateaus on non-convex optimization

𝐽 𝐽

𝑤1 𝑤1
𝑤0 𝑤0

Batch Gradient Descent Stochastic/Mini-batch Gradient Descent

40
Outline
• Linear Regression
• Gradient Descent
• Gradient Descent Algorithm
• Linear Regression with Gradient Descent
• Variants of Gradient Descent
• Linear Regression: Challenges and Solutions
• Linear Regression with Many Attributes
• Dealing with Features of Different Scales
• Dealing with Non-Linear Relationship
• Normal Equation

41
Linear Regression with Many Attributes
𝑥0 𝑥1 𝑥2 𝑥3 𝑥4 𝑦 Hypothesis:
Bias Year # bedrooms # bathrooms Size (m2) Price ($) ℎ𝒘 𝑥 = 𝑤0 𝑥0 + 𝑤1 𝑥1 +𝑤2 𝑥2 +𝑤3 𝑥3 +𝑤4 𝑥4
1 2016 4 2 113 560,000
1 1998 3 2 102 739,000
1 1997 3 0 100 430,000 Hypothesis (for 𝑛 features):
1 2014 3 2 84 698,000 𝑤0 𝑇 𝑥0
𝑛
1 2016 3 0 112 688,888
1 1979 2 2 68 390,000 ℎ𝒘 𝑥 = ෍ 𝑤𝑗 𝑥𝑗 = 𝑤
…1
𝑥1 = 𝒘𝑇 𝑥
…
1 1969 2 1 53 250,000
1 1986 3 2 122 788,000
𝑗=0 𝑤𝑛 𝑥𝑛
1 1985 3 3 150 680,000
1 2009 3 2 90 828,000 Weight Update (for 𝑛 features):
HDB prices from SRX 𝜕𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1 , … , 𝑤𝑛
Notation: 𝑤𝑗 ← 𝑤𝑗 − 𝛾
𝜕𝑤𝑗
• 𝑛 = number of features 𝑚
1 𝑖
• 𝑥 (𝑖) = input features of the 𝑖-th training example 𝑤𝑗 ← 𝑤𝑗 − 𝛾 ෍ ℎ𝒘 (𝑥 (𝑖) ) − 𝑦 𝑖 ⋅ 𝑥𝑗
𝑚
(𝑖) 𝑖=1
• 𝑥𝑗 = value of feature 𝑗 in 𝑖-th training example

42
Dealing with Features of Different Scales
𝑥1 𝑥2 𝑦 Hypothesis: Weight Update:
1 𝑖
# bedrooms Size (m2) Price ($1K) ℎ𝒘 𝑥 = 𝑤0 𝑥0 + 𝑤1 𝑥1 +𝑤2 𝑥2 𝑤𝑗 ← 𝑤𝑗 − 𝛾 𝑚 σ𝑚
𝑖=1 ℎ𝒘 𝑥
𝑖
−𝑦 𝑖
⋅ 𝑥𝑗
4 113 560 = 0𝑥0 + 1𝑥1 +1𝑥2
3 102 739
Simplify: set 𝑤0 = 0 , 𝑤1 = 𝑤2 = 1 𝜕𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1 , … , 𝑤𝑛
3 100 430
𝜕𝑤𝑗
3 84 698
3 112 688 ℎ𝒘 𝑥 = 4 + 113 = 117
2 68 390 𝑤1 ← 1 − 1 117 − 560 ⋅ 4
2 53 250 ← 1 − 1 × −1,772
3 122 788 Δ𝐽 ← 1,773
3 150 680 𝐽
3 90 828
HDB prices from SRX
𝑤2 ← 1 − 1 117 − 560 ⋅ 113
← 1 − 1 × −50,059
How to fix this? ← 50,060
Mean normalization: 𝑤2 Other solution:
𝑥𝑖 − 𝜇𝑖
𝑥𝑖 ← Δ𝑤 Learning rate for each weight 𝛾𝑖
𝜎𝑖
std dev
Other methods of standardization also exists: 𝑤1
Min-max scaling, robust scaling, etc 43
-gradient
Dealing with Features of Different Scales

𝑤1 𝑤1
𝑤2 𝑤2
𝑤2 𝑤2

𝑤1 𝑤1

44
Image Credit: Andrew Ng
Dealing with Non-Linear Relationship

Exam
Score
(𝑦)

Anxiety (𝑥)
Which function?
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 +𝑤2 𝑥 2
Generally:
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑓1 +𝑤2 𝑓2 +𝑤3 𝑓3 + ⋯ + 𝑤𝑛 𝑓𝑛 Polynomial Regression

Need to scale this!

Transformed features:
𝑒. 𝑔. , 𝑓1 = 𝑥, 𝑓2 = 𝑥 2 45
Outline
• Linear Regression
• Gradient Descent
• Gradient Descent Algorithm
• Linear Regression with Gradient Descent
• Variants of Gradient Descent
• Linear Regression: Challenges and Solutions
• Linear Regression with Many Attributes
• Dealing with Features of Different Scales
• Dealing with Non-Linear Relationship
• Normal Equation

46
Normal Equation
1
(1)
𝑥1 (1)
𝑥𝑛 𝑤0 𝑦 (1) Goal: find 𝑤 that minimizes 𝐽𝑀𝑆𝐸
(2) (2)
𝑋= 1 𝑥1 … 𝑥𝑛 𝑤= 𝑤
(2)
…1 𝑌 = 𝑦… 𝜕𝐽𝑀𝑆𝐸 (𝑤)
1 ⋮ ⋮ 𝑤𝑛 Set =0
(𝑚) (𝑚) 𝑦 (𝑚) 𝜕𝑤 A bunch of math…
1 𝑥1 𝑥𝑛
ℎ𝑤 𝑋 = 𝑋𝑤 2𝑋 𝑇 𝑋𝑤 − 2𝑋 𝑇 𝑌 =0
Bias
2𝑋 𝑇 𝑋𝑤 = 2𝑋 𝑇 𝑌
𝐽 𝑋 𝑇 𝑋𝑤 = 𝑋 𝑇 𝑌
𝑤 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌
Assume
invertible

𝑤0 𝑤1

Zero gradient!
47
Gradient Descent vs Normal Equation
Gradient Descent Normal Equation
Need to choose 𝛾 Yes No
Iteration(s) Many None
Large number of features 𝑛? No problem Slow, 𝑋 𝑇 𝑋 −1
→ 𝑂(𝑛3 )
Feature scaling? May be necessary Not necessary
Constraints - 𝑋 𝑇 𝑋 needs to be invertible

48
Summary
• Linear Regression: fitting a line to data
• Gradient Descent
• Gradient Descent Algorithm: follow –gradient to reduce error
• Linear Regression with Gradient Descent: convex optimization, one minimum
• Variants of Gradient Descent: batch, mini-batch, stochastic
• Linear Regression: Challenges and Solutions
• Linear Regression with Many Attributes: ℎ𝑤 𝑥 = σ𝑛𝑗=0 𝑤𝑗 𝑥𝑗 = 𝑤 𝑇 𝑥
• Dealing with Features of Different Scales: normalize!
• Dealing with Non-Linear Relationship: transform features
• Normal Equation: analytically find the best parameters

49
Coming Up Next Week
• Logistic Regression
• Gradient Descent
• Multi-class classification
• Non-linear decision boundary
• (More) Performance Measure
• Receiver Operating Characteristic (ROC)
• Area under ROC (AUC)
• Model Evaluation & Selection
• Bias & Variance

50
To Do
• Lecture Training 5
• +100 Free EXP
• +50 Early bird bonus
• Problem Set 4
• Out today!

Chapter 1 by Ian Stewart Infographic
No ratings yet
Chapter 1 by Ian Stewart Infographic
1 page
Machine Learning Cheat Sheet
100% (1)
Machine Learning Cheat Sheet
211 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Aligarh Muslim University
0% (1)
Aligarh Muslim University
497 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
2. Linear_ Regression_SGD
No ratings yet
2. Linear_ Regression_SGD
71 pages
GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
linear regression
No ratings yet
linear regression
130 pages
CSE_412__Lab_Manual_3___Linear_Regression
No ratings yet
CSE_412__Lab_Manual_3___Linear_Regression
10 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Wk05 machine learning
No ratings yet
Wk05 machine learning
6 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
ML_Lec 4-introduction to regression
No ratings yet
ML_Lec 4-introduction to regression
65 pages
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
Foundations of Machine Learning - 3
No ratings yet
Foundations of Machine Learning - 3
38 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
ML L6 Linear Regresion
No ratings yet
ML L6 Linear Regresion
54 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Regression
No ratings yet
Regression
25 pages
Regression and Optimization in ML
No ratings yet
Regression and Optimization in ML
41 pages
M6 RegressionLinearModels v2
No ratings yet
M6 RegressionLinearModels v2
97 pages
Machine Learning Theory and Applications - 2024 - Vasques - Machine Learning Alg (1)
No ratings yet
Machine Learning Theory and Applications - 2024 - Vasques - Machine Learning Alg (1)
98 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Linear-Regression
No ratings yet
Linear-Regression
55 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
ML_Lec 5_Regression_Gradient Descent Least Square
No ratings yet
ML_Lec 5_Regression_Gradient Descent Least Square
59 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
EE708 Module 3A(1)
No ratings yet
EE708 Module 3A(1)
28 pages
Lect03 CSN382
No ratings yet
Lect03 CSN382
31 pages
Regression
No ratings yet
Regression
16 pages
Python Tutorial
No ratings yet
Python Tutorial
37 pages
Linear Regression
No ratings yet
Linear Regression
95 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
No ratings yet
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
25 pages
Linear Regression
No ratings yet
Linear Regression
54 pages
10 Linear Regression
No ratings yet
10 Linear Regression
61 pages
Regression PPT
No ratings yet
Regression PPT
21 pages
Week 04
No ratings yet
Week 04
101 pages
ML Cheatsheet PDF
100% (1)
ML Cheatsheet PDF
211 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
2 Simple Linear Regression
No ratings yet
2 Simple Linear Regression
22 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Lecture 6
No ratings yet
Lecture 6
29 pages
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Complexity Slides
No ratings yet
Complexity Slides
301 pages
Lecture 4 - Intro to Machine Learning and Decision Trees
No ratings yet
Lecture 4 - Intro to Machine Learning and Decision Trees
61 pages
Lecture 3 - Informed, Local, And Adversarial Search
No ratings yet
Lecture 3 - Informed, Local, And Adversarial Search
81 pages
CS2103 2
No ratings yet
CS2103 2
4 pages
Difference Between Network Typologies
No ratings yet
Difference Between Network Typologies
6 pages
QP-XII-Maths
No ratings yet
QP-XII-Maths
5 pages
Hrm Final Interview Ppt
No ratings yet
Hrm Final Interview Ppt
14 pages
Pallavi Aware: Online Worksheet (
No ratings yet
Pallavi Aware: Online Worksheet (
2 pages
IGDC0613 - IIM Udaipur 3
No ratings yet
IGDC0613 - IIM Udaipur 3
9 pages
Effect of Different Soil Types On Growth and Productivity OF RED KIDNEY BEANS (Phaseolus Vulgaris)
No ratings yet
Effect of Different Soil Types On Growth and Productivity OF RED KIDNEY BEANS (Phaseolus Vulgaris)
9 pages
Lecture #1 CSC-1101
No ratings yet
Lecture #1 CSC-1101
36 pages
Group Theory: Symmetry Operations
No ratings yet
Group Theory: Symmetry Operations
4 pages
Sample Test 3
No ratings yet
Sample Test 3
2 pages
Kolb Learning Styles Diagram
No ratings yet
Kolb Learning Styles Diagram
1 page
Floyd Warshall
No ratings yet
Floyd Warshall
6 pages
Strategy Marketing Plans and Small Organisations
No ratings yet
Strategy Marketing Plans and Small Organisations
119 pages
Kami Export - Niyati Naveen nair - DNA Profiling Lab
No ratings yet
Kami Export - Niyati Naveen nair - DNA Profiling Lab
5 pages
NetApp ONTAP 9.12.1 - Snaplock
No ratings yet
NetApp ONTAP 9.12.1 - Snaplock
12 pages
Grade IX Holiday Homework
No ratings yet
Grade IX Holiday Homework
3 pages
Poster Klasifikasi Filogeni Llumut
No ratings yet
Poster Klasifikasi Filogeni Llumut
1 page
Vernacular Term1
No ratings yet
Vernacular Term1
3 pages
Case Study:-: Various Electronics and Electrical Equipment's at Home
No ratings yet
Case Study:-: Various Electronics and Electrical Equipment's at Home
11 pages
ICH 10 2021 Not - For - Submission EN
No ratings yet
ICH 10 2021 Not - For - Submission EN
57 pages
Autonomic Receptors Atf
No ratings yet
Autonomic Receptors Atf
26 pages
worksheets in 3i
No ratings yet
worksheets in 3i
2 pages
Method Statement For Electrical Works
No ratings yet
Method Statement For Electrical Works
14 pages
Chemical bonds (Chemistry) Grade 7
No ratings yet
Chemical bonds (Chemistry) Grade 7
3 pages
Sifchain White Paper
No ratings yet
Sifchain White Paper
6 pages
Interpreting Affection A Comparative Analysis of Pedophilia Theme in Hafizs Poetry Translated by Bicknell Clarke Smith Bly and Lewisohn
No ratings yet
Interpreting Affection A Comparative Analysis of Pedophilia Theme in Hafizs Poetry Translated by Bicknell Clarke Smith Bly and Lewisohn
8 pages
BOOK ADAPTATIONS & THE FIDELITY ARGUMENT - FILM INQUIRY
No ratings yet
BOOK ADAPTATIONS & THE FIDELITY ARGUMENT - FILM INQUIRY
6 pages
Classification Test IV
No ratings yet
Classification Test IV
33 pages
Introduction To Art Appreciation: Lorven Jane B. Flores BSABE - 1A GEH 122
No ratings yet
Introduction To Art Appreciation: Lorven Jane B. Flores BSABE - 1A GEH 122
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 5 - Linear Regression

Uploaded by

Lecture 5 - Linear Regression

Uploaded by

CS2109S: Introduction to AI and Machine Learning

More details will be announced later.

𝑥 1 , 𝑦 (1) , … , 𝑥 𝑚 , 𝑦 (𝑚) ($M)

Select this line

Hypothesis: 𝑦 ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 (𝑤)

Want to get here!

Hypothesis: 𝑦 ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 (𝑤)

Hypothesis: 𝑦 ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 (𝑤)

Linear Regression with Gradient Descent

($M) ($M) ($M)

500 500 500

400 400 400

300 300 300

200 200 200

100 100 100

(sqft) (sqft) (sqft)

Batch Gradient Descent Stochastic/Mini-batch Gradient Descent

Need to scale this!

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.