0% found this document useful (0 votes)

8 views51 pages

Lecture 5 - Linear Regression

The document outlines the details of Lecture 5 for CS2109S, focusing on Linear Regression and its applications in machine learning. It includes information about the upcoming midterm exam, covering topics up to Lecture 4, and provides a recap of key concepts in machine learning. The lecture also discusses various aspects of linear regression, including performance measures, gradient descent, and challenges in regression analysis.

Uploaded by

Runjia Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views51 pages

Lecture 5 - Linear Regression

Uploaded by

Runjia Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

CS2109S: Introduction to AI and Machine Learning

Lecture 5:
Linear Regression
13 February 2023

1
Announcement

2
Midterm (Confirmed)
• Date & Time: 5 March, 16:00 – 18:00
• Please come at 15:30. You are allowed into hall at 15:45
• Venue: MPSH 2A & 2B
• Format: Digital Assessment (Examplify)
• Materials: all topics covered before recess week until Lecture 4
• Cheatsheet: 1 x A4 paper, both sides
• Calculators: Not allowed. Examplify has a built-in calculator

More details will be announced later.

3
Materials

4
Recap
• Machine Learning
• What is ML? – machine that learns through data
• Types of Feedback: supervised, unsupervised, semi-supervised, reinforcement
• Supervised Learning
• Performance Measure
• Regression: mean squared error, mean absolute error
• Classification: correctness, accuracy, confusion matrix, precision, recall, F1
• Decision Trees
• Decision Tree Learning (DTL): greedy, top-down, recursive algorithm
• Entropy and Information Gain
• Different types of attributes: many values, differing costs, missing values
• Pruning: min-sample, max-depth
• Ensemble Methods: bagging, boosting

5
Outline
• Linear Regression
• Gradient Descent
• Gradient Descent Algorithm
• Linear Regression with Gradient Descent
• Variants of Gradient Descent
• Linear Regression: Challenges and Solutions
• Linear Regression with Many Attributes
• Dealing with Features of Different Scales
• Dealing with Non-Linear Relationship
• Normal Equation

6
Outline
• Linear Regression
• Gradient Descent
• Gradient Descent Algorithm
• Linear Regression with Gradient Descent
• Variants of Gradient Descent
• Linear Regression: Challenges and Solutions
• Linear Regression with Many Attributes
• Dealing with Features of Different Scales
• Dealing with Non-Linear Relationship
• Normal Equation

7
Regression
Housing price prediction
($M)
500

400

300

200

100

(sqft)
500 1000 1500 2000 2500

1150?
8
1
Linear Regression ℎ𝑤 𝑥 = 100 + 𝑥
2
1
ℎ𝑤 𝑥 = 100 + 𝑥
Housing price prediction 5
($M) 1
ℎ𝑤 𝑥 = 0 + 𝑥
500 5
400

300 ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥
200 Find 𝒘 that “fits the data well”!
What does this mean?
100

(sqft)
500 1000 1500 2000 2500

0? 1150? 2600?
9
Linear Regression: Measuring Fit
For a set of 𝑚 examples
𝑥 1 , 𝑦 (1) , … , 𝑥 𝑚 , 𝑦 (𝑚) ($M)
500
we can compute the average (mean)
400
squared error as follows.
300
𝑚 200
1 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ ℎ𝑤 (𝑥 𝑖 ) − 𝑦 (𝑖) 100
2𝑚
𝑖=1
(sqft)
Loss function 𝑦ො (𝑖) 500 1000 1500 2000 2500

Mathematical
convenience Want to minimize the loss/error!
10
Linear Regression: Measuring Fit
How do we know how to
For a set of 𝑚 examples position/rotate our line?

𝑥 1 , 𝑦 (1) , … , 𝑥 𝑚 , 𝑦 (𝑚) ($M)

500
we can compute the average (mean)
400
squared error as follows.
300
𝑚 200
1 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ ℎ𝑤 (𝑥 𝑖 ) − 𝑦 (𝑖) 100
2𝑚
𝑖=1
(sqft)
Loss function 𝑦ො (𝑖) 500 1000 1500 2000 2500

Mathematical
convenience Want to minimize the loss/error!
11
Linear Regression Naïve Approach: Enumerate all possible lines

ℎ𝑤 𝑥 = 0𝑥
Hypothesis: 𝑦 𝐽𝑀𝑆𝐸 (𝑤)
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1
12 + 22 + 32
=
6
14
=
6

12
Linear Regression Naïve Approach: Enumerate all possible lines

ℎ𝑤 𝑥 = 0.5𝑥
Hypothesis: 𝑦 𝐽𝑀𝑆𝐸 (𝑤)
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1
2
0.5−1 + 1−2 2 + 1.5−3 2
=
2×3
0.52 + 1 + 1.5 2
= 6
3.5
= 6
13
Linear Regression Naïve Approach: Enumerate all possible lines

ℎ𝑤 𝑥 = 1𝑥
Hypothesis: 𝑦 𝐽𝑀𝑆𝐸 (𝑤)
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1

14
Linear Regression Naïve Approach: Enumerate all possible lines

ℎ𝑤 𝑥 = 1.5𝑥
Hypothesis: 𝑦 𝐽𝑀𝑆𝐸 (𝑤)
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1

15
Linear Regression Naïve Approach: Enumerate all possible lines

ℎ𝑤 𝑥 = 2𝑥
Hypothesis: 𝑦 𝐽𝑀𝑆𝐸 (𝑤)
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1

16
Linear Regression Naïve Approach: Enumerate all possible lines

Select this line

17
Linear Regression Loss Landscape

Hypothesis: 𝑦 ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 (𝑤)

ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1

Want to get here!

Can we do better?
18
Linear Regression Better Approach

Hypothesis: 𝑦 ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 (𝑤)

ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1
𝑤1 ← 𝑤1 + 𝑐 𝑤1 ← 𝑤1 − 𝑐
← 0.5 + 𝑐 ← 1.5 − 𝑐
How do we get
the appropriate c?
19
Linear Regression Better Approach

Hypothesis: 𝑦 ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 (𝑤)

ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 3 15
ℎ𝑤 𝑥 = 0 + 𝑤1 𝑥 2 6
10
Simplify: fix 𝑤0 = 0 for 1
easier visualization 6
𝑥
1 2 3 5
Loss Function: 6
𝑚
1 𝑖 (𝑖) 2
𝐽𝑀𝑆𝐸 (𝑤) = ෍ 𝑤1 𝑥 −𝑦 𝑤1
2𝑚 0.5 1 1.5 2
𝑖=1
𝑚 𝑤1 ← 𝑤1 + 𝑐 𝑤1 ← 𝑤1 − 𝑐
𝜕𝐽𝑀𝑆𝐸 𝑤 1 𝑖
← 0.5 + 𝑐 ← 1.5 − 𝑐
− = − ෍(𝑤1 𝑥 − 𝑦 𝑖 )𝑥 (𝑖)
𝜕𝑤 𝑚 𝜕𝐽𝑀𝑆𝐸 𝑤1
𝑖=1
= − 0.5 × 1 − 1 × 1 = +0.5 +𝑐 𝑤1 ← 𝑤1 −
𝜕𝑤1 20
= −(1.5 × 1 − 1) × 1 = −0.5 −𝑐
Outline
• Linear Regression
• Gradient Descent
• Gradient Descent Algorithm
• Linear Regression with Gradient Descent
• Variants of Gradient Descent
• Linear Regression: Challenges and Solutions
• Linear Regression with Many Attributes
• Dealing with Features of Different Scales
• Dealing with Non-Linear Relationship
• Normal Equation

21
Gradient Descent Remember Hill-climbing?

• Start at some 𝑤
• Pick a nearby 𝑤 that reduces 𝐽 𝑤

𝜕𝐽 𝑤0 , 𝑤1 , …
𝑤𝑗 ← 𝑤𝑗 − 𝛾
Learning Rate
𝜕𝑤𝑗
• Repeat until minimum is reached

22
Gradient Descent: 1 Parameter
𝐽𝑀𝑆𝐸 (𝑤)
• Start at some 𝑤0
• Pick a nearby 𝑤0 that reduces 𝐽 𝑤0 15

10
𝜕𝐽 𝑤0
𝑤0 ← 𝑤0 − 𝛾
Learning Rate
𝜕𝑤0 5
• Repeat until minimum is reached
𝑤0
0.5 1 1.5 2
As it gets closer to a minimum,
• The gradient becomes smaller
• The steps becomes smaller

23
Gradient Descent: 1 Parameter
𝐽𝑀𝑆𝐸 (𝑤)
• Start at some 𝑤0
• Pick a nearby 𝑤0 that reduces 𝐽 𝑤0 15

10
𝜕𝐽 𝑤0
𝑤0 ← 𝑤0 − 𝛾
Learning Rate
𝜕𝑤0 5
• Repeat until minimum is reached
𝑤0
0.5 1 1.5 2
𝜸 too large

24
Gradient Descent: 1 Parameter
𝐽𝑀𝑆𝐸 (𝑤)
• Start at some 𝑤0
• Pick a nearby 𝑤0 that reduces 𝐽 𝑤0 15

10
𝜕𝐽 𝑤0
𝑤0 ← 𝑤0 − 𝛾
Learning Rate
𝜕𝑤0 5
• Repeat until minimum is reached
𝑤0
0.5 1 1.5 2
𝜸 too small

25
Gradient Descent: 2 Parameters
• Start at some 𝑤 = (𝑤0 , 𝑤1 )
• Pick a nearby 𝑤 that reduces 𝐽 𝑤 𝐽

𝜕𝐽 𝑤0 , 𝑤1
𝑤𝑗 ← 𝑤𝑗 − 𝛾
Learning Rate
𝜕𝑤𝑗
• Repeat until minimum is reached
𝑤0
𝑤1

26
Image Credit: https://www.researchgate.net/figure/The-2D-Caussian-function-of-Example-2-Example-1-Consider-the-following-strongly-convex_fig2_230787652
Gradient Descent
Common Mistakes
𝑤0 changed!
• Start at some 𝑤
𝜕𝐽 𝑤0 , 𝑤1
𝑤0 = 𝑤0 − 𝛾
• Pick a nearby 𝑤 that reduces 𝐽 𝑤 𝜕𝑤0
𝜕𝐽 𝑤0 , 𝑤1
𝑤1 = 𝑤1 − 𝛾
𝜕𝑤1
𝜕𝐽 𝑤0 , 𝑤1 , …
𝑤𝑗 ← 𝑤𝑗 − 𝛾
Learning Rate
𝜕𝑤𝑗
𝜕𝐽 𝑤0 , 𝑤1
• Repeat until minimum is reached 𝑎=
𝜕𝑤0
𝜕𝐽 𝑤0 , 𝑤1
𝑏=
𝜕𝑤1
𝑤0 = 𝑤0 − 𝛾𝑎
𝑤1 = 𝑤1 − 𝛾𝑏

27
Image Credit: https://python.plainenglish.io/logistic-regression-in-machine-learning-from-scratch-872b1fedd05b

Linear Regression with Gradient Descent

Hypothesis: Loss Function:
𝑚
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 1
𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1 = ෍( 𝑤0 +𝑤1 𝑥 (𝑖) − 𝑦 (𝑖) )2
2𝑚
𝑖=1
𝑚
𝜕𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1 𝜕 1
= ෍( 𝑤0 +𝑤1 𝑥 (𝑖) − 𝑦 (𝑖) )2
𝜕𝑤𝑗 𝜕𝑤𝑗 2𝑚
𝑖=1
𝑚
𝜕𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1 1
= ෍( 𝑤0 +𝑤1 𝑥 (𝑖) − 𝑦 (𝑖) )
𝜕𝑤0 𝑚
𝑖=1
𝑚
𝜕𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1 1 𝑖 𝑖 𝑖
= ෍ 𝑤0 +𝑤1 𝑥 −𝑦 . 𝑥1
𝜕𝑤1 𝑚
𝑖=1
Theorem: MSE loss function is convex for linear regression.
• One minimum, global minimum Can we use mean absolute error (MAE) for our J?
28
MAE is not fully differentiable (its derivative is undefined at 0)
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
29
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
30
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
31
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
32
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
33
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
34
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
35
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
36
Credit: Andrew Ng
Linear Regression with Gradient Descent
ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1

𝑤1

𝑤0
37
Credit: Andrew Ng
Variants of Gradient Descent
𝑚
𝜕𝐽 𝑤0 , 𝑤1 , … 1 2
𝑤𝑗 ← 𝑤𝑗 − 𝛾 𝐽𝑀𝑆𝐸 (𝑤) = ෍ ℎ𝑤 (𝑥 𝑖 ) − 𝑦 (𝑖)
𝜕𝑤𝑗 2𝑚
𝑖=1

($M) ($M) ($M)

500 500 500

400 400 400

300 300 300

200 200 200

100 100 100

(sqft) (sqft) (sqft)

500 1000 1500 2000 2500 500 1000 1500 2000 2500 500 1000 1500 2000 2500

(Batch) Gradient Descent Mini-batch Gradient Descent Stochastic Gradient Descent (SGD)
• Consider all training examples • Consider a subset of training • Select one random data point at a time
examples at a time • Cheapest (Fastest) / iteration
• Cheaper (Faster) / iteration • More randomness, may escape local
• Randomness, may escape minima*
local minima* 38
Variants of Gradient Descent

Credit: analyticsvidhya.com 39
Escaping Local Minima / Plateaus on non-convex optimization

𝐽 𝐽

𝑤1 𝑤1
𝑤0 𝑤0

Batch Gradient Descent Stochastic/Mini-batch Gradient Descent

40
Outline
• Linear Regression
• Gradient Descent
• Gradient Descent Algorithm
• Linear Regression with Gradient Descent
• Variants of Gradient Descent
• Linear Regression: Challenges and Solutions
• Linear Regression with Many Attributes
• Dealing with Features of Different Scales
• Dealing with Non-Linear Relationship
• Normal Equation

41
Linear Regression with Many Attributes
𝑥0 𝑥1 𝑥2 𝑥3 𝑥4 𝑦 Hypothesis:
Bias Year # bedrooms # bathrooms Size (m2) Price ($) ℎ𝒘 𝑥 = 𝑤0 𝑥0 + 𝑤1 𝑥1 +𝑤2 𝑥2 +𝑤3 𝑥3 +𝑤4 𝑥4
1 2016 4 2 113 560,000
1 1998 3 2 102 739,000
1 1997 3 0 100 430,000 Hypothesis (for 𝑛 features):
1 2014 3 2 84 698,000 𝑤0 𝑇 𝑥0
𝑛
1 2016 3 0 112 688,888
1 1979 2 2 68 390,000 ℎ𝒘 𝑥 = ෍ 𝑤𝑗 𝑥𝑗 = 𝑤
…1
𝑥1 = 𝒘𝑇 𝑥
…
1 1969 2 1 53 250,000
1 1986 3 2 122 788,000
𝑗=0 𝑤𝑛 𝑥𝑛
1 1985 3 3 150 680,000
1 2009 3 2 90 828,000 Weight Update (for 𝑛 features):
HDB prices from SRX 𝜕𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1 , … , 𝑤𝑛
Notation: 𝑤𝑗 ← 𝑤𝑗 − 𝛾
𝜕𝑤𝑗
• 𝑛 = number of features 𝑚
1 𝑖
• 𝑥 (𝑖) = input features of the 𝑖-th training example 𝑤𝑗 ← 𝑤𝑗 − 𝛾 ෍ ℎ𝒘 (𝑥 (𝑖) ) − 𝑦 𝑖 ⋅ 𝑥𝑗
𝑚
(𝑖) 𝑖=1
• 𝑥𝑗 = value of feature 𝑗 in 𝑖-th training example

42
Dealing with Features of Different Scales
𝑥1 𝑥2 𝑦 Hypothesis: Weight Update:
1 𝑖
# bedrooms Size (m2) Price ($1K) ℎ𝒘 𝑥 = 𝑤0 𝑥0 + 𝑤1 𝑥1 +𝑤2 𝑥2 𝑤𝑗 ← 𝑤𝑗 − 𝛾 𝑚 σ𝑚
𝑖=1 ℎ𝒘 𝑥
𝑖
−𝑦 𝑖
⋅ 𝑥𝑗
4 113 560 = 0𝑥0 + 1𝑥1 +1𝑥2
3 102 739
Simplify: set 𝑤0 = 0 , 𝑤1 = 𝑤2 = 1 𝜕𝐽𝑀𝑆𝐸 𝑤0 , 𝑤1 , … , 𝑤𝑛
3 100 430
𝜕𝑤𝑗
3 84 698
3 112 688 ℎ𝒘 𝑥 = 4 + 113 = 117
2 68 390 𝑤1 ← 1 − 1 117 − 560 ⋅ 4
2 53 250 ← 1 − 1 × −1,772
3 122 788 Δ𝐽 ← 1,773
3 150 680 𝐽
3 90 828
HDB prices from SRX
𝑤2 ← 1 − 1 117 − 560 ⋅ 113
← 1 − 1 × −50,059
How to fix this? ← 50,060
Mean normalization: 𝑤2 Other solution:
𝑥𝑖 − 𝜇𝑖
𝑥𝑖 ← Δ𝑤 Learning rate for each weight 𝛾𝑖
𝜎𝑖
std dev
Other methods of standardization also exists: 𝑤1
Min-max scaling, robust scaling, etc 43
-gradient
Dealing with Features of Different Scales

𝑤1 𝑤1
𝑤2 𝑤2
𝑤2 𝑤2

𝑤1 𝑤1

44
Image Credit: Andrew Ng
Dealing with Non-Linear Relationship

Exam
Score
(𝑦)

Anxiety (𝑥)
Which function?
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑥 +𝑤2 𝑥 2
Generally:
ℎ𝑤 𝑥 = 𝑤0 + 𝑤1 𝑓1 +𝑤2 𝑓2 +𝑤3 𝑓3 + ⋯ + 𝑤𝑛 𝑓𝑛 Polynomial Regression

Need to scale this!

Transformed features:
𝑒. 𝑔. , 𝑓1 = 𝑥, 𝑓2 = 𝑥 2 45
Outline
• Linear Regression
• Gradient Descent
• Gradient Descent Algorithm
• Linear Regression with Gradient Descent
• Variants of Gradient Descent
• Linear Regression: Challenges and Solutions
• Linear Regression with Many Attributes
• Dealing with Features of Different Scales
• Dealing with Non-Linear Relationship
• Normal Equation

46
Normal Equation
1
(1)
𝑥1 (1)
𝑥𝑛 𝑤0 𝑦 (1) Goal: find 𝑤 that minimizes 𝐽𝑀𝑆𝐸
(2) (2)
𝑋= 1 𝑥1 … 𝑥𝑛 𝑤= 𝑤
(2)
…1 𝑌 = 𝑦… 𝜕𝐽𝑀𝑆𝐸 (𝑤)
1 ⋮ ⋮ 𝑤𝑛 Set =0
(𝑚) (𝑚) 𝑦 (𝑚) 𝜕𝑤 A bunch of math…
1 𝑥1 𝑥𝑛
ℎ𝑤 𝑋 = 𝑋𝑤 2𝑋 𝑇 𝑋𝑤 − 2𝑋 𝑇 𝑌 =0
Bias
2𝑋 𝑇 𝑋𝑤 = 2𝑋 𝑇 𝑌
𝐽 𝑋 𝑇 𝑋𝑤 = 𝑋 𝑇 𝑌
𝑤 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌
Assume
invertible

𝑤0 𝑤1

Zero gradient!
47
Gradient Descent vs Normal Equation
Gradient Descent Normal Equation
Need to choose 𝛾 Yes No
Iteration(s) Many None
Large number of features 𝑛? No problem Slow, 𝑋 𝑇 𝑋 −1
→ 𝑂(𝑛3 )
Feature scaling? May be necessary Not necessary
Constraints - 𝑋 𝑇 𝑋 needs to be invertible

48
Summary
• Linear Regression: fitting a line to data
• Gradient Descent
• Gradient Descent Algorithm: follow –gradient to reduce error
• Linear Regression with Gradient Descent: convex optimization, one minimum
• Variants of Gradient Descent: batch, mini-batch, stochastic
• Linear Regression: Challenges and Solutions
• Linear Regression with Many Attributes: ℎ𝑤 𝑥 = σ𝑛𝑗=0 𝑤𝑗 𝑥𝑗 = 𝑤 𝑇 𝑥
• Dealing with Features of Different Scales: normalize!
• Dealing with Non-Linear Relationship: transform features
• Normal Equation: analytically find the best parameters

49
Coming Up Next Week
• Logistic Regression
• Gradient Descent
• Multi-class classification
• Non-linear decision boundary
• (More) Performance Measure
• Receiver Operating Characteristic (ROC)
• Area under ROC (AUC)
• Model Evaluation & Selection
• Bias & Variance

50
To Do
• Lecture Training 5
• +100 Free EXP
• +50 Early bird bonus
• Problem Set 4
• Out today!

TBox MS-32-S2_3.05-1
No ratings yet
TBox MS-32-S2_3.05-1
8 pages
Database Design and Programming 20210071
No ratings yet
Database Design and Programming 20210071
440 pages
Complexity Slides
No ratings yet
Complexity Slides
301 pages
M6 RegressionLinearModels v2
No ratings yet
M6 RegressionLinearModels v2
97 pages
ML Cheatsheet PDF
100% (1)
ML Cheatsheet PDF
211 pages
Lecture 6
No ratings yet
Lecture 6
29 pages
2022-10-13 Lecture 3
No ratings yet
2022-10-13 Lecture 3
35 pages
linear regression
No ratings yet
linear regression
130 pages
Lecture 3 - Informed, Local, And Adversarial Search
No ratings yet
Lecture 3 - Informed, Local, And Adversarial Search
81 pages
Accessmax Release 9.0 Etsi General Release Description: March 31, 2004
No ratings yet
Accessmax Release 9.0 Etsi General Release Description: March 31, 2004
24 pages
Lect03 CSN382
No ratings yet
Lect03 CSN382
31 pages
Linear-Regression
No ratings yet
Linear-Regression
55 pages
Linear Regression
No ratings yet
Linear Regression
95 pages
Aixcmds2 PDF
No ratings yet
Aixcmds2 PDF
778 pages
ML_Lec 4-introduction to regression
No ratings yet
ML_Lec 4-introduction to regression
65 pages
Introduce of Wisenet Viewer - 220124
No ratings yet
Introduce of Wisenet Viewer - 220124
12 pages
ML_Lec 5_Regression_Gradient Descent Least Square
No ratings yet
ML_Lec 5_Regression_Gradient Descent Least Square
59 pages
Foundations of Machine Learning - 3
No ratings yet
Foundations of Machine Learning - 3
38 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
No ratings yet
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
25 pages
Stephanie Wu - Resume
No ratings yet
Stephanie Wu - Resume
1 page
10 Linear Regression
No ratings yet
10 Linear Regression
61 pages
EE708 Module 3A(1)
No ratings yet
EE708 Module 3A(1)
28 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
Introduction To Computers
No ratings yet
Introduction To Computers
92 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Week 04
No ratings yet
Week 04
101 pages
Numerical Methods II Interpolation and Approximation
No ratings yet
Numerical Methods II Interpolation and Approximation
9 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Case Study NIST Cybersecurity
No ratings yet
Case Study NIST Cybersecurity
1 page
Python Tutorial
No ratings yet
Python Tutorial
37 pages
CS2103 2
No ratings yet
CS2103 2
4 pages
ML L6 Linear Regresion
No ratings yet
ML L6 Linear Regresion
54 pages
Regression and Optimization in ML
No ratings yet
Regression and Optimization in ML
41 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
VFM - Manual 1-1Z301113040
No ratings yet
VFM - Manual 1-1Z301113040
23 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
Aiman Adam 2
No ratings yet
Aiman Adam 2
2 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
2. Linear_ Regression_SGD
No ratings yet
2. Linear_ Regression_SGD
71 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Lecture 4 - Intro to Machine Learning and Decision Trees
No ratings yet
Lecture 4 - Intro to Machine Learning and Decision Trees
61 pages
Array Methods
No ratings yet
Array Methods
22 pages
Function of Production Planning and Control
No ratings yet
Function of Production Planning and Control
3 pages
Chapter 4 Exception Handling (1)
No ratings yet
Chapter 4 Exception Handling (1)
6 pages
Statement of Account
No ratings yet
Statement of Account
5 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
TradingView Instructions - Videos
No ratings yet
TradingView Instructions - Videos
4 pages
Machine Learning Theory and Applications - 2024 - Vasques - Machine Learning Alg (1)
No ratings yet
Machine Learning Theory and Applications - 2024 - Vasques - Machine Learning Alg (1)
98 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Regression PPT
No ratings yet
Regression PPT
21 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
Linear Regression
No ratings yet
Linear Regression
54 pages
2021 API Summit Virtual Agenda 5 Day Web
No ratings yet
2021 API Summit Virtual Agenda 5 Day Web
9 pages
Regression
No ratings yet
Regression
16 pages
Unit-5 ML Notes
No ratings yet
Unit-5 ML Notes
72 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Digital Privacy in the Age of Surveillance – Are We Truly Safe Online?
No ratings yet
Digital Privacy in the Age of Surveillance – Are We Truly Safe Online?
4 pages
Code to Multiply Mtn Credit - Google Search
No ratings yet
Code to Multiply Mtn Credit - Google Search
1 page
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
Regression
No ratings yet
Regression
25 pages
GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
2 Simple Linear Regression
No ratings yet
2 Simple Linear Regression
22 pages
CSE_412__Lab_Manual_3___Linear_Regression
No ratings yet
CSE_412__Lab_Manual_3___Linear_Regression
10 pages
2015 Honeywell RMshell LR
No ratings yet
2015 Honeywell RMshell LR
8 pages
Admit Card
No ratings yet
Admit Card
2 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Upgrade
No ratings yet
Upgrade
11 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Form 2 Chapter 9 Heat
No ratings yet
Form 2 Chapter 9 Heat
46 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Lawn Tractor 917.28726 Operators Manual
No ratings yet
Lawn Tractor 917.28726 Operators Manual
64 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
Machine Learning Cheat Sheet
100% (1)
Machine Learning Cheat Sheet
211 pages
eb600_1_compressed
No ratings yet
eb600_1_compressed
3 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
Diary Client List-2023 Final
No ratings yet
Diary Client List-2023 Final
24 pages
HFG2 0 High Force Gas Valve
No ratings yet
HFG2 0 High Force Gas Valve
2 pages
Wk05 machine learning
No ratings yet
Wk05 machine learning
6 pages
Defence Untouched Data 222
No ratings yet
Defence Untouched Data 222
48 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 5 - Linear Regression

Uploaded by

Lecture 5 - Linear Regression

Uploaded by

CS2109S: Introduction to AI and Machine Learning

More details will be announced later.

𝑥 1 , 𝑦 (1) , … , 𝑥 𝑚 , 𝑦 (𝑚) ($M)

Select this line

Hypothesis: 𝑦 ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 (𝑤)

Want to get here!

Hypothesis: 𝑦 ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 (𝑤)

Hypothesis: 𝑦 ℎ𝑤 𝑥 𝐽𝑀𝑆𝐸 (𝑤)

Linear Regression with Gradient Descent

($M) ($M) ($M)

500 500 500

400 400 400

300 300 300

200 200 200

100 100 100

(sqft) (sqft) (sqft)

Batch Gradient Descent Stochastic/Mini-batch Gradient Descent

Need to scale this!

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.