0% found this document useful (0 votes)

38 views38 pages

Week 4 Linear Regression

Introduction to Applied Machine Learning

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views38 pages

Week 4 Linear Regression

Introduction to Applied Machine Learning

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

IAML: Linear Regression

Nigel Goddard
School of Informatics

Semester 1

1 / 38
Overview

I The linear model

I Fitting the linear model to data
I Probabilistic interpretation of the error function
I Examples of regression problems
I Dealing with multiple outputs
I Generalized linear regression
I Radial basis function (RBF) models

2 / 38
The Regression Problem

I Classification and regression problems:

I Classification: target of prediction is discrete
I Regression: target of prediction is continuous

I Training data: Set D of pairs (xi , yi ) for i = 1, . . . , n, where

xi ∈ RD and yi ∈ R
I Today: Linear regression, i.e., relationship between x and
y is linear.
I Although this is simple (and limited) it is:
I More powerful than you would expect
I The basis for more complex nonlinear methods
I Teaches a lot about regression and classification

3 / 38
Examples of regression problems

I Robot inverse dynamics: predicting what torques are

needed to drive a robot arm along a given trajectory
I Electricity load forecasting, generate hourly forecasts two
days in advance.
I Predicting staffing requirements at help desks based on
historical data and product and sales information,
I Predicting the time to failure of equipment based on
utilization and environmental conditions

4 / 38
The Linear Model

I Linear model

f (x; w) = w0 + w1 x1 + . . . + wD xD
= φ(x)w

where φ(x) = (1, x1 , . . . , xD ) = (1, xT )

and  
w0
 w1 
w= 
 ...  (1)
wD
I The maths of fitting linear models to data is easy. We use
the notation φ(x) to make generalisation easy later.

5 / 38
Toy example: Data
4

● ●
3

●
●
●
●●
2

●
● ●
● ●
● ●
●
y

●
● ●●
0

●
●
●
−1

● ●
−2

−3 −2 −1 0 1 2 3

6 / 38
Toy example: Data
4

4
● ● ● ●
3

3
● ●
● ●
● ●
●● ●●
2

2
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
y

1
● ●

● ●
● ●● ● ●●
0

0
● ●
● ●
● ●
−1

−1
● ● ● ●
−2

−2
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

x x

7 / 38
With two features

•
• • ••
•
•• •
• • •• •• • • •
• • • • •• •
•
• •
•• • • • • •• •• • •
•
•• • •
• • X2
• •
•
X1

FIGURE 3.1. Linear least squares fitting with

Instead X of ∈a IRline,
2
. We a plane. With
seek the more
linear features,
function a hyperplane.
of X that mini-
mizes the sum of
Figure: Hastie, Tibshirani, and Friedman
squared residuals from Y .
8 / 38
With more features

CPU Performance Data Set

I Predict: PRP: published relative performance
I MYCT: machine cycle time in nanoseconds (integer)
I MMIN: minimum main memory in kilobytes (integer)
I MMAX: maximum main memory in kilobytes (integer)
I CACH: cache memory in kilobytes (integer)
I CHMIN: minimum channels in units (integer)
I CHMAX: maximum channels in units (integer)

9 / 38
With more features

PRP = - 56.1
+ 0.049 MYCT
+ 0.015 MMIN
+ 0.006 MMAX
+ 0.630 CACH
- 0.270 CHMIN
+ 1.46 CHMAX

10 / 38
In matrix notation

I Design matrix is n × (D + 1)
 
1 x11 x12 . . . x1D
 1 x21 x22 . . . x2D 
 
Φ= . .. .. .. .. 
 .. . . . . 
1 xn1 xn2 . . . xnD

I xij is the jth component of the training input xi

I Let y = (y1 , . . . , yn )T
I Then ŷ = Φw is ...?

11 / 38
Linear Algebra: The 1-Slide Version
What is matrix multiplication?
   
a11 a12 a13 b1
A = a21 a22 a23  , b = b2 
a31 a32 a33 b3

First consider matrix times vector, i.e., Ab. Two answers:

1. Ab is a linear combination of the columns of A
     
a11 a12 a13
Ab = b1 a21  + b2 a22  + b3 a23 
a31 a32 a33
2. Ab is a vector. Each element of the vector is the dot
products between b and one row of A.
 
(a11 , a12 , a13 )b
Ab = (a21 , a22 , a23 )b
(a31 , a32 , a33 )b
12 / 38
Linear model (part 2)

In matrix notation:

I Design matrix is n × (D + 1)
 
1 x11 x12 . . . x1D
 1 x21 x22 . . . x2D 
 
Φ= . .. .. .. .. 
 .. . . . . 
1 xn1 xn2 . . . xnD

I xij is the jth component of the training input xi

I Let y = (y1 , . . . , yn )T
I Then ŷ = Φw is the model’s predicted values on training
inputs.

13 / 38
Solving for Model Parameters

This looks like what we’ve seen in linear algebra

y = Φw

We know y and Φ but not w.

So why not take w = Φ−1 y? (You can’t, but why?)

14 / 38
Solving for Model Parameters

This looks like what we’ve seen in linear algebra

y = Φw

We know y and Φ but not w.

So why not take w = Φ−1 y? (You can’t, but why?)

Three reasons:
I Φ is not square. It is n × (D + 1).
I The system is overconstrained (n equations for D + 1
parameters), in other words
I The data has noise

15 / 38
Loss function

Want a loss function O(w) that

I We minimize wrt w.
I At minimum, ŷ looks like y.
I (Recall: ŷ depends on w)

ŷ = Φw

16 / 38
Fitting a linear model to data

Y
I A common choice: squared error
(makes the maths easy)
•
• • •• n
X
•
•• • O(w) = (yi − wT xi )2
• • • •• • • •
•
• • • • •• • i=1
•
• • • • ••
•• • • • • • • I In the picture: this is sum of
•
•• • •• • X2 squared length of black sticks.
• • I (Each one is called a residual,
•
X1 i.e., each yi − wT xi )

URE 3.1. Linear least squares fitting with

IR2 . We seek the linear function of X that mini-
s the sum of squared residuals from Y . 17 / 38
Fitting a linear model to data
ting a linear model to data
I
n
� Given a dataset D of pairs (xi , yiX
) for i = 1, .T. . , n2
O(w) = (y − w xi )
� Squared error makes the maths easy i
i=1
n
�
E(w) = (y= (y − Φw)2T (y − Φw)
i − f (xi ; w))
i=1
I We want to minimize this with respect to w.
� TheIerror
Thesurface is a parabolic
error surface bowl
is a parabolic bowl
25

15
E[w]

0
2

1
-2
-1
0 0
1
2
-1 3
w0 w1

How do we do this?
Figure: Tom Mitchell
I 5 / 17
18 / 38
The Solution

Pn
I Answer: to minimize O(w) = i=1 (yi − wT xi )2 , set partial
derivatives to 0.
I This has an analytical solution

ŵ = (ΦT Φ)−1 ΦT y

I (ΦT Φ)−1 ΦT is the pseudo-inverse of Φ

I First check: Does this make sense? Do the matrix
dimensions line up?
I Then: Why is this called a pseudo-inverse? ()
I Finally: What happens if there are no features?

19 / 38
Probabilistic interpretation of O(w)

I Assume that y = wT x + , where ∼ N(0, σ 2 )

I (This is an exact linear relationship plus Gaussian noise.)
I This implies that y |xi ∼ N(wT xi , σ 2 ), i.e.

√ (y − wT xi )2
− log p(yi |xi ) = log 2π + log σ + i
2σ 2
I So minimising O(w) equivalent to maximising likelihood!
I Can view wT x as E[y |x].
I Squared residuals allow estimation of σ 2
n
2 1X
σ̂ = (yi − wT xi )2
n
i=1

20 / 38
Fitting this into the general structure for learning algorithms:

I Define the task: regression

I Decide on the model structure: linear regression model
I Decide on the score function: squared error (likelihood)
I Decide on optimization/search method to optimize the
score function: calculus (analytic solution)

21 / 38
Sensitivity to Outliers
I Linear regression is sensitive to outliers √
I Example: Suppose y = 0.5x + , where ∼ N(0, 0.25),
and then add a point (2.5,3):

3.0
2.5 ●

●
●
2.0
1.5

●
1.0

●
0.5
0.0

0 1 2 3 4 5
22 / 38
Diagnositics

Graphical diagnostics can be useful for checking:

I Is the relationship obviously nonlinear? Look for structure
in residuals?
I Are there obvious outliers?
The goal isn’t to find all problems. You can’t. The goal is to find
obvious, embarrassing problems.

Examples: Plot residuals by fitted values. Stats packages will

do this for you.

23 / 38
Dealing with multiple outputs

I Suppose there are q different targets for each input x

I We introduce a different wi for each target dimension, and
do regression separately for each one
I This is called multiple regression

24 / 38
Basis expansion

I We can easily transform the original attributes x

non-linearly into φ(x) and do linear regression on them

1 1 1

0.5 0.75 0.75

0 0.5 0.5

−0.5 0.25 0.25

−1 0 0
−1 0 1 −1 0 1 −1 0 1

polynomial Gaussians sigmoids

Figure credit: Chris Bishop, PRML

25 / 38
I Design matrix is n × m
 
φ1 (x1 ) φ2 (x1 ) . . . φm (x1 )
 φ1 (x2 ) φ2 (x2 ) . . . φm (x2 ) 
 
Φ= .. .. .. .. 
 . . . . 
φ1 (xn ) φ2 (xn ) . . . φm (xn )

I Let y = (y1 , . . . , yn )T
I Minimize E(w) = |y − Φw|2 . As before we have an
analytical solution

ŵ = (ΦT Φ)−1 ΦT y

I (ΦT Φ)−1 ΦT is the pseudo-inverse of Φ

26 / 38
Example: polynomial regression
φ(x) = (1, x, x 2 , . . . , x M )T

1 M =0 1 M =1
t t

0 0

−1 −1

0 x 1 0 x 1

1 M =3 1 M =9
t t

0 0

−1 −1

0 x 1 0 x 1

Figure credit: Chris Bishop, PRML

27 / 38
More about the features

I Transforming the features can be important.

I Example: Suppose I want to predict the CPU performance.
I Maybe one of the features is manufacturer.



 1 if Intel

2 if AMD
x1 =


 3 if Apple

4 if Motorola

I Let’s use this as a feature. Will this work?

28 / 38
More about the features

I Transforming the features can be important.

I Example: Suppose I want to predict the CPU performance.
I Maybe one of the features is manufacturer.



 1 if Intel

2 if AMD
x1 =


 3 if Apple

4 if Motorola

I Let’s use this as a feature. Will this work?

I Not the way you want. Do you really believe AMD is double
Intel?

29 / 38
More about the features

I Instead convert this into 0/1

x1 = 1 if Intel, 0 otherwise
x2 = 1 if AMD, 0 otherwise
..
.

I Note this is a consequence of linearity. We saw something

similar with text in the first week.
I Other good transformations: log, square, square root

30 / 38
Radial basis function (RBF) models

I Set φi (x) = exp(− 12 |x − ci |2 /α2 ).

I Need to position these “basis functions” at some prior
chosen centres ci and with a given width α. There are
many ways to do this but choosing a subset of the
datapoints as centres is one method that is quite effective
I Finding the weights is the same as ever: the
pseudo-inverse solution.

31 / 38
RBF example

2.0
1.5
1.0
y

0.5
0.0

●
−0.5

0 1 2 3 4 5 6 7

32 / 38
y

−0.5 0.0 0.5 1.0 1.5 2.0

0
1
2
RBF example

x
4
5
6
7

−0.5 0.0 0.5 1.0 1.5 2.0

●

0
1
2
3

x
4
5
6
7

33 / 38
An RBF feature

Original data RBF feature, c1 = 3, α1 = 1

2.0

2.0
1.5

1.5
1.0

1.0
y

y
0.5

0.5
0.0

0.0
● ●
−0.5

0 1 2 3 4 5 6 7 −0.5 0.0 0.1 0.2 0.3 0.4

x phi3.x

34 / 38
Another RBF feature

Notice how the feature functions “specialize” in input space.

Original data RBF feature, c2 = 6, α2 = 1

2.0

2.0
1.5

1.5
1.0

1.0
y

y
0.5

0.5
0.0

0.0
● ●
−0.5

−0.5
0 1 2 3 4 5 6 7 0.0 0.1 0.2 0.3 0.4

x RBF kernel (mean 6)

35 / 38
RBF example
Run the RBF with both basis functions above, plot the residuals

yi − φ(xi )T w

Original data Residuals

2.0

2.0
1.5

1.5
Residual (RBF model)
1.0

1.0
y

0.5

0.5
0.0

0.0
●
−0.5

−0.5

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

x x
36 / 38
RBF: Ay, there’s the rub

I So why not use RBFs for everything?

I Short answer: You might need too many basis functions.
I This is especially true in high dimensions (we’ll say more
later)
I Too many means you probably overfit.
I Extreme example: Centre one on each training point.
I Also: notice that we haven’t seen yet in the course how to
learn the RBF parameters, i.e., the mean and standard
deviation of each kernel
I Main point of presenting RBFs now: Set up later methods
like support vector machines

37 / 38
Summary

I Linear regression often useful out of the box.

I More useful than it would be seem because linear means
linear in the parameters. You can do a nonlinear transform
of the data first, e.g., polynomial, RBF. This point will come
up again.
I Maximum likelihood solution is computationally efficient
(pseudo-inverse)
I Danger of overfitting, especially with many features or
basis functions

38 / 38

Probability Qns
No ratings yet
Probability Qns
9 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
11_Học máy cơ bản_Hồi quy tuyến tính 1
No ratings yet
11_Học máy cơ bản_Hồi quy tuyến tính 1
105 pages
ML_Lec 4-introduction to regression
No ratings yet
ML_Lec 4-introduction to regression
65 pages
Lecture 3 Multi-Regresion 2022.
No ratings yet
Lecture 3 Multi-Regresion 2022.
16 pages
Berkeley Machine Learning
No ratings yet
Berkeley Machine Learning
185 pages
CPSC 540 Assignment 1 (Due January 19)
100% (1)
CPSC 540 Assignment 1 (Due January 19)
9 pages
Lecture 2 - Linear Regression
No ratings yet
Lecture 2 - Linear Regression
54 pages
lecture3_supervised_learning_I
No ratings yet
lecture3_supervised_learning_I
84 pages
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
26 pages
5.2 Regression
No ratings yet
5.2 Regression
19 pages
Unit 2
No ratings yet
Unit 2
35 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
Bias
No ratings yet
Bias
62 pages
Examples for LSE, RLS, and RBFN
No ratings yet
Examples for LSE, RLS, and RBFN
16 pages
2022 Linear Regression
No ratings yet
2022 Linear Regression
34 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
lec03_MultLinRegression
No ratings yet
lec03_MultLinRegression
42 pages
Lec 03
No ratings yet
Lec 03
42 pages
Lecture Slides-Week11
No ratings yet
Lecture Slides-Week11
32 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Neural Network Lectures RBF 1
No ratings yet
Neural Network Lectures RBF 1
44 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Lecture Slides Week11
No ratings yet
Lecture Slides Week11
33 pages
Lecture 09_02.09.2024_Regression-01
No ratings yet
Lecture 09_02.09.2024_Regression-01
62 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Lecture Notes on High Dimensional Linear Regression
No ratings yet
Lecture Notes on High Dimensional Linear Regression
73 pages
Lecture 3_Regression (1)
No ratings yet
Lecture 3_Regression (1)
47 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
2 - Multiple Linear Regression
No ratings yet
2 - Multiple Linear Regression
71 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
ML Unit3
No ratings yet
ML Unit3
9 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
eng
No ratings yet
eng
10 pages
Notes5_Regression
No ratings yet
Notes5_Regression
14 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
1 Density and Relative Density
No ratings yet
1 Density and Relative Density
32 pages
Theodore Wildi-Electrical Machines, Drives and Power Systems, Fifth Edition - Prentice Hall (2002)
No ratings yet
Theodore Wildi-Electrical Machines, Drives and Power Systems, Fifth Edition - Prentice Hall (2002)
907 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
TS Part2
No ratings yet
TS Part2
62 pages
Chromatic Triadic Approach (Garzone) PDF
100% (4)
Chromatic Triadic Approach (Garzone) PDF
2 pages
MTH601 Quiz-2 File by Vu Topper RM-1
100% (1)
MTH601 Quiz-2 File by Vu Topper RM-1
53 pages
BEAM LG Gr.3 Module 1.1 - Whole Numbers PDF
100% (1)
BEAM LG Gr.3 Module 1.1 - Whole Numbers PDF
22 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
Triangles
No ratings yet
Triangles
33 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
FIBO MUSANG FINAL CHAPTER WORKBOOK Eng
75% (8)
FIBO MUSANG FINAL CHAPTER WORKBOOK Eng
45 pages
FM Ii
No ratings yet
FM Ii
66 pages
Part 5
No ratings yet
Part 5
31 pages
3.1.Random Variables and Probability Distributions
No ratings yet
3.1.Random Variables and Probability Distributions
24 pages
Lecture_3.2- Operators And control statements in Java
No ratings yet
Lecture_3.2- Operators And control statements in Java
19 pages
Part 3
No ratings yet
Part 3
29 pages
Wk05 machine learning
No ratings yet
Wk05 machine learning
6 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
Part 4
No ratings yet
Part 4
24 pages
Least Square Method
No ratings yet
Least Square Method
23 pages
MDA3S
No ratings yet
MDA3S
22 pages
Download full Introduction to Mathematical Statistics 8th Edition Hogg Solutions Manual (PDF) with all chapters
100% (3)
Download full Introduction to Mathematical Statistics 8th Edition Hogg Solutions Manual (PDF) with all chapters
29 pages
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
Std x Icse Maths Ch 14 Reflection Divesh Sir Manjiri Pg 174 187
No ratings yet
Std x Icse Maths Ch 14 Reflection Divesh Sir Manjiri Pg 174 187
14 pages
Nov Dec 2011
No ratings yet
Nov Dec 2011
2 pages
Award_in_Education_and_Training_sample
No ratings yet
Award_in_Education_and_Training_sample
9 pages
12th_Probability
No ratings yet
12th_Probability
18 pages
Chapterwise 01
No ratings yet
Chapterwise 01
20 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
2023 Rivision 34 With Pass Code
No ratings yet
2023 Rivision 34 With Pass Code
12 pages
LinearRegression PDF
No ratings yet
LinearRegression PDF
4 pages
Semester One Final Examinations 2021 ECON7350
No ratings yet
Semester One Final Examinations 2021 ECON7350
10 pages
w2e_multivariate_gaussian
No ratings yet
w2e_multivariate_gaussian
6 pages
04c - Data Management (Relative Position) PDF
No ratings yet
04c - Data Management (Relative Position) PDF
3 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
Theory of Demand and Supply ARM - M1
No ratings yet
Theory of Demand and Supply ARM - M1
31 pages
Vi Maths 8 Decimals
No ratings yet
Vi Maths 8 Decimals
2 pages
Machine Learning and Pattern Recognition Variational KL
No ratings yet
Machine Learning and Pattern Recognition Variational KL
5 pages
Machine Learning and Pattern Recognition - Laplace - Approximation
No ratings yet
Machine Learning and Pattern Recognition - Laplace - Approximation
4 pages
Teaching Pack: Cambridge International AS & A Level Mathematics 9709
No ratings yet
Teaching Pack: Cambridge International AS & A Level Mathematics 9709
36 pages
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
MLPR w0f - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0f - Machine Learning and Pattern Recognition
3 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
Find The Nearest Set of Coordinates in Excel - Stack Overflow
No ratings yet
Find The Nearest Set of Coordinates in Excel - Stack Overflow
3 pages
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
No ratings yet
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
3 pages
Strategic Thinking New
No ratings yet
Strategic Thinking New
14 pages
A Tutorial On CGAL Polyhedron For Subdivision Algorithms
No ratings yet
A Tutorial On CGAL Polyhedron For Subdivision Algorithms
25 pages
Tutorial 1 Discrete Probability Distribution: STA408: Statistics For Science and Engineering
No ratings yet
Tutorial 1 Discrete Probability Distribution: STA408: Statistics For Science and Engineering
6 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
Report Metodo Schrenk
No ratings yet
Report Metodo Schrenk
16 pages
w2c_central_limit
No ratings yet
w2c_central_limit
1 page
Table of Z-Scores
No ratings yet
Table of Z-Scores
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week 4 Linear Regression

Uploaded by

Week 4 Linear Regression

Uploaded by

IAML: Linear Regression

I The linear model

I Classification and regression problems:

I Training data: Set D of pairs (xi , yi ) for i = 1, . . . , n, where

I Robot inverse dynamics: predicting what torques are

where φ(x) = (1, x1 , . . . , xD ) = (1, xT )

FIGURE 3.1. Linear least squares fitting with

CPU Performance Data Set

I xij is the jth component of the training input xi

First consider matrix times vector, i.e., Ab. Two answers:

I xij is the jth component of the training input xi

This looks like what we’ve seen in linear algebra

We know y and Φ but not w.

So why not take w = Φ−1 y? (You can’t, but why?)

This looks like what we’ve seen in linear algebra

We know y and Φ but not w.

So why not take w = Φ−1 y? (You can’t, but why?)

Want a loss function O(w) that

URE 3.1. Linear least squares fitting with

I (ΦT Φ)−1 ΦT is the pseudo-inverse of Φ

I Assume that y = wT x + , where  ∼ N(0, σ 2 )

I Define the task: regression

Graphical diagnostics can be useful for checking:

Examples: Plot residuals by fitted values. Stats packages will

I Suppose there are q different targets for each input x

I We can easily transform the original attributes x

0.5 0.75 0.75

−0.5 0.25 0.25

polynomial Gaussians sigmoids

Figure credit: Chris Bishop, PRML

I (ΦT Φ)−1 ΦT is the pseudo-inverse of Φ

Figure credit: Chris Bishop, PRML

I Transforming the features can be important.

I Let’s use this as a feature. Will this work?

I Transforming the features can be important.

I Let’s use this as a feature. Will this work?

I Instead convert this into 0/1

I Note this is a consequence of linearity. We saw something

I Set φi (x) = exp(− 12 |x − ci |2 /α2 ).

−0.5 0.0 0.5 1.0 1.5 2.0

−0.5 0.0 0.5 1.0 1.5 2.0

Original data RBF feature, c1 = 3, α1 = 1

0 1 2 3 4 5 6 7 −0.5 0.0 0.1 0.2 0.3 0.4

Notice how the feature functions “specialize” in input space.

Original data RBF feature, c2 = 6, α2 = 1

x RBF kernel (mean 6)

Original data Residuals

I So why not use RBFs for everything?

I Linear regression often useful out of the box.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

I Assume that y = wT x + , where ∼ N(0, σ 2 )