0% found this document useful (0 votes)

21 views18 pages

Chapter 7 - Handsout Machine Learning

chapter7

Uploaded by

Abid Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views18 pages

Chapter 7 - Handsout Machine Learning

chapter7

Uploaded by

Abid Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Machine Learning

Session 17-18

Prof. Dr.Ijaz Hussain

Statistics, QAU

May 27, 2024

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 1 / 18
Moving Beyond Linearity: Background

In Chapter 6 we see that we can improve upon least squares using

ridge regression, the lasso, principal components regression, and other
techniques.
In that setting, the improvement is obtained by reducing the complexity
of the linear model, and hence the variance of the estimates.
But we are still using a linear model, which can only be improved so
far!
In this chapter we relax the linearity assumption while still attempting
to maintain as much interpretability as possible.
We do this by examining very simple extensions of linear models like
polynomial regression and step functions, as well as more sophisticated
approaches such as splines, local regression, and generalized additive
models.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 2 / 18
Possible Non-Linear Methods
Polynomial regression extends the linear model by adding extra predic-
tors, obtained by raising each of the original predictors to a power.
Step functions cut the range of a variable into K distinct regions in
order to produce a qualitative variable. This has the effect of fitting a
piecewise constant function.
Regression splines are more flexible than polynomials and step func-
tions, and in fact are an extension of the two. They involve dividing
the range of X into K distinct regions. Within each region, a polyno-
mial function is fitted to the data.
Smoothing splines are similar to regression splines, but arise in a slightly
different situation. Smoothing splines result from minimizing a residual
sum of squares criterion subject to a smoothness penalty.
Local regression is similar to splines, but differs in an important way.
The regions are allowed to overlap, and indeed they do so in a very
smooth way.
Generalized additive models allow us to extend the above methods to
deal with multiple predictors.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 3 / 18
Polynomial Regression

The standard way to extend linear regression to nonlinear is to replace

the standard linear model

yi = β0 + β1 xi + ϵi

with a polynomial function

yi = β0 + β1 xi + β2 xi2 + β3 xi3 + ... + βd xid + ϵi

For large enough degree d, a polynomial regression allows us to produce

an extremely non-linear curve.
Notice that the coefficients in above equation can be easily estimated
using OLS because this is just a standard linear model with predictors.
Generally speaking, it is unusual to use d greater than 3 or 4 because
for large values of d, the polynomial curve can become overly flexible
and can take on some very strange shapes.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 4 / 18
Step Functions
Using polynomial functions of the features as predictors in a linear
model imposes a global structure on the non-linear function of X .
We can instead use step functions in order to avoid imposing such a
global structure.
Here we break the range of X into bins, and fit a different constant in
each bin.
This amounts to converting a continuous variable into an ordered cat-
egorical variable.
In greater detail, we create cutpoints c1 , c2 , ..., cK in the range of X
and then construct K + 1 new variables C0 (X ) = I (X < c1 ), C1 (X ) =
I (c1 ≤ X < c2 ), C2 (X ) = I (c2 ≤ X < c3 ), ...CK −1 (X ) = I (cK −1 ≤
X < cK ), CK (X ) = I (cK ≤ X )
where I () is an indicator function that returns a 1 if the condition is true
and returns a 0 otherwise.These are sometimes called dummy variables.
We then use least squares to fit a linear model using
C1 (X ), C2 (X ), ..., CK (X ) as predictors.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 5 / 18
Step Functions...

For a given value of X , at most one of C1 , C2 , ..., CK can be non-

zero.Resulting model becomes as follows

yi = β0 + β1 C1 (xi ) + β2 C2 (xi ) + + βK CK (xi ) + ϵi

Note that when X < c1 , all of the predictors in are zero, so β0 can be
interpreted as the mean value of Y for X < c1 .
By comparison, above equation predicts a response of β0 + βj for cj ≤
X < cj + 1, so βj represents the average increase in the response for
X in cj ≤ X < cj + 1 relative to X < c1 .
Unfortunately, unless there are natural breakpoints in the predictors,
piecewise-constant functions can miss the action.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 6 / 18
Basis Functions
Polynomial and piecewise-constant regression models are in fact special
cases of a basis function approach.
The idea is to have at hand a family of functions or transformations
that can be applied to a variable X i.e
b1 (X ), b2 (X ), ..., bK (X )
.
Instead of fitting a linear model in X , we fit the model
yi = β0 + β1 b1 (xi ) + β2 b2 (xi ) + β3 b3 (xi ) + + βK bK (xi ) + ϵi .
Note that the basis functions b1 (), b2 (), ..., bK () are fixed and known.
(In other words, we choose the functions ahead of time.)
For polynomial regression, the basis functions are bj (xi ) = xij , and for
piecewise constant functions they are bj (xi ) = I (cj ≤ xi < cj + 1).
We can think of above model as a standard linear model with predictors
b1 (xi ), b2 (xi ), ..., bK (xi ).
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 7 / 18
Basis Functions..

Hence,we can use least squares to estimate the unknown regression

coefficients in above model.
Importantly, this means that all of the inference tools for linear models,
such as standard errors for the coefficient estimates and F-statistics for
the model’s overall significance, are available in this setting.
Thus far we have considered the use of polynomial functions and piece-
wise constant functions for our basis functions; however, many alter-
natives are possible.
For instance, we can use wavelets or Fourier series to construct basis
functions.
In the next section, we investigate a very common choice for a basis
function: regression splines.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 8 / 18
Piecewise Polynomials
Instead of fitting a high-degree polynomial over the entire range of
X , piecewise polynomial regression involves fitting separate low-degree
polynomials over different regions of X .
For example, a piecewise cubic polynomial works by fitting a cubic
regression model of the form
yi = β0 + β1 xi + β2 xi2 + β3 xi3 + ϵi
where the coefficients β0 , β1 , β2 , and β3 differ in different parts of the
range of X .
The points where the coefficients change are called knots.
For example, a piecewise cubic with no knots is just a standard cubic
polynomial.
A piecewise cubic polynomial with a single knot at a point c takes the
form

yi = β01 + β11 xi + β21 xi2 + β31 xi3 + ϵi if xi < c

yi = β02 + β12 xi + β22 xi2 + β32 xi3 + ϵi if xi ≥ c.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 9 / 18
Piecewise Polynomials

In other words, we fit two different polynomial functions to the data,

one on the subset of the observations with xi < c, and one on the
subset of the observations with xi ≥ c.
The first polynomial function has coefficients β01 , β11 , β21 , and β31 ,
and the second has coefficients β02 , β12 , β22 , and β32 .
Each of these polynomial functions can be fit using least squares applied
to simple functions of the original predictor.
Using more knots leads to a more flexible piecewise polynomial.
In general, if we place K different knots throughout the range of X ,
then we will end up fitting K + 1 different cubic polynomials.
Note that we do not need to use a cubic polynomial. For example, we
can instead fit piecewise linear functions.
In fact, our piecewise constant functions of Section 7.2 are piecewise
polynomials of degree 0!.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 10 / 18
Various piecewise polynomials

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 11 / 18
Constraints and Splines
The top left panel of the above Figure looks wrong because the fitted
curve is just too flexible.
To remedy this problem, we can fit a piecewise polynomial under the
constraint that the fitted curve must be continuous.
In other words, there cannot be a jump when age=50.
The top right plot in the Figure shows the resulting fit. This looks
better than the top left plot, but the Vshaped join looks unnatural.
In the lower left plot, we have added two additional constraints: now
both the first and second derivatives of the piecewise polynomials are
continuous
at age=50.
In other words, we are requiring that the piecewise polynomial be not
only continuous when age=50, but also very smooth.
Each constraint that we impose on the piecewise cubic polynomials
effectively frees up one degree of freedom, by reducing the complexity
of the resulting piecewise polynomial fit.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 12 / 18
Constraints and Splines...
So in the top left plot, we are using eight degrees of freedom, but in the
bottom left plot we imposed three constraints (continuity, continuity
of the first derivative, and continuity of the second derivative) and so
are left with five degrees of freedom.
The curve in the bottom left plot is called a cubic spline.
In general, a cubic spline with K knots uses cubic spline a total of 4+K
degrees of freedom.
In Figure 7.3, the lower right plot is a linear spline, which is continuous
linear spline at age=50.
The general definition of a degree-d spline is that it is a piecewise
degree d polynomial, with continuity in derivatives up to degree d − 1
at each knot.
Therefore, a linear spline is obtained by fitting a line in each region of
the predictor space defined by the knots, requiring continuity at each
knot.
In Figure 7.3, there is a single knot at age=50. Of course, we could
add more knots, and impose continuity at each.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 13 / 18
The Spline Basis Representation
The regression splines that we just saw in the previous section may
have seemed somewhat complex: how can we fit a piecewise degree-d
polynomial under the constraint that it (and possibly its first d − 1
derivatives) be continuous?
It turns out that we can use the basis model to represent a regression
spline.
A cubic spline with K knots can be modeled as

yi = β0 + β1 b1 (xi ) + β2 b2 (xi ) + β3 b3 (xi ) + + βK bK (xi ) + ϵi .

for an appropriate choice of basis functions b1 (), b2 (), ..., bK +3 () This
model can be fitted by using least squares.
Just as there were several ways to represent polynomials, there are also
many equivalent ways to represent cubic splines using different choices
of basis functions in above model.
The most direct way to represent a cubic spline using above model is
to start off with a basis for a cubic polynomial—namely, x, x 2 , and x 3
and
Prof. Dr.Ijaz then(Statistics,
Hussain add one QAU) truncatedMachine
power basis function per May
Learning knot.
27, 2024 14 / 18
The Spline Basis Representation...
A truncated power basis function is defined as

h(x, ξ) = (x − ξ)3+ = (x − ξ)3 if x > ξ, Otherwise 0

where ξ is the knot.

One can show that adding a term of the form β4 h(x, ξ) to the above
model for a cubic polynomial will lead to a discontinuity in only the third
derivative at ξ; the function will remain continuous, with continuous
first and second derivatives, at each of the knots.
In other words, in order to fit a cubic spline to a data set with K knots,
we perform least squares regression with an intercept and 3+K pre-
dictors, of the form X , X 2 , X 3 , h(X , ξ1 ), h(X , ξ2 ), ..., h(X , ξK ), where
ξ1 , ..., ξK are the knots.
This amounts to estimating a total of K + 4 regression coefficients; for
this reason, fitting a cubic spline with K knots uses K + 4 degrees of
freedom.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 15 / 18
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 16 / 18
References

The material used in these slides is borrowed from the following books.
These slides can be used only for academic purpose.
Gareth, J., Daniela, W., Trevor, H., & Robert, T. (2013). An intro-
duction to statistical learning: with applications in R. Spinger.
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009).
The elements of statistical learning: data mining, inference, and pre-
diction (Vol. 2, pp. 1-758). New York: springer.

Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 17 / 18
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 18 / 18

Statistical Methods For Bioinformatics Lecture 5
No ratings yet
Statistical Methods For Bioinformatics Lecture 5
48 pages
M.A Economics PDF
No ratings yet
M.A Economics PDF
42 pages
Course Plans of Department of Economics, University of Dhaka
56% (18)
Course Plans of Department of Economics, University of Dhaka
63 pages
Juan Martin Garcia System Dynamics Exercises
No ratings yet
Juan Martin Garcia System Dynamics Exercises
294 pages
ISYE 8803 - Kamran - M1 - Intro To HD and Functional Data - Updated
No ratings yet
ISYE 8803 - Kamran - M1 - Intro To HD and Functional Data - Updated
87 pages
Lec 08 - Polynomial Regression
No ratings yet
Lec 08 - Polynomial Regression
56 pages
Module08 PolynomialRegressionSplineGAMs
No ratings yet
Module08 PolynomialRegressionSplineGAMs
56 pages
Chapter3 Annotated Almostwholething
No ratings yet
Chapter3 Annotated Almostwholething
26 pages
Polynomial Regression
No ratings yet
Polynomial Regression
15 pages
07 Moving Beyond Linearity I 169
No ratings yet
07 Moving Beyond Linearity I 169
40 pages
Chapter 5: Basis Functions and Regularization
No ratings yet
Chapter 5: Basis Functions and Regularization
4 pages
Matlab 3
No ratings yet
Matlab 3
42 pages
Non Linear Regression Models
No ratings yet
Non Linear Regression Models
50 pages
06 Basis
No ratings yet
06 Basis
9 pages
Lecture 19
No ratings yet
Lecture 19
4 pages
Lecture02 95791
No ratings yet
Lecture02 95791
94 pages
Polynomial and Spline Interpolation: A Chemical Reaction
No ratings yet
Polynomial and Spline Interpolation: A Chemical Reaction
4 pages
7 Nonlinear
No ratings yet
7 Nonlinear
48 pages
Lecture19 PDF
No ratings yet
Lecture19 PDF
4 pages
Lecture 16: Polynomial and Categorical Regression 1 Review
No ratings yet
Lecture 16: Polynomial and Categorical Regression 1 Review
10 pages
Basis Approaches
No ratings yet
Basis Approaches
9 pages
Sy19 A22 Cours6
No ratings yet
Sy19 A22 Cours6
73 pages
15 Splines
No ratings yet
15 Splines
51 pages
Nonlinear Regression
No ratings yet
Nonlinear Regression
8 pages
NVT SDS Unit V Final PDF
No ratings yet
NVT SDS Unit V Final PDF
100 pages
10 Matlab Fitting
No ratings yet
10 Matlab Fitting
36 pages
Smoothspline PDF
No ratings yet
Smoothspline PDF
4 pages
Chapter12 Regression PolynomialRegression
No ratings yet
Chapter12 Regression PolynomialRegression
12 pages
Chapter 7 Polynomial Regression Models: Ray-Bing Chen Institute of Statistics National University of Kaohsiung
No ratings yet
Chapter 7 Polynomial Regression Models: Ray-Bing Chen Institute of Statistics National University of Kaohsiung
69 pages
Polynomial Regression
No ratings yet
Polynomial Regression
16 pages
4 - Empirical Modeling
No ratings yet
4 - Empirical Modeling
36 pages
L11 ML
No ratings yet
L11 ML
27 pages
Lecture - 23 24 25 26
No ratings yet
Lecture - 23 24 25 26
36 pages
Fourth Lecture
No ratings yet
Fourth Lecture
15 pages
Curve Fitting & Approximation
No ratings yet
Curve Fitting & Approximation
29 pages
An Introduction To Splines: James H. Steiger
No ratings yet
An Introduction To Splines: James H. Steiger
23 pages
Chapter 8 - Interpolation
No ratings yet
Chapter 8 - Interpolation
14 pages
Lec 03
No ratings yet
Lec 03
42 pages
CPSC540: Regularization, Regularization, Nonlinear Prediction and Generalization
No ratings yet
CPSC540: Regularization, Regularization, Nonlinear Prediction and Generalization
23 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
Week 2
No ratings yet
Week 2
43 pages
Slides4 mrbm2324
No ratings yet
Slides4 mrbm2324
40 pages
Advanced Machine Learning
No ratings yet
Advanced Machine Learning
74 pages
qt95f7x3hb Nosplash
No ratings yet
qt95f7x3hb Nosplash
26 pages
Lecture 8
No ratings yet
Lecture 8
24 pages
Fitting
No ratings yet
Fitting
32 pages
Multivariat Kernel Regression
No ratings yet
Multivariat Kernel Regression
3 pages
Introduction To Curve Fitting PDF
No ratings yet
Introduction To Curve Fitting PDF
49 pages
Flexible Regression - Lecture 6: Marnie Mclean Room 344 Mathematics and Statistics Building
No ratings yet
Flexible Regression - Lecture 6: Marnie Mclean Room 344 Mathematics and Statistics Building
27 pages
Lecture4 (Piecewise Interpolation)
No ratings yet
Lecture4 (Piecewise Interpolation)
7 pages
Module 07
No ratings yet
Module 07
21 pages
Introduction To Polynomial Regression
No ratings yet
Introduction To Polynomial Regression
5 pages
Matlabnoteschap 06
No ratings yet
Matlabnoteschap 06
34 pages
Stats216 hw3 PDF
No ratings yet
Stats216 hw3 PDF
26 pages
Amath/Math 516 Second Homework Set Linear Least Squares
No ratings yet
Amath/Math 516 Second Homework Set Linear Least Squares
6 pages
Polynomial Regression
No ratings yet
Polynomial Regression
6 pages
Lecture03 Kernel
No ratings yet
Lecture03 Kernel
28 pages
CE304-Unit 5-Lect2-Jumah 2018
No ratings yet
CE304-Unit 5-Lect2-Jumah 2018
14 pages
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
No ratings yet
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
5 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
CHP 1curve Fitting
No ratings yet
CHP 1curve Fitting
21 pages
V3i403 PDF
No ratings yet
V3i403 PDF
3 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Rmprobit
No ratings yet
Rmprobit
8 pages
27.12.10h15 KTLTC De-1
No ratings yet
27.12.10h15 KTLTC De-1
6 pages
Aff700 1000 220401
No ratings yet
Aff700 1000 220401
8 pages
Chapter 5 Estimation PDF
No ratings yet
Chapter 5 Estimation PDF
15 pages
Pricing Analytics Models and Advanced Quantitative Techniques For Product Pricing First Edition Paczkowski
100% (2)
Pricing Analytics Models and Advanced Quantitative Techniques For Product Pricing First Edition Paczkowski
65 pages
TYBBAA 1007points Tally Show
No ratings yet
TYBBAA 1007points Tally Show
33 pages
Analisis Soalan STPM Matematik
No ratings yet
Analisis Soalan STPM Matematik
1 page
Mastering Metrics Published
No ratings yet
Mastering Metrics Published
4 pages
Unit10 Plane of Regression: Structure
No ratings yet
Unit10 Plane of Regression: Structure
14 pages
Hasil Estimasi
No ratings yet
Hasil Estimasi
6 pages
CFA L2 2024 Volume1
100% (1)
CFA L2 2024 Volume1
168 pages
Solved Problems
No ratings yet
Solved Problems
6 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
Statistics For Business and Economics
No ratings yet
Statistics For Business and Economics
43 pages
Date Wazir Ui Ui 2 Eq. Weight Ui 2 Eq. Weight
No ratings yet
Date Wazir Ui Ui 2 Eq. Weight Ui 2 Eq. Weight
51 pages
EDA and Regression: Introduction To Assignment
No ratings yet
EDA and Regression: Introduction To Assignment
2 pages
Econometrics I, Lecture 3: Fn. Form What Does "Linear" Mean?
No ratings yet
Econometrics I, Lecture 3: Fn. Form What Does "Linear" Mean?
3 pages
MA Economics CBCS 2023 24 With Objectives
No ratings yet
MA Economics CBCS 2023 24 With Objectives
34 pages
UMADBK Assignment Brief (CW1)
No ratings yet
UMADBK Assignment Brief (CW1)
10 pages
Forecastin G Moving Averages - 3 Period Moving Average
No ratings yet
Forecastin G Moving Averages - 3 Period Moving Average
9 pages
Advanced Econometric Methods I: Lecture Notes On Weak Instruments
No ratings yet
Advanced Econometric Methods I: Lecture Notes On Weak Instruments
16 pages
Group 5 - Assignment No.3
No ratings yet
Group 5 - Assignment No.3
4 pages
MSF 566 Topic 04 Modeling With Volatility
No ratings yet
MSF 566 Topic 04 Modeling With Volatility
36 pages
Testing Endogeneity
No ratings yet
Testing Endogeneity
3 pages
Package TSA': R Topics Documented
No ratings yet
Package TSA': R Topics Documented
79 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
TS1819 Exam Final
No ratings yet
TS1819 Exam Final
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 7 - Handsout Machine Learning

Uploaded by

Chapter 7 - Handsout Machine Learning

Uploaded by

Machine Learning

Prof. Dr.Ijaz Hussain

May 27, 2024

In Chapter 6 we see that we can improve upon least squares using

The standard way to extend linear regression to nonlinear is to replace

with a polynomial function

yi = β0 + β1 xi + β2 xi2 + β3 xi3 + ... + βd xid + ϵi

For large enough degree d, a polynomial regression allows us to produce

For a given value of X , at most one of C1 , C2 , ..., CK can be non-

yi = β0 + β1 C1 (xi ) + β2 C2 (xi ) + + βK CK (xi ) + ϵi

Hence,we can use least squares to estimate the unknown regression

yi = β01 + β11 xi + β21 xi2 + β31 xi3 + ϵi if xi < c

In other words, we fit two different polynomial functions to the data,

yi = β0 + β1 b1 (xi ) + β2 b2 (xi ) + β3 b3 (xi ) + + βK bK (xi ) + ϵi .

h(x, ξ) = (x − ξ)3+ = (x − ξ)3 if x > ξ, Otherwise 0

where ξ is the knot.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.