Chapter 7 - Handsout Machine Learning
Chapter 7 - Handsout Machine Learning
Session 17-18
Statistics, QAU
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 1 / 18
Moving Beyond Linearity: Background
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 2 / 18
Possible Non-Linear Methods
Polynomial regression extends the linear model by adding extra predic-
tors, obtained by raising each of the original predictors to a power.
Step functions cut the range of a variable into K distinct regions in
order to produce a qualitative variable. This has the effect of fitting a
piecewise constant function.
Regression splines are more flexible than polynomials and step func-
tions, and in fact are an extension of the two. They involve dividing
the range of X into K distinct regions. Within each region, a polyno-
mial function is fitted to the data.
Smoothing splines are similar to regression splines, but arise in a slightly
different situation. Smoothing splines result from minimizing a residual
sum of squares criterion subject to a smoothness penalty.
Local regression is similar to splines, but differs in an important way.
The regions are allowed to overlap, and indeed they do so in a very
smooth way.
Generalized additive models allow us to extend the above methods to
deal with multiple predictors.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 3 / 18
Polynomial Regression
yi = β0 + β1 xi + ϵi
Note that when X < c1 , all of the predictors in are zero, so β0 can be
interpreted as the mean value of Y for X < c1 .
By comparison, above equation predicts a response of β0 + βj for cj ≤
X < cj + 1, so βj represents the average increase in the response for
X in cj ≤ X < cj + 1 relative to X < c1 .
Unfortunately, unless there are natural breakpoints in the predictors,
piecewise-constant functions can miss the action.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 6 / 18
Basis Functions
Polynomial and piecewise-constant regression models are in fact special
cases of a basis function approach.
The idea is to have at hand a family of functions or transformations
that can be applied to a variable X i.e
b1 (X ), b2 (X ), ..., bK (X )
.
Instead of fitting a linear model in X , we fit the model
yi = β0 + β1 b1 (xi ) + β2 b2 (xi ) + β3 b3 (xi ) + + βK bK (xi ) + ϵi .
Note that the basis functions b1 (), b2 (), ..., bK () are fixed and known.
(In other words, we choose the functions ahead of time.)
For polynomial regression, the basis functions are bj (xi ) = xij , and for
piecewise constant functions they are bj (xi ) = I (cj ≤ xi < cj + 1).
We can think of above model as a standard linear model with predictors
b1 (xi ), b2 (xi ), ..., bK (xi ).
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 7 / 18
Basis Functions..
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 8 / 18
Piecewise Polynomials
Instead of fitting a high-degree polynomial over the entire range of
X , piecewise polynomial regression involves fitting separate low-degree
polynomials over different regions of X .
For example, a piecewise cubic polynomial works by fitting a cubic
regression model of the form
yi = β0 + β1 xi + β2 xi2 + β3 xi3 + ϵi
where the coefficients β0 , β1 , β2 , and β3 differ in different parts of the
range of X .
The points where the coefficients change are called knots.
For example, a piecewise cubic with no knots is just a standard cubic
polynomial.
A piecewise cubic polynomial with a single knot at a point c takes the
form
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 10 / 18
Various piecewise polynomials
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 11 / 18
Constraints and Splines
The top left panel of the above Figure looks wrong because the fitted
curve is just too flexible.
To remedy this problem, we can fit a piecewise polynomial under the
constraint that the fitted curve must be continuous.
In other words, there cannot be a jump when age=50.
The top right plot in the Figure shows the resulting fit. This looks
better than the top left plot, but the Vshaped join looks unnatural.
In the lower left plot, we have added two additional constraints: now
both the first and second derivatives of the piecewise polynomials are
continuous
at age=50.
In other words, we are requiring that the piecewise polynomial be not
only continuous when age=50, but also very smooth.
Each constraint that we impose on the piecewise cubic polynomials
effectively frees up one degree of freedom, by reducing the complexity
of the resulting piecewise polynomial fit.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 12 / 18
Constraints and Splines...
So in the top left plot, we are using eight degrees of freedom, but in the
bottom left plot we imposed three constraints (continuity, continuity
of the first derivative, and continuity of the second derivative) and so
are left with five degrees of freedom.
The curve in the bottom left plot is called a cubic spline.
In general, a cubic spline with K knots uses cubic spline a total of 4+K
degrees of freedom.
In Figure 7.3, the lower right plot is a linear spline, which is continuous
linear spline at age=50.
The general definition of a degree-d spline is that it is a piecewise
degree d polynomial, with continuity in derivatives up to degree d − 1
at each knot.
Therefore, a linear spline is obtained by fitting a line in each region of
the predictor space defined by the knots, requiring continuity at each
knot.
In Figure 7.3, there is a single knot at age=50. Of course, we could
add more knots, and impose continuity at each.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 13 / 18
The Spline Basis Representation
The regression splines that we just saw in the previous section may
have seemed somewhat complex: how can we fit a piecewise degree-d
polynomial under the constraint that it (and possibly its first d − 1
derivatives) be continuous?
It turns out that we can use the basis model to represent a regression
spline.
A cubic spline with K knots can be modeled as
The material used in these slides is borrowed from the following books.
These slides can be used only for academic purpose.
Gareth, J., Daniela, W., Trevor, H., & Robert, T. (2013). An intro-
duction to statistical learning: with applications in R. Spinger.
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009).
The elements of statistical learning: data mining, inference, and pre-
diction (Vol. 2, pp. 1-758). New York: springer.
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 17 / 18
Prof. Dr.Ijaz Hussain (Statistics, QAU) Machine Learning May 27, 2024 18 / 18