0% found this document useful (0 votes)

28 views56 pages

Module08 PolynomialRegressionSplineGAMs

The document discusses different non-linear regression methods such as polynomial regression, step functions, splines and basis functions. Polynomial regression extends linear regression by adding higher-order terms. Step functions fit a piecewise constant function by dividing the predictor range into regions. Splines are similar but join smoothly at region boundaries. Basis functions provide a general framework that includes these methods as special cases.

Uploaded by

riya pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views56 pages

Module08 PolynomialRegressionSplineGAMs

Uploaded by

riya pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Polynomial Regression, Step Functions, Basis

Functions, Splines, GAMS

Prof. Sayak Roychowdhury
Department of Industrial and Systems Engineering
Indian Institute of Technology Kharagpur
Reference
• James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An
introduction to statistical learning (Vol. 112, p. 18). New York:
springer.
• Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H.
(2009). The elements of statistical learning: data mining,
inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
Need of non-linear models
• Linear models are difficult to fit when the linearity assumption is poor.
• Ridge, lasso, and principal components regression improve least squares
regression by reducing the variance of the coefficient estimates. But still this
models hold the linearity and perform poorly in nonlinear problems.
• So to overcome this we use polynomial regression, step functions, splines, local
regression, and generalized additive models (GAM).
Different Methods
• Polynomial regression extends the linear model by adding extra predictors,
obtained by raising each of the original
2
predictors
3
to a power. E.g., a cubic
regression uses three variables, 𝑋, 𝑋 , 𝑎𝑛𝑑 𝑋 , as predictors.
• Step functions cut the range of a variable into 𝐾 distinct regions in order to
produce a qualitative variable. This has the effect of fitting a piecewise
constant function.
• Regression splines are more flexible are an extension of the above two.
They involve dividing the range of 𝑋 into 𝐾 distinct regions.
• Within each region, a polynomial function is fit to the data.
• However, these polynomials are constrained so that they join smoothly at the region
boundaries, or knots.
• Provided that the interval is divided into enough regions, this can produce an
extremely flexible fit.
Different Methods
• Smoothing splines are similar to regression splines, but arise in a
slightly different situation resulting from minimizing a residual sum of
squares criterion subject to a smoothness penalty.
• Local regression is similar to splines, but differs in an important way.
The regions are allowed to overlap, and indeed they do so in a very
smooth way.
• Generalized additive models allow us to extend the methods above
to deal with multiple predictors.
Polynomial Regression
• Polynomial Regression is a regression algorithm that models the relationship
between a dependent(𝑦) and independent variable(𝑥) by adding extra predictors
with 𝑛th degree polynomial. A polynomial regression model may look like the
following:
𝑌 = 𝛽0 + 𝛽1 (𝑋) + 𝛽2 (𝑋 2 ) + 𝛽3 (𝑋 3 )+. . . +𝛽𝑛 (𝑋 𝑛 )
• For above equation degree greater than 3 or 4 results in too flexible and weird
shaped curve.
• In polynomial regression instead of individual fit the overall fit is considered to
assess the relationship between the predictor and response.
• We either fix the degree 𝑑 at some reasonably low value, else use cross-validation
to choose 𝑑.
• Polynomial regression imposes a global structure of the relationship (means same
degree for all predictors).
Figure: Difference between simple and polynomial curve Figure: Polynomial curve fitting for higher order
Polynomial Model for Regression
Structural Multicollinearity

High VIF
Structural Multicollinearity (after centering)

VIF improved after

centering
Nested Sequence of Models
Probability Bands for Polynomial Model

The confidence bands are first

calculated in logit scale and then
transformed to probability
scale.
All the upper and lower bound values
lie between 0 and 1.
Step Function
• Another way of creating transformations of a variable is cut the variable into
distinct regions.
• Step function fit a piecewise constant function into 𝐾 distinct regions in order to
produce a qualitative variable.
• To avoid imposing of global structure on a non-linear function we use step
function. In step function the𝑋values are divided into range and fit a different
constant for each range.
• In greater detail, we create cutpoints 𝑐1 , 𝑐2 , . . . , 𝑐𝐾 in the range of𝑋, and then
construct 𝐾 + 1 new variables.
Step Function Expression
• The step function is as given below:
𝐶0 (𝑋) = 𝐼(𝑋 < 𝑐1 ),
𝐶1 (𝑋) = 𝐼(𝑐1 ≤ 𝑋 < 𝑐2 ),
𝐶2 (𝑋) = 𝐼(𝑐2 ≤ 𝑋 < 𝑐3 ),
...
𝐶𝐾−1 (𝑋) = 𝐼(𝑐𝐾−1 ≤ 𝑋 < 𝑐𝐾 ),
𝐶𝐾 (𝑋) = 𝐼(𝑐𝐾 ≤ 𝑋),
where 𝐼(·) is an indicator function that returns a 1 if the condition is true otherwise 0.
• For any value of 𝑋, 𝐶0 (𝑋) + 𝐶1 (𝑋) +··· +𝐶𝐾 (𝑋) = 1, since 𝑋 must be in exactly one of
the 𝐾 + 1 intervals.
• The least square fit becomes:
𝑦𝑖 = 𝛽0 + 𝛽1 𝐶1 (𝑥𝑖 ) + 𝛽2 𝐶2 (𝑥𝑖 ) +··· +𝛽𝐾 𝐶𝐾 (𝑥𝑖 ) + 𝜀𝑖
when X < 𝑐1 , all of the predictors in above equation are zero, so 𝛽0 can be interpreted as
the mean value of 𝑌 for 𝑋 < 𝑐1 .
Step Function
Basis Function
• Polynomial and piecewise-constant regression models are in fact special cases of a
basis function approach. Instead of fitting a linear model in X, we fit the model
𝑌 = 𝛽0 + 𝛽1 𝑏1 𝑥𝑖 + 𝛽2 𝑏2 𝑥𝑖 + 𝛽3 𝑏3 𝑥𝑖 +. . . +𝛽𝐾 𝑏𝐾 𝑥𝑖 + 𝜖𝑖
𝑗
• For polynomial regression, the basis functions are 𝑏𝑗 (𝑥𝑖 ) = 𝑥𝑖 , and for
piecewise constant functions they are 𝑏𝑗 𝑥𝑖 = 𝐼(𝑐𝑗 ≤ 𝑥𝑖 ≤ 𝑐𝑗+1) .
Regression splines
• In this X data points are divided into K distinct regions and for each region a
separate polynomial function is fitted. These polynomial are constrained so that
they join smoothly at region boundaries.
• This is a class of basis functions that extends upon the polynomial regression and
piecewise constant regression.
• Piecewise polynomial regression involves fitting separate low-degree polynomials
over different regions of X.
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝛽2 𝑥𝑖2 + 𝛽3 𝑥𝑖3 + 𝜖𝑖
• where the coefficients 𝛽0 , 𝛽1 , 𝛽2 , 𝑎𝑛𝑑𝛽3 differ in different parts of the range of X.
• The points where the coefficients change are called knots. Piecewise cubic with
no knots is just a standard cubic polynomial.
Piecewise cubic polynomial
• Piecewise cubic polynomial with a single knot at a point c takes the form
𝛽01 + 𝛽11 𝑥𝑖 + 𝛽21 𝑥𝑖2 + 𝛽31 𝑥𝑖3 + 𝜖𝑖 𝑖𝑓𝑥𝑖 < 𝑐
𝑦𝑖 = ൝
𝛽02 + 𝛽12 𝑥𝑖 + 𝛽22 𝑥𝑖2 + 𝛽32 𝑥𝑖3 + 𝜖𝑖 𝑖𝑓𝑥𝑖 ≥ 𝑐
• Using more knots leads to a more flexible piecewise polynomial.
• The general definition of a degree-𝑑 spline is that it is a piecewise degree-
𝑑polynomial, with continuity in derivatives up to degree 𝑑 − 1 at each knot.
Linear spline is obtained by fitting a line in each region of the predictor space
defined by the knots, requiring continuity at each knot.
Various piecewise polynomials fit to a subset of wage dataset
Fitting Cubic Spline

Spline with continuous 2nd derivatives

at knots
Constraints
• Discontinuity at the knot is undesirable
• Adding constraint that both first and second derivatives at the knots
should be continuous
• This will result in a smooth piecewise polynomial function
The Spline Basis Representation
• To fit a regression splines with 𝑑-degree polynomial is complex. So we can use
the basis model to represent a regression spline. A cubic spline with K knots can
be modelled as and fit with least square
𝑌 = 𝛽0 + 𝛽1 𝑏1 𝑥𝑖 + 𝛽2 𝑏2 𝑥𝑖 +. . . +𝛽𝐾+3 𝑏𝐾+3 𝑥𝑖 + 𝜖𝑖
• Basis for a cubic polynomial—namely, 𝑥, 𝑥 2 , 𝑎𝑛𝑑𝑥 3 and then add one truncated
power basis function per knot. A truncated power basis function is defined as
3 (𝑥 − 𝜉) 3 𝑖𝑓𝑥 > 𝜉
ℎ 𝑥, 𝜉 = 𝑥 − 𝜉 + = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where ξ is the knot.
• The above function will be discontinuous after 3rd derivative.
• Fitting least square regression with an intercept and 3+K predictors, of the form
𝑋, 𝑋 2 , 𝑋 3 , ℎ 𝑥, 𝜉1 , … ℎ 𝑥, 𝜉𝑘 , where 𝜉1 , .. 𝜉𝑘 are the knots
The Spline Basis Representation
❖Choosing the Number and Locations of the Knots
• Regression splines are more flexible where more knots are placed, because in
these regions the coefficients vary rapidly.
• One option could be to place more knots where the function varies rapidly
• Usually knots are placed uniformly, or at certain quantiles
• Software can place the knots automatically when degree of freedom is specified
• Cross-validation can be used to choose the appropriate number of knots
Regression Spline and Polynomial Regression

• Splines introduce flexibility by increasing the number of knots but

keeping the degree fixed.
• Polynomial regression uses higher degree for increasing flexibility
• For splines more knots can be placed where the function varies rapidly
• The extra flexibility in the polynomial produces undesirable results at
the boundaries, while the natural cubic spline still provides a
reasonable fit to the data.
Natural Splines
• A natural spline is a regression spline with additional boundary
constraints: the natural function is required to be linear at the
boundary.
fit <- lm(wage ~ ns(age,
knots=c(25, 40, 60)), data = Wage)
Smoothing splines
• Similar to regression splines but the splines result from minimizing a residual sum of
squares criterion subject to smoothness penalty.
• In fitting a smooth curve we always focus on some function 𝑔(𝑥) such that 𝑅𝑆𝑆 =
σ𝑛𝑖=1(𝑦𝑖 − 𝑔 𝑥𝑖 )2 should be minimum.
• Withoutanyconstraint on𝑔(𝑥), RSS can be made zero simply by choosing 𝑔 such that it
interpolates all of the𝑦𝑖 .
• Require a function 𝑔 that makes RSS small, but is also smooth. A natural approach is to
find the function g that minimizes 𝑛
𝐿 𝑔 𝑥 = ෍(𝑦𝑖 − 𝑔 𝑥𝑖 )2 +𝜆 න 𝑔"(𝑡)2 𝑑𝑡
𝑖=1
• where 𝜆 is a nonnegative tuning parameter. The function 𝑔 that minimizes above
equation is known as a smoothing spline (Loss + Penalty).
• The second derivative of a function indicates its roughness.
• Larger the λ, smoother is 𝑔(𝑥)
Smoothing splines
• The function 𝑔(𝑥)that minimizes 𝐿 𝑔 𝑥 can be shown to have some special
properties
• It is a piecewise cubic polynomial with knots at the unique values of 𝑥1 , . . . , 𝑥𝑛 ,
and continuous first and second derivatives at each knot.
• It is linear in the region outside of the extreme knots.
• In other words, the function g(x) that minimizes 𝐿 𝑔 𝑥 is a natural cubic spline
with knots at each unique observation.
• However, it is a shrunken version of such a natural cubic spline, where the value
of the tuning parameter λ controls the level of shrinkage.
Smoothing Parameter 𝝀
• It is possible to show that as 𝜆 increases from 0 to ∞, the effective degrees of
freedom degrees, which we write 𝑑𝑓𝜆 , decrease from 𝑛 to 2.
• Degrees of freedom refer to the number of free parameters, such as the number of
coefficients fit in a polynomial or cubic spline.
• Effective degrees of freedom is defined to be the sum of the diagonal elements of
the matrix𝑆𝜆 .
ෝ 𝝀 = 𝑺𝝀 𝒚 (Vector of fitted values with smoothing splines using 𝜆)
𝒈
𝑑𝑓𝜆 = σ𝑛𝑖=1 𝑺𝝀 𝑖𝑖 (Effective dof)
• The value of λ that makes the cross-validated RSS as small as possible is the best
value.
Smoothing splines
• For smoothing splines, there will be a knot at each observations.
• The leave-one- out cross-validation error (LOOCV) can be computed very
efficiently for smoothing splines, with essentially the same cost as computing a
single fit, using the following
𝑛
formula: 𝑛 2
−𝑖 2
𝑦𝑖 − 𝑔ො𝜆 (𝑥𝑖 )
𝑅𝑆𝑆𝑐𝑣 𝜆 = ෍(𝑦𝑖 − 𝑔ො𝜆 𝑥𝑖 ) = ෍
1 − 𝑆𝜆 𝑖𝑖
𝑖=1 𝑖=1
−𝑖
• 𝑔ො𝜆 𝑥𝑖 indicates the fitted value for this smoothing spline evaluated at 𝑥𝑖 ,
where the fit uses all of the training observations except for the 𝑖th observation
(𝑥𝑖 , 𝑦𝑖 ). In contrast, 𝑔ො𝜆 𝑥𝑖 indicates the smoothing spline function fit to all of the
training observations and evaluated at𝑥𝑖 .
Fitting Smoothing Splines
Cubic spline

Smoothing spline
With df = 16
Fitting Smoothing Splines
Cubic spline

Smoothing spline
With df = 16
Smoothing spline with
LOOCV, df = 6.79
Local regression
• Local regression is a different approach for fitting flexible non-linear functions,
which involves computing the fit at a target point 𝑥0 using only the regression
nearby training observations.
• Local regression is sometimes referred to as a memory-based procedure, because
like nearest-neighbours, we need all the training data each time we wish to
compute a prediction.
• Choices to be made:
• How to define weighting function K
• Form of regression (linear, cubic, quadratic)
• Span 𝑠 (plays the role of a tuning parameter, smaller 𝑠: more local – higher variance)
Local regression

Blue- generating function 𝑓(𝑥), orange – estimates from local regression

Local Regression Model Fitting
1. Gather the fraction 𝑠 = 𝑘/𝑛of training points whose 𝑥𝑖 are closest to 𝑥0 .
2. Assign a weight 𝐾𝑖0 = 𝐾(𝑥𝑖 , 𝑥0 )to each point in this neighborhood, so that the
point furthest from 𝑥0 has weight zero, and the closest has the highest weight.
All but these 𝑘 nearest neighbors get weight zero.
3. Fit a weighted least squares regression of the 𝑦𝑖 on the 𝑥𝑖 using the above
weights, by finding 𝛽መ0 and 𝛽መ1 that minimize
σ𝑛𝑖=1 𝐾𝑖0 𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 2
4. The fitted value at 𝑥0 is given by 𝑓መ x0 = 𝛽መ0 + 𝛽መ1 𝑥0 .
Local Regression Generalization
• In a setting with multiple features 𝑋1 , 𝑋2 , . . . , 𝑋𝑝 , local regression can be
generalized by fitting a multiple linear regression model that is global in some
variables, but local in another, such as time.
• Local regression also generalizes nicely for a pair of variables 𝑋1 𝑎𝑛𝑑𝑋2 , rather
than one.
• Two-dimensional neighborhoods can be used to fit bivariate linear regression
models using the observations that are near each target point in two-dimensional
space.
• However, local regression can perform poorly if number of predictors is much
larger than about 3 or 4 because there will generally be very few training
observations close to 𝑥0
• This is similar to curse of dimensionality problem discussed in KNN.
Generalized additive models (GAMs)
• GAM allow us to extend the local regression methods to deal with multiple predictors.
• Generalized additive models (GAMs) provide a general framework for extending a
standard linear model by allowing non-linear functions of each of the variables, while
maintaining additivity.
• Just like linear models, GAMs can be applied with both quantitative and qualitative
responses. 𝑝

𝑦𝑖 = 𝛽0 + ෍ 𝑓𝑗 (𝑥𝑖𝑗 ) + 𝜖𝑖
𝑗=1
• This is an example of GAM, where linear component 𝛽𝑗 𝑥𝑖𝑗 in multiple linear regression
model is replaced by smooth non-linear function𝑓𝑗 (𝑥𝑖𝑗 ).
• It is called an additive model because we calculate a separate 𝑓𝑗 for each𝑋𝑗 , and then add
together all of their contributions.
Generalized additive models (GAMs)
• In the regression setting GAM has the form
𝐸 𝑌 𝑋1 , . . , 𝑋𝑝 = 𝛼 + 𝑓1 𝑋1 + 𝑓2 𝑋2 . . +𝑓𝑝 (𝑋𝑝 )
• The 𝑓𝑗 ()s are unspecified, smooth, non-parametric functions
• Each function is fitted using a scatter-plot smoother (e.g. cubic spline, kernel
smoother) and then estimate all p functions simultaneously using an algorithm
• An additive logistic regression model is represented by
𝑃 𝑌=1𝑋
log = 𝛼 + 𝑓1 𝑋1 + 𝑓2 𝑋2 . . +𝑓𝑝 (𝑋𝑝 )
1−𝑃 𝑌 =1 𝑋
• The above model can also be extended to the generalized linear models which
include linear model, logit, probit, gamma, negative-binomial, log-linear models.
• Linear and other parametric forms can be mixed with the nonlinear terms, a
necessity when some of the inputs are qualitative variables (factors).
Generalized additive models (GAMs)
• Additive models can replace linear models, e.g. additive decomposition of time-
series
𝑌𝑡 = 𝑆𝑡 + 𝑇𝑡 + 𝜖𝑡
Where 𝑆𝑡 is seasonal component, 𝑇𝑡 is the trend and 𝜖𝑡 is the error term.
Model fitting with GAMs
• The additive model has the form
𝑝
𝑌 = 𝛼 + σ𝑗=1 𝑓𝑗 (𝑋𝑗 ) + 𝜖
where 𝐸 𝜖 = 0
The penalized sum of square error similar to smoothing spline is applicable for this model
2
𝑛 𝑝 𝑝

𝑃𝑅𝑆𝑆 𝛼, 𝑓1 , . . 𝑓𝑝 = ෍ (𝑦𝑖 − 𝛼 − ෍ 𝑓𝑗 (𝑥𝑖𝑗 )) + ෍ 𝜆𝑗 න 𝑓𝑗 "(𝑡𝑗 )2 𝑑𝑡𝑗

𝑖=1 𝑗=1 𝑗=1
Where 𝜆𝑗 ≥ 0 are the tuning parameters.

• The minimizer of the PRSS is an additive cubic spline model in 𝑋𝑗 , with knots at each of
the unique values of 𝑥𝑖𝑗 , where 𝑖 = 1, . . 𝑁
Backfitting Algorithm
1 𝑁
1. Initialize 𝛼ො = σ𝑖=1 𝑦𝑖 , 𝑓መ𝑗
= 0, ∀𝑖, 𝑗
𝑁
2. For 𝑗 = 1𝑡𝑜𝑝, loop
𝑝 𝑁
𝑓መ𝑗 ← 𝒮𝑗 (𝑦𝑖 − 𝛼 − σ𝑘=1 𝑓መ𝑘 (𝑥𝑖𝑘 )
𝑘≠𝑗 1
1 𝑁
𝑓መ𝑗 ← 𝑓መ𝑗 − σ𝑖=1 𝑓መ𝑗 (𝑥𝑖𝑗 )
𝑁
Until the change in 𝑓መ𝑗 is smaller than some threshold
𝑁
• 𝒮𝑗 is cubic smoothing spline applied to targets (𝑦𝑖 − 𝛼 − σ𝑘≠𝑗 𝑓መ𝑘 (𝑥𝑖𝑘 )
1
to obtain new estimates of 𝑓መ𝑗
Backfitting Algorithm
• Operation of smoother𝒮𝑗 only at the training points can be represented by an N ×
N operator matrix 𝑺𝑗
• Then the degrees of freedom for the jth term are (approximately) computed
• as 𝑑𝑓𝑗 = 𝑡𝑟𝑎𝑐𝑒[𝑺𝑗 ] − 1,
Advantages and Disadvantages
• Advantages:
1) GAMs allow us to fit a non-linear 𝑓𝑗 for each𝑋𝑗 , so that we can automatically
model non-linear relationships that standard linear regression will miss.
2) The non-linear fits can potentially make more accurate predictions for the
response𝑌.
3) Because the model is additive, we can examine the effect of each 𝑋𝑗 on 𝑌
individually while holding all of the other variables fixed.
4) The smoothness of the function 𝑓𝑗 for the variable 𝑋𝑗 can be summarized via
degrees of freedom.

Disadvantages:
1) The model is restricted to be additive. With many variables, important
interactions can be missed. For removing that manually interactions are added using
linear regression, local regression techniques.
Regression with GAM
require(gam)
gam1 <- gam(wage ~ s(age, df = 4)+ s(year, df = 4) + education, data = Wage)
par(mfrow = c(1,3))
plot(gam1, se = T)
Logit with GAM
gam2 <- gam(I(wage>250) ~ s(age, df = 4)+ s(year, df = 4) + education,
data = Wage, family = binomial)
par(mfrow = c(1,3))
plot(gam2)
Kernel Density Estimation
• The Kernel Density Estimation is a mathematic process of finding an estimate probability
density function of a random variable.
• It is a non-parametric method to estimate the probability density function of a random
variable based on kernels as weights.
• Let (𝑥1, 𝑥2, … , 𝑥𝑛)be independent and identically distributed samples drawn from some
univariate distribution with an unknown density𝑓at
𝑛
any given point x.
1
መ
𝑓𝜆 𝑥 = ෍ 𝐾𝜆 𝑥 − 𝑥𝑖
𝑛
𝑖=1
𝐾 is the kernel — a non-negative function — and 𝜆 > 0 is a smoothing parameter called
the bandwidth.
• A range of kernel functions are commonly used: uniform, triangular, biweight, triweight,
Epanechnikov, normal, and others.
• The Kernel Density Estimation works by plotting out the data and beginning to create a
curve of the distribution.
Bandwidth Selection
• The most common optimality criterion used to
select bandwidth is termed as mean integrated
squared error.
• In each of the kernels 𝐾𝜆 , 𝜆 is a parameter that
controls its width:
• For the Epanechnikov or tri-cube kernel with
metric width, ℎ is the radius of the support region.
• For the Gaussian kernel,𝜆 is the standard deviation.
• ℎ is the number k of nearest neighbors in k-nearest
neighborhoods, often expressed as a fraction or
span k/N of the total training sample. Figure: Kernel density estimation with different
bandwidth. Red: KDE with 𝜆 =0.05, Black:
KDE with 𝜆=0.337, Green: KDE with 𝜆=2,
Grey curve is normal density with mean o and
variance 1 source: wikipedia
Kernel Smoothing
KNN and Kernel Smoothing
• KNN average is computed as
𝑓መ 𝑥 = 𝐴𝑣𝑒(𝑦𝑖 |𝑥𝑖 ∈ 𝑁𝑘 (𝑥))
• Here 𝑁𝑘 (𝑥) is the set of k points nearest to x in squared distance
• Moving 𝑥0 from left to right, the KNN remains constant, until a point
𝑥𝑖 to the right of 𝑥0 becomes closer than the furthest point 𝑥𝑖 ′ in the
neighborhood to the left of 𝑥0 , at which time 𝑥𝑖 replaces 𝑥𝑖 ′ .
• This leads to discontinuous 𝑓መ 𝑥
• Alternatively, assign weights that die off smoothly with distance from
the target point
Kernel Smoothing
• Nadaraya Watson Kernel weighted average:
σ𝑁 𝑖=1 𝐾𝜆 𝑥0 , 𝑥𝑖 𝑦𝑖
መ
𝑓 𝑥0 = 𝑁
σ𝑖=1 𝐾𝜆 𝑥0 , 𝑥𝑖
• Epanechnikov quadratic kernel:

𝑥−𝑥0
𝐾𝜆 𝑥0 , 𝑥 = 𝐷
𝜆
3
Where 𝐷 𝑡 = (1 − 𝑡 2 ) 𝑡 ≤1
4
=0 otherwise
Kernel Smoothing
𝑥 − 𝑥0
𝐾𝜆 𝑥0 , 𝑥 = 𝐷
𝜆
𝜆 represents width, larger the value of 𝜆 the smoother is the kernel

• Adaptive width function can also be used

𝑥−𝑥0
• 𝐾𝜆 𝑥0 , 𝑥 = 𝐷
ℎ𝜆 (𝑥0 )
Other Popular Kernels
Tri-cubic kernel 𝐷 𝑡 = 1 − 𝑡 3 3 𝑡 ≤1
=0 otherwise

• Gaussian kernel 𝐷 𝑡 = 𝜙(𝑡)

Standard deviation plays the role of the window size
Local Linear Regression (revisit)
Local Linear Regression
• Locally weighted averages can be badly biased on the boundaries of
the domain because of the asymmetry of the kernel in that region.
• By fitting straight lines rather than constants locally, we can remove
this bias exactly to first order
• Locally weighted regression solves a separate weighted least squares
problem at each target point 𝑥0 :
min σ𝑁 𝐾 𝑥
𝑖=1 𝜆 0 𝑖 , 𝑥 𝑦𝑖 − 𝛼 𝑥0 − 𝛽 𝑥0 𝑖𝑥 2
𝛼 𝑥0 ,𝛽 𝑥0
Estimate at 𝑥0 : 𝑓መ 𝑥0 = 𝛼ො 𝑥0 + 𝛽መ 𝑥0 𝑥𝑖
THANK YOU

Smart City Ppt
No ratings yet
Smart City Ppt
37 pages
Fitting
No ratings yet
Fitting
32 pages
L5 Spline Regression
No ratings yet
L5 Spline Regression
74 pages
Standby Kernel Log 2024 0729 002059
No ratings yet
Standby Kernel Log 2024 0729 002059
308 pages
A20 Topside - Docking Guide Analysis-AB-ELT-AN-WP20-8815-D0
No ratings yet
A20 Topside - Docking Guide Analysis-AB-ELT-AN-WP20-8815-D0
104 pages
Lecture02 95791
No ratings yet
Lecture02 95791
94 pages
Ashok Project Main 2 (1)
No ratings yet
Ashok Project Main 2 (1)
69 pages
Non Linear Regression Models
No ratings yet
Non Linear Regression Models
50 pages
Lec 08_Polynomial Regression
No ratings yet
Lec 08_Polynomial Regression
56 pages
ICBSN25
No ratings yet
ICBSN25
4 pages
Chapter5-2013-Diffusion
No ratings yet
Chapter5-2013-Diffusion
49 pages
non-par-regression
No ratings yet
non-par-regression
35 pages
Lecture_23-24-25-26
No ratings yet
Lecture_23-24-25-26
36 pages
Flexible Regression - Lecture 6: Marnie Mclean Room 344 Mathematics and Statistics Building
No ratings yet
Flexible Regression - Lecture 6: Marnie Mclean Room 344 Mathematics and Statistics Building
27 pages
L11_ML
No ratings yet
L11_ML
27 pages
15 Splines
No ratings yet
15 Splines
51 pages
Vision in Elementary Mathematics W. W. Sawyer - Download the full ebook version right now
No ratings yet
Vision in Elementary Mathematics W. W. Sawyer - Download the full ebook version right now
48 pages
qt95f7x3hb Nosplash
No ratings yet
qt95f7x3hb Nosplash
26 pages
The_Hills
No ratings yet
The_Hills
39 pages
Kenwood rxd-803 803e 853 853e A83
No ratings yet
Kenwood rxd-803 803e 853 853e A83
35 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Machine Learning Class Slide
No ratings yet
Machine Learning Class Slide
44 pages
numerical methods (Answer key)
No ratings yet
numerical methods (Answer key)
12 pages
motulsky1987
No ratings yet
motulsky1987
10 pages
slides4-mrbm2324
No ratings yet
slides4-mrbm2324
40 pages
06basis
No ratings yet
06basis
9 pages
Recipe Club 2024 - 01 - JANUARY
No ratings yet
Recipe Club 2024 - 01 - JANUARY
28 pages
Polynomial Regression
No ratings yet
Polynomial Regression
16 pages
7 Nonlinear
No ratings yet
7 Nonlinear
48 pages
SMSP
No ratings yet
SMSP
39 pages
An Introduction To Splines: James H. Steiger
No ratings yet
An Introduction To Splines: James H. Steiger
23 pages
Lecture 8
No ratings yet
Lecture 8
24 pages
Chapter3 Annotated Almostwholething
No ratings yet
Chapter3 Annotated Almostwholething
26 pages
Aero Supplies - Catalogue CHAPTER 6 (505-595) PDF
No ratings yet
Aero Supplies - Catalogue CHAPTER 6 (505-595) PDF
91 pages
Cubic_spline
No ratings yet
Cubic_spline
5 pages
wahba_improper_priors
No ratings yet
wahba_improper_priors
9 pages
Structural Organization in Animals : DPP 06 (of Lec-07) || Zoology by Samapti Sinha Ma'Am
No ratings yet
Structural Organization in Animals : DPP 06 (of Lec-07) || Zoology by Samapti Sinha Ma'Am
4 pages
Polynomial Regression
No ratings yet
Polynomial Regression
15 pages
Chapter 7 - Handsout Machine Learning
No ratings yet
Chapter 7 - Handsout Machine Learning
18 pages
Appendix Nonparametric Regression
No ratings yet
Appendix Nonparametric Regression
17 pages
Chapter 7 Polynomial Regression Models: Ray-Bing Chen Institute of Statistics National University of Kaohsiung
No ratings yet
Chapter 7 Polynomial Regression Models: Ray-Bing Chen Institute of Statistics National University of Kaohsiung
69 pages
IIT JEE 2011 PAPER 1 Key With Solutions
No ratings yet
IIT JEE 2011 PAPER 1 Key With Solutions
33 pages
Isotonic Smoothing Spline Regression
No ratings yet
Isotonic Smoothing Spline Regression
18 pages
Regression Interpolation
No ratings yet
Regression Interpolation
34 pages
advanced_regression_models
No ratings yet
advanced_regression_models
5 pages
Penalized regression
No ratings yet
Penalized regression
6 pages
UNFCCC Daily Program 3
No ratings yet
UNFCCC Daily Program 3
13 pages
Mars PDF
No ratings yet
Mars PDF
15 pages
Radial Basis Function Networks: Yousef Akhlaghi
No ratings yet
Radial Basis Function Networks: Yousef Akhlaghi
28 pages
Introduction To Computer
No ratings yet
Introduction To Computer
31 pages
Spline Models: - Introduction To CS and NCS - Regression Splines - Smoothing Splines
No ratings yet
Spline Models: - Introduction To CS and NCS - Regression Splines - Smoothing Splines
24 pages
Basis Approaches
No ratings yet
Basis Approaches
9 pages
Statistical Methods For Bioinformatics Lecture 5
No ratings yet
Statistical Methods For Bioinformatics Lecture 5
48 pages
CP 2
No ratings yet
CP 2
2 pages
Nonlinear Regression
No ratings yet
Nonlinear Regression
8 pages
JEMStar Manual
100% (1)
JEMStar Manual
207 pages
Shop Supplies and Tools Catalog 2007 1
No ratings yet
Shop Supplies and Tools Catalog 2007 1
31 pages
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
No ratings yet
Elements of Statistical Learning II - Ch.6 Kernel Smoothing Methods - Notes
5 pages
Uni-2 22 12 12
No ratings yet
Uni-2 22 12 12
7 pages
C42GM 1645092199975
No ratings yet
C42GM 1645092199975
1 page
Part 5b: Interpolation
No ratings yet
Part 5b: Interpolation
18 pages
Matlab 3
No ratings yet
Matlab 3
42 pages
Estimating Penalized Spline Regressions: Theory and Application To Economics
No ratings yet
Estimating Penalized Spline Regressions: Theory and Application To Economics
16 pages
Lecture 16: Polynomial and Categorical Regression 1 Review
No ratings yet
Lecture 16: Polynomial and Categorical Regression 1 Review
10 pages
ANSI-IEEE STD C57.117-1986 (IEEE Guide For Reporting Failure Data For Power Transformers and Shunt Reactors On Electric Utility Power Systems)
No ratings yet
ANSI-IEEE STD C57.117-1986 (IEEE Guide For Reporting Failure Data For Power Transformers and Shunt Reactors On Electric Utility Power Systems)
29 pages
Ruskin Tubular Dissipative Silencer CN-C-403
No ratings yet
Ruskin Tubular Dissipative Silencer CN-C-403
2 pages
SDV
No ratings yet
SDV
82 pages
class 10 statistics PQ sheet 4
No ratings yet
class 10 statistics PQ sheet 4
2 pages
Stress and Stability of Coal Ribs and Pillars
No ratings yet
Stress and Stability of Coal Ribs and Pillars
12 pages
ISYE 8803 - Kamran - M1 - Intro To HD and Functional Data - Updated
No ratings yet
ISYE 8803 - Kamran - M1 - Intro To HD and Functional Data - Updated
87 pages
Stats216 hw3 PDF
No ratings yet
Stats216 hw3 PDF
26 pages
Annida Salsabila - 1700615
No ratings yet
Annida Salsabila - 1700615
4 pages
Lecture 19
No ratings yet
Lecture 19
4 pages
ECEG2102 CM Notes - Ch4
No ratings yet
ECEG2102 CM Notes - Ch4
15 pages
Spline and Penalized Regression
No ratings yet
Spline and Penalized Regression
45 pages
Splines 2014
No ratings yet
Splines 2014
7 pages
10 Matlab Fitting
No ratings yet
10 Matlab Fitting
36 pages
Travel Brochure: Philippines at Luzon Area
No ratings yet
Travel Brochure: Philippines at Luzon Area
6 pages
16-Splines and Piecewise Interpolation
No ratings yet
16-Splines and Piecewise Interpolation
17 pages
Smoothspline PDF
No ratings yet
Smoothspline PDF
4 pages
Chapter 5: Basis Functions and Regularization
No ratings yet
Chapter 5: Basis Functions and Regularization
4 pages
Spline
No ratings yet
Spline
13 pages
Lecture19 PDF
No ratings yet
Lecture19 PDF
4 pages
ADEC - Lab 3
No ratings yet
ADEC - Lab 3
14 pages
Cause of Covid-19: Transmission
No ratings yet
Cause of Covid-19: Transmission
2 pages
Polynomial and Spline Interpolation: A Chemical Reaction
No ratings yet
Polynomial and Spline Interpolation: A Chemical Reaction
4 pages
Connectors in FPD
94% (16)
Connectors in FPD
64 pages
7 Personal Recount
No ratings yet
7 Personal Recount
5 pages
1st Periodic Test - Science 10
100% (3)
1st Periodic Test - Science 10
3 pages
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Module08 PolynomialRegressionSplineGAMs

Uploaded by

Module08 PolynomialRegressionSplineGAMs

Uploaded by

Polynomial Regression, Step Functions, Basis

Functions, Splines, GAMS

VIF improved after

The confidence bands are first

Spline with continuous 2nd derivatives

• Splines introduce flexibility by increasing the number of knots but

Blue- generating function 𝑓(𝑥), orange – estimates from local regression

𝑃𝑅𝑆𝑆 𝛼, 𝑓1 , . . 𝑓𝑝 = ෍ (𝑦𝑖 − 𝛼 − ෍ 𝑓𝑗 (𝑥𝑖𝑗 )) + ෍ 𝜆𝑗 න 𝑓𝑗 "(𝑡𝑗 )2 𝑑𝑡𝑗

• Adaptive width function can also be used

• Gaussian kernel 𝐷 𝑡 = 𝜙(𝑡)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Module08 PolynomialRegressionSplineGAMs

Uploaded by

Module08 PolynomialRegressionSplineGAMs

Uploaded by

Polynomial Regression, Step Functions, Basis

Functions, Splines, GAMS

VIF improved after

The confidence bands are first

Spline with continuous 2nd derivatives

• Splines introduce flexibility by increasing the number of knots but

Blue- generating function 𝑓(𝑥), orange – estimates from local regression

𝑃𝑅𝑆𝑆 𝛼, 𝑓1 , . . 𝑓𝑝 = ෍ (𝑦𝑖 − 𝛼 − ෍ 𝑓𝑗 (𝑥𝑖𝑗 )) + ​ ෍ 𝜆𝑗 න 𝑓𝑗 "(𝑡𝑗 )2 𝑑𝑡𝑗

• Adaptive width function can also be used

• Gaussian kernel 𝐷 𝑡 = 𝜙(𝑡)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

𝑃𝑅𝑆𝑆 𝛼, 𝑓1 , . . 𝑓𝑝 = ෍ (𝑦𝑖 − 𝛼 − ෍ 𝑓𝑗 (𝑥𝑖𝑗 )) + ෍ 𝜆𝑗 න 𝑓𝑗 "(𝑡𝑗 )2 𝑑𝑡𝑗