0% found this document useful (0 votes)
23 views74 pages

L5 Spline Regression

Uploaded by

yuebb2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views74 pages

L5 Spline Regression

Uploaded by

yuebb2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.

6 Ot

Spline Regression

MAST90083 Computational Statistics and Data Mining

Karim Seghouane
School of Mathematics & Statistics
The University of Melbourne

Spline Regression 1/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Outline

§3.1 Introduction

§3.2 Motivation

§3.3 Spline

§3.4 Penalized Spline Regression

§3.5 Linear Smoothers

§3.6 Other Basis

Spline Regression 2/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Introduction

I Some data sets are hard or impossible to model using


traditional parametric techniques
I Many data sets also involve nonlinear effects that are difficult
to model parametrically
I There is a need for flexible techniques to handle complicated
nonlinear relationships
I Here we look at some ways of freeing oneself of the
restrictions of parametric regression models

Spline Regression 3/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Introduction
The interest is the discovery of the underlying trend in the observed
data which are treated as a collection of points on the plane

Spline Regression 4/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Introduction

I Alternatively, we could think of the vertical axis as a


realization of a random variable y conditional on the variable x
I The underlying trend would then be a function

f (x) = E (y |x)
I This can also be written as

yi = f (xi ) + i , E (i ) = 0
I and the problem is referred as nonparametric regression

Spline Regression 5/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Introduction

I Aim Estimate the unspecified smooth function from the pairs


(xi , yi ) , i = 1, ..., n.
I x here will be considered univariate
I There are several available methods, here we focus first on
penalized splines
I It is an extension of linear regression modeling

Spline Regression 6/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Motivation

I Let’s start with the straight line regression model

yi = β0 + β1 xi + i

Spline Regression 7/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Motivation

I The corresponding basis for this model are the functions: 1


and x

I The model is a linear combination of these functions which is


the reason for use of the world basis

Spline Regression 8/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Motivation

I The basis functions correspond to the columns of X for fitting


the regression
 
1 x1
X =  ... ... 
 

1 xn
I The vector of fitted values
 −1
ŷ = X X > X X > y = Hy

Spline Regression 9/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Motivation

I The quadratic model is a simple extension of the linear model

yi = β0 + β1 xi + β2 xi2 + i

Spline Regression 10/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Motivation

I There is an extra basis function x 2 corresponding to the


addition of the β2 xi2 term to the model

I The quadratic model is an example of how the simple linear


model might be extended to handle nonlinear structure

Spline Regression 11/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Motivation

I The basis functions correspond to the columns of X for fitting


the regression in the case of a quadratic model is given by

1 x1 x12
 

X =  ... ... .. 

.
1 xn xn2
I The vector of fitted values
 −1
ŷ = X X > X X > y = Hy

Spline Regression 12/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Spline basis function

I We know look at how the model can be extended to


accommodate a different type of nonlinear structure

I Broken line model: it consists of two differently sloped lines


that join together

Spline Regression 13/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Spline basis function

I Broken line: A linear combination of three basis functions

I where we have (x − 0.6)+ with


(
u u>0
u+ =
0 u≤0

Spline Regression 14/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Spline basis function

I Broken line model is

yi = β0 + β1 xi + β11 (xi − 0.6)+ + i


I which can be fit using the least square estimator with
 
1 x1 (x1 − 0.6)+
 .. .. ..
X = .

. . 
1 xn (xn − 0.6)+

Spline Regression 15/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Spline basis function

I Assume a more complicated structure

I Straight line structure in the left-hand half but the right-hand


is prone to a high amount of detailed structure (whip model)

Spline Regression 16/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Spline basis function

I If we have good reason to believe that our underlying


structure is of this basic, we could change the basis ?

I where the functions: (x − 0.5)+ , (x − 0.55)+ ,...,(x − 0.95)+

Spline Regression 17/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Spline basis function

I The basis can do a reasonable job with a linear portion


between x = 0 and x = 0.5
I We can use least square to fit such model with

 
1 x1 (x1 − 0.5)+ (x1 − 0.55)+ . . . (x1 − 0.95)+
X =  ... ... .. .. .. ..
 
. . . . 
1 xn (xn − 0.5)+ (xn − 0.55)+ . . . (xn − 0.95)+

Spline Regression 18/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Spline basis function

I It is possible to handle any complex type of structure by


simply adding functions of the form (x − k)+ to the basis
I This is equivalent to adding a column of values to the X
matrix
I The value k corresponding to the function (x − k)+ is
referred to as a knot
I This is because the function is made up of two lines that are
tied together at x = k

Spline Regression 19/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Spline basis function

I The function (x − 0.6)+ is called a linear spline basis function


I A set of such functions is called a linear spline basis
I Any linear combination of linear spline basis functions 1, x,
(xi − k1 )+ ,...,(xi − kK )+ is a piecewise linear function with
knots k1 , k2 ,...,kK and called spline

Spline Regression 20/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Spline basis function

I Rather than referring to the spline basis function (x − k)+ it


is common to simply refer to it knots k
I We say the model has a knot at 0.35 it the function
(x − 0.35)+ is in the basis
I The spline model for a function f is

K
X
f (x) = β0 + β1 x + βki (x − ki )+
i=1

Spline Regression 21/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Illustration
I The selection of a good basis is usually challenging
I Start by trying to choose knots by trial (at range 575 and 600)

Spline Regression 22/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Illustration
I The fit lacking in quality for low values of range
I An obvious remedy is to use more knots (at range 500, 550,
600 and 650)

Spline Regression 23/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Illustration
I Larger set of knots (at every 12.5), the fitting procedure has
much more flexibility
I The plots is heavily overfitted

Spline Regression 24/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Illustration
I Pruning the knots (at 612.5, 650, 662.5 and 687.5) to
overcome the overfitting issue
I This fits the data well without overfitting
I Obtained, after a lot of time consuming trial and error

Spline Regression 25/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Knot selection

I A natural attempt at automatic selection of the knots is to


use a model selection criterion
I If there are K candidate knots then there are 2K possible
models assuming the overall intercept and linear term are
always present
I Highly computational intensive

Spline Regression 26/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Penalized spline regression

I Too many knots in the model induces roughness of the fit


I An alternative approach: retain all the knots but constrain
their influence
I Hope: this will result in a less variable fit
I Consider a general spline model with K knots, K large

Spline Regression 27/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Penalized spline regression

I The ordinary least square fit is written as

ŷ = X β̂ where β̂ minimizes ky − X βk2


I and β = [β0 , β1 , β11 , ..., β1K ] with β1k the coefficient of the
k th knot.
I Unconstrained estimation of the β leads to a wiggly fit

Spline Regression 28/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Penalized spline regression

Constraints on the β1k that might help avoid this situation are

I max |β1k | < C


I
P
|β1k | < C
P 2
I β1k < C

With an appropriate choice of C each of these will lead to a


smoother fit, however the last constraint is much simpler to
implement

Spline Regression 29/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Penalized spline regression

Define the matrix D if size (K + 2) × (K + 2)


 
0 0 0 0 0 ... 0
0 0 0 0 0 . . . 0
  
0 0 1 0 0 . . . 0

02×2 02×K
D = 0 =
 
 0 0 1 0 . . . 0 
0K ×2 IK ×K
 .. .. .. .. .. . . .. 
. . . . . . .
0 0 0 0 0 ... 1

Spline Regression 30/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Penalized spline regression

I The third constraint is easier to implement than the first two


I The minimization problem

Minimize ky − X βk2 subject to β > Dβ ≤ C


I This is equivalent to choosing β to minimize

ky − X βk2 + λβ > Dβ
I for λ ≥ 0 and has solution
 −1
β̂ λ = X > X + λD X >y

Spline Regression 31/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Penalized spline regression

I The term λβ > Dβ is called a roughness penalty since it


penalizes fits that are too rough, thus yielding smoother result
I The amount of smoothness is controlled by λ, which is
therefore referred to as a smoothing parameter
I The fitted values for penalized spline regression are
 −1
ŷ = X X > X + λD X >y

Spline Regression 32/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Illustration
I Linear penalized spline regression fits for different values of
the smoothing parameter (depends on λ)

Spline Regression 33/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Quadratic spline bases

I We have discussed linear splines, that is continuous, piecewise


linear functions
I The reason for the piecewise linear nature of the functions ?

Spline Regression 34/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Quadratic spline bases

I We have discussed linear splines, that is continuous, piecewise


linear functions
I The reason for the piecewise linear nature of the functions ?
I is that they are a linear combination of piecewise linear
functions of the form (x − k)+
I A simple way of escaping from piecewise linearity ?

Spline Regression 35/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Quadratic spline bases

I We have discussed linear splines, that is continuous, piecewise


linear functions
I The reason for the piecewise linear nature of the functions ?
I is that they are a linear combination of piecewise linear
functions of the form (x − k)+
I A simple way of escaping from piecewise linearity ?
I is to add x 2 to the basis and also to replace each (x − k)+ by
its square (x − k)2+

Spline Regression 36/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Illustration of a quadratic spline basis function

I Illustration of the function (x − 0.6)2+

Spline Regression 37/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Quadratic spline bases

I The function doesn’t have a sharp corner like (x − 0.6)+ does


I The function (x − 0.6)2+ has a continuous first derivative
I Any linear combination of the functions

1, x, x 2 , (x − k1 )2+ , ..., (x − kK )2+


I also have a continuous first derivative and not have any sharp
corner
I This result in better fit
I This is called a quadratic spline basis with knots at k1 , ..., kK

Spline Regression 38/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Illustration of quadratic spline basis functions

I Quadratic spline do a better job of fitting peaks and valleys

Spline Regression 39/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Other spline bases


I We discussed linear and quadratic spline models
I One reason for considering other models is to achieve
smoother fits → important if one plans to differentiate the fit
to estimate derivative of the regression function
I In principle a change of basis does not change the fit but
some bases are more stable and allow computation of a fit
with better accuracy
I Besides numerical stability: ease of implementation is another
reason for selecting one basis over another
I An obvious generalization is given by

1, x, ..., x p , (x − k1 )p+ , ..., (x − kK )p+


I know as the truncated power basis of degree p
Spline Regression 40/74
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Other spline bases

I since the function (x − k)p+ has p − 1 continuous derivatives,


higher values of p lead to smoother spline functions
I The p th degree spline model is

K
X
f (x) = β0 + β1 x + ... + βp x p + βki (x − ki )p+
i=1
I The expression for the fitted values is given by
 −1
ŷ = X X > X + λD X >y

D = diag (0p+1 , 1K )

Spline Regression 41/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

B-Spline bases

Truncated power bases can be used in practice


I if the knots are selected carefully or
I a penalized fit is used

Truncated power bases have the practical disadvantage that they


are far from orthogonal
I this lead to numerical instability
I particularly when there is a large number of knots (λ is small
or zero)

Spline Regression 42/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

B-Spline bases

I In practice, especially for OLS fitting, it is advisable to work


with equivalent bases with more stable numerical properties.
I The most common choice is the B-spline basis

Spline Regression 43/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

B-Spline bases
B-spline bases of degree 1, 2 and 3 for the case of seven irregularly
spaced knots

Spline Regression 44/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

B-Spline bases

I Each of these are equivalent to the truncated power basis of


the same degree
I In regression, this means using B-spline for the columns of X
or truncated power basis of similar degree produce identical
fits (knots at the same locations).

Spline Regression 45/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

B-Spline bases
Mathematically, this equivalence is quantified as follows
I Let XT be the X matrix with columns obtained with a power
truncated basis and
I let XB be the X matrix corresponding to the B-spline basis of
the same degree and same knots locations then

XB = XT Lp where Lp is square invertible matrix


The penalized spline fit of degree p in terms of B−spline
 −1
ŷ = XB XB> XB + λL>
p DLp XB> y

→ basis used in regression packages


Note 1
Spline Regression 46/74
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Natural Cubic Spline


I Nature cubic spline is a modification of cubic spline that adds
a linearity constraint beyond the boundary knots (0 and 1)
I The other knots are called interior knots
I The linearity is enforced through the constraints that the
spline f satisfy f 00 = f 000 = 0 at the boundary knots

Spline Regression 47/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Cubic smoothing spline

I Spline basis method that avoids the knot selection issue by


using a maximal set of knots (or n knots)
I Among all functions f (x) with two continuous derivatives,
select fˆ (x) that minimizes
n Z
2
X 2
f 00 (x)

{yi − f (xi )} + λ dx
i=1
I The regularization controls the complexity of the fit by
penalizing the curvature of the function f
I The minimizer of this penalized sum of squares is a natural
cubic spline with knots at the xi

Spline Regression 48/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Cubic smoothing spline

The smoothing parameter λ controls the tradoff between closeness


to the data and complexity and there are two special cases
I λ = 0 : f can be any function that interpolates the data (very
rough)
I λ = ∞ : least square line fit (since no second derivative can
be tolerated)
I The function is over-parametrized since there are n knots
which implies n degrees of freedom
I The penalty term translates to a penalty on the spline
coefficients which are shrunk toward the linear fit

Spline Regression 49/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Cubic smoothing spline

Since the solution is a natural spline, it can take the form


n
X
f (x) = Bi (x) βi
i=1

Bj (x) are an n−dimensional set of basis functions for representing


this family of natural spline
With this representation, the criterion for smoothing spline reduces

RSS (β, λ) = (y − Bβ)> (y − Bβ) + λβ > Ωβ


where Bij = Bi (xj ) and {Ω}lm = Bl00 (x)Bm
00 (x)dx
R

Spline Regression 50/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Cubic smoothing spline


The fitted smoothing spline is given by
I The solution is
 −1
β̂ = B> B + λΩ B> y
I The fitted smoothing spline is given by
n
X
fˆ (x) = Bi (x) β̂i
i=1
I Efficient computation in O(n) operations can be realized using
a Cholesky decomposition
 
B> B + λΩ β = B> y

Spline Regression 51/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

General form of penalized spline

The general definition of penalized spline is B(x)β and


n
X
β̂ = arg min β [yi − B (xi ) β]2 + λβ > Dβ
i=1

where D is symmetric positive semidefinite and λ > 0


I In case of spline basis D = diag (0p+1 , 1K )
I In smoothing splines D defines the penalty

Spline Regression 52/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

General form of penalized spline

When applying splines, there are two basic choices to make


I The spline model: the degree and knot locations
I The penalty: the form of the penalty

Once the choices have been made, there follow two secondary
choices
I The basis functions: truncate power functions or B-splines
I The basis functions used in the computations

Spline Regression 53/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Linear smoothers

Penalized spline is a linear function of the data y


 −1
ŷ = Sλ y with Sλ = X X > X + λD X>

I where X corresponds for example to the p th degree truncated


spline basis
I Sλ is usually called the smoother matrix

In general

ŷ = Ly
where L is an n × n matrix that doesn’t depend on y directly (but
does through λ). This is also called linear smoother.
Spline Regression 54/74
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Error of the smoothers


Let fˆ be an estimator of f obtained from

yi = f (xi ) + i
An important quantity of interest is the error incurred by an
estimator with respect to a given target. The most common
measure of error is the mean square error MSE
n o n o2 
ˆ ˆ
MSE f (x) = E f (x) − f (x)

which has the advantage of admitting the decomposition


n o h n o i2 n o
MSE fˆ (x) = E fˆ(x) − f (x) + var fˆ(x)

which represents the squared bias and variance of the error.


Spline Regression 55/74
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Error of the smoothers

I The entire curve is of interest → so it is common to measure


the error globally across several values of x
I Mean integrated squared error (MISE) is a possibility
n o Z n o
MISE fˆ(.) = MSE fˆ (x) dx
χ
I when only error at the observations are considered

n o n n
X o2
MSSE fˆ(.) = E fˆ (xi ) − f (xi )
i=1

Spline Regression 56/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Error of the smoothers


h i>
I Let f̂ = fˆ (x1 ) , ..., fˆ (xn ) denotes the vector of fitted values
and
I let f = [f (x1 ) , ..., f (xn )] denotes the vector of unknown values

MSSE f̂ = E kf̂ − fk2

I For linear smoother f̂ = Ly

 X n  2 n o
MSSE f̂ = E fˆ(xi ) − f (xi ) + var fˆ (xi )
i=1
  
MSSE f̂ = k (L − I ) fk2 + σ2 tr LL>

Note 2
Spline Regression 57/74
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Error of the smoothers

I The bias is given by


 
Bias f̂ = f − E f̂ = f − Lf
I The covariance

cov f̂ = Lcov (y) L> = σ2 LL>
I The diagonal contains the pointwise variances at the xi

Spline Regression 58/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Degrees of freedom of a smoother


I For penalized spline
 −1 
> >
dffit = tr X X + λD X X = tr (Sλ )

I For K knots and degree p

tr (S0 ) = p + 1 + K
I At the other extreme

tr (Sλ ) → p + 1 as λ → ∞
I So for λ > 0

p + 1 < df < p + 1 + K
Spline Regression 59/74
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Degrees of freedom of a smoother


Different values lead to similar appearance. They have roughly the
same degree of freedom

Spline Regression 60/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Cross validation

I The most common measure for the goodness of fit of a


regression curve
n
1 X 1
RSS = (yi − ŷi )2 = ky − ŷk2
N N
i=1
I It is minimized for λ = 0 for which ŷi = yi , 1 < i ≤ n
I Solution close to interpolation

Spline Regression 61/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Cross validation

I Cross-validation allows the estimation of λ when fˆ (x, λ) is


used as a nonparametric regression at x
n
1 X 2
RSS (λ) = yi − fˆ−i (xi , λ)
N
i=1
I The cross-validation criterion is

n n
!2
1 X ˆ
2 1 X yi − fˆλ (xi )
CV (λ) = yi − f−i (xi , λ) =
N N 1 − Sλ (i, i)
i=1 i=1

I The choice of λ is the one that minimizes CV (λ) over λ ≥ 0

Spline Regression 62/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Cross validation

A computationally efficient variant can be obtained using a


simplified version where Sλ (i, i) are replaced by their average
n
1X 1
Sλ (i, i) = tr (Sλ )
n n
i=1

This leads to the generalization cross validation

n 
1 X [(I − Sλ ) y]i 2

RSS(λ)
GCV(λ) = =
N 1 − n−1 tr(Sλ ) (1 − n−1 tr(Sλ ))2
i=1

Spline Regression 63/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Selection with a criterion

Our interest
  
MSSE f̂ = k (L − I ) fk2 + σ2 tr LL>

where Sλ = L, however

E (RSS) = E kf̂ − yk2 = MSSE f̂ + σ2 (n − 2dffit )

where dffit = tr (Sλ ) = tr (L).


Note 3

Spline Regression 64/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Selection with a criterion

It follows that if σ̂2 is an unbiased estimate of σ2 then

IC = RSS + 2σ̂2 dffit


is an unbiased estimator of

MSSE f̂ + nσ2

but nσ2 doesn’t depend on Sλ → then minimizing


 IC is
approximately similar to minimizing MSSE f̂

Spline Regression 65/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Selection with a criterion

For penalized splines this leads to

Cp (λ) = RSS(λ) + 2σ̂2 dffit (λ)


for selecting λ. We represent λ̂Cp the smoothing parameter
obtained by minimizing Cp (λ).
As estimate of σ̂2 we take

RSS (λ)
σ̂2 =
dfres (λ)
Note 4

Spline Regression 66/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Selection with a criterion

GCV can approximately take a different form

RSS(λ)
GCV (λ) =
(1 − n−1 tr(Sλ ))2

= RSS(λ) + 2σ̂2 (λ) dffit


The main difference is that GCV estimates σ2 using RSS(λ) where
as Cp (λ) requires a prior estimate of σ2 .

Spline Regression 67/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Selection with a criterion

This can be extended to other selection criteria for example


2
AIC (λ) = log (RSS(λ)) + dffit
n

Spline Regression 68/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Other basis

Assume f is defined on the unit interval, under some regularity


conditions, f admits a Fourier series representation

X  s
f (x) = β0 + βj sin(jπx) + βjc cos(jπx)
j=1

For higher values of j, the functions sin(jπx) and cos(jπx) become


more oscillatory → account for the finer structure in f

Spline Regression 69/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Other basis

For smoother f , the corresponding coefficients are small


J n
X o
fˆ(x) = β̂0 + β̂js sin(jπx) + β̂jc cos(jπx)
j=1

β̂js , β̂jc , (1 ≤ j ≤ J) and β̂0 are obtained by least squares.


The values of J is the smoothing parameter in this case.

Spline Regression 70/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Radial Basis functions

An extension of the truncated power functions

1, x, ..., x p , |x − k1 |p , ..., |x − kK |p
where

|x − ki |p = r (|x − ki |) where r (u) = u p


This shows that this basis |x − ki |p (1 ≤ i ≤ K ) depends only on
the distance |x − ki | and the univariate function r

Spline Regression 71/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Radial Basis functions

Extension to multivariate cases x ∈ Rd and k1 , ..., kK ∈ Rd is


straightforward

r (kx − ki k)

I where kvk = v> v is the vector length
I These functions are radially symmetric
I They are called radial basis functions

Spline Regression 72/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Cubic approximation

A cubic smoothing spline approximation can be written as


n
X
fˆ(x) = β̂0 + β̂1 x + β̂1j |x − xj |3
j=1

where β̂0 , β̂1 , β̂11 , ..., β̂1n minimize

ky − X0 β 0 − X1 β 1 k2 + λβ >
1 K β1

X0> β 1 = 0
where β 0 = [β0 , β1 ]> , β 1 = [β11 , ..., β1n ]> , X0 = [1, xi ]1≤ß≤n and
X1 = K = |xi − xj |3 1≤i,j≤n
 

Spline Regression 73/74


§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

Cubic approximation

Computational saving can be obtained


 by specifying a knot
3

sequence k1 , ..., kK and using K = |ki − kj | 1≤i,j≤K and
X = |x − ki |3 1≤i≤K
 

Spline Regression 74/74

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy