L5 Spline Regression
L5 Spline Regression
6 Ot
Spline Regression
Karim Seghouane
School of Mathematics & Statistics
The University of Melbourne
Outline
§3.1 Introduction
§3.2 Motivation
§3.3 Spline
Introduction
Introduction
The interest is the discovery of the underlying trend in the observed
data which are treated as a collection of points on the plane
Introduction
f (x) = E (y |x)
I This can also be written as
yi = f (xi ) + i , E (i ) = 0
I and the problem is referred as nonparametric regression
Introduction
Motivation
yi = β0 + β1 xi + i
Motivation
Motivation
1 xn
I The vector of fitted values
−1
ŷ = X X > X X > y = Hy
Motivation
yi = β0 + β1 xi + β2 xi2 + i
Motivation
Motivation
1 x1 x12
X = ... ... ..
.
1 xn xn2
I The vector of fitted values
−1
ŷ = X X > X X > y = Hy
1 x1 (x1 − 0.5)+ (x1 − 0.55)+ . . . (x1 − 0.95)+
X = ... ... .. .. .. ..
. . . .
1 xn (xn − 0.5)+ (xn − 0.55)+ . . . (xn − 0.95)+
K
X
f (x) = β0 + β1 x + βki (x − ki )+
i=1
Illustration
I The selection of a good basis is usually challenging
I Start by trying to choose knots by trial (at range 575 and 600)
Illustration
I The fit lacking in quality for low values of range
I An obvious remedy is to use more knots (at range 500, 550,
600 and 650)
Illustration
I Larger set of knots (at every 12.5), the fitting procedure has
much more flexibility
I The plots is heavily overfitted
Illustration
I Pruning the knots (at 612.5, 650, 662.5 and 687.5) to
overcome the overfitting issue
I This fits the data well without overfitting
I Obtained, after a lot of time consuming trial and error
Knot selection
Constraints on the β1k that might help avoid this situation are
ky − X βk2 + λβ > Dβ
I for λ ≥ 0 and has solution
−1
β̂ λ = X > X + λD X >y
Illustration
I Linear penalized spline regression fits for different values of
the smoothing parameter (depends on λ)
K
X
f (x) = β0 + β1 x + ... + βp x p + βki (x − ki )p+
i=1
I The expression for the fitted values is given by
−1
ŷ = X X > X + λD X >y
D = diag (0p+1 , 1K )
B-Spline bases
B-Spline bases
B-Spline bases
B-spline bases of degree 1, 2 and 3 for the case of seven irregularly
spaced knots
B-Spline bases
B-Spline bases
Mathematically, this equivalence is quantified as follows
I Let XT be the X matrix with columns obtained with a power
truncated basis and
I let XB be the X matrix corresponding to the B-spline basis of
the same degree and same knots locations then
Once the choices have been made, there follow two secondary
choices
I The basis functions: truncate power functions or B-splines
I The basis functions used in the computations
Linear smoothers
In general
ŷ = Ly
where L is an n × n matrix that doesn’t depend on y directly (but
does through λ). This is also called linear smoother.
Spline Regression 54/74
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
yi = f (xi ) + i
An important quantity of interest is the error incurred by an
estimator with respect to a given target. The most common
measure of error is the mean square error MSE
n o n o2
ˆ ˆ
MSE f (x) = E f (x) − f (x)
n o n n
X o2
MSSE fˆ(.) = E fˆ (xi ) − f (xi )
i=1
X n 2 n o
MSSE f̂ = E fˆ(xi ) − f (xi ) + var fˆ (xi )
i=1
MSSE f̂ = k (L − I ) fk2 + σ2 tr LL>
Note 2
Spline Regression 57/74
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
tr (S0 ) = p + 1 + K
I At the other extreme
tr (Sλ ) → p + 1 as λ → ∞
I So for λ > 0
p + 1 < df < p + 1 + K
Spline Regression 59/74
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Cross validation
Cross validation
n n
!2
1 X ˆ
2 1 X yi − fˆλ (xi )
CV (λ) = yi − f−i (xi , λ) =
N N 1 − Sλ (i, i)
i=1 i=1
Cross validation
n
1 X [(I − Sλ ) y]i 2
RSS(λ)
GCV(λ) = =
N 1 − n−1 tr(Sλ ) (1 − n−1 tr(Sλ ))2
i=1
Our interest
MSSE f̂ = k (L − I ) fk2 + σ2 tr LL>
where Sλ = L, however
E (RSS) = E kf̂ − yk2 = MSSE f̂ + σ2 (n − 2dffit )
RSS (λ)
σ̂2 =
dfres (λ)
Note 4
RSS(λ)
GCV (λ) =
(1 − n−1 tr(Sλ ))2
Other basis
Other basis
1, x, ..., x p , |x − k1 |p , ..., |x − kK |p
where
r (kx − ki k)
√
I where kvk = v> v is the vector length
I These functions are radially symmetric
I They are called radial basis functions
Cubic approximation
ky − X0 β 0 − X1 β 1 k2 + λβ >
1 K β1
X0> β 1 = 0
where β 0 = [β0 , β1 ]> , β 1 = [β11 , ..., β1n ]> , X0 = [1, xi ]1≤ß≤n and
X1 = K = |xi − xj |3 1≤i,j≤n
Cubic approximation