Lecture 20
Lecture 20
The full data set has 3932 observations. Half of those (1966)
are used now – the remaining 1966 are reserved for an out-
of-sample comparison of the ridge v. other prediction
methods, done later.
where Lasso j =1 b j
k
is the “penalty term.”
a2 + b2 = 1
For 2 X’s that are positively
correlated, the resulting
choices of a and b are a = b = 1/ 2
This is shown in the figure −−
Principal Components, k > 2 Upper K greater than 2.
i =1 i =1
The scree plot is informative (you should look at it) but doesn’t provide a
simple rule for choosing p.
• The number of principal components p is like the ridge and
Lasso penalty factors Ridge and Lasso - all are additional
parameters needed to implement the procedure.
• Like Ridge and Lasso , p can be estimated by minimizing the
m-fold cross validated estimate of the MSPE.
– For a given value of p, the principal components
forecast is obtained by regressing Y on PC1,..., PCp −1
using the estimation sample, then using that model to predict in the
test sample
Predicting test scores
OLS: 78.2
Principal Components:
39.7