0% found this document useful (0 votes)

15 views11 pages

18 CV & Model Selection

The document discusses model selection and cross-validation techniques in regression analysis, emphasizing the importance of estimating out-of-sample prediction risk. It introduces concepts such as training error, generalization error, and optimism, and presents Mallow's Cp as a method for estimating prediction error. Additionally, it explains cross-validation methods, including K-fold and leave-one-out cross-validation (LOOCV), as effective ways to assess model performance without assuming model correctness.

Uploaded by

MLGen GT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

18 CV & Model Selection

Uploaded by

MLGen GT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Model Selection & Cross-Validation

Last time we talked about how to use the F test to test between two different models, when
one is nested inside the other. However, often we might have many possible models to choose from,
e.g.:

• including some covariates but not others

• all possible two-way interactions

• all possible squared and cubic terms

• combinations of the above, etc.

What should be done then?

1 Training error & prediction risk

Let m(X
b i ) be the predicted value from some fitted regression model. The in-sample or training
error
btrain = 1
X
R b i ))2
(Yi − m(X (1)
n
i

measures how well the model does on the exact same observations it was built from. This is like
seeing a test’s answers before you take it! It is not a good estimate of what we really care about,
which is the out-of-sample or prediction risk (i.e., the error on predicting a new, future Y we
haven’t seen yet),
2
R = R(m)b = E[(Y − m(X))
b ] (2)
where (X, Y ) is a new observation that was not used to construct m(X).
b The prediction risk is
also known as the generalization error.
R
btrain can be extremely different from R when there are many features. And as you’d expect,
R
btrain will always be an underestimate of R, and will always be made smaller with larger more
complex models.

2 Generalization and Optimism

Here we will study the precise difference between R btrain and R, which will lead to a practical
method for correcting Rtrain to get a better estimate of out-of-sample risk.
b
We assume the linear model
Y = Xβ + .
From these data P
we constructed our estimator β,
b which has dimension q = p + 1. Our fitted model
is m(X)
b = β0 + j βj Xj . The training error — also called the in-sample error — is as usual
b b

n
btrain = 1
X
R b i ))2 .
(Yi − m(X
n
i=1

1
Now imagine that we get a new data set at the same Xi ’s but with new errors:

Y0 = Xβ + 0

where and 0 are independent but identically distributed. The design matrix is the same, the
true parameters β are the same, but the noise is different. This might seem a bit strange. But it’s
similar to observing a new (X, Y ).
How well can we predict these new Yi0 s? The predicted values using our model are still m(X
b i ).
Define the out-of-sample prediction error by
n
bout = 1
X
R (Yi0 − m(X
b i ))2
n
i=1

We will now show that

2
E[R bout ] − 2σ q
btrain ] = E[R (3)
n
which shows that R
btrain is a biased estimate of the out-of-sample error in that

E[R
btrain ] < E[R
bout ]

btrain + 2bσ2 q to estimate the generalization error.

and also suggests that we can use R n
As we often do, we will condition on X fixed. Notice that m(X b i ) is a function of of all the
Yi so these are dependent random variables. On the other hand, m(X b i ) and Yi0 are completely
statistically independent.

Theorem 1
n
" #
1 X
0 2 2σ 2 p
E[R
btrain ] = E (Yi − m(X
b i )) − (4)
n n
i=1

Proof. Now

b i ))2 = Var [Yi − m(X b i )])2

E (Yi − m(X b i )] + (E [Yi − m(X (5)
b i )] − 2Cov [Yi , m(X
= Var [Yi ] + Var [m(X b i )])2 . (6)
b i )] + (E [Yi ] − E [m(X

On the other hand,

2
E (Yi0 − m(X
b i ))2 = Var Yi0 − m(X
b i ) + E Yi0 − m(X

b i) (7)
0 0 0 2
b i )] − 2Cov Yi , m(X
= Var Yi + Var [m(X b i ) + E Yi − E [m(X
b i )] .(8)

Now Yi0 is independent of Yi , but has the same distribution. This tells us that E [Yi0 ] = E [Yi ],
Var [Yi0 ] = Var [Yi ], but Cov [Yi0 , m(X
b i )] = 0. So

b i )])2
0
b i ))2 = Var [Yi ] + Var [m(X

E (Yi − m(X b i )] + (E [Yi ] − E [m(X (9)
2

= E (Yi − m(X b i )) + 2Cov [Yi , m(X
b i )] . (10)

Averaging over data points,

" n # " n # n
1X 0 2 1 X
2 2X
E (Yi − m(X
b i )) = E (Yi − m(X
b i )) + Cov [Yi , m(X
b i )].
n n n
i=1 i=1 i=1

2
Now
b = Cov(Y, HY) = σ 2 H
Cov(Y, Y)
b i )] = σ 2 Hii . So,
and so Cov[Yi , m(X
" n # " n # " n #
1X 0 1 X 2 1 X 2q
E b i ))2 = E
(Yi − m(X b i ))2 + σ 2 tr H = E
(Yi − m(X b i ))2 + σ 2 .
(Yi − m(X
n n n n n
i=1 i=1 i=1

The term (2/n)σ 2 q is called the optimism of the model — the amount by which the training
error systematically under-estimates its true expected squared error. Notice that the optimism:

• Grows with σ 2 : more noise gives the model more opportunities to seem to fit well by capital-
izing on chance.

• Shrinks with n: at any fixed level of noise, more data makes it harder to pretend the fit is
better than it really is.

• Grows with q: every extra parameter is another control which can be adjusted to fit to the
noise.

Minimizing the training error completely ignores the bias from optimism, so it is guaranteed to
pick models which are too large and predict poorly out of sample.
We can now state an estimate of the risk. Define
σ2q
btrain + 2b
Cp = R (11)
n
which is known as Mallow’s Cp . We see that E[Cp ] = Rout . You can compute Cp in R as follows:

out = lm(y ~ x1 + x2 + etc )

n = length(y)
s = summary(out)
q = length(out$coefficients)
sigma = s$sigma
training_error = mean((y - fitted(out))^2)
Cp = training_error + 2*sigma^2*q/n

3 Cross-Validation
Mallow’s Cp is nice, but our derivation assumed that the linear model is correct. Is there a way to
estimate prediction error without assuming the model is correct? Yes! It’s called cross-validation,
and is one of the most powerful tools in statisticsP
& data science.
The basic idea is this. The training error n−1 i (Yi − m(X
b i ))2 is biased because Yi and m(X
b i)
are dependent. Cross-validation methods break the data into a training set to get m(X
b i ) and a test
set to get Yi . This makes them independent and mimics what it is like to predict a new observation.
There are two main flavors of cross-validation: K-fold cross-validation and leave-one-out cross-
validation.

3
3.1 K-fold Cross-Validation
K-fold cross-validation goes as follows.

• Randomly divide the data into K equally-sized parts, or “folds”. A common choice is K = 5
or K = 10.

• For each fold

– Temporarily hold out that fold, calling it the “testing set”.

– Call the other K − 1 folds, taken together, the “training set”.
– Estimate the model on the training set.
– Calculate the traing error on the testing set.

• Average training error estimates over the folds.

In other words, divide the data into K groups B1 , . . . , BK . For j ∈ {1, . . . , K}, estimate m
b from
the data {B1 , . . . , Bj−1 , Bj+1 , . . . , BK }. Then let

bj = 1
X
G b i ))2
(Yi − m(X
nj
i∈Bj

where nj is the number of points in Bj . Finally, we estimate the generalization error by

K
1 Xb
G=
b Gj .
K
j=1

3.2 Leave-one-out Cross-Validation (LOOCV)

Let m
b −i (Xi ) be the predicted value when we leave out (Xi , Yi ) from the dataset. The leave-one-
out cross-validation score (LOOCV) is
n
1X
LOOCV = b −i (Xi ))2 .
(Yi − m
n
i=1

Computing LOOCV sounds painful. Fortunately, there is a simple, amazing shortcut formula:
n 2
1X Yi − m(X
b i)
LOOCV = .
n 1 − Hii
i=1

So computing LOOCV is actually very fast.

It also interesting to note the following. We know that tr(H) = q. So the average value of the
Hii0 s is γ ≡ q/n. If we approximate each Hii with γ we have
n 2
1X Yi − m(X
b i)
LOOCV ≈ ..
n 1−γ
i=1

4
By doing a Taylor series we see that (1 − γ)−2 ≈ 1 + 2γ. Hence,
n
1 + 2γ X
LOOCV ≈ b i ))2
(Yi − m(X (12)
n
i=1
n n
1X 2 1X
= (Yi − m(X
b i )) + 2γ b i ))2
(Yi − m(X (13)
n n
i=1 i=1
σ2q
2b
= training error + = Cp . (14)
n
So, Mallows Cp can be thought of as an approximation to cross-validation.

4 R Practicalities
You can get the LOOCV from R as follows:

out = lm(y ~ x)
numerator = y - fitted(out)
denominator = 1 - hatvalues(out)
LOOCV = mean( (numerator/denominator)^2 )

To get Cp I suggest you use the code after (??). To get K-fold cross-validation you can either
write your own code (excellent idea!) or use the boot package together with glm as follows:

library(boot)
out = glm(y ~ x,data = D)
cv.glm(D,out,K=5)$delta[1]

This looks pretty strange, but that will give you the value that you want. (There may be an
easier way. Let me know if you find one.) Here is an example. We will generate data from a
quadratic. We will then fit polynomials up to order 10. Then we will plot the LOOCV and the
K-fold cross-validation using K = 5.

library("boot")

## generate the data

n = 100
x = runif(n)
y = 2 + x - 3*x^2 + rnorm(n,0,.1)
D = data.frame(x=x,y=y)

plot(x,y)

LOOCV = rep(0,10)
KFoldCV = rep(0,10)

5
## fit polynomials and get the cross-validation scores
for(j in 1:10){
out = glm(y ~ poly(x,j),data=D)
print(summary(out))
LOOCV[j] = mean(((y - fitted(out))/(1-hatvalues(out)))^2)
KFoldCV[j] = cv.glm(D,out,K=5)$delta[1]
}

## plot them
plot(1:10,LOOCV,type="l",lwd=3)
lines(1:10,KFoldCV,lwd=3,col="blue")

● ●● ●
●
●●
●●
●
●● ● ● ●
● ● ●●● ●
●
● ● ● ●
2.0

● ● ● ●● ● ●
●
● ●● ● ●● ●●
●
● ●
●●
●● ●
●
● ●● ●
●
●
● ● ●
●
●
1.5

● ●

●
●
●
●
● ●
●●
●
y

●
●
1.0

● ●●
●
● ●●
●

●
●
0.5

●●
●
●
● ●
●
● ●
●●●
0.0

0.2 0.4 0.6 0.8 1.0

Figure 1: The data.

6
0.06
0.05
0.04
LOOCV

0.03
0.02
0.01

2 4 6 8 10

1:10

Figure 2: LOOCV is in black. K-fold is in blue. The x-axis is the order of the polynomial.

7
5 Another Example

Cp = function(y,W){
### Cp from output of lm
n = length(W$residuals)
p = length(W$coefficients)
sigma = summary(W)$sigma
train = mean( (y - fitted(out))^2)
mallows = train + 2*sigma^2*p/n
return(mallows)
}

CV = function(y,W){
### leave-one-out cross-validation from output of lm
numerator = y - fitted(W)
denominator = 1 - hatvalues(W)
LOOCV = mean( (numerator/denominator)^2 )
return(LOOCV)
}

pdf("AnotherExample.pdf")
par(mfrow=c(2,1))
n = 100
x = runif(n,-1,1)
y = x^3 + rnorm(n,0,.1)
plot(x,y)
outcp = rep(0,5)
outcv = rep(0,5)

for(i in 1:5){
out = lm(y ~ poly(x,i))
outcp[i] = Cp(y,out)
outcv[i] = CV(y,out)
}

a = min(c(outcp,outcv))
b = max(c(outcp,outcv))
plot(1:5,outcp,type="l",lwd=3,ylim=c(a,b),xlab="Degree",ylab="Error")
lines(1:5,outcv,col="red",lwd=3)
dev.off()

8
1.0

●●
●●
●
● ●●
●● ● ●
●
● ●●●●
● ●
● ● ● ●
● ●
●● ●● ● ●● ● ● ●
● ● ●●
0.0

● ● ●● ● ●● ●● ●●● ● ● ●
● ● ●● ●●
y

● ● ● ● ●
●
● ●●●
●
● ● ● ● ●
● ●●
●
● ●
● ●●
● ●
●
●●● ●
●●
−1.0

●●
●

−1.0 −0.5 0.0 0.5 1.0

x
0.030
Error

0.015

1 2 3 4 5

Degree

Figure 3: Comparing Cp and LOOCV.

9
6 Inference after Selection with Data Splitting
All of the inferential statistics we have done in earlier lectures presumed that our choice of model
was completely fixed, and not at all dependent on the data. If different data sets would lead us
to use different models, and our data are (partly) random, then which model we’re using is also
random. This leads to some extra uncertainty in, say, our estimate of the slope on X1 , which is not
accounted for by our formulas for the sampling distributions, hypothesis tests, confidence sets, etc.
A very common response to this problem, among practitioners, is to ignore it, or at least hope
it doesn’t matter. This can be OK, if the data-generating distribution forces us to pick one model
with very high probability, or if all of the models we might pick are very similar to each other.
Otherwise, ignoring it leads to nonsense.
Here, for instance, I simulate 200 data points where the Y variable is a standard Gaussian, and
there are 100 independent predictor variables, all also standard Gaussians, independent of each
other and of Y :

n = 200
p = 100
y = rnorm(n)
x = matrix(rnorm(n*p),nrow=n)
df = data.frame(y=y,x)
mdl = lm(y~., data=df)

Of the 100 predictors, 5 have t-statistics which are significant at the 0.05 level or less. (The
expected number would be 5.) If we select the model using just those variables we get

##
## Call:
## lm(formula = y ~ ., data = df[, c(1, stars)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.53035 -0.75081 0.03042 0.58347 2.63677
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03084 0.07092 0.435 0.6641
## X21 -0.13821 0.07432 -1.860 0.0644
## X25 0.12472 0.06945 1.796 0.0741
## X41 0.13696 0.07279 1.882 0.0614
## X83 -0.03067 0.07239 -0.424 0.6722
## X88 0.14585 0.07040 2.072 0.0396
##
## Residual standard error: 0.9926 on 194 degrees of freedom
## Multiple R-squared: 0.06209,Adjusted R-squared: 0.03792
## F-statistic: 2.569 on 5 and 194 DF, p-value: 0.02818

10
Notice that final over-all F statistic: it’s testing whether including those variables fits better
than an intercept-only model, and saying it thinks it does, with a definitely significant p-value. This
is the case even though, by construction, the response is completely independent of all predictors.
This is not a fluke: if you re-run my simulation many times, your p-values in the full F test will not
be uniformly distributed (as they would be on all 100 predictors), but rather will have a distribution
strongly shifted over to the left. Similarly, if we looked at the confidence intervals, they would be
much too narrow.
These issues do not go away if the true model isn’t “everything is independent of everything
else”, but rather has some structure. Because we picked the model to predict well on this data,
if we then run hypothesis tests on that same data, they’ll be too likely to tell us everything is
significant, and our confidence intervals will be too narrow. Doing statistical inference on the same
data we used to select our model is just broken. It may not always be as spectacularly broken as
in my demo above, but it’s still broken.
There are three ways around this. One is to pretend the issue doesn’t exist; as I said, this
is popular, but it’s got nothing else to recommend it. Another, is to not do tests or confidence
intervals. The third approach, which is in many ways the simplest, is to use data splitting.
Data splitting is (for regression) a very simple procedure:

• Randomly divide your data set into two parts.

• Calculate your favorite model selection criterion for all your candidate models using only the
first part of the data. Pick one model as the winner.

• Re-estimate the winner, and calculate all your inferential statistics, using only the other half
of the data.

(Division into two equal halves is optional, but usual.)

Because the winning model is statistically independent of the second half of the data, the
confidence intervals, hypothesis tests, etc., can treat it as though that model were fixed a priori.
Since we’re only using n/2 data points to calculate confidence intervals (or whatever), they will be
somewhat wider than if we really had fixed the model in advance and used all n data points, but
that can be viewed as the price we pay for having to select a model based on data. Alternatively,
after our first split-and-test, we could then swap our two halves and average the corresponding
results to get back the full sample size n instead of n/2.

7 History
Cross-validation goes back in statistics into the 1950s, if not earlier, but did not become formalized
as a tool until the 1970s, with the work of Stone (1974). It was adopted, along with many other
statistical ideas, by computer scientists during the period in the late 1980s–early 1990s when the
modern area of “machine learning” emerged from (parts of) earlier areas called “artificial intelli-
gence”, “pattern recognition”, “connectionism”, “neural networks”, or indeed “machine learning”.
Subsequently, many of the scientific descendants of the early machine learners forgot where their
ideas came from, to the point where many people now think cross-validation is something computer
science contributed to data analysis.

Cross-Validation and Model Selection
No ratings yet
Cross-Validation and Model Selection
46 pages
hw16 109090023
No ratings yet
hw16 109090023
22 pages
MI - Unit 5
No ratings yet
MI - Unit 5
72 pages
Data Science Cheatsheet
100% (1)
Data Science Cheatsheet
5 pages
YMS Ch3: Examining Relationships AP Statistics at LSHS Mr. Molesky
100% (2)
YMS Ch3: Examining Relationships AP Statistics at LSHS Mr. Molesky
2 pages
Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture4
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture4
57 pages
IPS Report
No ratings yet
IPS Report
127 pages
Unit 5 & 6. Probability and Prob Disti
No ratings yet
Unit 5 & 6. Probability and Prob Disti
90 pages
10 CV Val1
No ratings yet
10 CV Val1
26 pages
Slides 1 Handout
No ratings yet
Slides 1 Handout
23 pages
Assignment Solution 2
No ratings yet
Assignment Solution 2
8 pages
Intro To Regression
No ratings yet
Intro To Regression
4 pages
Week7 Lecture 1 ML SPR25
No ratings yet
Week7 Lecture 1 ML SPR25
23 pages
DAY 7 SESSION 2 Cross Validation
No ratings yet
DAY 7 SESSION 2 Cross Validation
18 pages
Regression Models Assignment 1
No ratings yet
Regression Models Assignment 1
5 pages
I2ml3e Chap19
No ratings yet
I2ml3e Chap19
33 pages
Data Analysis (Workshop)
No ratings yet
Data Analysis (Workshop)
2 pages
Hsieh Yeung 1986 Active Neck Motion Measurements With A Tape Measure
No ratings yet
Hsieh Yeung 1986 Active Neck Motion Measurements With A Tape Measure
3 pages
226 Lecture5 Prediction
No ratings yet
226 Lecture5 Prediction
45 pages
Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
Ameen, Muhydeen Garba (09/30GB116) : An Empirical Assesment of Causes of Building Failures in Lagos State
No ratings yet
Ameen, Muhydeen Garba (09/30GB116) : An Empirical Assesment of Causes of Building Failures in Lagos State
58 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Lec 5
No ratings yet
Lec 5
28 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Interview
No ratings yet
Interview
4 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
Briefly Explain The Trade-Offs Associated Between The Model Variance Versus Bias-Squared To Inform Model Selection
No ratings yet
Briefly Explain The Trade-Offs Associated Between The Model Variance Versus Bias-Squared To Inform Model Selection
7 pages
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
No ratings yet
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
17 pages
Week 2
No ratings yet
Week 2
43 pages
Solutions For Tutorial 2
No ratings yet
Solutions For Tutorial 2
14 pages
Ch5 Resampling Methods
No ratings yet
Ch5 Resampling Methods
66 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
KNN Bias Variance Classification Metrics
No ratings yet
KNN Bias Variance Classification Metrics
81 pages
Daeng Nur Cahyo - Tugas Pertemuan 9 - Statistika
No ratings yet
Daeng Nur Cahyo - Tugas Pertemuan 9 - Statistika
4 pages
On Estimating Model Accuracy
No ratings yet
On Estimating Model Accuracy
6 pages
Class 9 After
No ratings yet
Class 9 After
38 pages
Unit 5 - Activity 1 - Probability Distribution Worksheet
No ratings yet
Unit 5 - Activity 1 - Probability Distribution Worksheet
4 pages
ECS171: Machine Learning: Lecture 13: Validation, Model Selection
No ratings yet
ECS171: Machine Learning: Lecture 13: Validation, Model Selection
32 pages
Model Selection and Multiple Hypothesis Testing
No ratings yet
Model Selection and Multiple Hypothesis Testing
6 pages
Gary Chamberlain Econometric S
No ratings yet
Gary Chamberlain Econometric S
152 pages
M1 - Evaluating Predictive Performance
No ratings yet
M1 - Evaluating Predictive Performance
58 pages
Ghojogh, Benyamin, and Mark Crowley
No ratings yet
Ghojogh, Benyamin, and Mark Crowley
23 pages
Sma 2230 Probability and Statistics Ii
No ratings yet
Sma 2230 Probability and Statistics Ii
2 pages
PRML RefSheet
No ratings yet
PRML RefSheet
6 pages
STA 112 Tutorial Questions
No ratings yet
STA 112 Tutorial Questions
4 pages
��
No ratings yet
��
3 pages
ML 2024 Part1 CrossValidation
No ratings yet
ML 2024 Part1 CrossValidation
43 pages
1.5 - Inference For A Single Proportion Using A Theory-Based Approach
No ratings yet
1.5 - Inference For A Single Proportion Using A Theory-Based Approach
6 pages
Stata
No ratings yet
Stata
5 pages
Lecture 11 - Stochastic Regressors Measurement Errors
No ratings yet
Lecture 11 - Stochastic Regressors Measurement Errors
6 pages
Bayes Ch7
No ratings yet
Bayes Ch7
11 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
Reasoning With Uncertainty - Probabilistic Reasoning: Version 2 CSE IIT, Kharagpur
No ratings yet
Reasoning With Uncertainty - Probabilistic Reasoning: Version 2 CSE IIT, Kharagpur
10 pages
Output Hasil Spss
No ratings yet
Output Hasil Spss
7 pages
Optimum Policy Control
No ratings yet
Optimum Policy Control
4 pages
Machine Learning Lecture Notes Undergrad
No ratings yet
Machine Learning Lecture Notes Undergrad
19 pages
Classification
No ratings yet
Classification
4 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Education and Research: UP School of Statistics Student Council
No ratings yet
Education and Research: UP School of Statistics Student Council
26 pages
EC1 Slides Part4
No ratings yet
EC1 Slides Part4
35 pages
Lecture - 10 - Part 1
No ratings yet
Lecture - 10 - Part 1
34 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
Lecture 21: Model Selection 1 Choosing Models
No ratings yet
Lecture 21: Model Selection 1 Choosing Models
14 pages
Bias-Variance Tradeoffs: 1 Single Sample MLE
No ratings yet
Bias-Variance Tradeoffs: 1 Single Sample MLE
7 pages
wp667 PDF
No ratings yet
wp667 PDF
38 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
Fin 460-HW 4 Adianto Joel
No ratings yet
Fin 460-HW 4 Adianto Joel
7 pages
JCSS Probabilistic Model Code, Section 3.7: SOIL PROPERTIES
No ratings yet
JCSS Probabilistic Model Code, Section 3.7: SOIL PROPERTIES
27 pages
UCCM2233 - Chp3 Num Descriptive Measures-Wble
No ratings yet
UCCM2233 - Chp3 Num Descriptive Measures-Wble
103 pages
Random Variables PDF
No ratings yet
Random Variables PDF
64 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Discussion 3 Supervised
No ratings yet
Discussion 3 Supervised
14 pages
5 CV Boot-Handout PDF
No ratings yet
5 CV Boot-Handout PDF
44 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Quantitative Methods
No ratings yet
Quantitative Methods
4 pages
Econometric Theory: Module - Ii
No ratings yet
Econometric Theory: Module - Ii
11 pages
PDF Textbook Information
No ratings yet
PDF Textbook Information
2 pages
Hypothesis Testing For The Difference of Proportions
No ratings yet
Hypothesis Testing For The Difference of Proportions
6 pages
Pearson Product Moment Correlation
No ratings yet
Pearson Product Moment Correlation
18 pages
Discrete Probability Distributions: Binomial Distribution Poisson Distribution
No ratings yet
Discrete Probability Distributions: Binomial Distribution Poisson Distribution
17 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
Partial Correlation
No ratings yet
Partial Correlation
2 pages
Chapter 6-8 Sampling and Estimation
No ratings yet
Chapter 6-8 Sampling and Estimation
48 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

18 CV & Model Selection

Uploaded by

18 CV & Model Selection

Uploaded by

Model Selection & Cross-Validation

• including some covariates but not others

• all possible two-way interactions

• all possible squared and cubic terms

• combinations of the above, etc.

What should be done then?

1 Training error & prediction risk

2 Generalization and Optimism

We will now show that

btrain + 2bσ2 q to estimate the generalization error.

b i ))2 = Var [Yi − m(X b i )])2

On the other hand,

Averaging over data points,

out = lm(y ~ x1 + x2 + etc )

• For each fold

– Temporarily hold out that fold, calling it the “testing set”.

• Average training error estimates over the folds.

where nj is the number of points in Bj . Finally, we estimate the generalization error by

3.2 Leave-one-out Cross-Validation (LOOCV)

So computing LOOCV is actually very fast.

## generate the data

0.2 0.4 0.6 0.8 1.0

Figure 1: The data.

−1.0 −0.5 0.0 0.5 1.0

Figure 3: Comparing Cp and LOOCV.

• Randomly divide your data set into two parts.

(Division into two equal halves is optional, but usual.)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.