0% found this document useful (0 votes)
135 views15 pages

Resampling Methods: Prof. Asim Tewari IIT Bombay

Resampling methods such as cross-validation and the bootstrap are used for model assessment and selection. Cross-validation involves splitting the data into training and validation sets to evaluate a model's performance, while avoiding overfitting. Leave-one-out cross-validation uses a single observation for validation each time. The bootstrap randomly samples observations with replacement to estimate properties of estimators like standard errors. While useful for complex models, it provides little benefit for simple linear models where standard errors can be directly calculated.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views15 pages

Resampling Methods: Prof. Asim Tewari IIT Bombay

Resampling methods such as cross-validation and the bootstrap are used for model assessment and selection. Cross-validation involves splitting the data into training and validation sets to evaluate a model's performance, while avoiding overfitting. Leave-one-out cross-validation uses a single observation for validation each time. The bootstrap randomly samples observations with replacement to estimate properties of estimators like standard errors. While useful for complex models, it provides little benefit for simple linear models where standard errors can be directly calculated.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Resampling Methods

Prof. Asim Tewari


IIT Bombay

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Resampling
• Resampling involve repeatedly drawing
samples from a training set and refitting a
model of interest on each sample

• Can be computationally expensive

• Resampling methods
– Cross-validation
– Bootstrap

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Resampling
• Model assessment: The process of evaluating
a model’s performance

• Model selection: The process of selecting the


proper level of flexibility.

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Resampling
Cross-Validation
• The Validation Set Approach
– It involves randomly dividing the available set of
observations into two parts, a training set and a
validation set or hold-out set.

A schematic display of the validation set approach. A set of n observations are randomly split
into a training set (shown in blue, containing observations 7, 22, and 13, among others) and a
validation set (shown in beige, and containing observation 91, among others). The statistical
learning method is fit on the training set, and its performance is evaluated on the validation
set.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Resampling
Cross-Validation
• The Validation Set Approach

Left: Validation error estimates for a single split into training and validation data sets. Right:
The validation method was repeated ten times, each time using a different random split of the
observations into a training set and a validation set. This illustrates the variability in the
estimated test MSE that results from this approach.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Resampling
Cross-Validation
• The Validation Set Approach
– Test error rate can be highly variable, depending
on which observations are included in the training
set and the validation set.
– In the validation approach, only a subset of the
observations are used to fit the model. This is a
problem since statistical methods tend to perform
worse when trained on fewer observations.

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Resampling
Cross-Validation
• Leave-One-Out Cross-Validation

A schematic display of LOOCV. A set of n data points is repeatedly split into a training set
(shown in blue) containing all but one observation, and a validation set that contains only that
observation (shown in beige). The test error is then estimated by averaging the n resulting
MSE’s. The first training set contains all but observation 1, the second training set contains all
but observation 2, and so forth.
Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
Resampling
Cross-Validation

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bootstrap Method
• Can be used to estimate the standard errors of the
coefficients. But not very useful for linear models
since the standard errors of the coefficients can be
directly estimated.

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bootstrap Method

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bootstrap Method
• The Bootstrap method can be used to estimate the standard errors of the
coefficients. But not very useful for linear models since the standard errors
of the coefficients can be directly estimated

A graphical illustration of
the bootstrap approach on
a small sample containing n
= 3 observations. Each
bootstrap data set contains
n observations, sampled
with replacement from the
original data set. Each
bootstrap data set is used
to obtain an estimate of α

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bootstrap Method

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bootstrap Method

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bootstrap Method

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
The Bootstrap Method

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy