0% found this document useful (0 votes)
81 views21 pages

Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering

This document discusses various cross-validation techniques used to evaluate machine learning models. It defines cross-validation as a resampling method that evaluates models by training and testing on different subsets of the data. The main types discussed are leave-one-out cross-validation, k-fold cross-validation, stratified k-fold cross-validation, and time series cross-validation. K-fold cross-validation randomly divides data into k groups, trains on k-1 and tests on the remaining group, and averages results over k trials. Stratified k-fold aims to maintain class distributions across folds. Time series cross-validation forecasts sequentially on a rolling basis.

Uploaded by

Chandan BK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views21 pages

Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering

This document discusses various cross-validation techniques used to evaluate machine learning models. It defines cross-validation as a resampling method that evaluates models by training and testing on different subsets of the data. The main types discussed are leave-one-out cross-validation, k-fold cross-validation, stratified k-fold cross-validation, and time series cross-validation. K-fold cross-validation randomly divides data into k groups, trains on k-1 and tests on the remaining group, and averages results over k trials. Stratified k-fold aims to maintain class distributions across folds. Time series cross-validation forecasts sequentially on a rolling basis.

Uploaded by

Chandan BK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Cross

Validation Chandan B K
By

Submitted under the guidance of

Mrs. Sridevi S
Asst professor,
Department of computer science engineering
Contents
 Cross validation.
 Types of cross validation
 leave one out cross validation.
 Hold out Method.
 K-Fold cross validation.
 Stratified K-Fold cross validation.
 Time series cross validation
Cross Validation
 Cross-Validation is a resampling technique that helps to
make our model sure about its efficiency and accuracy
on the unseen data.
 It is a method for evaluating Machine Learning models
by training several other Machine learning models on
subsets of the available input data set and evaluating
them on the subset of the data set.
 involves reserving a particular sample of a dataset on
which you do not train the model. Later, you test your
model on this sample before finalizing it.
Steps Involved In Cross
Validation
 Shuffle the dataset randomly.
 Split the dataset into k groups
 For each unique group:
 Take the group as a hold out or test data set
 Take the remaining groups as a training data set
 Fit a model on the training set and evaluate it on the
test set
 Retain the evaluation score and discard the model
 Summarize the skill of the model using the sample of
model evaluation scores
Advantages and Disadvantages Of
Cross Validation
Pros
 Reduces Over fitting.
 Hyper parameter Tuning.

Cons
 Increases Training Time. 
 Needs Expensive Computation.
Cross Validation
Techniques
Exhaustive Methods
 Leave-One-Out Cross-Validation
 Leave-P-Out Cross-Validation
Non-exhaustive Methods
 Hold out method
 K – Fold cross validation
 Stratified K-Fold cross validation
Time Series cross validation
Leave One Out Cross
Validation
 Leave-one-out cross-validation is a special case of cross-
validation where the number of folds equals the number
of instances in the data set.
 if there are n data points, n – p data points are taken in one iteration
and the remaining p data points are used for validation. 
 Only a single data point is taken into consideration as the testing data
i.e. p=1.
Leave One Out Cross
Validation
Leave One Out Cross
Validation
Pro’s and Con’s :
 More number of iterations in case of large data
set.
 Low biased approach.
 Requires more computational power.
 No randomness in test data set. 
Hold Out Method
 In this approach we divide our entire dataset into two
parts viz training data and testing data.
 The size of training data is set more than twice that of
testing data, so the data is split in the ratio of 70:30 or
80:20.
 In this approach, the data is first shuffled randomly
before splitting. As the model is trained on a different
combination of data points.
Hold Out Method
Pro’s and Con’s :
 The model can give different results every time we train it,
and this can be a cause of instability.
 We can never assure that the train set we picked is
representative of the whole dataset.
 When dataset is not too large, there is a high possibility that
the testing data may contain some important information
that we lose as we do not train the model on the testing set.
 The hold-out method is good to use when you have a very large
dataset or you are starting to build an initial model in your data
science project.
K -Fold Cross Validation
 K-Fold cross-validation, the data is divided into k
subsets.
 Each time, one of the k subsets is used as the validation
set and the other k-1 subsets as the training set.
 The Parameters is averaged over all k trials to get the
total efficiency of our model.
K -Fold Cross Validation
 A large value for ‘k’ indicates less bias, and high
variance. Also, this means more data samples can be
used to give a better, and precise outcomes.
 The true error is estimated as the average error rate on
test examples.
K -Fold Cross Validation
K -Fold Cross Validation
Pros and Cons:
 Computation time is reduced as we repeated the process
only 10 times when the value of k is 10.
 Reduced bias
 The variance of the is reduced as k increases
 The training algorithm is computationally intensive as
the algorithm has to be rerun from scratch k times.
Stratified K -Fold Cross
Validation
 Stratified sampling is a sampling technique where the
samples are selected in the same proportion (as they appear
in the population.
 Stratified K Fold used when just random shuffling and
splitting the data is not sufficient, and we want to have
correct distribution of data in each fold.
 In case of regression problem folds are selected so that the
mean response value is approximately equal in all the folds.
 In case of classification problem folds are selected to have
same proportion of class labels .
Stratified K -Fold Cross
Validation
Stratified K -Fold Cross
Validation
Pros and Cons:
 It can improve different models using hyper-parameter
tuning.
 Helps us compare models.
 It helps in reducing both Bias and Variance.
Time Series Cross
Validation
 Cross-validating the time-series model is cross-
validation on a rolling basis.
 In this method we Start with a small subset of data for
training purpose, forecast for the later data points and
then checking the accuracy for the forecasted data
points.
 The same forecasted data points are then included as
part of the next training dataset and subsequent data
points are forecasted.
Time Series Cross
Validation
Thank
You

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy