DS Notes Unit - V
DS Notes Unit - V
MODEL EVALUATION
GENERALIZATION ERROR
EVALUATION METRICS
Model Evaluation Metrics define the evaluation metrics for evaluating the performance of a
machine learning model, which is an integral component of any data science project. It aims to
Confusion Matrix
A confusion matrix is a matrix representation of the prediction results of any binary testing that
is often used to describe the performance of the classification model (or on a set of
The confusion matrix itself is relatively simple to understand, but the related terminology can be
confusing.
1
Each prediction can be one of the four outcomes, based on how it matches up to the actual value:
hypothesis testing.
A Hypothesis is speculation or theory based on insufficient evidence that lends itself to further
testing and experimentation. With further testing, a hypothesis can usually be proven true or false.
A Null Hypothesis is a hypothesis that says there is no statistical significance between the two
variables in the hypothesis. It is the hypothesis that the researcher is trying to disprove.
We would always reject the null hypothesis when it is false, and we would accept the null
Even though hypothesis tests are meant to be reliable, there are two types of errors that can occur.
2
These errors are known as Type 1 and Type II errors.
For example, when examining the effectiveness of a drug, the null hypothesis would be that the
The first kind of error that is possible involves the rejection of a null hypothesis that is true.
go back to the example of a drug being used to treat a disease. If we reject the null
hypothesis in this situation, then we claim that the drug does have some effect on a disease. But
if the null hypothesis is true, then, in reality, the drug does not combat the disease at all. The
The other kind of error that occurs when we accept a false null hypothesis. This sort of error is
called a type II error and is also referred to as an error of the second kind.
CROSS VALIDATION
Cross validation is a technique for assessing how the statistical analysis generalises to an
independent data set.It is a technique for evaluating machine learning models by training several
models on subsets of the available input data and evaluating them on the complementary subset
of the data. Using cross-validation, there are high chances that we can detect over-fitting with
ease.
3
First I would like to introduce you to a golden rule mix training and test . Your
first step should always be to isolate the test data-set and use it only for final evaluation.
Cross- validation will thus be performed on the training set.
Initially, the entire training data set is broken up in k equal parts. The first part is kept as the hold
out (testing) set and the remaining k-1 parts are used to train the model. Then the trained model is then tested on
changing the holdout set. Thus, every data point get an equal opportunity to be included in the
test set.
Usually, k is equal to 3 or 5. It can be extended even to higher values like 10 or 15 but it becomes
extremely computationally expensive and time-consuming. Let us have a look at how we can
implement this with a few lines of Python code and the Sci-kit Learn API.
We pass the model or classifier object, the features, the labels and the parameter cv which
indicates the K for K-Fold cross-validation. The method will return a list of k accuracy values for
4
each iteration. In general, we take the average of them and use it as a consolidated cross-
validation score.
import numpy as np
print(np.mean(cross_val_score(model, X_train, y_train, cv=5)))
What is Overfitting?
When a model performs very well for training data but has poor performance with test data (new
data), it is known as overfitting. In this case, the machine learning model learns the details and
noise in the training data such that it negatively affects the performance of the model on test data.
Overfitting can happen due to low bias and high variance.
5
Reasons for Overfitting
Data used for training is not cleaned and contains noise (garbage values) in it
What is Underfitting?
When a model has not learned the patterns in the training data well and is unable to generalize
well on the new data, it is known as underfitting. An underfit model has poor performance on the
training data and will result in unreliable predictions. Underfitting occurs due to high bias and low
variance.
6
Reasons for Underfitting
Data used for training is not cleaned and contains noise (garbage values) in it
Ridge Regression
7
Ridge regression is a model tuning method that is used to analyse any data that suffers from
multicollinearity. This method performs L2 regularization. When the issue of
multicollinearity occurs, least-squares are unbiased, and variances are large, this results in
predicted values being far away from the actual values.
Min(||Y X(theta)||^2 +
Lambda is the penalty term. given here is denoted by an alpha parameter in the ridge function.
For any type of regression machine learning model, the usual regression equation forms the base
which is written as:
Y = XB + e
Where Y is the dependent variable, X represents the independent variables, B is the regression
coefficients to be estimated, and e represents the errors are residuals.
8
The predictions for the input data are shown in column J. In fact, the values in range J2:J19 can
be calculated by the array formula
=H2+MMULT(A2:D19,H3:H6).
=RidgePred(A2:D19,A2:D19,E2:E19,H9)
Real Statistics Function: The Real Statistics Resource Pack provides the following functions.
RidgeMSE(Rx, Ry, lambda) = MSE of the Ridge regression defined by the x data in Rx, y data
in Ry and the given lambda value.
RidgePred(Rx0, Rx, Ry, lambda): returns an array of predicted y values for the x data in range
Rx0 based on the Ridge regression model defined by Rx, Ry and lambda; if Rx0 contains only
one row then only one y value is returned.
9
GRID SEARCH
Grid-search is used to find the optimal hyperparameters of a model which results in the most
Grid search refers to a technique used to identify the optimal hyperparameters for a model. Unlike parameters, fin
hyperparameters, we create a model for each combination of hyperparameters.
Grid search is thus considered a very traditional hyperparameter optimization method since we are
basically - all possible combinations. The models are then evaluated through cross-
validation. The model boasting the best accuracy is naturally considered to be the best.
A model hyperparameter is a characteristic of a model that is external to the model and whose value cannot
10
Cross validation
We have mentioned that cross-validation is used to evaluate the performance of the models.
Cross-validation measures how a model generalizes itself to an independent dataset. We use
cross-validation to get a good estimate of how well a predictive model performs.
With this method, we have a pair of datasets: an independent dataset and a training dataset. We
can partition a single dataset to yield the two sets. These partitions are of the same size and are
referred to as folds. A model in consideration is trained on all folds, bar one.
The excluded fold is used to then test the model. This process is repeated until all folds are used
as the test set. The average performance of the model on all folds is then used to estimate the
In a technique known as the k-fold cross-validation, a user specifies the number of folds,
represented by kk. This means that when k=5k=5, there are 5 folds.
11
K-fold cross-validation with K as 5.
The example given below is a basic implementation of grid search. We first specify the
hyperparameters we seek to examine. Then we provide a set of values to test.
1. Load dataset.
My first step is loading the dataset using from sklearn.datasets import load_iris and iris =
load_iris(). The iris dataset is sci-kit learn library in Python. Data is stored in
a 150 4150 4 array.
sklearn.model_selection import
12
3. Set estimator parameters.
In this implementation, we use the rbf kernel of the SVR model. rbf stands for the radial basis
function. It introduces some form of non-linearity to the model since the data in use is non-linear.
By this, we mean that the data arrangement follows no specific sequence.
estimator=SVR(kernel='rbf')
4. Specify hyperparameters and range of values.
rbf kernel,
the three hyperparameters to use are C, epsilon, and gamma. We can give each one several
values to choose from.
5. Evaluation.
We do this through grid.fit(X,y), which does the fitting with all the parameters.
13