0% found this document useful (0 votes)

10 views13 pages

DS Notes Unit - V

The document discusses model evaluation metrics in machine learning, focusing on generalization error, confusion matrices, hypothesis testing, and the types of errors that can occur. It also covers techniques like cross-validation, overfitting, underfitting, ridge regression, and grid search for optimizing hyperparameters. The importance of these concepts is emphasized for ensuring reliable model performance on unseen data.

Uploaded by

Vishva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views13 pages

DS Notes Unit - V

Uploaded by

Vishva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

UNIT V

MODEL EVALUATION

GENERALIZATION ERROR

EVALUATION METRICS

Model Evaluation Metrics define the evaluation metrics for evaluating the performance of a

machine learning model, which is an integral component of any data science project. It aims to

estimate the generalization accuracy of a model on the future (unseen/out-of-sample) data.

Confusion Matrix

A confusion matrix is a matrix representation of the prediction results of any binary testing that

is often used to describe the performance of the classification model (or on a set of

test data for which the true values are known.

The confusion matrix itself is relatively simple to understand, but the related terminology can be

confusing.

1
Each prediction can be one of the four outcomes, based on how it matches up to the actual value:

True Positive (TP): Predicted True and True in reality.

True Negative (TN): Predicted False and False in reality.

False Positive (FP): Predicted True and False in reality.

False Negative (FN): Predicted False and True in

reality. Now let us understand this concept using

hypothesis testing.

A Hypothesis is speculation or theory based on insufficient evidence that lends itself to further

testing and experimentation. With further testing, a hypothesis can usually be proven true or false.

A Null Hypothesis is a hypothesis that says there is no statistical significance between the two

variables in the hypothesis. It is the hypothesis that the researcher is trying to disprove.

We would always reject the null hypothesis when it is false, and we would accept the null

hypothesis when it is indeed true.

Even though hypothesis tests are meant to be reliable, there are two types of errors that can occur.
2
These errors are known as Type 1 and Type II errors.

For example, when examining the effectiveness of a drug, the null hypothesis would be that the

drug does not affect a disease.

Type I Error:- equivalent to False Positives(FP).

The first kind of error that is possible involves the rejection of a null hypothesis that is true.

go back to the example of a drug being used to treat a disease. If we reject the null

hypothesis in this situation, then we claim that the drug does have some effect on a disease. But

if the null hypothesis is true, then, in reality, the drug does not combat the disease at all. The

drug is falsely claimed to have a positive effect on a disease.

Type II Error:- equivalent to False Negatives(FN).

The other kind of error that occurs when we accept a false null hypothesis. This sort of error is
called a type II error and is also referred to as an error of the second kind.

CROSS VALIDATION

Cross validation is a technique for assessing how the statistical analysis generalises to an
independent data set.It is a technique for evaluating machine learning models by training several
models on subsets of the available input data and evaluating them on the complementary subset
of the data. Using cross-validation, there are high chances that we can detect over-fitting with
ease.

K-Fold Cross Validation

3
First I would like to introduce you to a golden rule mix training and test . Your
first step should always be to isolate the test data-set and use it only for final evaluation.
Cross- validation will thus be performed on the training set.

Initially, the entire training data set is broken up in k equal parts. The first part is kept as the hold
out (testing) set and the remaining k-1 parts are used to train the model. Then the trained model is then tested on

changing the holdout set. Thus, every data point get an equal opportunity to be included in the
test set.

Usually, k is equal to 3 or 5. It can be extended even to higher values like 10 or 15 but it becomes
extremely computationally expensive and time-consuming. Let us have a look at how we can
implement this with a few lines of Python code and the Sci-kit Learn API.

from sklearn.model_selection import cross_val_score

print(cross_val_score(model, X_train, y_train, cv=5))

We pass the model or classifier object, the features, the labels and the parameter cv which
indicates the K for K-Fold cross-validation. The method will return a list of k accuracy values for

4
each iteration. In general, we take the average of them and use it as a consolidated cross-
validation score.

import numpy as np
print(np.mean(cross_val_score(model, X_train, y_train, cv=5)))

Although it might be computationally expensive, cross-validation is essential for

evaluating the performance of the learning model.

Overfitting and Underfitting :

What is Overfitting?

When a model performs very well for training data but has poor performance with test data (new
data), it is known as overfitting. In this case, the machine learning model learns the details and
noise in the training data such that it negatively affects the performance of the model on test data.
Overfitting can happen due to low bias and high variance.

5
Reasons for Overfitting

Data used for training is not cleaned and contains noise (garbage values) in it

The model has a high variance

The size of the training dataset used is not

enough The model is too complex

Ways to Tackle Overfitting

Using K-fold cross-validation

Using Regularization techniques such as Lasso and

Ridge Training model with sufficient data

Adopting ensembling techniques

What is Underfitting?

When a model has not learned the patterns in the training data well and is unable to generalize
well on the new data, it is known as underfitting. An underfit model has poor performance on the
training data and will result in unreliable predictions. Underfitting occurs due to high bias and low
variance.

6
Reasons for Underfitting

Data used for training is not cleaned and contains noise (garbage values) in it

The model has a high bias

The size of the training dataset used is not

enough The model is too simple

Ways to Tackle Underfitting

Increase the number of features in the

dataset Increase model complexity

Reduce noise in the data

Increase the duration of training the data

Ridge Regression

7
Ridge regression is a model tuning method that is used to analyse any data that suffers from
multicollinearity. This method performs L2 regularization. When the issue of
multicollinearity occurs, least-squares are unbiased, and variances are large, this results in
predicted values being far away from the actual values.

The cost function for ridge regression:

Min(||Y X(theta)||^2 +

Lambda is the penalty term. given here is denoted by an alpha parameter in the ridge function.

Ridge Regression Models

For any type of regression machine learning model, the usual regression equation forms the base
which is written as:

Y = XB + e

Where Y is the dependent variable, X represents the independent variables, B is the regression
coefficients to be estimated, and e represents the errors are residuals.

Ridge Regression Predictions

We now show how to make predictions from a Ridge regression model. In particular, we will
make predictions based on the Ridge regression model created for Example 1 with lambda = 1.6.
The raw input data is repeated in range A1:E19 of Figure 1 and the unstandardized regression
coefficients calculated in Figure 2 of Ridge Regression Analysis Tool is repeated in range G2:H6
of Figure 1.

8
The predictions for the input data are shown in column J. In fact, the values in range J2:J19 can
be calculated by the array formula

=H2+MMULT(A2:D19,H3:H6).

Alternatively, they can be calculated by the array formula

=RidgePred(A2:D19,A2:D19,E2:E19,H9)

Real Statistics Function: The Real Statistics Resource Pack provides the following functions.

RidgeMSE(Rx, Ry, lambda) = MSE of the Ridge regression defined by the x data in Rx, y data
in Ry and the given lambda value.

RidgePred(Rx0, Rx, Ry, lambda): returns an array of predicted y values for the x data in range
Rx0 based on the Ridge regression model defined by Rx, Ry and lambda; if Rx0 contains only
one row then only one y value is returned.

9
GRID SEARCH

Grid-search is used to find the optimal hyperparameters of a model which results in the most

Grid search refers to a technique used to identify the optimal hyperparameters for a model. Unlike parameters, fin
hyperparameters, we create a model for each combination of hyperparameters.

Grid search is thus considered a very traditional hyperparameter optimization method since we are
basically - all possible combinations. The models are then evaluated through cross-
validation. The model boasting the best accuracy is naturally considered to be the best.

A model hyperparameter is a characteristic of a model that is external to the model and whose value cannot

10
Cross validation

We have mentioned that cross-validation is used to evaluate the performance of the models.
Cross-validation measures how a model generalizes itself to an independent dataset. We use
cross-validation to get a good estimate of how well a predictive model performs.

With this method, we have a pair of datasets: an independent dataset and a training dataset. We
can partition a single dataset to yield the two sets. These partitions are of the same size and are
referred to as folds. A model in consideration is trained on all folds, bar one.

The excluded fold is used to then test the model. This process is repeated until all folds are used
as the test set. The average performance of the model on all folds is then used to estimate the

In a technique known as the k-fold cross-validation, a user specifies the number of folds,
represented by kk. This means that when k=5k=5, there are 5 folds.

11
K-fold cross-validation with K as 5.

Grid search implementation

The example given below is a basic implementation of grid search. We first specify the
hyperparameters we seek to examine. Then we provide a set of values to test.

1. Load dataset.

My first step is loading the dataset using from sklearn.datasets import load_iris and iris =
load_iris(). The iris dataset is sci-kit learn library in Python. Data is stored in
a 150 4150 4 array.

2. Import GridSearchCV, svm and SVR.

After loading the dataset, we then import GridSearchCV as well

as svm and SVR from sklearn.model_selection

sklearn.model_selection import

GridSearchCV from sklearn import svm

from sklearn.svm import SVR

12
3. Set estimator parameters.

In this implementation, we use the rbf kernel of the SVR model. rbf stands for the radial basis
function. It introduces some form of non-linearity to the model since the data in use is non-linear.
By this, we mean that the data arrangement follows no specific sequence.

estimator=SVR(kernel='rbf')
4. Specify hyperparameters and range of values.

rbf kernel,
the three hyperparameters to use are C, epsilon, and gamma. We can give each one several
values to choose from.

5. Evaluation.

We mentioned that cross-validation is carried out to estimate the performance of a model. In k-

fold cross-validation, k is the number of folds. As shown below, through cv=5, we use cross-
validation to train the model 5 times. This means that 5 would be the kk value.

6. Fitting the data.

We do this through grid.fit(X,y), which does the fitting with all the parameters.

Int3209 - Data Mining: Week 5: Classification Model Improvements
No ratings yet
Int3209 - Data Mining: Week 5: Classification Model Improvements
56 pages
Cross Validation Thesis
100% (4)
Cross Validation Thesis
5 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Bias Varience Trade Off
100% (2)
Bias Varience Trade Off
35 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
KSMF
No ratings yet
KSMF
35 pages
Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
机器学习
No ratings yet
机器学习
41 pages
Unit8 (Evaluation Method)
No ratings yet
Unit8 (Evaluation Method)
43 pages
6 Model Evalution
No ratings yet
6 Model Evalution
16 pages
Cofusion Matrix Cross - Validation
No ratings yet
Cofusion Matrix Cross - Validation
34 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
ERROR and Confusion Matrix
No ratings yet
ERROR and Confusion Matrix
29 pages
SML Updated UNIT 4
No ratings yet
SML Updated UNIT 4
44 pages
Faiml Revision
No ratings yet
Faiml Revision
5 pages
Presentation of Joule Thomson Effect
100% (6)
Presentation of Joule Thomson Effect
16 pages
Lecture 2.1 - AML
No ratings yet
Lecture 2.1 - AML
32 pages
Lec 5
No ratings yet
Lec 5
28 pages
ML 4
No ratings yet
ML 4
21 pages
Identify Your Helpers of Destiny
90% (10)
Identify Your Helpers of Destiny
6 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
49 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
Cross Validation
No ratings yet
Cross Validation
9 pages
Unit 4
No ratings yet
Unit 4
34 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
Lecture 5 Evaluation - Classifer
No ratings yet
Lecture 5 Evaluation - Classifer
61 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
K Fold
No ratings yet
K Fold
9 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Lec - 4
No ratings yet
Lec - 4
43 pages
Module 3 - ML
No ratings yet
Module 3 - ML
101 pages
Data Science Technical Interview Questions
No ratings yet
Data Science Technical Interview Questions
24 pages
Unit V
No ratings yet
Unit V
12 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
ML.1Lecture.2 (Old)
No ratings yet
ML.1Lecture.2 (Old)
23 pages
Module 6
No ratings yet
Module 6
24 pages
Machine Learning General: Definiton
No ratings yet
Machine Learning General: Definiton
14 pages
Lecture - Model Accuracy Measures
No ratings yet
Lecture - Model Accuracy Measures
61 pages
Performance Evaluation
No ratings yet
Performance Evaluation
24 pages
Hensley Bolt-On Wear Runners
No ratings yet
Hensley Bolt-On Wear Runners
7 pages
Unit 2
No ratings yet
Unit 2
28 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Sunny Helius Lite Smart Brushless Motor Treadpad Treadmill-SF-T722051 Manual
No ratings yet
Sunny Helius Lite Smart Brushless Motor Treadpad Treadmill-SF-T722051 Manual
27 pages
Die Basics 101: Intro To Stamping: Stamping (Metalworking) Stamping Pressing
No ratings yet
Die Basics 101: Intro To Stamping: Stamping (Metalworking) Stamping Pressing
30 pages
Chapter 1 Capstone Project Ai Class 12
No ratings yet
Chapter 1 Capstone Project Ai Class 12
5 pages
ML 5
No ratings yet
ML 5
14 pages
Technical White Paper For VPLS: Huawei Technologies Co., LTD
No ratings yet
Technical White Paper For VPLS: Huawei Technologies Co., LTD
19 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
0106 Gearsolutions
No ratings yet
0106 Gearsolutions
56 pages
Materials Compatibility Milling Units Chart
No ratings yet
Materials Compatibility Milling Units Chart
1 page
Frozen Desserts
No ratings yet
Frozen Desserts
27 pages
APC Detailing Guideline - Draft 01
No ratings yet
APC Detailing Guideline - Draft 01
23 pages
ML MAKAUT Unit-3
No ratings yet
ML MAKAUT Unit-3
6 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
12 pages
Hypothesis in ML
No ratings yet
Hypothesis in ML
8 pages
UserManual en 1051 1100
No ratings yet
UserManual en 1051 1100
50 pages
B1 - Reading Lesson - Countdown To The Paris 2024 Olympics
No ratings yet
B1 - Reading Lesson - Countdown To The Paris 2024 Olympics
6 pages
Evaluation of Predictive Models Final
No ratings yet
Evaluation of Predictive Models Final
6 pages
A Psalm of Life
0% (1)
A Psalm of Life
12 pages
Thomas Printz' Private Bulletin - 1953 - Vol. 01 - 34
No ratings yet
Thomas Printz' Private Bulletin - 1953 - Vol. 01 - 34
2 pages
Validation Over Under Fir Unit 5
No ratings yet
Validation Over Under Fir Unit 5
6 pages
Pa ZG512 Ec-3r First Sem 2022-2023
No ratings yet
Pa ZG512 Ec-3r First Sem 2022-2023
5 pages
PG5 Us
No ratings yet
PG5 Us
2 pages
Disaster Management at International Level
No ratings yet
Disaster Management at International Level
4 pages
Risk Assessment Sheet V2
No ratings yet
Risk Assessment Sheet V2
11 pages
Relay For OLTC Control & Transformer Monitoring: Technical Data
No ratings yet
Relay For OLTC Control & Transformer Monitoring: Technical Data
32 pages
Condition Monitoring Systems (CMS)
No ratings yet
Condition Monitoring Systems (CMS)
9 pages
Effect of Elemental Sulfur On Pitting Corrosion of Steels
No ratings yet
Effect of Elemental Sulfur On Pitting Corrosion of Steels
8 pages
African Religion
No ratings yet
African Religion
5 pages
SEE 3433 Electrical Machines: Classification of DC Machines DC Generators - Separately Excited - Armature Reaction
No ratings yet
SEE 3433 Electrical Machines: Classification of DC Machines DC Generators - Separately Excited - Armature Reaction
22 pages
EAF DustTreatment ByNewProcess
No ratings yet
EAF DustTreatment ByNewProcess
11 pages
Single Function Timers Model No. 740
No ratings yet
Single Function Timers Model No. 740
6 pages
Role of UN and International NGOs in Global Health Governance - Edited
No ratings yet
Role of UN and International NGOs in Global Health Governance - Edited
3 pages
Imm Rota New
No ratings yet
Imm Rota New
1 page
Platoon #1
No ratings yet
Platoon #1
3 pages
Exercises
No ratings yet
Exercises
9 pages
SAR Data Access and Availability One-Pager
No ratings yet
SAR Data Access and Availability One-Pager
2 pages
Unit Plan - Science The 5 Senses
No ratings yet
Unit Plan - Science The 5 Senses
4 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DS Notes Unit - V

Uploaded by

DS Notes Unit - V

Uploaded by

UNIT V

estimate the generalization accuracy of a model on the future (unseen/out-of-sample) data.

test data for which the true values are known.

True Positive (TP): Predicted True and True in reality.

True Negative (TN): Predicted False and False in reality.

False Positive (FP): Predicted True and False in reality.

False Negative (FN): Predicted False and True in

reality. Now let us understand this concept using

hypothesis when it is indeed true.

drug does not affect a disease.

Type I Error:- equivalent to False Positives(FP).

drug is falsely claimed to have a positive effect on a disease.

Type II Error:- equivalent to False Negatives(FN).

K-Fold Cross Validation

from sklearn.model_selection import cross_val_score

Although it might be computationally expensive, cross-validation is essential for

Overfitting and Underfitting :

The model has a high variance

The size of the training dataset used is not

enough The model is too complex

Ways to Tackle Overfitting

Using K-fold cross-validation

Using Regularization techniques such as Lasso and

Ridge Training model with sufficient data

The model has a high bias

The size of the training dataset used is not

enough The model is too simple

Ways to Tackle Underfitting

Increase the number of features in the

dataset Increase model complexity

Increase the duration of training the data

The cost function for ridge regression:

Ridge Regression Models

Ridge Regression Predictions

Alternatively, they can be calculated by the array formula

Grid search implementation

2. Import GridSearchCV, svm and SVR.

After loading the dataset, we then import GridSearchCV as well

GridSearchCV from sklearn import svm

We mentioned that cross-validation is carried out to estimate the performance of a model. In k-

6. Fitting the data.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.