0% found this document useful (0 votes)

18 views6 pages

Evaluation Measures For Machine Learning Models

The document outlines various evaluation measures for classification problems in machine learning, including True Positives, True Negatives, False Positives, and False Negatives, along with metrics like Accuracy, Precision, Recall, and F1-Score. It emphasizes the importance of the Confusion Matrix for assessing model performance and discusses the trade-offs between Precision and Recall. Additionally, it introduces ROC-AUC as a valuable metric for evaluating binary classifiers, particularly in imbalanced datasets.

Uploaded by

nabinkoirala53

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views6 pages

Evaluation Measures For Machine Learning Models

Uploaded by

nabinkoirala53

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

" True Positives (TP): The number of positive instances

correctly
classified as positive.
" Example: People having tumor correcty classified

Evaluation Measures For 1.

Classification
"True Negatives (TN): The number of negative instances correctly
classified as negative.
"Example: People not having tumor correctly classified so
Machine Learning Models Problems:
Binary and "False Positives (FP): The number of negative instances incorrectly
classified as positive (Type / Error).
Multiclass " Example: Aperson with a tumor wrongly classified as healthy
Classification "False Negatives (FN): The number of positive instances incorrectly
classified as negative (Type ll Error).
" Example: Ahealthy patient wrongly classified having tumor
Error:
FP + FN
error = , N= (TP+ FP + TN + FN)

" Accuracy: Percentage of correctly classifiedinstances. Confusion Matrix: Aperformance measurement tool for evaluating
TP +TN 1. classification algorithms.
=1- error
accuracy =
N Classification It provides a tabular summary of the predictions made by a
model compared to the actual labels.
" Nota preferred performance measure for classifiers. Why? Problems:
1. "Confusion Matrix for Binary classification: a 2x2 table that
" Imbalanced datasets: In a dataset with 95% negative and 5% Confusion
Classification positive cases, a model predicting all instances as negative compares the predicted values to the actualalues.
Problems: would have 95% accuracy but would completely fail to identIfy Matrix " Each row represents an actual class as recorded in the test set
the positive cases. while each column represents a predicted class as predicted by
Binary and "Skewed Datasets: Some classes are more frequent than
the classifier.
Multiclass
others. Predicted-’ Positive Negative
Classification distribution of
" Accuracy does not account for the Actual
predicted classes versus actual classes, making it less AGUI Postive Negatire Positive 35 15 50

informative in cases where the class balance is critical. True Posltlve (TP) False Negative (FN)} Negative 20 30 50
POtyos
4S 100
" Accuracy is appropriate when: False Positlve (FP) True Negative (TN)
" The dataset is balanced.
" Simplicity is preferred for initial evaluations.
" True
Positive Rate
positive instances that(TPR): Measures the
the model correctly proportion of actual " True Negative Rate (TNR):
TPR = TP identifies
TP as positive.
" High TNR means the model is
1. good at avoiding false alarms
" Concerned with Actual Positive TP + FN (e.g., correctly identifying non-fraudulent
transactions as not
Classification "Out of actual
identifying the positive class 1 fraudulent).
Problems: positives, determines how many did the model Classification "False Positive Rate (FPR): Measures the
correctly identify. negative instances that the model incorrectlyproportion
of actual
Binary and " High TPR => The model is good at Problems: FP classifies as positive.
PP
Multiclass identifying patients with a disease). identifying positives (e.g., Binary and FPR =
Actual Negative TN + FP
Classification True Negative Rate (TNR): Measures the
proportion of actual Multiclass " Concerned with error made in the
negative class
negative instances that the model correctly identifies as negative. Classification " Qut of all actual negatives,
determines how many were
TNR =
TN TN incorrectly classified as positive.
" Low FPR is desirable, especially in scenarios where false
Actual Negative TN + FP positives have significant consequences (e.g., mistakenly
" Concerned with identifying the negative class
" Out of all actual negatives, determines how many did the flagging non-spam emails as spam).
model correctly classify

" False Negative Rate (FNR): Measures the proportion of actual " Precision: Proportion of true positive predictions to total positive
positive instances that the model incorrectly FN
classifiesas negative. predictions (accuracy of positive predictions).
TP TP
FN
FNR precision =
Actual Postive TP + FN all positive predictions TP+ FP
1. 1 " Measures the total number of actual positive predictions
Concerned with errors made in the positive class Classification among all positive prediction (true and false)
Classification "Qut of all actual positives, determines how many were
incorrectly classified as negative Problems: " Hlgh precision => model makes very few false positive erors
Problems: (minimizes false positive).
" Low FNR is critical in scenarios where missing positives has Binary and
Binary and severe consequences (e.g., failing to detect a serious disease). Multiclass
Multiclass "Recall (Sensitlvity or TPR): Measures the ability of the model to
TPR and FNR sum to 1: Classification identify all relevant instances.
Classification TPR + FNR= 1 TP
recall
" TNR and FPR sum to 1: TP +FN
correctly predicted.
TNR+ FPR= 1 " Measures how many actual positives are
positive cases
" High recall => model misses very few
(minimizes false negatives).
dde-offbetween
comes at the cost of Precision
the other. and Recall: improving one metric
If the
model labels more
1. recall, it might also increaseinstances as
false positives,positive to increase
Classification Conversely,
in labeling
to increase reducing
precision, the model might beprecision. 1
TN
Negativa
Predlcted
Positive FP

Problems: "
positives, which could lower recall.
F1-Score: Balances the trade-off between
stricter
Classification
Binary and
of precision and recal.precision and recall Problems:
" Harmonic mean Negative
Multiclass Binary and Actual
F1= precisonx recall
Classification =2x
precison + recal! Multiclass Precision
PrGsion recall Classification Positvg
5
(e.g. 3 out of 4)

TP + FN + FP CFN TP
" Particularly useful when the dataset is imbalanced, and both Rocal!
(e.g., 3 out of 5)
precision and recallare important.
" Favours classifiers having similar
precision and recall

Example 1- Brain Tumour detection: " Example 2- Spam detection: Classify mail as spam
" Goal: To identify brain tumour -True Positive: A person with brain " True Positive: Aspam mail classified as spam
tumour correctly classified " True Negative: Anon-spam mail correctly classified
1 " True Negative: A healthy person correctly classified. 1. "False Positive: Anon-spam incorrectly classified as spam
Classification "False Positive: Healthy person wrongly classified as having Classification " False Negative: Aspam incorrectly classified as non-spam
tumour " In the detection of spam mail, it is okay if any spam mail
Problems: Problems:
" False Negative: A patient with tumour classified as healthy remains undetected (false negative), but what if we miss any
Binary and " We do not want to miss patients with tumour. False Negative Binary and critical mailbecause it is classified as spam (false positive)?
Multiclass should be as low as possible. Multiclass " False positive should be as low as possible
Classification " Precision can be low but recallshould be high. Classification " Precision more important than recall. Precision should be
high; recall can be low.
Predicted’ Brain Tumour No Braln
Predlcted’ Spam Not Spam
Actual Tumour
Actual
Brain Tumour TP FN FN
Spam
No Brain Tumour FF TN FF N
Not Spam
Figure : Confusion Matrix for Braln Tumour Detection Flgure: Confuslon Matrix for Spam Detection
1/22/2025
" Example 3- Search engines
True Positive: ?? "For a multi-class classification problem with k classes, the
" True Negative: ?? confusion matrix 0s a k x k table where:
1. "False Positive: ?? " Rows represent the actual classes.
Classification " False Negative: ?? " Columns represent the predicted classes.
"Each cell C[t,i] contains the count of instances where:
Problems: " Precision is key to
ensure only relevant results are shown, True class =
Binary and while recallensures no relevant results are missed. Confusion
"Predicted class =j
Multiclass "Example 4- Credit Card Fraud Detection Matrix for
True Positive: ?? Multiclass
Classification " True Negative: ?? Classification Predicted 'A Predicted BPredicted c
"False Positive: ??
" False Negative: ??
Actual A C[o,0] C[o,1] C[o,2]
" We do not want to miss any fraud
Actual B C[1,0] C[1,1] C[1,2]
transactions. Therefore, alC C[2,0]
we want False-Negative to be as low as possible. In these ACua c[2,1] C[2,2]
situations, we can compromise with the low precision, but
recallshould be high.

a) Class-wise Metrics b) Precision (Positive Predictive Value): Proportion of correctly

" For each class i: predicted instances for class i:
TPi
1. True Positives (TP): Correctly predicted instances of class i: Precision =
TP = C[L1 TPi+ FPi
2. False Negatives (FN): Actual instances of class imisclassified ) Recall(Sensitivity or True Posithve Rate): Proportion of actual
instances of class li correctly predicted:
Metrics as another class:
Metrics TP:
Recall, = TPi + Fri
Derived from FN,=C[4J) Derived from
d) F1-Score: Harmonic mean of precision and recal
Confusion Confusion
Matrix
3. False Positives (FP): Instances of other classes misclassIfied
as class i: Matrix F1 = 2x Precision, XRecall,
PrecisionË + Recall
FP,= cU,1
4. True Negatives (TN): All other correctly predicted instances
(not class i):
TN, = Total instances - (TP,+ FPi + FN)
1/22/2025
a) Accuracy: Proportion of correctly predicted instances:
Accuracy Total},TP;
Error functions for Continuous Target Variable
Instances " Mean Squared Error (MSE)
b) Macro-Averaged Metrics: Simple average of metrics across all
Global classes (treating all classes equally):, MSE =
k
Metrics Macro Precision = ) Precisions " Penalizes large errors more than small ones (quadratic scaling).
Derived from " Commonly used in linear regression.
Confusion Macro Recall =Recall, "Mean Absolute Error (MAE)
Matrix 2. Regression
Problems MAE =)y- |
Macro F1=F (=1
"Measures average absolute differences between predictions
=1 and actual values.
" Robust to outliers compared to MSE

Error functions for Continuous Target Variable Error functions for Continuous Target Variable
"Root Mean Squared Error (RMSE) " Huber Loss
n

if ly - | s6,
RMSE =
ni=1 L.9) =
" Provides the error in the same units as the target variable.
öly -9l-8 ifly-9|>8
"Combines MSE and MAE characte ristics to handle outliers
" Amplifies large errors more than MSE due to the square root. effectively.
2. Regression" Mean Absolute Percentage Error (MAPE) 2. Regression Log-Cosh Loss
Problerns L.9) =)log(cosh(91 -yi))
MAPE =
SP10
" Represents the error as a percentage of the actualvalues.
Problems
" Similar to MSE but less sensitive to lange outliers.

" Sensitive to small actual values (y)

1/22/2025
"Recelver Operating characteristlc (ROC): evaluation metric for
binary classifiers " What is the signiflcance of ROC-AUC?
" ROCcurve plots the true positive rate (TPR- another name for " Threshold-Independent: Unlike precision and recall, ROC-AUC
1, recall) against the false positive rate (FPR) evaluates model performance across all classification
1, thresholds.
" Demonstrate the trade-off between sensitivity (TPR) and
Classification specificity (FPR) as the decision threshold is varied. Classification "Comparison of Models: ROC-AUC provides a single scalar
Problems: AUC (Area Under the Curve): Quantifies the overallability of the
value to compare multiple classifiers.
Problems:
Binary and modelto discriminate between positive and negative classes. Binary and
" Balanced View: It considers both true positives and false
positives, offering a balanced evaluation metric.
Multiclass Ranges between0 and 1: Multiclass Applications
Classification AUC = 1.0: Perfect classifier
" AUC = 0.5: Random classifier (no Classification "Binary classification Problems: Commonly used in medical
" AUC < 0.5: worse than random,
discrimination power) diagnosis, fraud detection, and other fields where
indicating the modle classification thresholds vary.
might be mislabelling the classes. " Imbalanced Datasets: More reliable than accuracy for
datasets with class imbalances, as it evaluates performance
across all thresholds.

" Example of ROC-AUC: For a spam detection model: " m

" High TPR (Recall): Ensures most spam emails are flagged.
" Low FPR: Minimizes the number of legitimate emails
1 incorrectly marked as spam. The ROC-AUC provides a holistic 1.
Classification view of the model'seffectiveness.
Limitations of ROC-AUC: Classification
Problems: does not consider the actual costs of false positives or Problems:
Binary and Binary and
Multiclass Multiclass
Classification Classification

Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Timber Home Living 2015-09-10
No ratings yet
Timber Home Living 2015-09-10
84 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
Predictive Analytics PPT Info
No ratings yet
Predictive Analytics PPT Info
3 pages
Cycle Sheet - 1 DB
67% (3)
Cycle Sheet - 1 DB
13 pages
Lec 12 13 Evaluation Measures
No ratings yet
Lec 12 13 Evaluation Measures
45 pages
CH 6
No ratings yet
CH 6
24 pages
Notes 03
No ratings yet
Notes 03
38 pages
BSC ML CH1
No ratings yet
BSC ML CH1
63 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Assignment 5
No ratings yet
Assignment 5
22 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
9 Roc Auc
No ratings yet
9 Roc Auc
27 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Confusion Matrix and Performance Evaluation Metrics
No ratings yet
Confusion Matrix and Performance Evaluation Metrics
13 pages
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-09 Reference-Material-IV
No ratings yet
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-09 Reference-Material-IV
20 pages
Confusion Matrix and Outliers
No ratings yet
Confusion Matrix and Outliers
32 pages
Confusion Matrix V 2.0
No ratings yet
Confusion Matrix V 2.0
14 pages
COnfusion Matrix
No ratings yet
COnfusion Matrix
32 pages
HITEC PowerPRO2700 - 2016 PDF
100% (4)
HITEC PowerPRO2700 - 2016 PDF
55 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
Lec5 Classification
No ratings yet
Lec5 Classification
27 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
JNN 5.2 Confusion Matrix and Performance Evaluation Metrics
No ratings yet
JNN 5.2 Confusion Matrix and Performance Evaluation Metrics
13 pages
Lecture - (3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture - (3-4) Evaluation Metrices Classification and Regression
28 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
14 pages
12-Confusion Matrix
No ratings yet
12-Confusion Matrix
3 pages
21-General Approach To Classification, Classification by Decision Tree Induction-17-02-2025
No ratings yet
21-General Approach To Classification, Classification by Decision Tree Induction-17-02-2025
15 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Chater 3 Class 10
No ratings yet
Chater 3 Class 10
4 pages
Module 7 - Evaluation Measures
No ratings yet
Module 7 - Evaluation Measures
27 pages
Imbalance Problem
No ratings yet
Imbalance Problem
13 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
Confusion Matrix
No ratings yet
Confusion Matrix
11 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Classification Algorithm in Machine Learning
No ratings yet
Classification Algorithm in Machine Learning
13 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
Comprehensive Guide On Confusion Matrix 1657202063
No ratings yet
Comprehensive Guide On Confusion Matrix 1657202063
5 pages
L22 KNN+Metrics
No ratings yet
L22 KNN+Metrics
18 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
Performance Measure For A Classification Model.
No ratings yet
Performance Measure For A Classification Model.
5 pages
Dimensioning and Tolerances
No ratings yet
Dimensioning and Tolerances
51 pages
Risk Security and Regulatory Compliance
No ratings yet
Risk Security and Regulatory Compliance
12 pages
Atlas Copco Pf4000 Manual
67% (6)
Atlas Copco Pf4000 Manual
476 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Confusion Matrix
No ratings yet
Confusion Matrix
5 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
Confusion Matrix and Classification Evaluation Metrics
No ratings yet
Confusion Matrix and Classification Evaluation Metrics
16 pages
Wa0013.
No ratings yet
Wa0013.
9 pages
Understanding The Confusion Matrix in Machine Learning
No ratings yet
Understanding The Confusion Matrix in Machine Learning
4 pages
Electrostatics (Formula Sheet)
No ratings yet
Electrostatics (Formula Sheet)
6 pages
Chs 10 - Lesson 2
No ratings yet
Chs 10 - Lesson 2
43 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Confusion Matrix
No ratings yet
Confusion Matrix
14 pages
309-00A Exhaust System - 1.5L EcoBoost
No ratings yet
309-00A Exhaust System - 1.5L EcoBoost
42 pages
Confusion Matrix
No ratings yet
Confusion Matrix
18 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
Measurement Instrumentation and Sensors Handbook Two Volume Set 2nd Edition John G. Webster (Editor) Instant Download
No ratings yet
Measurement Instrumentation and Sensors Handbook Two Volume Set 2nd Edition John G. Webster (Editor) Instant Download
42 pages
Nonincendive Circuit Parameters: Planning and Installation Guide For Tricon v9-v10 Systems
No ratings yet
Nonincendive Circuit Parameters: Planning and Installation Guide For Tricon v9-v10 Systems
26 pages
Confusion Matrix and Performance Evaluation Metrics
No ratings yet
Confusion Matrix and Performance Evaluation Metrics
13 pages
Confusion Matrix and Performance Evaluation Metrics
No ratings yet
Confusion Matrix and Performance Evaluation Metrics
13 pages
IO Wheel Balancer WB220L - CE - 1.1 - ENG - Set910710984
No ratings yet
IO Wheel Balancer WB220L - CE - 1.1 - ENG - Set910710984
18 pages
WRC 107 Tips
No ratings yet
WRC 107 Tips
4 pages
Dr. Devang Sharma
No ratings yet
Dr. Devang Sharma
6 pages
Worksheet On Force
No ratings yet
Worksheet On Force
3 pages
9100 Manual
No ratings yet
9100 Manual
11 pages
Varioklav 75s and 135s
No ratings yet
Varioklav 75s and 135s
6 pages
Realtek Driver For Windows 10
No ratings yet
Realtek Driver For Windows 10
5 pages
Icd Tutorial
No ratings yet
Icd Tutorial
42 pages
Logistic Regression - Validating
No ratings yet
Logistic Regression - Validating
2 pages
TUPLE
No ratings yet
TUPLE
16 pages
1000-4 European Union EN12975
No ratings yet
1000-4 European Union EN12975
26 pages
DF0LS35 - Celestial Navigation
No ratings yet
DF0LS35 - Celestial Navigation
12 pages
Motor Current Calculator
No ratings yet
Motor Current Calculator
2 pages
Automated Face Mask Detection: A Project by Nishant Goel Under The Guidance of Dr. Anil Kumar
No ratings yet
Automated Face Mask Detection: A Project by Nishant Goel Under The Guidance of Dr. Anil Kumar
21 pages
Muravyl Installation ENG
No ratings yet
Muravyl Installation ENG
10 pages
Ethanolamine and Phosphoethanolamine Inhibit Mitochondrial Function in Vitro - Implications For Mitochondrial Dysfunction Hypothesis in Depression and Bipolar Disorder - ScienceDirect
No ratings yet
Ethanolamine and Phosphoethanolamine Inhibit Mitochondrial Function in Vitro - Implications For Mitochondrial Dysfunction Hypothesis in Depression and Bipolar Disorder - ScienceDirect
6 pages
Evolution of Object Approach: C Programming Language
No ratings yet
Evolution of Object Approach: C Programming Language
35 pages
LTspice Tutorial Part 4 - Intermediate Circuits
No ratings yet
LTspice Tutorial Part 4 - Intermediate Circuits
23 pages
CS5371 Theory of Computation
No ratings yet
CS5371 Theory of Computation
2 pages
M.S.Ramaiah Institute of Technology Department of Management Studies
No ratings yet
M.S.Ramaiah Institute of Technology Department of Management Studies
5 pages
Jurnal Spasial: Volume 6, Nomor 1, April
No ratings yet
Jurnal Spasial: Volume 6, Nomor 1, April
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Evaluation Measures For Machine Learning Models

Uploaded by

Evaluation Measures For Machine Learning Models

Uploaded by

" True Positives (TP): The number of positive instances

Evaluation Measures For 1.

a) Class-wise Metrics b) Precision (Positive Predictive Value): Proportion of correctly

" Sensitive to small actual values (y)

" Example of ROC-AUC: For a spam detection model: " m

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.