Evaluation Measures For Machine Learning Models
Evaluation Measures For Machine Learning Models
correctly
classified as positive.
" Example: People having tumor correcty classified
" Accuracy: Percentage of correctly classifiedinstances. Confusion Matrix: Aperformance measurement tool for evaluating
TP +TN 1. classification algorithms.
=1- error
accuracy =
N Classification It provides a tabular summary of the predictions made by a
model compared to the actual labels.
" Nota preferred performance measure for classifiers. Why? Problems:
1. "Confusion Matrix for Binary classification: a 2x2 table that
" Imbalanced datasets: In a dataset with 95% negative and 5% Confusion
Classification positive cases, a model predicting all instances as negative compares the predicted values to the actualalues.
Problems: would have 95% accuracy but would completely fail to identIfy Matrix " Each row represents an actual class as recorded in the test set
the positive cases. while each column represents a predicted class as predicted by
Binary and "Skewed Datasets: Some classes are more frequent than
the classifier.
Multiclass
others. Predicted-’ Positive Negative
Classification distribution of
" Accuracy does not account for the Actual
predicted classes versus actual classes, making it less AGUI Postive Negatire Positive 35 15 50
informative in cases where the class balance is critical. True Posltlve (TP) False Negative (FN)} Negative 20 30 50
POtyos
4S 100
" Accuracy is appropriate when: False Positlve (FP) True Negative (TN)
" The dataset is balanced.
" Simplicity is preferred for initial evaluations.
" True
Positive Rate
positive instances that(TPR): Measures the
the model correctly proportion of actual " True Negative Rate (TNR):
TPR = TP identifies
TP as positive.
" High TNR means the model is
1. good at avoiding false alarms
" Concerned with Actual Positive TP + FN (e.g., correctly identifying non-fraudulent
transactions as not
Classification "Out of actual
identifying the positive class 1 fraudulent).
Problems: positives, determines how many did the model Classification "False Positive Rate (FPR): Measures the
correctly identify. negative instances that the model incorrectlyproportion
of actual
Binary and " High TPR => The model is good at Problems: FP classifies as positive.
PP
Multiclass identifying patients with a disease). identifying positives (e.g., Binary and FPR =
Actual Negative TN + FP
Classification True Negative Rate (TNR): Measures the
proportion of actual Multiclass " Concerned with error made in the
negative class
negative instances that the model correctly identifies as negative. Classification " Qut of all actual negatives,
determines how many were
TNR =
TN TN incorrectly classified as positive.
" Low FPR is desirable, especially in scenarios where false
Actual Negative TN + FP positives have significant consequences (e.g., mistakenly
" Concerned with identifying the negative class
" Out of all actual negatives, determines how many did the flagging non-spam emails as spam).
model correctly classify
" False Negative Rate (FNR): Measures the proportion of actual " Precision: Proportion of true positive predictions to total positive
positive instances that the model incorrectly FN
classifiesas negative. predictions (accuracy of positive predictions).
TP TP
FN
FNR precision =
Actual Postive TP + FN all positive predictions TP+ FP
1. 1 " Measures the total number of actual positive predictions
Concerned with errors made in the positive class Classification among all positive prediction (true and false)
Classification "Qut of all actual positives, determines how many were
incorrectly classified as negative Problems: " Hlgh precision => model makes very few false positive erors
Problems: (minimizes false positive).
" Low FNR is critical in scenarios where missing positives has Binary and
Binary and severe consequences (e.g., failing to detect a serious disease). Multiclass
Multiclass "Recall (Sensitlvity or TPR): Measures the ability of the model to
TPR and FNR sum to 1: Classification identify all relevant instances.
Classification TPR + FNR= 1 TP
recall
" TNR and FPR sum to 1: TP +FN
correctly predicted.
TNR+ FPR= 1 " Measures how many actual positives are
positive cases
" High recall => model misses very few
(minimizes false negatives).
dde-offbetween
comes at the cost of Precision
the other. and Recall: improving one metric
If the
model labels more
1. recall, it might also increaseinstances as
false positives,positive to increase
Classification Conversely,
in labeling
to increase reducing
precision, the model might beprecision. 1
TN
Negativa
Predlcted
Positive FP
Problems: "
positives, which could lower recall.
F1-Score: Balances the trade-off between
stricter
Classification
Binary and
of precision and recal.precision and recall Problems:
" Harmonic mean Negative
Multiclass Binary and Actual
F1= precisonx recall
Classification =2x
precison + recal! Multiclass Precision
PrGsion recall Classification Positvg
5
(e.g. 3 out of 4)
TP + FN + FP CFN TP
" Particularly useful when the dataset is imbalanced, and both Rocal!
(e.g., 3 out of 5)
precision and recallare important.
" Favours classifiers having similar
precision and recall
Example 1- Brain Tumour detection: " Example 2- Spam detection: Classify mail as spam
" Goal: To identify brain tumour -True Positive: A person with brain " True Positive: Aspam mail classified as spam
tumour correctly classified " True Negative: Anon-spam mail correctly classified
1 " True Negative: A healthy person correctly classified. 1. "False Positive: Anon-spam incorrectly classified as spam
Classification "False Positive: Healthy person wrongly classified as having Classification " False Negative: Aspam incorrectly classified as non-spam
tumour " In the detection of spam mail, it is okay if any spam mail
Problems: Problems:
" False Negative: A patient with tumour classified as healthy remains undetected (false negative), but what if we miss any
Binary and " We do not want to miss patients with tumour. False Negative Binary and critical mailbecause it is classified as spam (false positive)?
Multiclass should be as low as possible. Multiclass " False positive should be as low as possible
Classification " Precision can be low but recallshould be high. Classification " Precision more important than recall. Precision should be
high; recall can be low.
Predicted’ Brain Tumour No Braln
Predlcted’ Spam Not Spam
Actual Tumour
Actual
Brain Tumour TP FN FN
Spam
No Brain Tumour FF TN FF N
Not Spam
Figure : Confusion Matrix for Braln Tumour Detection Flgure: Confuslon Matrix for Spam Detection
1/22/2025
" Example 3- Search engines
True Positive: ?? "For a multi-class classification problem with k classes, the
" True Negative: ?? confusion matrix 0s a k x k table where:
1. "False Positive: ?? " Rows represent the actual classes.
Classification " False Negative: ?? " Columns represent the predicted classes.
"Each cell C[t,i] contains the count of instances where:
Problems: " Precision is key to
ensure only relevant results are shown, True class =
Binary and while recallensures no relevant results are missed. Confusion
"Predicted class =j
Multiclass "Example 4- Credit Card Fraud Detection Matrix for
True Positive: ?? Multiclass
Classification " True Negative: ?? Classification Predicted 'A Predicted BPredicted c
"False Positive: ??
" False Negative: ??
Actual A C[o,0] C[o,1] C[o,2]
" We do not want to miss any fraud
Actual B C[1,0] C[1,1] C[1,2]
transactions. Therefore, alC C[2,0]
we want False-Negative to be as low as possible. In these ACua c[2,1] C[2,2]
situations, we can compromise with the low precision, but
recallshould be high.
Error functions for Continuous Target Variable Error functions for Continuous Target Variable
"Root Mean Squared Error (RMSE) " Huber Loss
n
if ly - | s6,
RMSE =
ni=1 L.9) =
" Provides the error in the same units as the target variable.
öly -9l-8 ifly-9|>8
"Combines MSE and MAE characte ristics to handle outliers
" Amplifies large errors more than MSE due to the square root. effectively.
2. Regression" Mean Absolute Percentage Error (MAPE) 2. Regression Log-Cosh Loss
Problerns L.9) =)log(cosh(91 -yi))
MAPE =
SP10
" Represents the error as a percentage of the actualvalues.
Problems
" Similar to MSE but less sensitive to lange outliers.