0% found this document useful (0 votes)
18 views6 pages

Evaluation Measures For Machine Learning Models

The document outlines various evaluation measures for classification problems in machine learning, including True Positives, True Negatives, False Positives, and False Negatives, along with metrics like Accuracy, Precision, Recall, and F1-Score. It emphasizes the importance of the Confusion Matrix for assessing model performance and discusses the trade-offs between Precision and Recall. Additionally, it introduces ROC-AUC as a valuable metric for evaluating binary classifiers, particularly in imbalanced datasets.

Uploaded by

nabinkoirala53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views6 pages

Evaluation Measures For Machine Learning Models

The document outlines various evaluation measures for classification problems in machine learning, including True Positives, True Negatives, False Positives, and False Negatives, along with metrics like Accuracy, Precision, Recall, and F1-Score. It emphasizes the importance of the Confusion Matrix for assessing model performance and discusses the trade-offs between Precision and Recall. Additionally, it introduces ROC-AUC as a valuable metric for evaluating binary classifiers, particularly in imbalanced datasets.

Uploaded by

nabinkoirala53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

" True Positives (TP): The number of positive instances

correctly
classified as positive.
" Example: People having tumor correcty classified

Evaluation Measures For 1.


Classification
"True Negatives (TN): The number of negative instances correctly
classified as negative.
"Example: People not having tumor correctly classified so
Machine Learning Models Problems:
Binary and "False Positives (FP): The number of negative instances incorrectly
classified as positive (Type / Error).
Multiclass " Example: Aperson with a tumor wrongly classified as healthy
Classification "False Negatives (FN): The number of positive instances incorrectly
classified as negative (Type ll Error).
" Example: Ahealthy patient wrongly classified having tumor
Error:
FP + FN
error = , N= (TP+ FP + TN + FN)

" Accuracy: Percentage of correctly classifiedinstances. Confusion Matrix: Aperformance measurement tool for evaluating
TP +TN 1. classification algorithms.
=1- error
accuracy =
N Classification It provides a tabular summary of the predictions made by a
model compared to the actual labels.
" Nota preferred performance measure for classifiers. Why? Problems:
1. "Confusion Matrix for Binary classification: a 2x2 table that
" Imbalanced datasets: In a dataset with 95% negative and 5% Confusion
Classification positive cases, a model predicting all instances as negative compares the predicted values to the actualalues.
Problems: would have 95% accuracy but would completely fail to identIfy Matrix " Each row represents an actual class as recorded in the test set
the positive cases. while each column represents a predicted class as predicted by
Binary and "Skewed Datasets: Some classes are more frequent than
the classifier.
Multiclass
others. Predicted-’ Positive Negative
Classification distribution of
" Accuracy does not account for the Actual
predicted classes versus actual classes, making it less AGUI Postive Negatire Positive 35 15 50

informative in cases where the class balance is critical. True Posltlve (TP) False Negative (FN)} Negative 20 30 50
POtyos
4S 100
" Accuracy is appropriate when: False Positlve (FP) True Negative (TN)
" The dataset is balanced.
" Simplicity is preferred for initial evaluations.
" True
Positive Rate
positive instances that(TPR): Measures the
the model correctly proportion of actual " True Negative Rate (TNR):
TPR = TP identifies
TP as positive.
" High TNR means the model is
1. good at avoiding false alarms
" Concerned with Actual Positive TP + FN (e.g., correctly identifying non-fraudulent
transactions as not
Classification "Out of actual
identifying the positive class 1 fraudulent).
Problems: positives, determines how many did the model Classification "False Positive Rate (FPR): Measures the
correctly identify. negative instances that the model incorrectlyproportion
of actual
Binary and " High TPR => The model is good at Problems: FP classifies as positive.
PP
Multiclass identifying patients with a disease). identifying positives (e.g., Binary and FPR =
Actual Negative TN + FP
Classification True Negative Rate (TNR): Measures the
proportion of actual Multiclass " Concerned with error made in the
negative class
negative instances that the model correctly identifies as negative. Classification " Qut of all actual negatives,
determines how many were
TNR =
TN TN incorrectly classified as positive.
" Low FPR is desirable, especially in scenarios where false
Actual Negative TN + FP positives have significant consequences (e.g., mistakenly
" Concerned with identifying the negative class
" Out of all actual negatives, determines how many did the flagging non-spam emails as spam).
model correctly classify

" False Negative Rate (FNR): Measures the proportion of actual " Precision: Proportion of true positive predictions to total positive
positive instances that the model incorrectly FN
classifiesas negative. predictions (accuracy of positive predictions).
TP TP
FN
FNR precision =
Actual Postive TP + FN all positive predictions TP+ FP
1. 1 " Measures the total number of actual positive predictions
Concerned with errors made in the positive class Classification among all positive prediction (true and false)
Classification "Qut of all actual positives, determines how many were
incorrectly classified as negative Problems: " Hlgh precision => model makes very few false positive erors
Problems: (minimizes false positive).
" Low FNR is critical in scenarios where missing positives has Binary and
Binary and severe consequences (e.g., failing to detect a serious disease). Multiclass
Multiclass "Recall (Sensitlvity or TPR): Measures the ability of the model to
TPR and FNR sum to 1: Classification identify all relevant instances.
Classification TPR + FNR= 1 TP
recall
" TNR and FPR sum to 1: TP +FN
correctly predicted.
TNR+ FPR= 1 " Measures how many actual positives are
positive cases
" High recall => model misses very few
(minimizes false negatives).
dde-offbetween
comes at the cost of Precision
the other. and Recall: improving one metric
If the
model labels more
1. recall, it might also increaseinstances as
false positives,positive to increase
Classification Conversely,
in labeling
to increase reducing
precision, the model might beprecision. 1
TN
Negativa
Predlcted
Positive FP

Problems: "
positives, which could lower recall.
F1-Score: Balances the trade-off between
stricter
Classification
Binary and
of precision and recal.precision and recall Problems:
" Harmonic mean Negative
Multiclass Binary and Actual
F1= precisonx recall
Classification =2x
precison + recal! Multiclass Precision
PrGsion recall Classification Positvg
5
(e.g. 3 out of 4)

TP + FN + FP CFN TP
" Particularly useful when the dataset is imbalanced, and both Rocal!
(e.g., 3 out of 5)
precision and recallare important.
" Favours classifiers having similar
precision and recall

Example 1- Brain Tumour detection: " Example 2- Spam detection: Classify mail as spam
" Goal: To identify brain tumour -True Positive: A person with brain " True Positive: Aspam mail classified as spam
tumour correctly classified " True Negative: Anon-spam mail correctly classified
1 " True Negative: A healthy person correctly classified. 1. "False Positive: Anon-spam incorrectly classified as spam
Classification "False Positive: Healthy person wrongly classified as having Classification " False Negative: Aspam incorrectly classified as non-spam
tumour " In the detection of spam mail, it is okay if any spam mail
Problems: Problems:
" False Negative: A patient with tumour classified as healthy remains undetected (false negative), but what if we miss any
Binary and " We do not want to miss patients with tumour. False Negative Binary and critical mailbecause it is classified as spam (false positive)?
Multiclass should be as low as possible. Multiclass " False positive should be as low as possible
Classification " Precision can be low but recallshould be high. Classification " Precision more important than recall. Precision should be
high; recall can be low.
Predicted’ Brain Tumour No Braln
Predlcted’ Spam Not Spam
Actual Tumour
Actual
Brain Tumour TP FN FN
Spam
No Brain Tumour FF TN FF N
Not Spam
Figure : Confusion Matrix for Braln Tumour Detection Flgure: Confuslon Matrix for Spam Detection
1/22/2025
" Example 3- Search engines
True Positive: ?? "For a multi-class classification problem with k classes, the
" True Negative: ?? confusion matrix 0s a k x k table where:
1. "False Positive: ?? " Rows represent the actual classes.
Classification " False Negative: ?? " Columns represent the predicted classes.
"Each cell C[t,i] contains the count of instances where:
Problems: " Precision is key to
ensure only relevant results are shown, True class =
Binary and while recallensures no relevant results are missed. Confusion
"Predicted class =j
Multiclass "Example 4- Credit Card Fraud Detection Matrix for
True Positive: ?? Multiclass
Classification " True Negative: ?? Classification Predicted 'A Predicted BPredicted c
"False Positive: ??
" False Negative: ??
Actual A C[o,0] C[o,1] C[o,2]
" We do not want to miss any fraud
Actual B C[1,0] C[1,1] C[1,2]
transactions. Therefore, alC C[2,0]
we want False-Negative to be as low as possible. In these ACua c[2,1] C[2,2]
situations, we can compromise with the low precision, but
recallshould be high.

a) Class-wise Metrics b) Precision (Positive Predictive Value): Proportion of correctly


" For each class i: predicted instances for class i:
TPi
1. True Positives (TP): Correctly predicted instances of class i: Precision =
TP = C[L1 TPi+ FPi
2. False Negatives (FN): Actual instances of class imisclassified ) Recall(Sensitivity or True Posithve Rate): Proportion of actual
instances of class li correctly predicted:
Metrics as another class:
Metrics TP:
Recall, = TPi + Fri
Derived from FN,=C[4J) Derived from
d) F1-Score: Harmonic mean of precision and recal
Confusion Confusion
Matrix
3. False Positives (FP): Instances of other classes misclassIfied
as class i: Matrix F1 = 2x Precision, XRecall,
PrecisionË + Recall
FP,= cU,1
4. True Negatives (TN): All other correctly predicted instances
(not class i):
TN, = Total instances - (TP,+ FPi + FN)
1/22/2025
a) Accuracy: Proportion of correctly predicted instances:
Accuracy Total},TP;
Error functions for Continuous Target Variable
Instances " Mean Squared Error (MSE)
b) Macro-Averaged Metrics: Simple average of metrics across all
Global classes (treating all classes equally):, MSE =
k
Metrics Macro Precision = ) Precisions " Penalizes large errors more than small ones (quadratic scaling).
Derived from " Commonly used in linear regression.
Confusion Macro Recall =Recall, "Mean Absolute Error (MAE)
Matrix 2. Regression
Problems MAE =)y- |
Macro F1=F (=1
"Measures average absolute differences between predictions
=1 and actual values.
" Robust to outliers compared to MSE

Error functions for Continuous Target Variable Error functions for Continuous Target Variable
"Root Mean Squared Error (RMSE) " Huber Loss
n

if ly - | s6,
RMSE =
ni=1 L.9) =
" Provides the error in the same units as the target variable.
öly -9l-8 ifly-9|>8
"Combines MSE and MAE characte ristics to handle outliers
" Amplifies large errors more than MSE due to the square root. effectively.
2. Regression" Mean Absolute Percentage Error (MAPE) 2. Regression Log-Cosh Loss
Problerns L.9) =)log(cosh(91 -yi))
MAPE =
SP10
" Represents the error as a percentage of the actualvalues.
Problems
" Similar to MSE but less sensitive to lange outliers.

" Sensitive to small actual values (y)


1/22/2025
"Recelver Operating characteristlc (ROC): evaluation metric for
binary classifiers " What is the signiflcance of ROC-AUC?
" ROCcurve plots the true positive rate (TPR- another name for " Threshold-Independent: Unlike precision and recall, ROC-AUC
1, recall) against the false positive rate (FPR) evaluates model performance across all classification
1, thresholds.
" Demonstrate the trade-off between sensitivity (TPR) and
Classification specificity (FPR) as the decision threshold is varied. Classification "Comparison of Models: ROC-AUC provides a single scalar
Problems: AUC (Area Under the Curve): Quantifies the overallability of the
value to compare multiple classifiers.
Problems:
Binary and modelto discriminate between positive and negative classes. Binary and
" Balanced View: It considers both true positives and false
positives, offering a balanced evaluation metric.
Multiclass Ranges between0 and 1: Multiclass Applications
Classification AUC = 1.0: Perfect classifier
" AUC = 0.5: Random classifier (no Classification "Binary classification Problems: Commonly used in medical
" AUC < 0.5: worse than random,
discrimination power) diagnosis, fraud detection, and other fields where
indicating the modle classification thresholds vary.
might be mislabelling the classes. " Imbalanced Datasets: More reliable than accuracy for
datasets with class imbalances, as it evaluates performance
across all thresholds.

" Example of ROC-AUC: For a spam detection model: " m


" High TPR (Recall): Ensures most spam emails are flagged.
" Low FPR: Minimizes the number of legitimate emails
1 incorrectly marked as spam. The ROC-AUC provides a holistic 1.
Classification view of the model'seffectiveness.
Limitations of ROC-AUC: Classification
Problems: does not consider the actual costs of false positives or Problems:
Binary and Binary and
Multiclass Multiclass
Classification Classification

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy