0% found this document useful (0 votes)
25 views21 pages

Chapter 5 Model Evaluation

Chapter 5 discusses model evaluation, focusing on the importance of selecting appropriate learning schemas and the ability to generalize from training data to unseen instances. It covers various metrics such as accuracy, sensitivity, specificity, recall, precision, and F-measure, highlighting their relevance in binary and multiclass classification tasks. The chapter emphasizes the significance of confusion matrices and the differences between macro and micro averaging in evaluating classifier performance.

Uploaded by

yosefdemeke08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views21 pages

Chapter 5 Model Evaluation

Chapter 5 discusses model evaluation, focusing on the importance of selecting appropriate learning schemas and the ability to generalize from training data to unseen instances. It covers various metrics such as accuracy, sensitivity, specificity, recall, precision, and F-measure, highlighting their relevance in binary and multiclass classification tasks. The chapter emphasizes the significance of confusion matrices and the differences between macro and micro averaging in evaluating classifier performance.

Uploaded by

yosefdemeke08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Chapter-5

Model Evaluation

By: Yeshambel A.
Introduction

• Evaluation aims at selecting the most appropriate learning schema for a specific
problem

• We evaluate its ability to generalize what it has been learned from the training set
on the new unseen instances

• Comparison of multiple classifiers on a specific domain (e.g. to find the


best algorithm for a given application task)

2
3
Absolute and Mean Square Error
 Refers to the error committed to classify an object to the desired class
 Error is defined as the difference between the desired value and
the predicted value
 Absolute mean square error is the sum of absolute value of the errors for
all the test data set

where ei = desired – predicted value


• Mean square error is defined as

4
Accuracy

• It assumes equal cost for all classes


• Misleading in unbalanced datasets
• It doesn’t differentiate between different types of errors

• Ex 1:
• Cancer Dataset: 10000 instances, 9990 are normal, 10 are ill , If our model classified all instances as normal
accuracy will be 99.9 %
• Medical diagnosis: 95 % healthy, 5% disease.
• e-Commerce: 99 % do not buy, 1 % buy.
• Security: 99.999 % of citizens are not terrorists.

5
Binary classification Confusion Matrix

6
7
Binary classification Confusion Matrix

• True Positive Rate

• False Positive Rate

• Overall success rate (Accuracy)

• Error rate =1- success rate

8
Sensitivity & Specificity

• Sensitivity

• Measures the classifier ability to detect positive classes (its positivity)

• Specificity

• The specificity measures how accurate is the classifier in not detecting too many
false positives (it measures its negativity)
• high specificity are used to confirm the results of sensitive
9
10
Recall & Precision
• It is used by information retrieval researches to measure accuracy of a search
engine, they define the recall as (number of relevant documents retrieved)
divided by ( total number of relevant documents)
• Recall (also called Sensitivity in some fields) measures the proportion of actual
positives which are correctly identified as such (e.g. the percentage of sick
people who are identified as having the condition);
• Precession of class Yes in classification can defined as the number of instance
classified correctly as class Yes divided by the total number of instances
classified as Yes by the classifier

11
F-measure

• The F-measure is the harmonic-mean (average of rates) of precision and


recall and takes account of both measures.

Or
• F1-score = 2*(precision*recall) /
(precision +recall)
• It is biased towards all cases except the
true negatives
12
13
Notes on Metrics

• As we can see the True Positive rate = Recall = Sensitivity all


are measuring how good the classifier is in finding true positives.

• When FP rate increases, specificity & precision decreases &


vice verse,

• It doesn't mean that specificity and precision are correlated,


• for example in unbalanced datasets the precision can be very low where the
specificity is high
• cause the number of instances in the negative class is much higher
than the number of positive instances
14
Multiclass classification

• For Multiclass prediction task, the result is usually displayed in confusion matrix
where there is a row and a column for each class,
• Each matrix element shows the number of test instances for which the actual class is the row
and the predicted class is the column
• Good results correspond to large numbers down the diagonal and small values (ideally zero)
in the rest of the matrix

Classified as a b c
A TPaa FNab FNac
B FPab TNbb FNbc
C FPac FNcb TNcc

15
Multiclass classification

• For example in three classes task {a , b , c} with the confusion matrix below, if we selected a to be
the class of interest then

• Note that we don’t care about the values (FNcb & FNbc) as we are considered with evaluating
how the classifier is performing with class a, so the misclassifications between the other classes is
out of our interest.

16
Multiclass classification

• To calculate overall model performance, we take their weighted average to


evaluate the overall performance of the classifier.
• Macro avg shows compute the performance of each class separately and then
average them over the total number of classes
• Averaged per category (macro average) :
• Gives equal weight to each class, including rare ones

17
Cont..

• Macro average of recall is computed by taking the average of all


recall values calculated for each class.

• Weighted-average is performed by using the total number of


occurrence for each class and multiplying it by the metric of that
class After that its sum is obtained.

18
19
Multiclass classification

• Micro Average:
• Obtained from true positives (TP), false positives (FP), and false negatives (FN) for each class,
and F-measure is the harmonic mean of micro-averaged precision and recall
• Micro average gives equal weight to each sample regardless of its class.

• They are dominated by those classes with the large number of samples.

20
THANK
YOU

By:Alemwork M. 21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy