Chapter 5 Model Evaluation
Chapter 5 Model Evaluation
Model Evaluation
By: Yeshambel A.
Introduction
• Evaluation aims at selecting the most appropriate learning schema for a specific
problem
• We evaluate its ability to generalize what it has been learned from the training set
on the new unseen instances
2
3
Absolute and Mean Square Error
Refers to the error committed to classify an object to the desired class
Error is defined as the difference between the desired value and
the predicted value
Absolute mean square error is the sum of absolute value of the errors for
all the test data set
4
Accuracy
• Ex 1:
• Cancer Dataset: 10000 instances, 9990 are normal, 10 are ill , If our model classified all instances as normal
accuracy will be 99.9 %
• Medical diagnosis: 95 % healthy, 5% disease.
• e-Commerce: 99 % do not buy, 1 % buy.
• Security: 99.999 % of citizens are not terrorists.
5
Binary classification Confusion Matrix
6
7
Binary classification Confusion Matrix
8
Sensitivity & Specificity
• Sensitivity
• Specificity
• The specificity measures how accurate is the classifier in not detecting too many
false positives (it measures its negativity)
• high specificity are used to confirm the results of sensitive
9
10
Recall & Precision
• It is used by information retrieval researches to measure accuracy of a search
engine, they define the recall as (number of relevant documents retrieved)
divided by ( total number of relevant documents)
• Recall (also called Sensitivity in some fields) measures the proportion of actual
positives which are correctly identified as such (e.g. the percentage of sick
people who are identified as having the condition);
• Precession of class Yes in classification can defined as the number of instance
classified correctly as class Yes divided by the total number of instances
classified as Yes by the classifier
11
F-measure
Or
• F1-score = 2*(precision*recall) /
(precision +recall)
• It is biased towards all cases except the
true negatives
12
13
Notes on Metrics
• For Multiclass prediction task, the result is usually displayed in confusion matrix
where there is a row and a column for each class,
• Each matrix element shows the number of test instances for which the actual class is the row
and the predicted class is the column
• Good results correspond to large numbers down the diagonal and small values (ideally zero)
in the rest of the matrix
Classified as a b c
A TPaa FNab FNac
B FPab TNbb FNbc
C FPac FNcb TNcc
15
Multiclass classification
• For example in three classes task {a , b , c} with the confusion matrix below, if we selected a to be
the class of interest then
• Note that we don’t care about the values (FNcb & FNbc) as we are considered with evaluating
how the classifier is performing with class a, so the misclassifications between the other classes is
out of our interest.
16
Multiclass classification
17
Cont..
18
19
Multiclass classification
• Micro Average:
• Obtained from true positives (TP), false positives (FP), and false negatives (FN) for each class,
and F-measure is the harmonic mean of micro-averaged precision and recall
• Micro average gives equal weight to each sample regardless of its class.
• They are dominated by those classes with the large number of samples.
20
THANK
YOU
By:Alemwork M. 21