0% found this document useful (0 votes)

25 views21 pages

Chapter 5 Model Evaluation

Chapter 5 discusses model evaluation, focusing on the importance of selecting appropriate learning schemas and the ability to generalize from training data to unseen instances. It covers various metrics such as accuracy, sensitivity, specificity, recall, precision, and F-measure, highlighting their relevance in binary and multiclass classification tasks. The chapter emphasizes the significance of confusion matrices and the differences between macro and micro averaging in evaluating classifier performance.

Uploaded by

yosefdemeke08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views21 pages

Chapter 5 Model Evaluation

Uploaded by

yosefdemeke08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Chapter-5

Model Evaluation

By: Yeshambel A.
Introduction

• Evaluation aims at selecting the most appropriate learning schema for a specific
problem

• We evaluate its ability to generalize what it has been learned from the training set
on the new unseen instances

• Comparison of multiple classifiers on a specific domain (e.g. to find the

best algorithm for a given application task)

2
3
Absolute and Mean Square Error
 Refers to the error committed to classify an object to the desired class
 Error is defined as the difference between the desired value and
the predicted value
 Absolute mean square error is the sum of absolute value of the errors for
all the test data set

where ei = desired – predicted value

• Mean square error is defined as

4
Accuracy

• It assumes equal cost for all classes

• Misleading in unbalanced datasets
• It doesn’t differentiate between different types of errors

• Ex 1:
• Cancer Dataset: 10000 instances, 9990 are normal, 10 are ill , If our model classified all instances as normal
accuracy will be 99.9 %
• Medical diagnosis: 95 % healthy, 5% disease.
• e-Commerce: 99 % do not buy, 1 % buy.
• Security: 99.999 % of citizens are not terrorists.

5
Binary classification Confusion Matrix

6
7
Binary classification Confusion Matrix

• True Positive Rate

• False Positive Rate

• Overall success rate (Accuracy)

• Error rate =1- success rate

8
Sensitivity & Specificity

• Sensitivity

• Measures the classifier ability to detect positive classes (its positivity)

• Specificity

• The specificity measures how accurate is the classifier in not detecting too many
false positives (it measures its negativity)
• high specificity are used to confirm the results of sensitive
9
10
Recall & Precision
• It is used by information retrieval researches to measure accuracy of a search
engine, they define the recall as (number of relevant documents retrieved)
divided by ( total number of relevant documents)
• Recall (also called Sensitivity in some fields) measures the proportion of actual
positives which are correctly identified as such (e.g. the percentage of sick
people who are identified as having the condition);
• Precession of class Yes in classification can defined as the number of instance
classified correctly as class Yes divided by the total number of instances
classified as Yes by the classifier

11
F-measure

• The F-measure is the harmonic-mean (average of rates) of precision and

recall and takes account of both measures.

Or
• F1-score = 2*(precision*recall) /
(precision +recall)
• It is biased towards all cases except the
true negatives
12
13
Notes on Metrics

• As we can see the True Positive rate = Recall = Sensitivity all

are measuring how good the classifier is in finding true positives.

• When FP rate increases, specificity & precision decreases &

vice verse,

• It doesn't mean that specificity and precision are correlated,

• for example in unbalanced datasets the precision can be very low where the
specificity is high
• cause the number of instances in the negative class is much higher
than the number of positive instances
14
Multiclass classification

• For Multiclass prediction task, the result is usually displayed in confusion matrix
where there is a row and a column for each class,
• Each matrix element shows the number of test instances for which the actual class is the row
and the predicted class is the column
• Good results correspond to large numbers down the diagonal and small values (ideally zero)
in the rest of the matrix

Classified as a b c
A TPaa FNab FNac
B FPab TNbb FNbc
C FPac FNcb TNcc

15
Multiclass classification

• For example in three classes task {a , b , c} with the confusion matrix below, if we selected a to be
the class of interest then

• Note that we don’t care about the values (FNcb & FNbc) as we are considered with evaluating
how the classifier is performing with class a, so the misclassifications between the other classes is
out of our interest.

16
Multiclass classification

• To calculate overall model performance, we take their weighted average to

evaluate the overall performance of the classifier.
• Macro avg shows compute the performance of each class separately and then
average them over the total number of classes
• Averaged per category (macro average) :
• Gives equal weight to each class, including rare ones

17
Cont..

• Macro average of recall is computed by taking the average of all

recall values calculated for each class.

• Weighted-average is performed by using the total number of

occurrence for each class and multiplying it by the metric of that
class After that its sum is obtained.

18
19
Multiclass classification

• Micro Average:
• Obtained from true positives (TP), false positives (FP), and false negatives (FN) for each class,
and F-measure is the harmonic mean of micro-averaged precision and recall
• Micro average gives equal weight to each sample regardless of its class.

• They are dominated by those classes with the large number of samples.

20
THANK
YOU

By:Alemwork M. 21

Question Bank Epidemiology
90% (10)
Question Bank Epidemiology
32 pages
Quality Control in Serology
100% (2)
Quality Control in Serology
56 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
Lecture - (3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture - (3-4) Evaluation Metrices Classification and Regression
28 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Session 1 Evaluation Model
No ratings yet
Session 1 Evaluation Model
58 pages
BSC ML CH1
No ratings yet
BSC ML CH1
63 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Lect 02 Evaluation Part 1
No ratings yet
Lect 02 Evaluation Part 1
33 pages
Data Mining Final
No ratings yet
Data Mining Final
25 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
Module 2
No ratings yet
Module 2
151 pages
Instruction & Option Choice
No ratings yet
Instruction & Option Choice
6 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
20 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
Lec 4
No ratings yet
Lec 4
24 pages
Comprehensive Guide On Confusion Matrix 1657202063
No ratings yet
Comprehensive Guide On Confusion Matrix 1657202063
5 pages
Ads 5
No ratings yet
Ads 5
5 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
Lec 4 ML S4 Evaluation Metrics
No ratings yet
Lec 4 ML S4 Evaluation Metrics
29 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
Classification Algorithm in Machine Learning
No ratings yet
Classification Algorithm in Machine Learning
13 pages
Confusion Matrix
No ratings yet
Confusion Matrix
11 pages
Performance Metrics Classification
No ratings yet
Performance Metrics Classification
39 pages
Chater 3 Class 10
No ratings yet
Chater 3 Class 10
4 pages
Ads Exp4
No ratings yet
Ads Exp4
3 pages
ML Lecture 11 Evaluation
No ratings yet
ML Lecture 11 Evaluation
17 pages
Imp Notes For Aamd
No ratings yet
Imp Notes For Aamd
6 pages
Lecture 7
No ratings yet
Lecture 7
25 pages
Lec 8
No ratings yet
Lec 8
35 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
Lesson 2.4.1 What Is Scikit Learn Keynote
No ratings yet
Lesson 2.4.1 What Is Scikit Learn Keynote
21 pages
Lec5 Classification
No ratings yet
Lec5 Classification
27 pages
Performance Measures - Session 2
No ratings yet
Performance Measures - Session 2
35 pages
Chương 2e. Model Evaluation
No ratings yet
Chương 2e. Model Evaluation
27 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
Lecture 20 - Evaluation Metrics
No ratings yet
Lecture 20 - Evaluation Metrics
27 pages
9b. Evaluation of Classifiers
No ratings yet
9b. Evaluation of Classifiers
4 pages
Module 6
No ratings yet
Module 6
24 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Lec07 Classification ModelEvaluation Ensemble
No ratings yet
Lec07 Classification ModelEvaluation Ensemble
62 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Machine Learning Chapter3
No ratings yet
Machine Learning Chapter3
27 pages
Confusion Matrix
No ratings yet
Confusion Matrix
14 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
ML Individual Assigenment 1
No ratings yet
ML Individual Assigenment 1
11 pages
Selected Topic
No ratings yet
Selected Topic
14 pages
Chapter 4 Neural Network
No ratings yet
Chapter 4 Neural Network
46 pages
Chapter 3-Unsupervised Learning - Updated
No ratings yet
Chapter 3-Unsupervised Learning - Updated
54 pages
Chapter 1-Introduction
No ratings yet
Chapter 1-Introduction
33 pages
Tadlo MCL
No ratings yet
Tadlo MCL
11 pages
Machine Learningassignment G - 7
No ratings yet
Machine Learningassignment G - 7
10 pages
MCL Ind Assign
No ratings yet
MCL Ind Assign
10 pages
Tugas Tutorial Epidemiologi M1.U3: Screening and Diagnostic Test
No ratings yet
Tugas Tutorial Epidemiologi M1.U3: Screening and Diagnostic Test
9 pages
Slides CRM - 4
No ratings yet
Slides CRM - 4
33 pages
Zebra Medical Vision
No ratings yet
Zebra Medical Vision
5 pages
Recent Developments With The PAI
No ratings yet
Recent Developments With The PAI
31 pages
Artigo Chaves e Izquierdo 1992 Mini Mental
No ratings yet
Artigo Chaves e Izquierdo 1992 Mini Mental
5 pages
Journal in Hematology 1 - Verification and Quality Control of Routine Hematology Analyzers-Merged
No ratings yet
Journal in Hematology 1 - Verification and Quality Control of Routine Hematology Analyzers-Merged
27 pages
Colilert Raport
No ratings yet
Colilert Raport
22 pages
Cytology of Bone Fine Needle Aspiration Biopsy
No ratings yet
Cytology of Bone Fine Needle Aspiration Biopsy
11 pages
Brunner Et Al
No ratings yet
Brunner Et Al
8 pages
Criterion and Concurrent Validity of Conners Adult Adhd Diagnostic Interview For Dsm-Iv (Caadid) Spanish
No ratings yet
Criterion and Concurrent Validity of Conners Adult Adhd Diagnostic Interview For Dsm-Iv (Caadid) Spanish
7 pages
Lecture 3: Periodontal Diagnostic Techniques: I. Diagnosis
No ratings yet
Lecture 3: Periodontal Diagnostic Techniques: I. Diagnosis
4 pages
Quality Assurance in Hemat Labs
No ratings yet
Quality Assurance in Hemat Labs
38 pages
Neonatal Jaundice Detection System
No ratings yet
Neonatal Jaundice Detection System
12 pages
The Functional Movement Screen: A Reliability Study: Research
No ratings yet
The Functional Movement Screen: A Reliability Study: Research
11 pages
Confusion Matrix
No ratings yet
Confusion Matrix
13 pages
L 0025614809 PDF
No ratings yet
L 0025614809 PDF
16 pages
110 TOP SOCIAL and PREVENTIVE Multiple Choice Questions and Answers PDF 2019
100% (1)
110 TOP SOCIAL and PREVENTIVE Multiple Choice Questions and Answers PDF 2019
24 pages
ABNGBE
100% (1)
ABNGBE
74 pages
Yield of Thoracoscopic Biopsy Truenat in The Diagnosis of Tubercular Pleural Effusion
No ratings yet
Yield of Thoracoscopic Biopsy Truenat in The Diagnosis of Tubercular Pleural Effusion
7 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Module 01 - Performance Metrics in ML
No ratings yet
Module 01 - Performance Metrics in ML
15 pages
Mini-COG Test PDF
No ratings yet
Mini-COG Test PDF
5 pages
Cognitive Assement - Woodford PDF
No ratings yet
Cognitive Assement - Woodford PDF
16 pages
Stokes Et Al. - 2022 - The Use of Artificial Intelligence Systems in Diagnosis of Pneumonia Via Signs and Symptoms A Systematic Review-Annotated
No ratings yet
Stokes Et Al. - 2022 - The Use of Artificial Intelligence Systems in Diagnosis of Pneumonia Via Signs and Symptoms A Systematic Review-Annotated
14 pages
Isp565 - Its665 Feb 22
No ratings yet
Isp565 - Its665 Feb 22
17 pages
Excel II Instruction Manual
No ratings yet
Excel II Instruction Manual
20 pages
Nutritional Assessment of Critically Ill Patients
No ratings yet
Nutritional Assessment of Critically Ill Patients
8 pages
Exercise ECG Testing Performing The Test and Interpreting The ECG Results
No ratings yet
Exercise ECG Testing Performing The Test and Interpreting The ECG Results
25 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 5 Model Evaluation

Uploaded by

Chapter 5 Model Evaluation

Uploaded by

Chapter-5

• Comparison of multiple classifiers on a specific domain (e.g. to find the

where ei = desired – predicted value

• It assumes equal cost for all classes

• True Positive Rate

• False Positive Rate

• Overall success rate (Accuracy)

• Error rate =1- success rate

• Measures the classifier ability to detect positive classes (its positivity)

• The F-measure is the harmonic-mean (average of rates) of precision and

• As we can see the True Positive rate = Recall = Sensitivity all

• When FP rate increases, specificity & precision decreases &

• It doesn't mean that specificity and precision are correlated,

• To calculate overall model performance, we take their weighted average to

• Macro average of recall is computed by taking the average of all

• Weighted-average is performed by using the total number of

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.