0% found this document useful (0 votes)

138 views7 pages

Evaluation Metrics:: Confusion Matrix

Evaluation metrics are used to measure the quality of statistical and machine learning models. There are many types of metrics that can evaluate models including confusion matrices, accuracy, precision, recall, F1 score, AUC-ROC curves, and mean squared error. Choosing the appropriate metric depends on factors like the problem type (classification or regression) and goals like balancing precision vs recall. Evaluation allows models to be optimized and the best model to be selected.

Uploaded by

Nithya Prasath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

138 views7 pages

Evaluation Metrics:: Confusion Matrix

Uploaded by

Nithya Prasath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Evaluation Metrics:

Evaluation Metrics are used to measure the quality of the statistical or

machine learning models. Evaluating Machine learning models or algorithms is essential for
any projects. There are many different types of evaluation metrics available to test a model.

Confusion Matrix:

 It is an NxN matrix, where N is the number of classes being Predicted.

 It gives us a matrix as output and describes the complete performance of the
model.

 Accuracy: the proportion of the total number of predictions that were correct
 True Positive: Positive class correctly predicted as positive
 False Negative : Positive class incorrectly predicted as negative.
 False Positive: Negative class incorrectly predicted as positive
 True Negative: Negative class correctly predicted as negative

 Sensitivity: It is also known as True positive rate or Recall. It is measure of Positive

examples labeled as positive by classifier [the proportion of actual positive cases
which are correctly identified].It should be higher.

 Specificity: It is also known as True Negative rate. It is measure of negative examples

labeled as negative by classifier [the proportion of actual negative cases which are
correctly identified].It should be higher.
 Positive Predictive Value or Precision : the proportion of positive cases that were
correctly identified.
 Negative Predictive Value : the proportion of negative cases that were correctly
identified.

Type I and Type II Error:

Type I error occurs when the null hypothesis is true, but is rejected. [type I error, or
false positive, is asserting something as true when it is actually false]

Type II error occurs when the null hypothesis is false, but erroneously fails to be
rejected. [type II error occurs when the null hypothesis is actually false, but was accepted
as true by the testing.]

Scenario/Problem Statement : Medical trials for a drug which is a cure for Cancer

Type I error: Predicting that a cure is found when it is not the case.

Type II error: Predicting that a cure is not found when in fact it is the case.

In this case, Type I error is not an issue. It could be corrected later with more trials. Type II
error is more serious as it could be discarded as no cure and a cure can save millions of lives.
Risk of committing a Type I error is represented by your alpha level(P value below which you
reject the null hypothesis)

To control these type of errors, a variable alpha is used. Increasing the sample size can also
reduce the risk and change the amount of these type of errors.

P- Value:

When you perform a hypothesis test in statistics, a p-value can help you
determine the strength of your results. p-value is a number between 0 and 1. Based on the
value it will denote the strength of the results. The claim which is on trial is called Null
Hypothesis.

*Low p-value (≤ 0.05) indicates strength against the null hypothesis which means
we can reject the null Hypothesis.

*High p-value (≥ 0.05) indicates strength for the null hypothesis which means we
can accept the null Hypothesis

*p-value of 0.05 indicates the Hypothesis could go either way.

To put it in another way,

High P values: your data are likely with a true null.

Low P values: your data are unlikely with a true null.
F1 Score:

It is a weighted average of the recall (sensitivity) and precision. F1

score might be good choice when you seek to balance between Precision and Recall. F1
score is used to measure a test’s accuracy.

F1 Score is the Harmonic Mean between precision and recall. The range for F1
Score is [0, 1]. It tells you how precise your classifier is (how many instances it classifies
correctly), as well as how robust it is (it does not miss a significant number of instances).

High precision but lower recall, gives you an extremely accurate, but it then
misses a large number of instances that are difficult to classify. The greater the F1 Score, the
better is the performance of our model

Classification Accuracy:
It is the ratio of number of correct predictions to the total number of input samples.

It works well only if there are equal number of sample belonging to each class[Balanced
dataset].

Logarithmic Loss:

Logarithmic Loss or Log Loss, works by penalising the false classifications. It

works well for multi-class classification. When working with Log Loss, the classifier must

assign probability to each class for all the samples. Suppose, there are N samples belonging

to M classes, then the Log Loss is calculated as below :

where,

yij indicates whether sample i belongs to class j or not

pij indicates the probability of sample i belonging to class j

Log Loss has no upper bound and it exists on the range [0, ∞). Log Loss nearer to 0 indicates

higher accuracy, whereas if the Log Loss is away from 0 then it indicates lower accuracy.In

general, minimising Log Loss gives greater accuracy for the classifier.

 Gain and Lift Charts:

Gain and Lift charts are used to evaluate performance of classification model.
They measure how much better one can expect to do with the predictive model comparing
without a model. It's a very popular metrics in marketing analytics. It's not just restricted to
marketing analysis. It can be used in other domains as well such as risk modeling, supply
chain analytics etc. It also helps to find the best predictive model among multiple challenger
models.

Steps to bulid a lift/gain chart:

Step 1 : Calculate probability for each observation

Step 2 : Rank these probabilities in decreasing order.

Step 3 : Build deciles with each group having almost 10% of the observations.

Step 4 : Calculate the response rate at each deciles for Good (Responders) ,Bad (Non-
responders) and total.

Gain at a given decile level is the ratio of cumulative number of targets (events) up to that
decile to the total number of targets (events) in the entire data set.

Lift measures how much better one can expect to do with the predictive model comparing
without a model. It is the ratio of gain % to the random expectation % at a given decile level.
The random expectation at the xth decile is x%.

Area Under Curve:[AUC-ROC Curve]

Area Under Curve(AUC) is one of the most widely used metrics for evaluation. It is
used for binary classification problem. AUC of a classifier is equal to the probability that the
classifier will rank a randomly chosen positive example higher than a randomly chosen
negative example.
AUC has a range of [0, 1].
The greater the value, the better is the performance of our model.
The ROC curve is the plot between sensitivity and (1- specificity). (1- specificity) is
also known as false positive rate and sensitivity is also known as True Positive rate. This is a
commonly used graph that summarizes the performance of a classifier over all possible
thresholds.
The biggest advantage of using ROC curve is that it is independent of the change in
proportion of responders.

Mean Absolute Error:

Mean Absolute Error is the average of the difference between the Original Values and
the Predicted Values. It gives us the measure of how far the predictions were from the actual
output

Mean Squared Error:

Mean Squared Error(MSE) is quite similar to Mean Absolute Error, the only

difference being that MSE takes the average of the square of the difference between the

original values and the predicted values.

The advantage of MSE being that it is easier to compute the gradient,

whereas Mean Absolute Error requires complicated linear programming tools to compute

the gradient.

As, we take square of the error, the effect of larger errors become more

pronounced then smaller error, hence the model can now focus more on the larger errors.

K-S Chart:

Kolmogorov-Smirnov chart measures performance of classification models. More

accurately, K-S is a measure of the degree of separation between the positive and negative
distributions. In most classification models the K-S will fall between 0 and 100, and that the
higher the value the better the model is at separating the positive from negative cases.
Gini Coefficient:

The Gini coefficient also relates to assessing classifier models. It is actually

directly related to the area under the ROC curve mentioned above. The Gini coefficient is
calculated from the area under the curve (AUC) as 2AUC–1. A 74% area under the curve
becomes a Gini coefficient of 48%, which is fair. The Gini coefficient hence effectively ranges
between 0% and 100%. Gini above 60% is a good model.

R-Squared:
R Squared is used to determine the strength of correlation between the
predictors and the target. In simple terms it lets us know how good a regression model is
when compared to the average. R Squared is the ratio between the residual sum of squares
and the total sum of squares.

Where,

 SSR (Sum of Squares of Residuals) is the sum of the squares of the difference
between the actual observed value (y) and the predicted value (ycap).
 SST (Total Sum of Squares) is the sum of the squares of the difference between the
actual observed value (y) and the average of the observed y value (yavg).

SSR is the best fitting criteria for a regression line. That is the regression algorithm chooses
the best regression line for a given set of observations by drawing random lines and
comparing the SSR of each line. The line with the least value of SSR is the best fitting line.

DisadvantageThe value of R Squared never decreases. Adding new independent variables

will result in an increased value of R Squared.

Adjusted R Squared:

Adjusted R Squared has the capability to decrease with the addition of less
significant variables, thus resulting in a more reliable and accurate evaluation.

Degree of Freedom: We can define it as the minimum number of data points or

observations required to generate a valid regression model.
Where,
 k is the number of independent variables
 n is the number of observations

Adjusted R Squared,makes use of the degree of freedom to compensate and penalize for
the inclusion of a bad variable.

The value of Adjusted R Squared decreases as k increases also while considering R Squared
acting a penalization factor for a bad variable and rewarding factor for a good or significant
variable. Adjusted R Squared is thus a better model evaluator and can correlate the
variables more efficiently than R Squared.

ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Wartsila o Pumps Fire Water
No ratings yet
Wartsila o Pumps Fire Water
4 pages
Confusion Matrix & Evaluation Metrics in Machine Learning
No ratings yet
Confusion Matrix & Evaluation Metrics in Machine Learning
23 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
Unit8 (Evaluation Method)
No ratings yet
Unit8 (Evaluation Method)
43 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
机器学习
No ratings yet
机器学习
41 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Bamboo Art: Terracotta
No ratings yet
Bamboo Art: Terracotta
2 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
DS Notes
No ratings yet
DS Notes
36 pages
Performance Evaluation
No ratings yet
Performance Evaluation
24 pages
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
No ratings yet
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
31 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
CE880 Lecture6 Slides
No ratings yet
CE880 Lecture6 Slides
25 pages
Lect 02 Evaluation Part 1
No ratings yet
Lect 02 Evaluation Part 1
33 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Math of Finance
No ratings yet
Math of Finance
33 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
S1 Evaluate Performance LKW 1mar2025
No ratings yet
S1 Evaluate Performance LKW 1mar2025
26 pages
Accresm Research Sample
No ratings yet
Accresm Research Sample
46 pages
Ad3501-Dl-Unit 4 Notes
No ratings yet
Ad3501-Dl-Unit 4 Notes
16 pages
Bi 2
No ratings yet
Bi 2
25 pages
Lec 8
No ratings yet
Lec 8
35 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
All Formulas in One: Quantitative Aptitude Ebook by Lucid Math
100% (1)
All Formulas in One: Quantitative Aptitude Ebook by Lucid Math
26 pages
Risk Security and Regulatory Compliance
No ratings yet
Risk Security and Regulatory Compliance
12 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
Week 5 Module Grade 9
No ratings yet
Week 5 Module Grade 9
7 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Confusion Matrix
No ratings yet
Confusion Matrix
8 pages
MTS3101 Appendices v1
No ratings yet
MTS3101 Appendices v1
35 pages
(Ebooks PDF) Download Charting Spiritual Care The Emerging Role of Chaplaincy Records in Global Health Care Simon Peng-Keller Full Chapters
100% (1)
(Ebooks PDF) Download Charting Spiritual Care The Emerging Role of Chaplaincy Records in Global Health Care Simon Peng-Keller Full Chapters
53 pages
Source Follower: (Common-Drain Amplifier)
No ratings yet
Source Follower: (Common-Drain Amplifier)
40 pages
ERROR and Confusion Matrix
No ratings yet
ERROR and Confusion Matrix
29 pages
ML Notes UT-2
No ratings yet
ML Notes UT-2
19 pages
Metric
No ratings yet
Metric
6 pages
MODULE 3 Job Order Costing PDF
100% (1)
MODULE 3 Job Order Costing PDF
9 pages
Performance Metrics
No ratings yet
Performance Metrics
8 pages
Ads 5
No ratings yet
Ads 5
5 pages
Research Paper Mytsak
No ratings yet
Research Paper Mytsak
27 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Analytics in Practice: Model Evaluation
No ratings yet
Analytics in Practice: Model Evaluation
40 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
A Comprehensive Overview of Chinas Belt and Road Initiative and Its Implication For The Region and Beyond
No ratings yet
A Comprehensive Overview of Chinas Belt and Road Initiative and Its Implication For The Region and Beyond
12 pages
Unit 3 Computational Statistics
No ratings yet
Unit 3 Computational Statistics
5 pages
ML CH 5
No ratings yet
ML CH 5
5 pages
Chapter 18: C++ As A Better C Introducing Object Technology
No ratings yet
Chapter 18: C++ As A Better C Introducing Object Technology
23 pages
Keberhasilan Media Promosi Judi Online Dalam Menarik Minat Masyarakat-1
No ratings yet
Keberhasilan Media Promosi Judi Online Dalam Menarik Minat Masyarakat-1
10 pages
CMAT - Module 3 Answer Key (QA - DI - LR)
No ratings yet
CMAT - Module 3 Answer Key (QA - DI - LR)
8 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
BROCHURE
No ratings yet
BROCHURE
4 pages
Literature Review of Personality Traits Essays
100% (1)
Literature Review of Personality Traits Essays
6 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
Lesson 6 Analytics Methods
No ratings yet
Lesson 6 Analytics Methods
12 pages
Punching Shear
100% (1)
Punching Shear
4 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
6.evaluation Metrics - UNIT 2
No ratings yet
6.evaluation Metrics - UNIT 2
4 pages
Business Finance - ADM - Module 1 Q1 WK 1 To 2 Introduction To Financial Management 3
No ratings yet
Business Finance - ADM - Module 1 Q1 WK 1 To 2 Introduction To Financial Management 3
37 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
NCMA 219 RUBRICS - ADMINISTERING Magnesium Sulfate
No ratings yet
NCMA 219 RUBRICS - ADMINISTERING Magnesium Sulfate
3 pages
The Inventory Order List Can Also Be Found On Our Website Under Downloads at The Specific Article
No ratings yet
The Inventory Order List Can Also Be Found On Our Website Under Downloads at The Specific Article
2 pages
Current Research Topics in Optical Sensors and Laser Diagnostics
No ratings yet
Current Research Topics in Optical Sensors and Laser Diagnostics
17 pages
1113
No ratings yet
1113
1 page
Vitotres343 TechGuide PDF
No ratings yet
Vitotres343 TechGuide PDF
32 pages
2022 Bar Examination Questionnaire For Criminal Law
No ratings yet
2022 Bar Examination Questionnaire For Criminal Law
1 page
Dot Matrix Printer (DMP)
No ratings yet
Dot Matrix Printer (DMP)
12 pages
Partnership - Case Digests (Thyrz)
No ratings yet
Partnership - Case Digests (Thyrz)
15 pages
Confusion Matrix
No ratings yet
Confusion Matrix
4 pages
Supermarket
No ratings yet
Supermarket
4 pages
Catalist-Listed ES Group Announces Revised Chartering Agreement and New Vessel Sale To Sea Hub Tankers For S$29.4 Million
No ratings yet
Catalist-Listed ES Group Announces Revised Chartering Agreement and New Vessel Sale To Sea Hub Tankers For S$29.4 Million
3 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Evaluation Metrics:: Confusion Matrix

Uploaded by

Evaluation Metrics:: Confusion Matrix

Uploaded by

Evaluation Metrics:

Evaluation Metrics are used to measure the quality of the statistical or

 It is an NxN matrix, where N is the number of classes being Predicted.

 Sensitivity: It is also known as True positive rate or Recall. It is measure of Positive

 Specificity: It is also known as True Negative rate. It is measure of negative examples

Type I and Type II Error:

Type I error: Predicting that a cure is found when it is not the case.

*p-value of 0.05 indicates the Hypothesis could go either way.

To put it in another way,

High P values: your data are likely with a true null.

It is a weighted average of the recall (sensitivity) and precision. F1

Logarithmic Loss or Log Loss, works by penalising the false classifications. It

to M classes, then the Log Loss is calculated as below :

yij indicates whether sample i belongs to class j or not

 Gain and Lift Charts:

Steps to bulid a lift/gain chart:

Step 1 : Calculate probability for each observation

Step 2 : Rank these probabilities in decreasing order.

Area Under Curve:[AUC-ROC Curve]

Mean Absolute Error:

Mean Squared Error:

original values and the predicted values.

The advantage of MSE being that it is easier to compute the gradient,

Kolmogorov-Smirnov chart measures performance of classification models. More

The Gini coefficient also relates to assessing classifier models. It is actually

DisadvantageThe value of R Squared never decreases. Adding new independent variables

Degree of Freedom: We can define it as the minimum number of data points or

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.