0% found this document useful (0 votes)
88 views13 pages

F1 - Score

The F1 score is a machine learning metric that combines precision and recall into a single measure to evaluate classification models, especially on imbalanced data. It is calculated as the harmonic mean of precision and recall to give equal weight to both metrics. The F1 score will be high only if both precision and recall are high, and it will be low if either precision or recall is low. While accuracy is a useful metric for balanced data, F1 score is generally better for imbalanced data where some classes are underrepresented. The document discusses precision, recall, and how F1 score addresses the limitations of accuracy and provides a balanced assessment of model performance.

Uploaded by

Karan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views13 pages

F1 - Score

The F1 score is a machine learning metric that combines precision and recall into a single measure to evaluate classification models, especially on imbalanced data. It is calculated as the harmonic mean of precision and recall to give equal weight to both metrics. The F1 score will be high only if both precision and recall are high, and it will be low if either precision or recall is low. While accuracy is a useful metric for balanced data, F1 score is generally better for imbalanced data where some classes are underrepresented. The document discusses precision, recall, and how F1 score addresses the limitations of accuracy and provides a balanced assessment of model performance.

Uploaded by

Karan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

F1 – SCORE

VKC
AK
KK
AGENDA
 Introducing the F1 score​
 Accuracy
 Imbalanced data example
 Solving imbalanced data
 Precision and Recall: foundations of the
F1 score
 The F1 score: combining Precision and
Recall
 Conclusion
INTRODUCING F1 3

SCORE
The F1 score is a machine learning metric that can be used in
classification models. Although there exist many metrics for
classification models, throughout this article you will discover how the
F1 score is calculated and when there is added value to use it.

The F1-score combines the precision and recall of a classifier into a


single metric by taking their harmonic mean. It is primarily used to
compare the performance of two classifiers.

Suppose that classifier A has a higher recall, and classifier B has higher
precision. In this case, the F1-scores for both the classifiers can be used
to determine which one produces better results.
ACCURACY
5
• Accuracy is a metric for classification models that measures the number of predictions
that are correct as a percentage of the total number of predictions that are made.

• As an example, if 90% of your predictions are correct, your accuracy is simply 90%.

• Accuracy is a useful metric only when you have an equal distribution of classes on
your classification.
• This means that if you have a use case in which you observe more data points of one
class than of another, the accuracy is not a useful metric anymore. Let’s see an
example to illustrate this.
6
IMBALANCED DATA EXAMPLE
• Imagine you are working on the sales data of a
website. You know that 99% of website visitors don’t
buy and that only 1% of visitors buy something. You
are building a classification model to predict which
website visitors are buyers and which are just lookers.
• Now imagine a model that doesn’t work very well.
It predicts that 100% of your visitors are just
lookers and that 0% of your visitors are buyers. It
is clearly a very wrong and useless model.

 Accuracy is not a good metric to use


when you have class imbalance.
7

What would happen if we’d use the accuracy formula on this model? Your model has predicted only
1% wrongly: all the buyers have been misclassified as lookers. The percentage of correct predictions
is therefore 99%. 

The problem here is that an accuracy of 99% sounds like a great result, whereas your model
performs very poorly. 

In conclusion: Accuracy is not a good metric to use when you have class imbalance.
SOLVING IMBALANCED
DATA
 Solving imbalanced data through resampling
One way to solve class imbalance problems is to work on
your sample. With specific sampling methods, you can
resample your data set in such a way that the data is not
imbalanced anymore.

 Solving imbalanced data through metrics


Another way to solve class imbalance problems is to
use better accuracy metrics like the F1 score, which take
into account not only the number of prediction errors that
your model makes, but that also look at the type of errors
that are made.
PRECISION & RECALL : 9

FOUNDATIONS OF F1 SCORE
PRECISION: THE FIRST PART RECALL :  THE SECOND PART
OF THE F1 SCORE OF THE F1 SCORE
• Precision is the first part of the F1 • Recall is the second component of
Score. It can also be used as an the F1 Score, although recall can
individual machine learning metric. also be used as an individual
It’s formula is shown here: machine learning metric. The
formula for recall is:

• A not precise model may find a lot of


the positives, but its selection method • A model with high recall succeeds
is noisy: it also wrongly detects many well in finding all the positive
positives that aren’t actually positives. cases in the data, even though they
• A precise model is very “pure”: may also wrongly identify some
maybe it does not find all the negative cases as positive cases.
positives, but the ones that the model • A model with low recall is not able
does class as positive are very likely to to find all (or a large part) of the
be correct. positive cases in the data.
THE F1 SCORE: COMBINING
PRECISION AND RECALL
• Precision and Recall are the two building blocks of
the F1 score. The goal of the F1 score is to combine
the precision and recall metrics into a single
metric. At the same time, the F1 score has been
designed to work well on imbalanced data.

• F1 score formula
The F1 score is defined as the harmonic mean of precision and recall.
As a short reminder, the harmonic mean is an alternative metric for the
more common arithmetic mean. It is often useful when computing an
average rate.
In the F1 score, we compute the average of precision and recall. They
are both rates, which makes it a logical choice to use the harmonic
mean. The F1 score formula is shown here:
11

Since the F1 score is an average of Precision and Recall, it means that the F1


score gives equal weight to Precision and Recall:

•A model will obtain a high F1 score if both Precision and Recall are high

•A model will obtain a low F1 score if both Precision and Recall are low

•A model will obtain a medium F1 score if one of Precision and Recall is low
and the other is high
SUMMARY
12

In conclusion, when you have the possibility to do so, you should definitely look at multiple
metrics for each of the models that you try out. Each metric has advantages and disadvantages
and each of them will give you specific information on the strengths and weaknesses of your
model.

The real difficulty of choice occurs when doing automated model training, or when using 
Grid Search for tuning models. In those cases, you'll have to specify a single metric that you
want to optimize.

In this case, my advice would be to have a good look at multiple different metrics of one or a
few sample models. Then, when you understand the implications for your specific use case, you
can choose one metric for optimization or tuning.

If you move your model to production for long-term use, you should regularly come back to
do model maintenance and verify if the model is still behaving as it should be.
THANK YOU

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy