F1 - Score
F1 - Score
VKC
AK
KK
AGENDA
Introducing the F1 score
Accuracy
Imbalanced data example
Solving imbalanced data
Precision and Recall: foundations of the
F1 score
The F1 score: combining Precision and
Recall
Conclusion
INTRODUCING F1 3
SCORE
The F1 score is a machine learning metric that can be used in
classification models. Although there exist many metrics for
classification models, throughout this article you will discover how the
F1 score is calculated and when there is added value to use it.
Suppose that classifier A has a higher recall, and classifier B has higher
precision. In this case, the F1-scores for both the classifiers can be used
to determine which one produces better results.
ACCURACY
5
• Accuracy is a metric for classification models that measures the number of predictions
that are correct as a percentage of the total number of predictions that are made.
• As an example, if 90% of your predictions are correct, your accuracy is simply 90%.
• Accuracy is a useful metric only when you have an equal distribution of classes on
your classification.
• This means that if you have a use case in which you observe more data points of one
class than of another, the accuracy is not a useful metric anymore. Let’s see an
example to illustrate this.
6
IMBALANCED DATA EXAMPLE
• Imagine you are working on the sales data of a
website. You know that 99% of website visitors don’t
buy and that only 1% of visitors buy something. You
are building a classification model to predict which
website visitors are buyers and which are just lookers.
• Now imagine a model that doesn’t work very well.
It predicts that 100% of your visitors are just
lookers and that 0% of your visitors are buyers. It
is clearly a very wrong and useless model.
What would happen if we’d use the accuracy formula on this model? Your model has predicted only
1% wrongly: all the buyers have been misclassified as lookers. The percentage of correct predictions
is therefore 99%.
The problem here is that an accuracy of 99% sounds like a great result, whereas your model
performs very poorly.
In conclusion: Accuracy is not a good metric to use when you have class imbalance.
SOLVING IMBALANCED
DATA
Solving imbalanced data through resampling
One way to solve class imbalance problems is to work on
your sample. With specific sampling methods, you can
resample your data set in such a way that the data is not
imbalanced anymore.
FOUNDATIONS OF F1 SCORE
PRECISION: THE FIRST PART RECALL : THE SECOND PART
OF THE F1 SCORE OF THE F1 SCORE
• Precision is the first part of the F1 • Recall is the second component of
Score. It can also be used as an the F1 Score, although recall can
individual machine learning metric. also be used as an individual
It’s formula is shown here: machine learning metric. The
formula for recall is:
• F1 score formula
The F1 score is defined as the harmonic mean of precision and recall.
As a short reminder, the harmonic mean is an alternative metric for the
more common arithmetic mean. It is often useful when computing an
average rate.
In the F1 score, we compute the average of precision and recall. They
are both rates, which makes it a logical choice to use the harmonic
mean. The F1 score formula is shown here:
11
•A model will obtain a high F1 score if both Precision and Recall are high
•A model will obtain a low F1 score if both Precision and Recall are low
•A model will obtain a medium F1 score if one of Precision and Recall is low
and the other is high
SUMMARY
12
In conclusion, when you have the possibility to do so, you should definitely look at multiple
metrics for each of the models that you try out. Each metric has advantages and disadvantages
and each of them will give you specific information on the strengths and weaknesses of your
model.
The real difficulty of choice occurs when doing automated model training, or when using
Grid Search for tuning models. In those cases, you'll have to specify a single metric that you
want to optimize.
In this case, my advice would be to have a good look at multiple different metrics of one or a
few sample models. Then, when you understand the implications for your specific use case, you
can choose one metric for optimization or tuning.
If you move your model to production for long-term use, you should regularly come back to
do model maintenance and verify if the model is still behaving as it should be.
THANK YOU