Loss
Loss
THIS BOOK
This book is written (typed) by
Ari, who hails from the South
and has keen interests in
Computer Science, Biology, and
Tamil Literature. Occasionally,
he updates his website, where
you can reach out to him.
https://arihara-sudhan.github.io
LOSS FUNCTION
Loss function serves as a measure of how well a model is
performing. It quantifies the difference between the predicted
values and the target values. We have learned of it a little bit in
MLP book. The objective of our training is to reduce the loss. We
also learned of optimization algorithms such as Gradient Descent,
SGD, Mini Batch Gradient Descent, SGD with Momentum,
AdaGrad, RMSProp and Adam. There are even more
optimization algorithms focused to reduce loss. Obviously, loss
function is the feedback-giver for the network by means of which
the parameters are tuned. Remember, loss function is not an
evaluation metric. Another term used is Cost Function. The loss
function is to capture the difference between the actual and
predicted values for a single datum whereas cost functions
aggregate the difference for the entire training dataset.
☆ MEAN SQUARED ERROR
Mean Squared Error is the one which calculates the average of the
squared differences between predicted and actual values.
It gives linear penalty. A linear penalty means that the error term
grows in direct proportion to the deviation between the predicted
and actual values. This is because the MAE calculates the
absolute difference between each predicted value and the actual
value, rather than squaring the difference as in MSE. We can also
say, each error contributes to the overall loss directly as it is, no
matter how large or small the error, it’s added linearly without
amplification. If the error (difference between the predicted and
actual values) is 3, then it contributes exactly 3 units to the loss. If
the error is 6, it contributes 6 units to the loss. It helps make MAE
less sensitive to large errors or outliers. When we say it is a Linear
Penalty, the one in MSE is Quadratic Penalty. There in MSE, if
the error is 6, the penalty will be like 6*6 = 36. The graph of MAE
versus error is symmetrical and forms a V-shape around zero,
where the minimum error is achieved when the error is zero. The
problem with MAE is that, it is not differentiable some times.
☆ HUBER LOSS
Huber Error combines the essence of both Mean Squared Error
and Mean Absoluter Error. It is linear for large errors and
quadratic for small errors. It is also differentiable. It has parameter
delta to switch between MSE and MAE.
☆ FOCAL LOSS
Balanced Cross-Entropy Loss adjusts the loss by assigning a
higher weight to the underrepresented class, which helps mitigate
class imbalance to some extent. However, the problem is that it
still treats both easy and hard examples in the same way.
Focal Loss modifies the standard cross-entropy loss by adding a
factor to down-weight the loss for well-classified examples and
focus more on the misclassified examples, especially the hard
examples. It is defined as: