0% found this document useful (0 votes)

16 views18 pages

Loss

This document discusses various loss functions used in machine learning, including Mean Squared Error, Mean Absolute Error, Huber Loss, Cross-Entropy Loss, and others. Each loss function is explained in terms of its characteristics, advantages, and applications, particularly in optimizing model performance. The document also highlights the importance of selecting appropriate loss functions based on the specific requirements of the task at hand.

Uploaded by

Aravind Ariharasudhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views18 pages

Loss

Uploaded by

Aravind Ariharasudhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

BASIC LOSS FUNCTIONS

THIS BOOK
This book is written (typed) by
Ari, who hails from the South
and has keen interests in
Computer Science, Biology, and
Tamil Literature. Occasionally,
he updates his website, where
you can reach out to him.
https://arihara-sudhan.github.io
LOSS FUNCTION
Loss function serves as a measure of how well a model is
performing. It quantifies the difference between the predicted
values and the target values. We have learned of it a little bit in
MLP book. The objective of our training is to reduce the loss. We
also learned of optimization algorithms such as Gradient Descent,
SGD, Mini Batch Gradient Descent, SGD with Momentum,
AdaGrad, RMSProp and Adam. There are even more
optimization algorithms focused to reduce loss. Obviously, loss
function is the feedback-giver for the network by means of which
the parameters are tuned. Remember, loss function is not an
evaluation metric. Another term used is Cost Function. The loss
function is to capture the difference between the actual and
predicted values for a single datum whereas cost functions
aggregate the difference for the entire training dataset.
☆ MEAN SQUARED ERROR
Mean Squared Error is the one which calculates the average of the
squared differences between predicted and actual values.

In the last MLP topic, we had a short introduction to error, on

reducing which the model performs well on given data. It is okay
to have (y – y_hat). But, why do we have it squared? Why do we
have it average-summed? Squaring the differences between actual
and predicted values makes larger errors stand out more. For
instance, an error of 4 becomes 16 after squaring, while an error of
1 remains 1. This helps the model focus on reducing large errors,
which are often more impactful on overall performance. Without
squaring, positive and negative errors would cancel each other
out. Squaring makes all errors positive, so we get a more accurate
representation of total error regardless of direction
(underestimation or overestimation). Averaging divides the total
error by the number of data points n, resulting in a mean value that
doesn’t change simply because there are more (or fewer) data
points. This makes the error measure comparable across datasets
of any size.
We can either implement by ourself or else use the built-in one.

MSE is a convex function, meaning that it has a single global

minimum and no local minima. This property makes it easier to
optimize using gradient-based optimization techniques like
gradient descent. MSE is differentiable everywhere, meaning its
gradient (derivative) can be computed at every point. This is
crucial for gradient-based optimization algorithms (such as
stochastic gradient descent). MSE squares the error for each data
point, which means that larger errors (outliers) have a
disproportionately large impact on the loss. A single large error
can significantly increase the MSE value. Thus, it becomes
sensitive to outliers.
Following is the curve of MSE Loss function.
☆ MEAN ABSOLUTE ERROR
Mean Absolute Error is the one which calculates the absolute
value of the average differences between predicted and actual
values.

It gives linear penalty. A linear penalty means that the error term
grows in direct proportion to the deviation between the predicted
and actual values. This is because the MAE calculates the
absolute difference between each predicted value and the actual
value, rather than squaring the difference as in MSE. We can also
say, each error contributes to the overall loss directly as it is, no
matter how large or small the error, it’s added linearly without
amplification. If the error (difference between the predicted and
actual values) is 3, then it contributes exactly 3 units to the loss. If
the error is 6, it contributes 6 units to the loss. It helps make MAE
less sensitive to large errors or outliers. When we say it is a Linear
Penalty, the one in MSE is Quadratic Penalty. There in MSE, if
the error is 6, the penalty will be like 6*6 = 36. The graph of MAE
versus error is symmetrical and forms a V-shape around zero,
where the minimum error is achieved when the error is zero. The
problem with MAE is that, it is not differentiable some times.
☆ HUBER LOSS
Huber Error combines the essence of both Mean Squared Error
and Mean Absoluter Error. It is linear for large errors and
quadratic for small errors. It is also differentiable. It has parameter
delta to switch between MSE and MAE.

Huber Loss is smooth, as there is no abrupt transition between

the quadratic and linear regions.
The smoothness property allows for more stable training
compared to non-smooth loss functions like MAE. Huber Loss is
convex, meaning that it has a single global minimum. This
property ensures that optimization algorithms like gradient
descent will converge to the optimal solution.
☆ CROSS ENTROPY LOSS
Cross-Entropy Loss measures how well the predicted probability
distribution matches the true distribution. When the predicted
probability for the true class is high, the loss is low. If the
predicted probability is low, the loss is high, indicating that the
model's prediction is far from the true label. Cross-Entropy Loss
is non-linear but convex when dealing with one-hot encoded target
distributions, which makes it suitable for optimization with
gradient-based methods like stochastic gradient descent (SGD).
☆ BINARY CROSS ENTROPY LOSS
Binary Cross-Entropy Loss is a specific form of Cross-Entropy
Loss used for binary classification tasks. It measures the
dissimilarity between the predicted probability of the positive class
and the true label. If the model predicts a probability close to 1 for
the correct class, the loss is minimal. Conversely, if the prediction
is far from the correct class, the loss increases, indicating that the
model's prediction deviates from the true label. Binary Cross-
Entropy Loss is convex and works well with gradient-based
optimization methods like stochastic gradient descent (SGD) for
training binary classifiers. For binary classification, we typically
want to model the probability of one class (usually the positive
class) using a single output neuron with a sigmoid activation
function. This gives us a probability value between 0 and 1, which
we then compare against the true label (0 or 1). In contrast, Cross-
Entropy Loss for multi-class classification assumes that each class
has its own output neuron, and it compares the predicted
probability distribution across all classes. This would not be
suitable for binary classification where only two outcomes (0 or 1)
are considered.
☆ BALANCED CROSS ENTROPY
Balanced Cross-Entropy Loss is an extension of the Binary Cross-
Entropy Loss that takes into account class imbalance. It assigns
different weights to the positive and negative classes, allowing the
model to focus more on the underrepresented class.

The weights w1 and w2 are typically set inversely proportional to

the class frequencies to balance the impact of each class. For
example, if the dataset is highly imbalanced with more negatives
than positives, you might set w1 (positive class weight) higher than
w2 (negative class weight). We can implement it like the following:

☆ FOCAL LOSS
Balanced Cross-Entropy Loss adjusts the loss by assigning a
higher weight to the underrepresented class, which helps mitigate
class imbalance to some extent. However, the problem is that it
still treats both easy and hard examples in the same way.
Focal Loss modifies the standard cross-entropy loss by adding a
factor to down-weight the loss for well-classified examples and
focus more on the misclassified examples, especially the hard
examples. It is defined as:

Focal Loss achieves this by focusing more on hard examples and

down-weighting easy examples, allowing the model to better learn
from rare or difficult examples, thus handling extreme class
imbalance more effectively.
☆ CONTRASTIVE LOSS
The core idea behind Contrastive Loss is to encourage the model
to output a small distance for similar pairs and a large distance for
dissimilar pairs. This is typically used in problems like face
verification, where you want the model to learn to distinguish
between similar and dissimilar instances (e.g., whether two faces
belong to the same person).
For similar pairs (yi=1), the loss is proportional to the square of
the Euclidean distance between the two samples, i.e., we want to
reduce the distance for similar pairs. For dissimilar pairs (yi=0),
the loss is proportional to the square of the difference between the
margin m and the Euclidean distance, ensuring the distance
between dissimilar pairs exceeds the margin m. If the distance
exceeds m, the loss is zero.

In training models with Contrastive Loss, the key idea is to select

pairs of data points — these are typically positive pairs (similar)
and negative pairs (dissimilar). Proper pair selection is crucial for
the model's performance because the learning task is built around
how well the model learns to distinguish between similar and
dissimilar instances.
☆ TRIPLET LOSS
Triplet Loss is another powerful loss function used in metric
learning, particularly to learn an embedding space where similar
items are close together, and dissimilar items are far apart. It's
commonly used in tasks like face verification, image retrieval, and
few-shot learning. The goal of Triplet Loss is to minimize the
distance between an anchor sample and a positive sample (same
class), while maximizing the distance between the anchor and a
negative sample (different class). This is achieved by ensuring that
the anchor-positive pair is closer in the embedding space than the
anchor-negative pair by a given margin.
Given a triplet (A,P,N):
A: Anchor sample
P: Positive sample (same class as the anchor)
N: Negative sample (different class from the anchor)

f(x) represents the embedding function of the model for a sample

x. ∥f(A)−f(P)∥ is the Euclidean distance between the anchor and
positive samples. ∥f(A)−f(N)∥ is the Euclidean distance between
the anchor and negative samples. alpha is a margin that ensures
the negative sample is sufficiently far away from the anchor (this
margin is typically a hyperparameter).

The Clusters should come out like this.

MERCI

Lesson 04 Deep Neural Network
No ratings yet
Lesson 04 Deep Neural Network
81 pages
05 AIS302 ANN-Optimization
No ratings yet
05 AIS302 ANN-Optimization
44 pages
Neural Networks
No ratings yet
Neural Networks
63 pages
Interview Questions For Machine Learning Total 215 Questions
100% (1)
Interview Questions For Machine Learning Total 215 Questions
70 pages
Metrices of The Model
No ratings yet
Metrices of The Model
9 pages
AI and Math - Python Multiple-Choice Questions
No ratings yet
AI and Math - Python Multiple-Choice Questions
16 pages
Cp4252 ML Unit-II
No ratings yet
Cp4252 ML Unit-II
44 pages
Lect 9 - Loss Functions
No ratings yet
Lect 9 - Loss Functions
28 pages
Module 6 - Loss Function
No ratings yet
Module 6 - Loss Function
22 pages
Lecture 11
No ratings yet
Lecture 11
26 pages
Linear Classfiers, Loss
No ratings yet
Linear Classfiers, Loss
38 pages
Computer Mcqs File 2 One Paper MCQs Preparation-1
No ratings yet
Computer Mcqs File 2 One Paper MCQs Preparation-1
17 pages
Ai - W3L6
No ratings yet
Ai - W3L6
29 pages
Deep Learning (Part 2) - Loss Function and Gradient Function - by Sumbatilinda - Medium
No ratings yet
Deep Learning (Part 2) - Loss Function and Gradient Function - by Sumbatilinda - Medium
30 pages
ML Merge
No ratings yet
ML Merge
145 pages
Machine Learning Models
No ratings yet
Machine Learning Models
52 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
52 pages
Losses
No ratings yet
Losses
9 pages
Web3 Based Blockchain
No ratings yet
Web3 Based Blockchain
50 pages
Loss Functions Types
No ratings yet
Loss Functions Types
11 pages
Loss Functions
No ratings yet
Loss Functions
17 pages
Ambr 15 Cell Culture Generation 2
No ratings yet
Ambr 15 Cell Culture Generation 2
4 pages
ML 2
No ratings yet
ML 2
155 pages
Loss Function
No ratings yet
Loss Function
3 pages
Loss Functions
No ratings yet
Loss Functions
29 pages
Lecture-16 Machine Learning With Python
No ratings yet
Lecture-16 Machine Learning With Python
39 pages
CV Lec4
No ratings yet
CV Lec4
46 pages
Puter Literacy MS Power Point Q & A SR
No ratings yet
Puter Literacy MS Power Point Q & A SR
11 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
04 LossFunctions
No ratings yet
04 LossFunctions
22 pages
8 Linear Classifiers HInge Loss 03-08-2024
No ratings yet
8 Linear Classifiers HInge Loss 03-08-2024
20 pages
997-476 HW19
No ratings yet
997-476 HW19
144 pages
Practical-5 - 2CEIT606 - Artificial Intelligence
No ratings yet
Practical-5 - 2CEIT606 - Artificial Intelligence
14 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Loss Function
No ratings yet
Loss Function
23 pages
Awrrpt 1 66826 66828
No ratings yet
Awrrpt 1 66826 66828
241 pages
4-Loss Function
No ratings yet
4-Loss Function
8 pages
Lesson 4 Deep Neural Network and Tools
No ratings yet
Lesson 4 Deep Neural Network and Tools
159 pages
Assignment 1 - Machine Learning
No ratings yet
Assignment 1 - Machine Learning
9 pages
3 - Loss Functions
No ratings yet
3 - Loss Functions
14 pages
Unit 2
No ratings yet
Unit 2
18 pages
Performance Evaluation
No ratings yet
Performance Evaluation
24 pages
Project Report On Hodophile Touristry
No ratings yet
Project Report On Hodophile Touristry
101 pages
Loss Functions
No ratings yet
Loss Functions
8 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Linear Models (Unit II) Chapter III 1
No ratings yet
Linear Models (Unit II) Chapter III 1
24 pages
Loss Function - Ipynb - Colaboratory
No ratings yet
Loss Function - Ipynb - Colaboratory
6 pages
Location Allocation Modelling FLP U
No ratings yet
Location Allocation Modelling FLP U
41 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
1 Intro
No ratings yet
1 Intro
5 pages
DSCSignerServiceVer 4 1 6UserGuidelines
No ratings yet
DSCSignerServiceVer 4 1 6UserGuidelines
75 pages
Supervised and Unsupervised Learning Feature
No ratings yet
Supervised and Unsupervised Learning Feature
2 pages
Cross Entropy Loss Intro, Applications
No ratings yet
Cross Entropy Loss Intro, Applications
21 pages
Gradient Descent Based Learners
No ratings yet
Gradient Descent Based Learners
11 pages
231123 智能无线通信技术研究概况PPT 演说
No ratings yet
231123 智能无线通信技术研究概况PPT 演说
28 pages
Bimal Maruti Audit Report
No ratings yet
Bimal Maruti Audit Report
25 pages
Automation Digitalization Pelleting Control Brochure Download Data
No ratings yet
Automation Digitalization Pelleting Control Brochure Download Data
22 pages
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
20 pages
Pre-Employment Exam
100% (6)
Pre-Employment Exam
10 pages
Cost Function Loss Function
No ratings yet
Cost Function Loss Function
7 pages
CS231n Convolutional Neural Networks For Visual Recognition
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition
8 pages
FLOODWALL A Real-Time Flash Flood Monitoring and Forecasting System Using IoT
No ratings yet
FLOODWALL A Real-Time Flash Flood Monitoring and Forecasting System Using IoT
13 pages
Machine Vesion hw6
No ratings yet
Machine Vesion hw6
18 pages
Loss Functions
No ratings yet
Loss Functions
7 pages
Loss Functions
No ratings yet
Loss Functions
7 pages
DL Practical 3 Loss Function
No ratings yet
DL Practical 3 Loss Function
6 pages
Vendor Manual
No ratings yet
Vendor Manual
74 pages
StewartPCalc7 01 04 Output
No ratings yet
StewartPCalc7 01 04 Output
33 pages
LinkVIeW User Manual EN 1.21
No ratings yet
LinkVIeW User Manual EN 1.21
20 pages
29 - Tarisa Fitria Fasya - Computer Engineering
No ratings yet
29 - Tarisa Fitria Fasya - Computer Engineering
4 pages
Chapter 4 Enumeration
No ratings yet
Chapter 4 Enumeration
26 pages
WSMA-Mid 1 Descriptive QP
No ratings yet
WSMA-Mid 1 Descriptive QP
3 pages
27x HP EliteBook 830 G7 I5 10th 8GB RAM TESTED
No ratings yet
27x HP EliteBook 830 G7 I5 10th 8GB RAM TESTED
1 page
A Neural Network Approach To Ordinal Regression
No ratings yet
A Neural Network Approach To Ordinal Regression
6 pages
UNIT2
No ratings yet
UNIT2
25 pages
CV Ishmam
No ratings yet
CV Ishmam
2 pages
Loss Functions in Neural Networks PDF
No ratings yet
Loss Functions in Neural Networks PDF
6 pages
SP18 CS182 Midterm Solutions - Edited
No ratings yet
SP18 CS182 Midterm Solutions - Edited
14 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
stm32 Nucleo 144 Overview
No ratings yet
stm32 Nucleo 144 Overview
4 pages
Technical Skills
No ratings yet
Technical Skills
5 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
Sentiment Analysis From Facebook Comments Using Au PDF
No ratings yet
Sentiment Analysis From Facebook Comments Using Au PDF
8 pages
Directorate General of Commercial Intelligence and Statistics
No ratings yet
Directorate General of Commercial Intelligence and Statistics
4 pages
Go4braindumps 1z0 909 Mysql 8.0 Database Developer Verified Questions Answers by Reeves 24-05-2024 12qa
No ratings yet
Go4braindumps 1z0 909 Mysql 8.0 Database Developer Verified Questions Answers by Reeves 24-05-2024 12qa
19 pages
Online Exams: SR. NO Olympiad Date of Examination Registration Fees Cost of Books (Optional)
No ratings yet
Online Exams: SR. NO Olympiad Date of Examination Registration Fees Cost of Books (Optional)
2 pages
Selenium Cucumber Interview Ques
No ratings yet
Selenium Cucumber Interview Ques
12 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Loss

Uploaded by

Loss

Uploaded by

BASIC LOSS FUNCTIONS

In the last MLP topic, we had a short introduction to error, on

MSE is a convex function, meaning that it has a single global

Huber Loss is smooth, as there is no abrupt transition between

The weights w1 and w2 are typically set inversely proportional to

Focal Loss achieves this by focusing more on hard examples and

In training models with Contrastive Loss, the key idea is to select

f(x) represents the embedding function of the model for a sample

The Clusters should come out like this.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.