0% found this document useful (0 votes)

65 views100 pages

CM20315 05 Loss

Uploaded by

davidadamczyk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views100 pages

CM20315 05 Loss

Uploaded by

davidadamczyk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 100

CM20315 - Machine Learning

Prof. Simon Prince

5. Loss functions
Log and exp functions
• Log • Exp

• Two rules:
Regression

• Univariate regression problem (one output, real value)

Graph regression

• Multivariate regression problem (>1 output, real value)

Text classification

• Binary classification problem (two discrete classes)

Music genre classification

• Multiclass classification problem (discrete classes, >2 possible values)

Loss function
• Training dataset of I pairs of input/output examples:

• Loss function or cost function measures how bad model is:

or for short:
Loss function
• Training dataset of I pairs of input/output examples:

• Loss function or cost function measures how bad model is:

or for short:
Returns a scalar that is smaller
when model maps inputs to
outputs better
Training
• Loss function:
Returns a scalar that is smaller
when model maps inputs to
outputs better

• Find the parameters that minimize the loss:

Example: 1D Linear regression loss function

Loss function:

“Least squares loss function”

Example: 1D Linear regression training

This technique is known as gradient descent

Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
How to construct loss functions
• Model predicts output y given input x
How to construct loss functions
• Model predicts output y given input x
How to construct loss functions
• Model predicts output y given input x
• Model predicts a conditional probability distribution:

over outputs y given inputs x.

• Loss function aims to make the outputs have high probability
How can a model predict a probability
distribution?
1. Pick a known distribution (e.g., normal distribution) to model output y with
parameters
e.g., the normal distribution

2. Use model to predict parameters of probability distribution

Maximum likelihood criterion

When we consider this probability as a function of the parameters , we call

it a likelihood.
Problem:

• The terms in this product might all be small

• The product might get so small that we can’t easily represent it
The log function is monotonic

Maximum of the logarithm of a function is in the same place as maximum of function

Maximum log likelihood

Now it’s a sum of terms, so doesn’t matter so much if the terms are small
Minimizing negative log likelihood
• By convention, we minimize things (i.e., a loss)
Inference
• But now we predict a probability distribution
• We need an actual prediction (point estimate)
• Find the peak of the probability distribution (i.e., mean for normal)
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Recipe for loss functions
Recipe for loss functions
Recipe for loss functions
Recipe for loss functions
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Example 1: univariate regression
Example 1: univariate regression

• Predict scalar output:

• Sensible probability distribution:
• Normal distribution
Example 1: univariate regression
Example 1: univariate regression
Least squares!
Least squares Maximum likelihood
Least squares Maximum likelihood
Example 1: univariate regression
Estimating variance
• Perhaps surprisingly, the variance term disappeared:

• But we could learn it:

Heteroscedastic regression
• Assume that the noise is the same everywhere.
• But we could make the noise a function of the data x.
• Build a model with two outputs:
Heteroscedastic regression
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Example 2: binary classification

• Goal: predict which of two classes the input x belongs to

Example 2: binary classification

• Domain:
• Bernoulli distribution
• One parameter [0,1]
Example 2: binary classification

Problem:
• Output of neural network can be anything
• Parameter [0,1]

Solution:
• Pass through function that maps “anything to
[0,1]
Example 2: binary classification

Problem:
• Output of neural network can be anything
• Parameter [0,1]

Solution:
• Pass through logistic sigmoid function that
maps “anything to [0,1]:
Example 2: binary classification
Example 2: binary classification
Example 2: binary classification

Binary cross-entropy loss

Example 2: binary classification

Choose y=1 where is greater than 0.5, otherwise 0

Goal: predict which of K classes the input x belongs to

Example 3: multiclass classification

• Domain:
• Categorical distribution
• K parameters [0,1]
• Sum of all parameters = 1
Example 3: multiclass classification

Problem:
• Output of neural network can be anything
• Parameters [0,1], sum to one

Solution:
• Pass through function that maps
“anything” to [0,1], sum to one
Example 3: multiclass classification
Example 3: multiclass classification

Multiclass cross-entropy loss

Example 3: multiclass classification

Choose the class with the largest probability

Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Other
data types
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Multiple outputs
• Treat each output as independent:

• Negative log likelihood becomes sum of terms:

Example 4: multivariate regression
Example 4: multivariate regression
• Goal: to predict a multivariate target
• Solution treat each dimension independently

• Make network with outputs to predict means

Example 4: multivariate regression
• What if the outputs vary in magnitude
• E.g., predict weight in kilos and height in meters
• One dimension has much bigger numbers than others
• Could learn a separate variance for each…
• …or rescale before training, and then rescale output in opposite way
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Cross Entropy

Kullback-Leibler Divergence -- a measure between probability distributions

Cross Entropy

Kullback-Leibler Divergence -- a measure between probability distributions

Cross Entropy
Cross Entropy

Minimum
negative log
likelihood
Cross entropy in machine learning

Minimum
negative log
likelihood

In machine learning:
Next up
• We have models with parameters!
• We have loss functions!
• Now let’s find the parameters that give the smallest loss
• Training, learning, or fitting the model
Feedback

DL145611 03 Shallow
No ratings yet
DL145611 03 Shallow
92 pages
Linear Classification: Slides Credit: CMU AI, Zico Kolter, Pat Virtue
No ratings yet
Linear Classification: Slides Credit: CMU AI, Zico Kolter, Pat Virtue
527 pages
04 AIS302 ANN - Loss Functions
No ratings yet
04 AIS302 ANN - Loss Functions
74 pages
5 DL Loss Functions
No ratings yet
5 DL Loss Functions
72 pages
Lect03 Linear Model ML
No ratings yet
Lect03 Linear Model ML
100 pages
05 AIS302 ANN-Optimization
No ratings yet
05 AIS302 ANN-Optimization
44 pages
CM20315 02 Supervised
No ratings yet
CM20315 02 Supervised
53 pages
Lecture 11
No ratings yet
Lecture 11
26 pages
Module 6 - Loss Function
No ratings yet
Module 6 - Loss Function
22 pages
Machine Learning Models
No ratings yet
Machine Learning Models
52 pages
Neural Networks
No ratings yet
Neural Networks
63 pages
Lect 8
No ratings yet
Lect 8
117 pages
Lecture 4 - Linear Classification
No ratings yet
Lecture 4 - Linear Classification
34 pages
Neural Network
No ratings yet
Neural Network
14 pages
Lecture 4 (Parts 3 and 4) - LR With Gradient Descent and Logistic Regression
No ratings yet
Lecture 4 (Parts 3 and 4) - LR With Gradient Descent and Logistic Regression
13 pages
Lecture 19
No ratings yet
Lecture 19
8 pages
04 LossFunctions
No ratings yet
04 LossFunctions
22 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-09 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-09 Reference-Material-I
15 pages
1 Intro
No ratings yet
1 Intro
5 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
Chapter02 Introduction To DeepLearning
No ratings yet
Chapter02 Introduction To DeepLearning
84 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
Loss Functions
No ratings yet
Loss Functions
29 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Cross Entropy Loss Intro, Applications
No ratings yet
Cross Entropy Loss Intro, Applications
21 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
ML Merge
No ratings yet
ML Merge
145 pages
Practical-5 - 2CEIT606 - Artificial Intelligence
No ratings yet
Practical-5 - 2CEIT606 - Artificial Intelligence
14 pages
02 - Linear Models - D (Multiclass Classification)
No ratings yet
02 - Linear Models - D (Multiclass Classification)
9 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Lecture 1, Part 2: Linear Classification: Roger Grosse
No ratings yet
Lecture 1, Part 2: Linear Classification: Roger Grosse
10 pages
8 Linear Classifiers HInge Loss 03-08-2024
No ratings yet
8 Linear Classifiers HInge Loss 03-08-2024
20 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
SML Lecture1
No ratings yet
SML Lecture1
37 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
CH 1
No ratings yet
CH 1
24 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
ML 2
No ratings yet
ML 2
155 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
ML
No ratings yet
ML
22 pages
Super Cheatsheet Machine Learning
100% (1)
Super Cheatsheet Machine Learning
15 pages
ML 01
No ratings yet
ML 01
24 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
51 Multiple Questions and Answers On Research Process in Physical Education
No ratings yet
51 Multiple Questions and Answers On Research Process in Physical Education
12 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Ida PDF
No ratings yet
Ida PDF
62 pages
Essentials of Statistics For Business and Economics Revised 6th Edition David R. Anderson Instant Download
100% (1)
Essentials of Statistics For Business and Economics Revised 6th Edition David R. Anderson Instant Download
59 pages
Bcom
No ratings yet
Bcom
38 pages
4.1 - The Literacy Myth - Preface
No ratings yet
4.1 - The Literacy Myth - Preface
40 pages
Statistical Issues in Drug Development 3rd Edition Stephen S. Senn
No ratings yet
Statistical Issues in Drug Development 3rd Edition Stephen S. Senn
44 pages
Exercise 1
100% (1)
Exercise 1
3 pages
CHAPTER 6 Solution
67% (3)
CHAPTER 6 Solution
64 pages
Augmented Dickey-Fuller Test - Wikipedia
No ratings yet
Augmented Dickey-Fuller Test - Wikipedia
4 pages
Stages in The Formulation of A Research Proposal
No ratings yet
Stages in The Formulation of A Research Proposal
4 pages
Economics Class Xi
No ratings yet
Economics Class Xi
4 pages
Cluster Sampling
No ratings yet
Cluster Sampling
9 pages
Forecasting Exchange Rates Using General Regression Neural Networks
No ratings yet
Forecasting Exchange Rates Using General Regression Neural Networks
35 pages
Script Namo
No ratings yet
Script Namo
28 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
Comparing The Areas Under Two or More Correlated Receiver Operating Characteristic Curves A Nonparametric Approach
No ratings yet
Comparing The Areas Under Two or More Correlated Receiver Operating Characteristic Curves A Nonparametric Approach
10 pages
Advanced Statistical Techniques Using R: Outliers and Missing Data
No ratings yet
Advanced Statistical Techniques Using R: Outliers and Missing Data
28 pages
IGCSE - Business Marketing
No ratings yet
IGCSE - Business Marketing
18 pages
Capacity BEC Z
No ratings yet
Capacity BEC Z
6 pages
Handout#3 - Statistical Inference, Z and T Test
No ratings yet
Handout#3 - Statistical Inference, Z and T Test
3 pages
IJHSR025
No ratings yet
IJHSR025
8 pages
Chapter 3 - Design & Development
No ratings yet
Chapter 3 - Design & Development
7 pages
Complete Business Statistics: Time Series, Forecasting, and Index Numbers
No ratings yet
Complete Business Statistics: Time Series, Forecasting, and Index Numbers
37 pages
Hansen Et Al 2016 Career Development Courses and Educational Outcomes Do Career Courses Make A Difference
No ratings yet
Hansen Et Al 2016 Career Development Courses and Educational Outcomes Do Career Courses Make A Difference
15 pages
AF5364
No ratings yet
AF5364
3 pages
A Study of Gender Specific Pitch Variation Pattern of Emotion Expression For Hindi Speech
No ratings yet
A Study of Gender Specific Pitch Variation Pattern of Emotion Expression For Hindi Speech
9 pages
Analysis of Green Marketing As Environme
No ratings yet
Analysis of Green Marketing As Environme
9 pages
Flowchart For All C.I. Cases (Compact)
No ratings yet
Flowchart For All C.I. Cases (Compact)
4 pages
Probability NST Notes
No ratings yet
Probability NST Notes
3 pages
ARIS Business Simulator: Using Dynamic Web-Based Simulation To Improve Process Efficiency
No ratings yet
ARIS Business Simulator: Using Dynamic Web-Based Simulation To Improve Process Efficiency
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CM20315 05 Loss

Uploaded by

CM20315 05 Loss

Uploaded by

CM20315 - Machine Learning

Prof. Simon Prince

• Univariate regression problem (one output, real value)

• Multivariate regression problem (>1 output, real value)

• Binary classification problem (two discrete classes)

• Multiclass classification problem (discrete classes, >2 possible values)

• Loss function or cost function measures how bad model is:

• Loss function or cost function measures how bad model is:

• Find the parameters that minimize the loss:

“Least squares loss function”

This technique is known as gradient descent

over outputs y given inputs x.

2. Use model to predict parameters of probability distribution

When we consider this probability as a function of the parameters , we call

• The terms in this product might all be small

Maximum of the logarithm of a function is in the same place as maximum of function

• Predict scalar output:

• But we could learn it:

• Goal: predict which of two classes the input x belongs to

Binary cross-entropy loss

Choose y=1 where is greater than 0.5, otherwise 0

Goal: predict which of K classes the input x belongs to

Multiclass cross-entropy loss

Choose the class with the largest probability

• Negative log likelihood becomes sum of terms:

• Make network with outputs to predict means

Kullback-Leibler Divergence -- a measure between probability distributions

Kullback-Leibler Divergence -- a measure between probability distributions

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

CM20315 05 Loss

Uploaded by

CM20315 05 Loss

Uploaded by

CM20315 - Machine Learning

Prof. Simon Prince

• Univariate regression problem (one output, real value)

• Multivariate regression problem (>1 output, real value)

• Binary classification problem (two discrete classes)

• Multiclass classification problem (discrete classes, >2 possible values)

• Loss function or cost function measures how bad model is:

• Loss function or cost function measures how bad model is:

• Find the parameters that minimize the loss:

“Least squares loss function”

This technique is known as gradient descent

over outputs y given inputs x.

2. Use model to predict parameters of probability distribution

When we consider this probability as a function of the parameters , we call

• The terms in this product might all be small

Maximum of the logarithm of a function is in the same place as maximum of function

• Predict scalar output:

• But we could learn it:

• Goal: predict which of two classes the input x belongs to

*Binary cross-entropy loss*

Choose y=1 where is greater than 0.5, otherwise 0

Goal: predict which of K classes the input x belongs to

*Multiclass cross-entropy loss*

Choose the class with the largest probability

• Negative log likelihood becomes sum of terms:

• Make network with outputs to predict means

Kullback-Leibler Divergence -- a measure between probability distributions

Kullback-Leibler Divergence -- a measure between probability distributions

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Binary cross-entropy loss

Multiclass cross-entropy loss