0% found this document useful (0 votes)
65 views100 pages

CM20315 05 Loss

Uploaded by

davidadamczyk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views100 pages

CM20315 05 Loss

Uploaded by

davidadamczyk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 100

CM20315 - Machine Learning

Prof. Simon Prince


5. Loss functions
Log and exp functions
• Log • Exp

• Two rules:
Regression

• Univariate regression problem (one output, real value)


Graph regression

• Multivariate regression problem (>1 output, real value)


Text classification

• Binary classification problem (two discrete classes)


Music genre classification

• Multiclass classification problem (discrete classes, >2 possible values)


Loss function
• Training dataset of I pairs of input/output examples:

• Loss function or cost function measures how bad model is:

or for short:
Loss function
• Training dataset of I pairs of input/output examples:

• Loss function or cost function measures how bad model is:

or for short:
Returns a scalar that is smaller
when model maps inputs to
outputs better
Training
• Loss function:
Returns a scalar that is smaller
when model maps inputs to
outputs better

• Find the parameters that minimize the loss:


Example: 1D Linear regression loss function

Loss function:

“Least squares loss function”


Example: 1D Linear regression training

This technique is known as gradient descent


Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
How to construct loss functions
• Model predicts output y given input x
How to construct loss functions
• Model predicts output y given input x
How to construct loss functions
• Model predicts output y given input x
• Model predicts a conditional probability distribution:

over outputs y given inputs x.


• Loss function aims to make the outputs have high probability
How can a model predict a probability
distribution?
1. Pick a known distribution (e.g., normal distribution) to model output y with
parameters
e.g., the normal distribution

2. Use model to predict parameters of probability distribution


Maximum likelihood criterion

When we consider this probability as a function of the parameters , we call


it a likelihood.
Problem:

• The terms in this product might all be small


• The product might get so small that we can’t easily represent it
The log function is monotonic

Maximum of the logarithm of a function is in the same place as maximum of function


Maximum log likelihood

Now it’s a sum of terms, so doesn’t matter so much if the terms are small
Minimizing negative log likelihood
• By convention, we minimize things (i.e., a loss)
Inference
• But now we predict a probability distribution
• We need an actual prediction (point estimate)
• Find the peak of the probability distribution (i.e., mean for normal)
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Recipe for loss functions
Recipe for loss functions
Recipe for loss functions
Recipe for loss functions
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Example 1: univariate regression
Example 1: univariate regression

• Predict scalar output:


• Sensible probability distribution:
• Normal distribution
Example 1: univariate regression
Example 1: univariate regression
Least squares!
Least squares Maximum likelihood
Least squares Maximum likelihood
Example 1: univariate regression
Estimating variance
• Perhaps surprisingly, the variance term disappeared:

• But we could learn it:


Heteroscedastic regression
• Assume that the noise is the same everywhere.
• But we could make the noise a function of the data x.
• Build a model with two outputs:
Heteroscedastic regression
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Example 2: binary classification

• Goal: predict which of two classes the input x belongs to


Example 2: binary classification

• Domain:
• Bernoulli distribution
• One parameter [0,1]
Example 2: binary classification

Problem:
• Output of neural network can be anything
• Parameter [0,1]

Solution:
• Pass through function that maps “anything to
[0,1]
Example 2: binary classification

Problem:
• Output of neural network can be anything
• Parameter [0,1]

Solution:
• Pass through logistic sigmoid function that
maps “anything to [0,1]:
Example 2: binary classification
Example 2: binary classification
Example 2: binary classification

*Binary cross-entropy loss*


Example 2: binary classification

Choose y=1 where is greater than 0.5, otherwise 0


Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Example 3: multiclass classification

Goal: predict which of K classes the input x belongs to


Example 3: multiclass classification

• Domain:
• Categorical distribution
• K parameters [0,1]
• Sum of all parameters = 1
Example 3: multiclass classification

Problem:
• Output of neural network can be anything
• Parameters [0,1], sum to one

Solution:
• Pass through function that maps
“anything” to [0,1], sum to one
Example 3: multiclass classification
Example 3: multiclass classification

*Multiclass cross-entropy loss*


Example 3: multiclass classification

Choose the class with the largest probability


Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Other
data types
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Multiple outputs
• Treat each output as independent:

• Negative log likelihood becomes sum of terms:


Example 4: multivariate regression
Example 4: multivariate regression
• Goal: to predict a multivariate target
• Solution treat each dimension independently

• Make network with outputs to predict means


Example 4: multivariate regression
• What if the outputs vary in magnitude
• E.g., predict weight in kilos and height in meters
• One dimension has much bigger numbers than others
• Could learn a separate variance for each…
• …or rescale before training, and then rescale output in opposite way
Loss functions
• Maximum likelihood
• Recipe for loss functions
• Example 1: univariate regression
• Example 2: binary classification
• Example 3: multiclass classification
• Other types of data
• Multiple outputs
• Cross entropy
Cross Entropy

Kullback-Leibler Divergence -- a measure between probability distributions


Cross Entropy

Kullback-Leibler Divergence -- a measure between probability distributions


Cross Entropy
Cross Entropy

Minimum
negative log
likelihood
Cross entropy in machine learning

Minimum
negative log
likelihood

In machine learning:
Next up
• We have models with parameters!
• We have loss functions!
• Now let’s find the parameters that give the smallest loss
• Training, learning, or fitting the model
Feedback

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy