0% found this document useful (0 votes)
14 views67 pages

CM20315 06 Fitting

Uploaded by

davidadamczyk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views67 pages

CM20315 06 Fitting

Uploaded by

davidadamczyk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

CM20315 - Machine Learning

Prof. Simon Prince


6. Fitting models
Regression

• Univariate regression problem (one output, real value


Graph regression

• Multivariate regression problem (>1 output, real value)


Text classification

• Binary classification problem (two discrete classes)


Music genre classification

• Multiclass classification problem (discrete classes, >2 possible values)


Loss function
• Training dataset of I pairs of input/output examples:

• Loss function or cost function measures how bad model is:

or for short:
Returns a scalar that is smaller
when model maps inputs to
outputs better
Training
• Loss function:
Returns a scalar that is smaller
when model maps inputs to
outputs better

• Find the parameters that minimize the loss:


Example: 1D Linear regression loss function

Loss function:

“Least squares loss function”


Example: 1D Linear regression training
Example: 1D Linear regression training
Example: 1D Linear regression training
Example: 1D Linear regression training
Example: 1D Linear regression training

This technique is known as gradient descent


Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
Gradient descent algorithm
Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters

Step 2: Update parameters according to rule

= step size or learning rate if fixed


Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters

Step 2: Update parameters according to rule

= step size
Gradient descent
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters

Step 2: Update parameters according to rule

= step size
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters

Step 2: Update parameters according to rule

= step size
Gradient descent
Gradient descent
Gradient descent
Line Search
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters

Step 2: Update parameters according to rule

= step size
Line Search (bracketing)
Line Search (bracketing)
Line Search (bracketing)
Gradient descent
Convex problems

Non convex Convex Non-Convex


Convex problems

Non convex Convex Non-Convex

Test for convexity is that 2nd derivative is positive everywhere


Convexity in higher dimensions
Test for convexity is that determinant
of Hessian (2nd derivative matrix) is
positive everywhere.
Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
Gabor model
Gabor model
• Gradient descent gets to the
global minimum if we start
in the right “valley”
• Otherwise, descent to a
local minimum
• Or get stuck near a saddle
point
Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
IDEA: add noise

• Stochastic gradient descent


• Compute gradient based on
only a subset of points – a
mini-batch
• Work through dataset
sampling without
replacement
• One pass though the data is
called an epoch
Stochastic gradient descent

Before (full batch descent)

After (SGD)

Fixed learning rate


Properties of SGD
• Can escape from local minima
• Adds noise, but still sensible updates as based on part of data
• Uses all data equally
• Less computationally expensive
• Seems to find better solutions

• Doesn’t converge in traditional sense


• Learning rate schedule – decrease learning rate over time
Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
Momentum
• Weighted sum of this gradient and previous gradient
Nesterov accelerated momentum
• Momentum is kind of like a
prediction of where we are going

• Move in the predicted direction,


THEN, measure the gradient
Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
Adaptive moment estimation (Adam)
Normalized gradients
• Measure mean and pointwise squared gradient

• Normalize:
Normalized gradients
• Measure mean and pointwise squared gradient

• Normalize:
Normalized gradients
• Measure mean and pointwise squared gradient

• Normalize:
Adaptive moment estimation (Adam)
• Compute mean and pointwise
squared gradients with momentum

• Moderate near start of the sequence

• Update the parameters


Adaptive moment estimation (Adam)
Hyperparameters
• Choice of learning algorithm
• Learning rate
• Momentum
Feedback

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy