0% found this document useful (0 votes)

14 views67 pages

CM20315 06 Fitting

Uploaded by

davidadamczyk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views67 pages

CM20315 06 Fitting

Uploaded by

davidadamczyk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 67

CM20315 - Machine Learning

Prof. Simon Prince

6. Fitting models
Regression

• Univariate regression problem (one output, real value

Graph regression

• Multivariate regression problem (>1 output, real value)

Text classification

• Binary classification problem (two discrete classes)

Music genre classification

• Multiclass classification problem (discrete classes, >2 possible values)

Loss function
• Training dataset of I pairs of input/output examples:

• Loss function or cost function measures how bad model is:

or for short:
Returns a scalar that is smaller
when model maps inputs to
outputs better
Training
• Loss function:
Returns a scalar that is smaller
when model maps inputs to
outputs better

• Find the parameters that minimize the loss:

Example: 1D Linear regression loss function

Loss function:

“Least squares loss function”

Example: 1D Linear regression training
Example: 1D Linear regression training
Example: 1D Linear regression training
Example: 1D Linear regression training
Example: 1D Linear regression training

This technique is known as gradient descent

Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
Gradient descent algorithm
Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters

Step 2: Update parameters according to rule

= step size or learning rate if fixed

Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters

Step 2: Update parameters according to rule

= step size
Gradient descent
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters

Step 2: Update parameters according to rule

= step size
Gradient descent
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters

Step 2: Update parameters according to rule

= step size
Gradient descent
Gradient descent
Gradient descent
Line Search
Step 1: Compute derivatives (slopes of function) with
Respect to the parameters

Step 2: Update parameters according to rule

= step size
Line Search (bracketing)
Line Search (bracketing)
Line Search (bracketing)
Gradient descent
Convex problems

Non convex Convex Non-Convex

Convex problems

Non convex Convex Non-Convex

Test for convexity is that 2nd derivative is positive everywhere

Convexity in higher dimensions
Test for convexity is that determinant
of Hessian (2nd derivative matrix) is
positive everywhere.
Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
Gabor model
Gabor model
• Gradient descent gets to the
global minimum if we start
in the right “valley”
• Otherwise, descent to a
local minimum
• Or get stuck near a saddle
point
Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
IDEA: add noise

• Stochastic gradient descent

• Compute gradient based on
only a subset of points – a
mini-batch
• Work through dataset
sampling without
replacement
• One pass though the data is
called an epoch
Stochastic gradient descent

Before (full batch descent)

After (SGD)

Fixed learning rate

Properties of SGD
• Can escape from local minima
• Adds noise, but still sensible updates as based on part of data
• Uses all data equally
• Less computationally expensive
• Seems to find better solutions

• Doesn’t converge in traditional sense

• Learning rate schedule – decrease learning rate over time
Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
Momentum
• Weighted sum of this gradient and previous gradient
Nesterov accelerated momentum
• Momentum is kind of like a
prediction of where we are going

• Move in the predicted direction,

THEN, measure the gradient
Fitting models
• Maths overview
• Gradient descent algorithm
• Linear regression example
• Gabor model example
• Stochastic gradient descent
• Momentum
• Adam
Adaptive moment estimation (Adam)
Normalized gradients
• Measure mean and pointwise squared gradient

• Normalize:
Normalized gradients
• Measure mean and pointwise squared gradient

• Normalize:
Adaptive moment estimation (Adam)
• Compute mean and pointwise
squared gradients with momentum

• Moderate near start of the sequence

• Update the parameters

Adaptive moment estimation (Adam)
Hyperparameters
• Choice of learning algorithm
• Learning rate
• Momentum
Feedback

CS60010 Fitting-1
No ratings yet
CS60010 Fitting-1
39 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
Machine Learning Notes by Standard Andrew NG
No ratings yet
Machine Learning Notes by Standard Andrew NG
142 pages
CS229
No ratings yet
CS229
69 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
3.linear Regression
No ratings yet
3.linear Regression
18 pages
Regression
No ratings yet
Regression
30 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
39 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
cs229 2
No ratings yet
cs229 2
275 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Chapter 02.background-Theory
No ratings yet
Chapter 02.background-Theory
20 pages
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
No ratings yet
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
12 pages
Notes 1
No ratings yet
Notes 1
30 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Module 3
No ratings yet
Module 3
27 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
Optim
No ratings yet
Optim
33 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
4 - Gradient Descent and Stochastic GD
No ratings yet
4 - Gradient Descent and Stochastic GD
37 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
FASA - Federation Ship Recognition Manual 2385
100% (4)
FASA - Federation Ship Recognition Manual 2385
204 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Linear Models (Unit II) Chapter III 1
No ratings yet
Linear Models (Unit II) Chapter III 1
24 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
ML Notes
No ratings yet
ML Notes
14 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Design of Marine Propulsion Shafting System For 53000 DWT Bulk Carrier
67% (3)
Design of Marine Propulsion Shafting System For 53000 DWT Bulk Carrier
10 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
Lesson Plan in TLE IC 8: Teaching The Common Competencies in ICT
No ratings yet
Lesson Plan in TLE IC 8: Teaching The Common Competencies in ICT
2 pages
Flexus FM Series
No ratings yet
Flexus FM Series
51 pages
Data Validation and Verification
100% (1)
Data Validation and Verification
18 pages
New Gen Strategy Ultra-Supercritical Technology
No ratings yet
New Gen Strategy Ultra-Supercritical Technology
21 pages
Econometrics I AMU
No ratings yet
Econometrics I AMU
145 pages
Ddos Attacks and How To Protect Against Them: Martin Oravec
No ratings yet
Ddos Attacks and How To Protect Against Them: Martin Oravec
34 pages
Instrument Specification Sheet - Flame Detectors: Project
No ratings yet
Instrument Specification Sheet - Flame Detectors: Project
1 page
Sakthivel Brochure
No ratings yet
Sakthivel Brochure
4 pages
Proposal of Online Travel Booking System-V2.0
No ratings yet
Proposal of Online Travel Booking System-V2.0
8 pages
ASM450 FC44 FB240 e
No ratings yet
ASM450 FC44 FB240 e
102 pages
CHAPTER 6 Frequency Analysis
No ratings yet
CHAPTER 6 Frequency Analysis
38 pages
Ep Record
No ratings yet
Ep Record
101 pages
Mdx36Range: Installation, Use and Maintenance Gearless
No ratings yet
Mdx36Range: Installation, Use and Maintenance Gearless
36 pages
PI1 - L07 - Dialogues Hotel and Airport
No ratings yet
PI1 - L07 - Dialogues Hotel and Airport
3 pages
Startup - Fitternity
No ratings yet
Startup - Fitternity
5 pages
Attachment 3 - Product Data Sheet of Ms 514
No ratings yet
Attachment 3 - Product Data Sheet of Ms 514
2 pages
IEC Timers and IEC Counter For SIMATIC S7-1200
No ratings yet
IEC Timers and IEC Counter For SIMATIC S7-1200
33 pages
Datasheet KOS 1060 HP - Max
No ratings yet
Datasheet KOS 1060 HP - Max
2 pages
SSRN Id4032020
No ratings yet
SSRN Id4032020
27 pages
Prime H510M-K R2.0
No ratings yet
Prime H510M-K R2.0
5 pages
1031590-Ejosat1083443-2292434 231226 134238
No ratings yet
1031590-Ejosat1083443-2292434 231226 134238
7 pages
Internet Webtechnology Question With Answers
No ratings yet
Internet Webtechnology Question With Answers
8 pages
Email Spoofing Detection Using Volatile Memory
No ratings yet
Email Spoofing Detection Using Volatile Memory
7 pages
Integration Between Service Quality With Refined KANO To Improve Academic Quality at MTI
No ratings yet
Integration Between Service Quality With Refined KANO To Improve Academic Quality at MTI
8 pages
Cxa VVVF 220
No ratings yet
Cxa VVVF 220
1 page
Update The Oracle Instant Client On Windows - Version23
No ratings yet
Update The Oracle Instant Client On Windows - Version23
2 pages
Maintaining A Customer Project Billing Proposal - PPM PDF
No ratings yet
Maintaining A Customer Project Billing Proposal - PPM PDF
7 pages
SNI Application - Flow Chart
No ratings yet
SNI Application - Flow Chart
3 pages
NumPy: Beginner's Guide - Third Edition
From Everand
NumPy: Beginner's Guide - Third Edition
Ivan Idris
3.5/5 (3)
Scala for Machine Learning: Leverage Scala and Machine Learning to construct and study systems that can learn from data
From Everand
Scala for Machine Learning: Leverage Scala and Machine Learning to construct and study systems that can learn from data
Patrick R. Nicolas
No ratings yet
WebGL Beginner's Guide
From Everand
WebGL Beginner's Guide
Diego Cantor
No ratings yet
NumPy Beginner's Guide
From Everand
NumPy Beginner's Guide
Ivan Idris
5/5 (3)
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
Learn Design and Analysis of Algorithms in 24 Hours
From Everand
Learn Design and Analysis of Algorithms in 24 Hours
Alex Nordeen
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CM20315 06 Fitting

Uploaded by

CM20315 06 Fitting

Uploaded by

CM20315 - Machine Learning

Prof. Simon Prince

• Univariate regression problem (one output, real value

• Multivariate regression problem (>1 output, real value)

• Binary classification problem (two discrete classes)

• Multiclass classification problem (discrete classes, >2 possible values)

• Loss function or cost function measures how bad model is:

• Find the parameters that minimize the loss:

“Least squares loss function”

This technique is known as gradient descent

Step 2: Update parameters according to rule

= step size or learning rate if fixed

Step 2: Update parameters according to rule

Step 2: Update parameters according to rule

Step 2: Update parameters according to rule

Step 2: Update parameters according to rule

Non convex Convex Non-Convex

Non convex Convex Non-Convex

Test for convexity is that 2nd derivative is positive everywhere

• Stochastic gradient descent

Before (full batch descent)

Fixed learning rate

• Doesn’t converge in traditional sense

• Move in the predicted direction,

• Moderate near start of the sequence

• Update the parameters

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.