0% found this document useful (0 votes)

15 views14 pages

Mit18 S096iap23 Lec4

The lecture notes cover applications of matrix calculus, particularly in nonlinear optimization and the adjoint method. It discusses methods like Newton's method for solving nonlinear equations and emphasizes the importance of using analytical gradients for efficiency. The notes also highlight the significance of formulating problems correctly to leverage optimization algorithms effectively.

Uploaded by

amitlama2299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views14 pages

Mit18 S096iap23 Lec4

Uploaded by

amitlama2299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Matrix Calculus lecture notes:

How can we use so many derivatives?

… a couple of applications
… and the “adjoint method”

Matrix Calculus, IAP 2023

Profs. Steven G. Johnson & Alan Edelman, MIT

1
Newton’s method: Nonlinear equations via Linearization
scalar out scalar in

18.01: solving f(x) = 0:

1. Linearize:
f(x+δx) ≈ f(x) + f′(x)δx
2. Solve linear equation
f(x) + f’(x)δx = 0
⇒ δx = –f(x)/f′(x)
3. Update x
x ⟵ x – f(x)/f′(x)

Courtesy of Ralf Pfeifer on Wikimedia. License: CC BY-SA. Some rights reserved. This content is excluded
from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use.
2
Multidimensional Newton’s method: Real world is nonlinear!
vector out vector in

18.06: solving f(x) = 0 where x ∈ ℝⁿ (input=vector) and f and 0 ∈ ℝⁿ

(output=vector)
Jacobian
1. Linearize:
That’s it! Once we have the Jacobian,
f(x+δx) ≈ f(x) + f′(x)δx
just solve a linear system on each step.
2. Solve linear equation
f(x) + f’(x)δx = 0 Converges amazingly fast:
doubles #digits (squares error)
⇒ δx = –f′(x)–1
inverse f(x) on each step (“quadratic convergence”)!
Jacobian

Caveat: needs a starting guess

3. Update x close enough to root
x ⟵ x – f′(x)–1f(x) (google “Newton fractal”…)
3
Nonlinear optimization: min f(x), x ∈ ℝn co
nto
(or maximize) ur
so
f f(
x)
–∇f points downhill (steepest descent)
Even if we have n=106 parameters x, we
can evolve them all simultaneously in the
downhill direction.
Reverse-mode / adjoint / left-to-right /
backpropagation: computing ∇f costs
about same as evaluating f(x) once.
Makes large-scale optimization practical: –∇f
training neural nets, optimizing shape of
airplane wing, portfolio optimization…
This image is in the public domain.
4
Nonlinear optimization: Lots of complications
● How far do we “step” in –∇f direction?
○ Line search: minα f(x–α∇f) — backtrack if not improved
○ and/or Limit step size to trust region, grow/shrink as needed
slow convergence:
○ Details are tricky to get right zig-zagging downhill
● Constraints: min f(x) subject to gk(x) ≤ 0
○ Algorithms still need gradients ∇gk!
● Faster convergence by “remembering” previous steps
○ Steepest-descent tends to “zig-zag” in narrow valleys
○ “Momentum” terms & conjugate gradients — simple “memory”
○ Fancier: estimate second derivative “Hessian matrix” from
sequence of ∇f changes: BFGS algorithm
● Lots of refinements & competing algorithms …
○ try out multiple (pre-packaged) algorithms on your problem!
This image is in the public domain.
5
Some parting advice:

Often, the main trick is finding the right mathematical

formulation of your problem — i.e. what function, what
constraints, what parameters? — which lets you exploit
the best algorithms.

…but if you have many (> 10) parameters,

always use an analytical gradient (not finite differences!)
… computed efficiently in reverse mode

6
Engineering/physical optimization

Design parameters p: Physical model(s): Model solution x(p):

Solid mechanics, Forces,

geometry, materials,
chemical reactions, displacements,
forces, unknowns…
heat transport, concentrations,
electromagnetism, temperatures,
acoustics, fluid flow… electric/magnetic
fields, pressures,
e.g. linear model: velocities, …
A(p)x = b(p)

Design objective f(x(p))

Strength, speed, power,

efficiency, dissipation loss,
Maximize/minimize f(x(p)) using gradient ∇pf match to experiment, …
… ∇pf computed by reverse-mode/“adjoint” methods 7
Example: “Topology optimization” of a chair
…optimizing every voxel to support weight with minimal material
(either voxel “density” or a “level-set” function)

© Source unknown. All rights reserved. This content is © Joris Laarman. All rights reserved. This content is
excluded from our Creative Commons license. For more excluded from our Creative Commons license. For more
information, see https://ocw.mit.edu/help/faq-fair-use. information, see https://ocw.mit.edu/help/faq-fair-use. 8
Adjoint differentiation
(yet another example of left-to-right/reverse-mode differentiation)
Example: gradient of scalar f(x(p)) where A(p)x=b, i.e. f(A(p)–1b)

● df = f′(x) dx = f’(x) d(A–1) b = – f′(x) A–1 dA A–1 b

row matrix row =x = “adjoint”
vec vec
solution vT
–1
● “Adjoint method:” Just multiply left-to-right! df = – (f′(x) A ) dA x
○ i.e. solve “adjoint equation” ATv = f’(x)T for v (“adjoint” meaning “transpose”)
○ …then df = vT dA x
○ For any given parameter pₖ, ∂f/∂pₖ = vT ∂A/∂pₖ x (& usually ∂A/∂pₖ is very
sparse)

● i.e. Takes only two solves to get both f and ∇f 🤓

○ Solve Ax=b once to get f(x), then solve one more time with AT for v 9
○ … then all derivatives ∂f/∂pₖ are just some cheap dot products
Don’t use right-to-left “forward-mode” derivatives
with lots of parameters!
∂f/∂pₖ = – f′(x) (A–1 (∂A/∂pₖ x)) = one solve per parameter pₖ! ☹
row = vector (different rhs)
vector
solve

Right-to-left (a.k.a. forward mode) better when 1 input & many outputs.
Left-to-right (a.k.a. backward mode, adjoint, backpropagation) better
when 1 output & many inputs

(Note: Using dual numbers is forward mode. Most AD uses the term
“forward” if it is forward mode. e.g. ForwardDiff.jl in Julia is forward mode.
jax.jacfwd in Python is forward mode.)
10
Don’t use finite differences with lots of parameters!

∂f/∂pₖ ≈ [ f(p + ε eₖ) – f(p) ] / ε (eₖ = unit vector, ε = small number)

= requires one solve x(p + ε eₖ) for each parameter pₖ ☹

… even worse if you use fancier finite-difference approximations

11
Adjoint differentiation with nonlinear equations
Example: gradient of scalar f(x(p)) where x(p) ∈ ℝⁿ solves g(p,x) = 0 ∈ ℝⁿ

● g(p,x) = 0 ⟹ dg = ∂g/∂p dp + ∂g/∂x dx = 0 ⟹ dx = –(∂g/∂x)–1 ∂g/∂p dp

Jacobian, = inverse Jacobian,
[ a.k.a. “implicit-function theorem”] matrix also used in Newton
solver for x!
● df = f′(x) dx = – ( f′(x) (∂g/∂x)–1 ) ∂g/∂p dp
= “adjoint”
solution vT
⟹ adjoint equation: (∂g/∂x)Tv = f′(x)T
● i.e. Takes only two solves to get both f and ∇f 🤓
○ one nonlinear solve for x, and one linear solve for v!
○ … then all derivatives ∂f/∂pₖ are just some cheap dot products
12
You need to understand adjoint methods even if you use AD
● Helps understand when to use forward vs. reverse mode!
● Many physical models call large software packages written over decades in
various languages, and cannot be differentiated automatically by AD
○ You often just need to supply a “vector–Jacobian product” yTdx for physics, or even just part
of the physics, and then AD will differentiate the rest and apply the chain rule for you

● Often models involve approximate calculations, but AD tools don’t know

this & spend extra effort trying to differentiate the error in your approximation
○ If you solve for x by an iterative method (e.g. Newton), it is inefficient for AD to
backpropagate through the iteration … instead, you want take derivative of the underlying
equation g(p,x) = 0
○ For discretized physics (e.g. a finite-element methods), it is often more efficient (and
sufficiently accurate) to apply adjoint method to continuous physics
(“differentiate-then-discretize”)
13
MIT OpenCourseWare
https://ocw.mit.edu

18.S096 Matrix Calculus for Machine Learning and Beyond

Independent Activities Period (IAP) 2023

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
Matrix Calculus
No ratings yet
Matrix Calculus
33 pages
AutomaticDifferentiation AppliedMaths
No ratings yet
AutomaticDifferentiation AppliedMaths
228 pages
Notes - Mike Giles - Aad
No ratings yet
Notes - Mike Giles - Aad
75 pages
(eBook-PDF) - Mathematics - Mathematical Methods For Robotic
No ratings yet
(eBook-PDF) - Mathematics - Mathematical Methods For Robotic
99 pages
Opt 202 LN
No ratings yet
Opt 202 LN
86 pages
Ee227c Notes 2 PDF
No ratings yet
Ee227c Notes 2 PDF
122 pages
XCS224N Module2 Slides
No ratings yet
XCS224N Module2 Slides
80 pages
Odms E-14 Ef009-50 - Manual
100% (1)
Odms E-14 Ef009-50 - Manual
247 pages
Maths For Intelligent Systems
No ratings yet
Maths For Intelligent Systems
76 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
8 pages
Federal Complaint Against Barrett Daffin Frappier Treder and Weiss For Legal Malpractice
100% (2)
Federal Complaint Against Barrett Daffin Frappier Treder and Weiss For Legal Malpractice
9 pages
EGR System Diagnostic Procedures
No ratings yet
EGR System Diagnostic Procedures
7 pages
Optimization: Calculating Derivatives Panos Patrinos STADIUS, Department of Electrical Engineering, KU Leuven
No ratings yet
Optimization: Calculating Derivatives Panos Patrinos STADIUS, Department of Electrical Engineering, KU Leuven
21 pages
Lecture 14 From Sensitivities To Optimisation
No ratings yet
Lecture 14 From Sensitivities To Optimisation
20 pages
Chapter2-Inverse Kinematics
No ratings yet
Chapter2-Inverse Kinematics
48 pages
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
No ratings yet
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
84 pages
Adjoint
No ratings yet
Adjoint
7 pages
Matrix Calculus (For Machine Learning and Beyond) : Lecturers: Alan Edelman and Steven G. Johnson
No ratings yet
Matrix Calculus (For Machine Learning and Beyond) : Lecturers: Alan Edelman and Steven G. Johnson
101 pages
Adjoint Tutorial
No ratings yet
Adjoint Tutorial
5 pages
Introduction To Differentiable Physics - Physics-Based Deep Learning
No ratings yet
Introduction To Differentiable Physics - Physics-Based Deep Learning
8 pages
CS115 Intro To Optimization
No ratings yet
CS115 Intro To Optimization
60 pages
Gradients Without Backpropagation
No ratings yet
Gradients Without Backpropagation
10 pages
SVM 2
No ratings yet
SVM 2
8 pages
Enterprise Architecture PDF
No ratings yet
Enterprise Architecture PDF
175 pages
The Matrix Calculus You Need For Deep Learning
No ratings yet
The Matrix Calculus You Need For Deep Learning
34 pages
Defecte Multiplexare
No ratings yet
Defecte Multiplexare
22 pages
Perceptrons
No ratings yet
Perceptrons
12 pages
Introduction To Machine Learning (CS 771A, IIT Kanpur) : Course Notes and Exercises
No ratings yet
Introduction To Machine Learning (CS 771A, IIT Kanpur) : Course Notes and Exercises
39 pages
Numerical Methods in Economics PDF
100% (3)
Numerical Methods in Economics PDF
349 pages
Numerical Methods in Economics
0% (1)
Numerical Methods in Economics
349 pages
Lecture04 Neuralnets
No ratings yet
Lecture04 Neuralnets
81 pages
MA261
No ratings yet
MA261
75 pages
Optimization Lecture 1
No ratings yet
Optimization Lecture 1
11 pages
Notes HQ
No ratings yet
Notes HQ
96 pages
04 Numerical
No ratings yet
04 Numerical
46 pages
Lec 105
No ratings yet
Lec 105
19 pages
Deeplearning2017 Johnson Automatic Differentiation 01
No ratings yet
Deeplearning2017 Johnson Automatic Differentiation 01
142 pages
Artificial Neural Network Methods For The Solution of Second Order Boundary Value Problems
No ratings yet
Artificial Neural Network Methods For The Solution of Second Order Boundary Value Problems
15 pages
Mit18 S096iap23 Lec1
No ratings yet
Mit18 S096iap23 Lec1
16 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
TUM I2DL Matrix Derivatives
No ratings yet
TUM I2DL Matrix Derivatives
8 pages
Differentiable Programming For Differential Equations A Review
No ratings yet
Differentiable Programming For Differential Equations A Review
72 pages
L5 - UCLxDeepMind DL2020
No ratings yet
L5 - UCLxDeepMind DL2020
52 pages
Gradient Vectors Computation
No ratings yet
Gradient Vectors Computation
4 pages
A2
No ratings yet
A2
13 pages
Interview Questions With Answers On All Topics (Rev1)
No ratings yet
Interview Questions With Answers On All Topics (Rev1)
41 pages
cs224n 2023 Lecture03 Neuralnets
No ratings yet
cs224n 2023 Lecture03 Neuralnets
83 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Lec 30
No ratings yet
Lec 30
22 pages
Ee227c Notes PDF
No ratings yet
Ee227c Notes PDF
122 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
Mit18 S096iap23 Lec06
No ratings yet
Mit18 S096iap23 Lec06
9 pages
PINN Notes
No ratings yet
PINN Notes
10 pages
Newton Raphson
No ratings yet
Newton Raphson
16 pages
Optimization
No ratings yet
Optimization
16 pages
HT I&ii
No ratings yet
HT I&ii
98 pages
Mathematical Modelling of Continuous Systems
No ratings yet
Mathematical Modelling of Continuous Systems
99 pages
OptimumEngineeringDesign Day2b
No ratings yet
OptimumEngineeringDesign Day2b
24 pages
The Siemens 42 FT Gearless Mill Drive
No ratings yet
The Siemens 42 FT Gearless Mill Drive
9 pages
(Handwritten Solutions) JEE ADVANCED PYQs - Straight Lines and Circles
No ratings yet
(Handwritten Solutions) JEE ADVANCED PYQs - Straight Lines and Circles
35 pages
Adobe CS2 EULA
No ratings yet
Adobe CS2 EULA
7 pages
Model BFV-300 Butterfly Valve Wafer Style General Description Technical Data
No ratings yet
Model BFV-300 Butterfly Valve Wafer Style General Description Technical Data
8 pages
Eco Assignment
No ratings yet
Eco Assignment
9 pages
Differential Equations in Matlab: E-Mail: Chengly@math - Pitt.edu
No ratings yet
Differential Equations in Matlab: E-Mail: Chengly@math - Pitt.edu
9 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages
Test Automation Using Selinim Internship Report Title Pages
No ratings yet
Test Automation Using Selinim Internship Report Title Pages
4 pages
Notes On Adjoint Methods MIT
No ratings yet
Notes On Adjoint Methods MIT
6 pages
Barangay Situational Analysis 2025
No ratings yet
Barangay Situational Analysis 2025
3 pages
Adjoint Tutorial PDF
No ratings yet
Adjoint Tutorial PDF
6 pages
Applied Auditing
No ratings yet
Applied Auditing
2 pages
Abstraction and Specification in Program Development 1st Edition by Barbara Liskov, John Guttag ISBN 0262121123 9780262121125 Download
No ratings yet
Abstraction and Specification in Program Development 1st Edition by Barbara Liskov, John Guttag ISBN 0262121123 9780262121125 Download
66 pages
Mid Semester Theory Exam17079936871961
No ratings yet
Mid Semester Theory Exam17079936871961
17 pages
Islamic Investment Fund: Tahreem Zafar Roll No. 172026 Course Instructor Dr. Mian Abbas
No ratings yet
Islamic Investment Fund: Tahreem Zafar Roll No. 172026 Course Instructor Dr. Mian Abbas
38 pages
Agilent ERP Failure
No ratings yet
Agilent ERP Failure
2 pages
Astm F513-00
No ratings yet
Astm F513-00
14 pages
Presenatation On SIP by Saral Jain
No ratings yet
Presenatation On SIP by Saral Jain
12 pages
Nlud Circ CPL Sep 2017 Batch
No ratings yet
Nlud Circ CPL Sep 2017 Batch
1 page
JNTUH 4-1-EEE-R13-Syllabus
No ratings yet
JNTUH 4-1-EEE-R13-Syllabus
14 pages
1123 w11 in 21
100% (1)
1123 w11 in 21
4 pages
Syllabus - Private International Law Copy 2
No ratings yet
Syllabus - Private International Law Copy 2
5 pages
Endowment Effect Essay
No ratings yet
Endowment Effect Essay
8 pages
FRCS Ophthalmology - Dates and Fees September - December 2023
No ratings yet
FRCS Ophthalmology - Dates and Fees September - December 2023
2 pages
P1 Marketing
No ratings yet
P1 Marketing
4 pages
V.K.S 7233 16.12.23
No ratings yet
V.K.S 7233 16.12.23
1 page
1 - Accounting - Crossword 3
No ratings yet
1 - Accounting - Crossword 3
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Mit18 S096iap23 Lec4

Uploaded by

Mit18 S096iap23 Lec4

Uploaded by

Matrix Calculus lecture notes:

How can we use so many derivatives?

Matrix Calculus, IAP 2023

18.01: solving f(x) = 0:

18.06: solving f(x) = 0 where x ∈ ℝⁿ (input=vector) and f and 0 ∈ ℝⁿ

Caveat: needs a starting guess

Often, the main trick is finding the right mathematical

…but if you have many (> 10) parameters,

Design parameters p: Physical model(s): Model solution x(p):

Solid mechanics, Forces,

Design objective f(x(p))

Strength, speed, power,

● df = f′(x) dx = f’(x) d(A–1) b = – f′(x) A–1 dA A–1 b

● i.e. Takes only two solves to get both f and ∇f 🤓

∂f/∂pₖ ≈ [ f(p + ε eₖ) – f(p) ] / ε (eₖ = unit vector, ε = small number)

= requires one solve x(p + ε eₖ) for each parameter pₖ ☹

… even worse if you use fancier finite-difference approximations

● g(p,x) = 0 ⟹ dg = ∂g/∂p dp + ∂g/∂x dx = 0 ⟹ dx = –(∂g/∂x)–1 ∂g/∂p dp

● Often models involve approximate calculations, but AD tools don’t know

18.S096 Matrix Calculus for Machine Learning and Beyond

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.