0% found this document useful (0 votes)

5 views23 pages

Lecture14

The document outlines concepts related to deep generative models, including score matching, noise conditioned score networks, and Gaussian diffusion processes. It discusses the mathematical foundations of these models, such as stochastic differential equations and theorems related to score matching. Additionally, it presents methods for training generative models and sampling from them using Langevin dynamics and diffusion processes.

Uploaded by

Saketha Nath Jagarlapudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views23 pages

Lecture14

Uploaded by

Saketha Nath Jagarlapudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Deep Generative

Models
Lecture 14
Roman Isachenko

Moscow Institute of Physics and Technology

2022 – 2023
Recap of previous lecture
∂L(y) ∂L(y)
az (t) = ; aθ (t) = – adjoint functions.
∂z(t) ∂θ(t)
Theorem (Pontryagin)
daz (t) ∂f (z(t), t, θ) daθ (t) ∂f (z(t), t, θ)
= −az (t)T · ; = −az (t)T · .
dt ∂z dt ∂θ
Forward pass Z
t1
z(t1 ) = f (z(t), t, θ)dt + z0 ⇒ ODE Solver
t0
Backward pass
Z t0
∂L ∂f (z(t), t, θ)

= aθ (t0 ) = − az (t)T dt + 0 
∂θ(t0 ) t1 ∂θ(t) 


Z t0 

∂L ∂f (z(t), t, θ) ∂L
= az (t0 ) = − az (t)T dt + ⇒ ODE Solver
∂z(t0 ) t1 ∂z(t) ∂z(t 1) 
Z t0 


z(t0 ) = − f (z(t), t, θ)dt + z1 .


t1

Chen R. T. Q. et al. Neural Ordinary Differential Equations, 2018 2 / 23

Recap of previous lecture
Continuous-in-time normalizing flows

dz(t) d log p(z(t), t) ∂f (z(t), t, θ)
= f (z(t), t, θ); = −tr .
dt dt ∂z(t)
Theorem (Picard)
If f is uniformly Lipschitz continuous in z and continuous in t,
then the ODE has a unique solution.
Forward transform + log-density
Z t1 " #
f (z(t), t, θ)

x z
= + dt.
log p(x|θ) log p(z) t0 −tr ∂f (z(t),t,θ)
∂z(t)

Hutchinson’s trace estimator

Z t1
T ∂f
log p(z(t1 )) = log p(z(t0 )) − Ep(ϵ) ϵ ϵ dt.
t0 ∂z

Grathwohl W. et al. FFJORD: Free-form Continuous Dynamics for Scalable Reversible

Generative Models, 2018 3 / 23
Recap of previous lecture
SDE basics
Let define stochastic process x(t) with initial condition
x(0) ∼ p0 (x):
dx = f(x, t)dt + g (t)dw,
where w(t) is the standard Wiener process (Brownian motion)
√
w(t) − w(s) ∼ N (0, (t − s)I), dw = ϵ · dt, where ϵ ∼ N (0, I).

Langevin dynamics
Let x0 be a random vector. Then under mild regularity conditions
for small enough η samples from the following dynamics
1 √
xt+1 = xt + η ∇xt log p(xt |θ) + η · ϵ, ϵ ∼ N (0, I).
2
will comes from p(x|θ).
The density p(x|θ) is a stationary distribution for the Langevin
SDE.
Welling M. Bayesian Learning via Stochastic Gradient Langevin Dynamics, 2011 4 / 23
Stochastic differential equation (SDE)

Statement
Let x0 be a random vector. Then samples from the following
dynamics
1 √
xt+1 = xt + η ∇xt log p(xt |θ) + η · ϵ, ϵ ∼ N (0, 1).
2
will come from p(x|θ) under mild regularity conditions for small
enough η and large enough t.

The density p(x|θ) is a stationary dis-

tribution for this SDE.

Song Y. Generative Modeling by Estimating Gradients of the Data Distribution, blog

post, 2021 5 / 23
Outline

1. Score matching

2. Noise conditioned score network

3. Gaussian diffusion process

6 / 23
Outline

1. Score matching

2. Noise conditioned score network

3. Gaussian diffusion process

7 / 23
Generative models zoo

Generative
models

Likelihood-based Implicit density

GANs
Tractable density Approximate density

Autoregressive VAEs
models
Diffusion
Normalizing models
Flows

8 / 23
Score matching
We could sample from the model using Langevin dynamics if we
have ∇x log p(x|θ).
Fisher divergence
1 2
DF (π, p) = Eπ ∇x log p(x|θ) − ∇x log π(x) 2 → min
2 θ
Let introduce score function s(x, θ) = ∇x log p(x|θ).

Problem: we do not know ∇x log π(x).

Song Y. Generative Modeling by Estimating Gradients of the Data Distribution, blog
post, 2021 9 / 23
Score matching
Theorem (implicit score matching)
Under some regularity conditions, it holds
1 2
h1 i
Eπ s(x, θ)−∇x log π(x) 2 = Eπ ∥s(x, θ)∥22 +tr ∇x s(x, θ) +const
2 2
Proof (only for 1D)
2
Eπ s(x)−∇x log π(x) 2 = Eπ s(x)2 +(∇x log π(x))2 −2[s(x)∇x log π(x)]

Z
Eπ [s(x)∇x log π(x)] = π(x)∇x log p(x)∇x log π(x)dx
Z +∞
= ∇x log p(x)∇x π(x)dx = π(x)∇x log p(x)
−∞
Z
− ∇2x log p(x)π(x)dx = −Eπ ∇2x log p(x) = −Eπ ∇x s(x)

1 2
h1 i
Eπ s(x) − ∇x log π(x) 2
= Eπ s(x)2 + ∇x s(x) +const.
2 2

Hyvarinen A. Estimation of non-normalized statistical models by score matching, 2005 10 / 23

Score matching
Theorem (implicit score matching)
1 2
h1 i
Eπ s(x, θ) − ∇x log π(x) 2
= Eπ ∥s(x, θ)∥22 + tr ∇x s(x, θ) +const
2 2
Here ∇x s(x, θ) = ∇2x log p(x|θ) is a Hessian matrix.
1. The left hand side is intractable due to unknown π(x) –
denoising score matching.
2. The right hand side is complex due to Hessian matrix – sliced
score matching.

Sliced score matching (Hutchinson’s trace estimation)

h i
tr ∇x s(x, θ) = Ep(ϵ) ϵT ∇x s(x, θ)ϵ

Song Y. Sliced Score Matching: A Scalable Approach to Density and Score

Estimation, 2019
Song Y. Generative Modeling by Estimating Gradients of the Data Distribution, blog
post, 2021 11 / 23
Denoising score matching

Let perturb original data x ∼ π(x) by random normal noise

x′ = x + σ · ϵ, ϵ ∼ N (0, 1), p(x′ |x, σ) = N (x′ |x, σ 2 I)

Z
π(x |σ) = π(x)p(x′ |x, σ)dx.
′

Then the solution of

1 2
E ′ s(x′ , θ, σ) − ∇x′ log π(x′ |σ) → min
2 π(x |σ) 2 θ

satisfies s(x′ , θ, σ) ≈ s(x′ , θ, 0) = s(x, θ) if σ is small enough.

Vincent P. A Connection Between Score Matching and Denoising Autoencoders, 2010 12 / 23

Denoising score matching

Theorem
2
Eπ(x′ |σ) s(x′ , θ, σ) − ∇x′ log π(x′ |σ) 2
=
2
= Eπ(x) Ep(x′ |x,σ) s(x′ , θ, σ) − ∇x′ log p(x′ |x, σ) 2
+ const(θ)

Gradient of the noise kernel

x′ − x
∇x′ log p(x′ |x, σ) = ∇x′ log N (x′ |x, σ 2 I) = −
σ2

▶ The RHS does not need to compute ∇x′ log π(x′ |σ) and even
∇x′ log π(x′ ).
▶ s(x′ , θ, σ) tries to denoise a corrupted sample x′ .
▶ Score function s(x′ , θ, σ) parametrized by σ. How to make it?

Vincent P. A Connection Between Score Matching and Denoising Autoencoders, 2010 13 / 23

Outline

1. Score matching

2. Noise conditioned score network

3. Gaussian diffusion process

14 / 23
Denoising score matching
▶ If σ is small, the score function is not accurate and Langevin
dynamics will probably fail to jump between modes.

▶ If σ is large, it is good for low-density regions and multimodal

distributions, but we will learn too corrupted distribution.

Song Y. Generative Modeling by Estimating Gradients of the Data Distribution, blog

post, 2021 15 / 23
Noise conditioned score network
▶ Define the sequence of noise levels: σ1 > σ2 > · · · > σL .
▶ Perturb the original data with the different noise level to get
π(x′ |σ1 ), . . . , π(x′ |σL ).
▶ Train denoised score function s(x′ , θ, σ) for each noise level:
L
2
X
σl2 Eπ(x) Ep(x′ |x,σl ) s(x′ , θ, σl ) − ∇′x log p(x′ |x, σl ) 2 → min
θ
l=1
▶ Sample from annealed Langevin dynamics (for l = 1, . . . , L).

Song Y. et al. Generative Modeling by Estimating Gradients of the Data Distribution,

2019 16 / 23
Noise conditioned score network
Training: loss function Inference: annealed Langevin
L 2 dynamic
X ϵ
σl2 Eπ(x) Eϵ sl + ,
σl 2
i=1
Here
▶ sl = s(x + σl · ϵ, θ, σl ).
′
▶ ∇x′ log p(x′ |x, σ) = − xσ−x 2 =
ϵ
− σl .

Samples

Song Y. et al. Improved Techniques for Training Score-Based Generative Models, 2020 17 / 23
Outline

1. Score matching

2. Noise conditioned score network

3. Gaussian diffusion process

18 / 23
Forward gaussian diffusion process
Let x0 = x ∼ π(x), β ∈ (0, 1). Define the Markov chain
p p
xt = 1 − β · xt−1 + β · ϵ, where ϵ ∼ N (0, 1);
p
q(xt |xt−1 ) = N (xt | 1 − β · xt−1 , β · I).
Statement 1
Applying the Markov chain to samples from any π(x) we will get
x∞ ∼ p∞ (x) = N (0, 1). Here p∞ (x) is a stationary distribution:
Z
p∞ (x) = q(x|x′ )p∞ (x′ )dx′ .

Statement 2 Q
t
Denote ᾱt = − βs ). Then
s=1 (1
√ √
xt = ᾱt · x0 + 1 − ᾱt · ϵ, where ϵ ∼ N (0, 1)
√
q(xt |x0 ) = N (xt | ᾱt · x0 , (1 − ᾱt ) · I).
We could sample from any timestamp using only x0 !
Sohl-Dickstein J. Deep Unsupervised Learning using Nonequilibrium Thermodynamics,
2015 19 / 23
Forward gaussian diffusion process
Diffusion refers to the flow of particles from high-density regions
towards low-density regions.

1. x0 = x ∼ π(x);
√ √
2. xt = 1 − β · xt−1 + β · ϵ, where ϵ ∼ N (0, 1), t ≥ 1;
3. xT ∼ p∞ (x) = N (0, 1), where T >> 1.
If we are able to invert this process, we will get the way to sample
x ∼ π(x) using noise samples p∞ (x) = N (0, 1).
Now our goal is to revert this process.
Das A. An introduction to Diffusion Probabilistic Models, blog post, 2021 20 / 23
Reverse gaussian diffusion process

Let define the reverse process

p(xt−1 |xt , θ) = N xt−1 |µ(xt , θ, t), σ 2 (xt , θ, t)

Forward process Reverse process

1. x0 = x ∼ π(x); 1. xT ∼ p∞ (x) = N (0, 1);
√ √
2. xt = 1 − β · xt−1 + β · ϵ, 2. xt−1 =
where ϵ ∼ N (0, 1), t ≥ 1; σ(xt , θ, t) · ϵ + µ(xt , θ, t);
3. xT ∼ p∞ (x) = N (0, 1). 3. x0 = x ∼ π(x);
Note: The forward process does not have any learnable
parameters!
Weng L. What are Diffusion Models?, blog post, 2021 21 / 23
Gaussian diffusion model as VAE

▶ Let treat z = (x1 , . . . , xT ) as a latent variable (note: each xt

has the same size).
▶ Variational posterior distribution (note: there is no learnable
parameters) T
Y
q(z|x) = q(x1 , . . . , xT |x0 ) = q(xt |xt−1 ).
t=1
▶ Probabilistic model
p(x, z|θ) = p(x|z, θ)p(z|θ)
▶ Generative distribution and prior
T
Y
p(x|z, θ) = p(x0 |x1 , θ); p(z|θ) = p(xt−1 |xt , θ) · p(xT )
t=2
Das A. An introduction to Diffusion Probabilistic Models, blog post, 2021 22 / 23
Summary
▶ Score matching proposes to minimize Fisher divergence to get
score function.

▶ Sliced score matching and denoising score matching are two

techniques to get scalable algorithm for fitting Fisher
divergence.

▶ Noise conditioned score network uses multiple noise levels and

annealed Langevin dynamics to fit score function.

▶ Gaussian diffusion process is a Markov chain that injects

special form of Gaussian noise to the samples.

▶ Reverse process allows to sample from the real distribution

π(x) using samples from noise.

23 / 23

Demystifying Variational Diffusion Models
No ratings yet
Demystifying Variational Diffusion Models
48 pages
2405.16387
No ratings yet
2405.16387
68 pages
supp2 (2)
No ratings yet
supp2 (2)
332 pages
547 Diffusion Posterior Sampli
No ratings yet
547 Diffusion Posterior Sampli
27 pages
kaist_cs492d_fall_2024_lecture_5
No ratings yet
kaist_cs492d_fall_2024_lecture_5
77 pages
2309.16948v3
No ratings yet
2309.16948v3
26 pages
Causality CounterfactualGeneration (9)
No ratings yet
Causality CounterfactualGeneration (9)
22 pages
Causality CounterfactualGeneration (8)
No ratings yet
Causality CounterfactualGeneration (8)
22 pages
Application-DPM
No ratings yet
Application-DPM
43 pages
kaist_cs492d_fall_2024_lecture_4
No ratings yet
kaist_cs492d_fall_2024_lecture_4
33 pages
lec24.diffusion
No ratings yet
lec24.diffusion
83 pages
fncom-14-574372
No ratings yet
fncom-14-574372
17 pages
Tutorial Session 12 - Model Selection Solution
No ratings yet
Tutorial Session 12 - Model Selection Solution
4 pages
Regularized_Gradient_Flows (1)
No ratings yet
Regularized_Gradient_Flows (1)
16 pages
2024 ICML Discrete Diffusion Modeling by Estimating The Ratios of The Data Distribution
No ratings yet
2024 ICML Discrete Diffusion Modeling by Estimating The Ratios of The Data Distribution
30 pages
2209.14687v4
No ratings yet
2209.14687v4
30 pages
Lecture 5 Diffusion - Models Part II Final
No ratings yet
Lecture 5 Diffusion - Models Part II Final
49 pages
Amp Sparse Paper Detail
No ratings yet
Amp Sparse Paper Detail
43 pages
Tutorialon Diffusion Modelsfor Imaging and Vision
No ratings yet
Tutorialon Diffusion Modelsfor Imaging and Vision
90 pages
Lecture 13
No ratings yet
Lecture 13
43 pages
Divide-and-Conquer Posterior Sampling For Denoising Diffusion Priors
No ratings yet
Divide-and-Conquer Posterior Sampling For Denoising Diffusion Priors
30 pages
Causality_CounterfactualGeneration (6)
No ratings yet
Causality_CounterfactualGeneration (6)
21 pages
lecture7-diffusion
No ratings yet
lecture7-diffusion
42 pages
Generative Modeling by Estimating Gradients of The Data Distribution
No ratings yet
Generative Modeling by Estimating Gradients of The Data Distribution
23 pages
2302.07194v1
No ratings yet
2302.07194v1
52 pages
Low-dimensional Adaptation of Diffusion Models
No ratings yet
Low-dimensional Adaptation of Diffusion Models
52 pages
Lecture 12
No ratings yet
Lecture 12
38 pages
Sampling Is As Easy As Learning The Score: Theory For Diffusion Models With Minimal Data Assumptions
No ratings yet
Sampling Is As Easy As Learning The Score: Theory For Diffusion Models With Minimal Data Assumptions
29 pages
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan September 10, 2024
No ratings yet
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan September 10, 2024
89 pages
Intro
No ratings yet
Intro
12 pages
CS772-Lec21
No ratings yet
CS772-Lec21
26 pages
Question 4 Explanation - Digital SAT Mock Test 1, Section 1, Module 1 - Reading and Writing
No ratings yet
Question 4 Explanation - Digital SAT Mock Test 1, Section 1, Module 1 - Reading and Writing
1 page
DPS
No ratings yet
DPS
30 pages
MCG
No ratings yet
MCG
29 pages
Score-Based Generative Modeling
No ratings yet
Score-Based Generative Modeling
31 pages
Maximum Likelihood Training of
No ratings yet
Maximum Likelihood Training of
24 pages
Landsat-Reflectance-Data-At-Your-Fingertips
No ratings yet
Landsat-Reflectance-Data-At-Your-Fingertips
10 pages
DiffusionModel DDPM
No ratings yet
DiffusionModel DDPM
52 pages
Instant ebooks textbook DNA Encoded Chemical Libraries Methods and Protocols 1st Edition David Israel download all chapters
100% (2)
Instant ebooks textbook DNA Encoded Chemical Libraries Methods and Protocols 1st Edition David Israel download all chapters
50 pages
diffusion_survey
No ratings yet
diffusion_survey
38 pages
Eneralization in Diffusion Models Arises From Geometry Adaptive Harmonic Representations
No ratings yet
Eneralization in Diffusion Models Arises From Geometry Adaptive Harmonic Representations
25 pages
Problems of Education To Indigenous Peoples
No ratings yet
Problems of Education To Indigenous Peoples
11 pages
AI60201_module3_4_problems (1)
No ratings yet
AI60201_module3_4_problems (1)
4 pages
A Score-Based Density Formula, With Applications in
No ratings yet
A Score-Based Density Formula, With Applications in
24 pages
Statistics For GMAT
No ratings yet
Statistics For GMAT
28 pages
Khan - Diffusion Models and Normalizing Flows
No ratings yet
Khan - Diffusion Models and Normalizing Flows
36 pages
Chapter 5
No ratings yet
Chapter 5
140 pages
Elucidating The Design Space of Diffusion-Based Generative Models
No ratings yet
Elucidating The Design Space of Diffusion-Based Generative Models
47 pages
slides2 (1)
No ratings yet
slides2 (1)
28 pages
Workshop Practices Notes
No ratings yet
Workshop Practices Notes
96 pages
Alain 14 A
No ratings yet
Alain 14 A
31 pages
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
No ratings yet
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
51 pages
Laumont etal22-BaysianImagingPnP
No ratings yet
Laumont etal22-BaysianImagingPnP
37 pages
IEEE - The Electrical Properties of Metal Microelectrodes - 1968
No ratings yet
IEEE - The Electrical Properties of Metal Microelectrodes - 1968
7 pages
14 Help - Structural Steel Import - Autodesk
No ratings yet
14 Help - Structural Steel Import - Autodesk
8 pages
Diffusion Model
No ratings yet
Diffusion Model
17 pages
Improved Denoising Diffusion Probabilistic Models
No ratings yet
Improved Denoising Diffusion Probabilistic Models
17 pages
18. Bending stresses in Beams
No ratings yet
18. Bending stresses in Beams
5 pages
Diffusion Based Representation Learning
No ratings yet
Diffusion Based Representation Learning
20 pages
Montanari
No ratings yet
Montanari
10 pages
From Denoising Diffusions To Denoising Markov Models
No ratings yet
From Denoising Diffusions To Denoising Markov Models
55 pages
Tutorial on diffusion models
No ratings yet
Tutorial on diffusion models
4 pages
Peg Howland, Haesun Park (Auth.), Michael W. Berry, Malu Castellanos (Eds.) - Survey of Text Mining II - Clustering, Classification, and Retrieval-Springer-Verlag London (2008)
No ratings yet
Peg Howland, Haesun Park (Auth.), Michael W. Berry, Malu Castellanos (Eds.) - Survey of Text Mining II - Clustering, Classification, and Retrieval-Springer-Verlag London (2008)
239 pages
Extra Reading Worksheet 2
No ratings yet
Extra Reading Worksheet 2
4 pages
2519 Blurring Diffusion Models
No ratings yet
2519 Blurring Diffusion Models
14 pages
Denoising Diffusion Probabilistic Models
No ratings yet
Denoising Diffusion Probabilistic Models
25 pages
Bio Paper 3 f4 Akhir Tahun
No ratings yet
Bio Paper 3 f4 Akhir Tahun
31 pages
Diffusion Models A Concise Perspective
No ratings yet
Diffusion Models A Concise Perspective
8 pages
Stamped Concrete
No ratings yet
Stamped Concrete
1 page
Bastille-Rousseau Et Al (2013)
No ratings yet
Bastille-Rousseau Et Al (2013)
9 pages
02-General Oceanography
No ratings yet
02-General Oceanography
97 pages
Mlgs 2021 Endterm Solution
No ratings yet
Mlgs 2021 Endterm Solution
26 pages
B Bob Doc Functions Def
No ratings yet
B Bob Doc Functions Def
14 pages
Understanding Diffusion Models: A Unified Perspective
No ratings yet
Understanding Diffusion Models: A Unified Perspective
23 pages
DGM 2023 Endterm Solution
No ratings yet
DGM 2023 Endterm Solution
12 pages
Lecture 2
No ratings yet
Lecture 2
24 pages
Revelation III Led
No ratings yet
Revelation III Led
2 pages
Yarn Structure and Fabric Geometry
No ratings yet
Yarn Structure and Fabric Geometry
2 pages
Time Grad
No ratings yet
Time Grad
11 pages
Ffjord: F - C D S R G M: REE Form Ontinuous Ynamics For Calable Eversible Enerative Odels
No ratings yet
Ffjord: F - C D S R G M: REE Form Ontinuous Ynamics For Calable Eversible Enerative Odels
13 pages
Score-Based Diffusion Models Via Stochastic Differential Equations - A Technical Tutorial
No ratings yet
Score-Based Diffusion Models Via Stochastic Differential Equations - A Technical Tutorial
29 pages
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
No ratings yet
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
9 pages
Family Nursing Care Plan
No ratings yet
Family Nursing Care Plan
10 pages
Atoms, Radiation, and Radiation Protection: James E. Turner
No ratings yet
Atoms, Radiation, and Radiation Protection: James E. Turner
9 pages
Basic Tech JSS1 First Term Note
No ratings yet
Basic Tech JSS1 First Term Note
46 pages
F07 Midterm Solutions
No ratings yet
F07 Midterm Solutions
9 pages
Manual de Reloj Casio 5183
No ratings yet
Manual de Reloj Casio 5183
1 page
Metre Bridge
No ratings yet
Metre Bridge
14 pages
Theory of Deep Learning 1652786371
No ratings yet
Theory of Deep Learning 1652786371
118 pages
Modelo Matemático de Pearse y Holmberg
No ratings yet
Modelo Matemático de Pearse y Holmberg
9 pages
CVPR2022 Tutorial Diffusion Model
No ratings yet
CVPR2022 Tutorial Diffusion Model
188 pages
Astm A29, A29m (2016)
No ratings yet
Astm A29, A29m (2016)
17 pages
Mister Miracle
No ratings yet
Mister Miracle
4 pages
Lesson Exemplar
No ratings yet
Lesson Exemplar
5 pages
Satellite Communications 2nd Ed by Timothy Pratt, Charles W (1) - Bostian Sample From CH 2
40% (5)
Satellite Communications 2nd Ed by Timothy Pratt, Charles W (1) - Bostian Sample From CH 2
5 pages
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture14

Uploaded by

Lecture14

Uploaded by

Deep Generative

Moscow Institute of Physics and Technology

Chen R. T. Q. et al. Neural Ordinary Differential Equations, 2018 2 / 23

Hutchinson’s trace estimator

Grathwohl W. et al. FFJORD: Free-form Continuous Dynamics for Scalable Reversible

The density p(x|θ) is a stationary dis-

Song Y. Generative Modeling by Estimating Gradients of the Data Distribution, blog

2. Noise conditioned score network

3. Gaussian diffusion process

2. Noise conditioned score network

3. Gaussian diffusion process

Likelihood-based Implicit density

Problem: we do not know ∇x log π(x).

Hyvarinen A. Estimation of non-normalized statistical models by score matching, 2005 10 / 23

Sliced score matching (Hutchinson’s trace estimation)

Song Y. Sliced Score Matching: A Scalable Approach to Density and Score

Let perturb original data x ∼ π(x) by random normal noise

x′ = x + σ · ϵ, ϵ ∼ N (0, 1), p(x′ |x, σ) = N (x′ |x, σ 2 I)

Then the solution of

satisfies s(x′ , θ, σ) ≈ s(x′ , θ, 0) = s(x, θ) if σ is small enough.

Vincent P. A Connection Between Score Matching and Denoising Autoencoders, 2010 12 / 23

Gradient of the noise kernel

Vincent P. A Connection Between Score Matching and Denoising Autoencoders, 2010 13 / 23

2. Noise conditioned score network

3. Gaussian diffusion process

▶ If σ is large, it is good for low-density regions and multimodal

Song Y. Generative Modeling by Estimating Gradients of the Data Distribution, blog

Song Y. et al. Generative Modeling by Estimating Gradients of the Data Distribution,

2. Noise conditioned score network

3. Gaussian diffusion process

Let define the reverse process

Forward process Reverse process

▶ Let treat z = (x1 , . . . , xT ) as a latent variable (note: each xt

▶ Sliced score matching and denoising score matching are two

▶ Noise conditioned score network uses multiple noise levels and

▶ Gaussian diffusion process is a Markov chain that injects

▶ Reverse process allows to sample from the real distribution

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.