Diffusion Model Clearly Explained! - by Steins - Medium
Diffusion Model Clearly Explained! - by Steins - Medium
| by Steins | Medium
Open in app
Search
Member-only story
Steins · Follow
7 min read · Dec 26, 2022
Jason Allen’s AI-generated work, “Théâtre D’opéra Spatial,” won first place in the digital category at the
Colorado State Fair. [1]
The rise of the Diffusion Model can be regarded as the main factor for the recent
breakthrough in the AI generative artworks field.
In this article, I’m going to explain how it works with illustrative diagrams.
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 1/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
Updates
[15-Jun-2023] Added detailed derivation of the mean μ̃ of q(x ₁|x , x₀) for the ₜ ₜ₋ ₜ
stepwise denoising term in Loss Function.
Table of Contents
- Overview
- Forward Diffusion Process
— Closed-Form Formula
- Reverse Diffusion Process
— Loss Function
— Simplified Loss
- The U-Net Model
— Dataset
— Training
— Reverse Diffusion
- Summary
- Appendix
- References
Overview
The training of the Diffusion Model can be divided into two parts:
The forward diffusion process gradually adds Gaussian noise to the input image x₀
step by step, and there will be T steps in total. the process will produce a sequence
of noisy image samples x₁, …, x_T.
When T → ∞, the final result will become a completely noisy image as if it is sampled
from an isotropic Gaussian distribution.
But instead of designing an algorithm to iteratively add noise to the image, we can
use a closed-form formula to directly sample a noisy image at a specific time step t.
Closed-Form Formula
The closed-form sampling formula can be derived using the Reparameterization
Trick.
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 3/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
Reparameterization trick
ₜ
Express x using the reparameterization trick
Note:
all the ε are i.i.d. (independent and identically distributed) standard normal random
variables.
It is important to distinguish them using different symbols and subscripts because they are
independent and their values could be different after sampling.
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 4/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
Some people find this step difficult to understand. Here I will show you how it
works:
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 5/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
Let’s denote these two terms using X and Y. They can be regarded as samples from
two different normal distributions. i.e. X ~ N(0, α (1-α ₁)I) and Y ~ N(0, (1-α )I).ₜ ₜ₋ ₜ
Recall that the sum of two normally distributed (independent) random variables is
also normally distributed. i.e. if Z = X + Y, then Z ~ N(0, σ²ₓ+σ²ᵧ).
Therefore we can merge them together and express the merged normal distribution
in the reparameterized form. This is how we combine the two terms.
Repeating these steps will give us the following formula which depends only on the
input image x₀:
ₜ
Now we can directly sample x at any time step using this formula, and this makes
the forward process much faster.
ₜ ₜ
Unlike the forward process, we cannot use q(x ₋₁|x ) to reverse the noise since it is
intractable (uncomputable).
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 6/26
ₜ ₜ ₜ ₜ
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
Thus we need to train a neural network pθ(x ₋₁|x ) to approximate q(x ₋₁|x ). The
ₜ ₜ
approximation pθ(x ₋₁|x ) follows a normal distribution and its mean and variance
are set as follows:
Loss Function
We can define our loss as a Negative Log-Likelihood:
Negative log-likelihood
This setup is very similar to the one in VAE. instead of optimizing the intractable
loss function itself, we can optimize the Variational Lower Bound.
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 7/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
By expanding the variational lower bound, we found that it can be represented with
the following three terms:
Since q has no learnable parameters and p is just a Gaussian noise probability, this
term will be a constant during training and thus can be ignored.
ₜ
2. L ₋₁: Stepwise denoising term
This term compares the target denoising step q and the approximated denoising
step pθ.
ₜ₋ ₜ
Note that by conditioning on x₀, the q(x ₁|x , x₀) becomes tractable.
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 8/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
ₜ
After a series of derivations, the mean μ̃ of q(x ₁|x , x₀) is shown above. ₜ₋ ₜ
For those who want to see the step-by-step derivation of the mean μ̃ (yellow box), ₜ
please see the Appendix.
To approximate the target denoising step q, we only need to approximate its mean
using a neural network. So we set the approximated mean μθ to be in the same form
ₜ
as the target mean μ̃ (with a learnable neural network εθ):
Approximated mean
The comparison between the target mean and the approximated mean can be done
using a mean squared error (MSE):
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 9/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
Experimentally, better results can be achieved by ignoring the weighting term and
simply comparing the target and predicted noises with MSE.
So, it turns out that to approximate the desired denoising step q, we just need to
ₜ
approximate the noise ε using a neural network εθ.
This is the reconstruction loss of the last denoising step and it can be ignored during
training for the following reasons:
Simplified Loss
So the final simplified training objective is as follows:
We find that training our models on the true variational bound yields better codelengths
than training on the simplified objective, as expected, but the latter yields the best sample
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 10/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
quality. [2]
1. A random time step t will be selected for each training sample (image).
Training
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 11/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 12/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
Reverse Diffusion
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 13/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
We can generate images from noises using the above algorithm. The following
diagram is an illustration of it:
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 14/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
Sampling illustration
Note that in the last step, we simply output the learned mean μθ(x₁, 1) without
adding the noise to it.
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 15/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
Summary
Here are some main takeaways from this article:
The Diffusion model is divided into two parts: forward diffusion and reverse
diffusion.
Appendix
The following is the detailed derivation of the mean μ̃ of q(x ₁|x , x₀) in the stepwise ₜ ₜ₋ ₜ
denoising term in the Loss Function section.
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 16/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
ₜ
Derivation of the mean μ̃ of q(x ₋₁|x , x₀)ₜ ₜ
References
[1] K. Roose, “An a.i.-generated picture won an art prize. artists aren’t happy.,” The
New York Times, 02-Sep-2022. [Online]. Available:
https://www.nytimes.com/2022/09/02/technology/ai-artificial-intelligence-
artists.html.
[3] N. A. Sergios Karagiannakos, “How diffusion models work: The math from
scratch,” AI Summer, 29-Sep-2022. [Online]. Available:
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 17/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium
https://theaisummer.com/diffusion-models.
[4] L. Weng, “What are diffusion models?,” Lil’Log, 11-Jul-2021. [Online]. Available:
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/.
[5] A. Seff, “What are diffusion models?,” YouTube, 20-Apr-2022. [Online]. Available:
https://www.youtube.com/watch?v=fbLgFrlTnGU.
[6] Outlier, “Diffusion models | paper explanation | math explained,” YouTube, 06-
Jun-2022. [Online]. Available: https://www.youtube.com/watch?v=HoKDTa5jHvg.
Follow
Written by Steins
966 Followers · 142 Following
Developer & AI Researcher. Write about AI, web dev/hack. Be my referred member:
https://medium.com/@steinsfu/membership. Support me: https://ko-fi.com/steinsfu
Responses (15)
Respond
Vishal Singh
Mar 14, 2023
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 18/26