0% found this document useful (0 votes)
31 views18 pages

Diffusion Model Clearly Explained! - by Steins - Medium

The document explains the Diffusion Model, which is crucial for the advancement of AI-generated art, detailing its two main processes: forward diffusion (adding noise) and reverse diffusion (removing noise). It discusses the mathematical foundations, including the use of closed-form formulas and neural networks for approximating denoising steps. Key takeaways include the importance of training on simplified loss functions for better sample quality.

Uploaded by

webbyronut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views18 pages

Diffusion Model Clearly Explained! - by Steins - Medium

The document explains the Diffusion Model, which is crucial for the advancement of AI-generated art, detailing its two main processes: forward diffusion (adding noise) and reverse diffusion (removing noise). It discusses the mathematical foundations, including the use of closed-form formulas and neural networks for approximating denoising steps. Key takeaways include the importance of training on simplified loss functions for better sample quality.

Uploaded by

webbyronut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

25/02/2025, 10:25 Diffusion Model Clearly Explained!

| by Steins | Medium

Open in app

Search

Member-only story

Diffusion Model Clearly Explained!


How does AI artwork work? Understanding the tech behind the rise of AI-generated
art.

Steins · Follow
7 min read · Dec 26, 2022

Listen Share More

Jason Allen’s AI-generated work, “Théâtre D’opéra Spatial,” won first place in the digital category at the
Colorado State Fair. [1]

The rise of the Diffusion Model can be regarded as the main factor for the recent
breakthrough in the AI generative artworks field.

In this article, I’m going to explain how it works with illustrative diagrams.

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 1/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Updates
[15-Jun-2023] Added detailed derivation of the mean μ̃ of q(x ₁|x , x₀) for the ₜ ₜ₋ ₜ
stepwise denoising term in Loss Function.

Table of Contents
- Overview
- Forward Diffusion Process
— Closed-Form Formula
- Reverse Diffusion Process
— Loss Function
— Simplified Loss
- The U-Net Model
— Dataset
— Training
— Reverse Diffusion
- Summary
- Appendix
- References

Overview

Overview of the Diffusion Model


https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 2/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

The training of the Diffusion Model can be divided into two parts:

1. Forward Diffusion Process → add noise to the image.

2. Reverse Diffusion Process → remove noise from the image.

Forward Diffusion Process

Forward diffusion process

The forward diffusion process gradually adds Gaussian noise to the input image x₀
step by step, and there will be T steps in total. the process will produce a sequence
of noisy image samples x₁, …, x_T.

When T → ∞, the final result will become a completely noisy image as if it is sampled
from an isotropic Gaussian distribution.

But instead of designing an algorithm to iteratively add noise to the image, we can
use a closed-form formula to directly sample a noisy image at a specific time step t.

Closed-Form Formula
The closed-form sampling formula can be derived using the Reparameterization
Trick.

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 3/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Reparameterization trick

With this trick, we can express the sampled image x as follows: ₜ


Express x using the reparameterization trick

Then we can expand it recursively to get the closed-form formula:

Derivation of the closed-form formula

Note:
all the ε are i.i.d. (independent and identically distributed) standard normal random
variables.

It is important to distinguish them using different symbols and subscripts because they are
independent and their values could be different after sampling.

But how do we jump from line 4 to line 5?

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 4/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Some people find this step difficult to understand. Here I will show you how it
works:

Detailed derivation from line 4 to line 5

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 5/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Let’s denote these two terms using X and Y. They can be regarded as samples from
two different normal distributions. i.e. X ~ N(0, α (1-α ₁)I) and Y ~ N(0, (1-α )I).ₜ ₜ₋ ₜ
Recall that the sum of two normally distributed (independent) random variables is
also normally distributed. i.e. if Z = X + Y, then Z ~ N(0, σ²ₓ+σ²ᵧ).

Therefore we can merge them together and express the merged normal distribution
in the reparameterized form. This is how we combine the two terms.

Repeating these steps will give us the following formula which depends only on the
input image x₀:

The closed-form formula


Now we can directly sample x at any time step using this formula, and this makes
the forward process much faster.

Reverse Diffusion Process

Reverse diffusion process

ₜ ₜ
Unlike the forward process, we cannot use q(x ₋₁|x ) to reverse the noise since it is
intractable (uncomputable).

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 6/26
ₜ ₜ ₜ ₜ
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Thus we need to train a neural network pθ(x ₋₁|x ) to approximate q(x ₋₁|x ). The
ₜ ₜ
approximation pθ(x ₋₁|x ) follows a normal distribution and its mean and variance
are set as follows:

mean and variance of pθ

Loss Function
We can define our loss as a Negative Log-Likelihood:

Negative log-likelihood

This setup is very similar to the one in VAE. instead of optimizing the intractable
loss function itself, we can optimize the Variational Lower Bound.

Variational Lower Bound

By optimizing a computable lower bound, we can indirectly optimize the intractable


loss function.

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 7/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Derivation and expansion of the Variational Lower Bound

By expanding the variational lower bound, we found that it can be represented with
the following three terms:

1. L_T: Constant term

Since q has no learnable parameters and p is just a Gaussian noise probability, this
term will be a constant during training and thus can be ignored.


2. L ₋₁: Stepwise denoising term

This term compares the target denoising step q and the approximated denoising
step pθ.

ₜ₋ ₜ
Note that by conditioning on x₀, the q(x ₁|x , x₀) becomes tractable.

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 8/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Details of the stepwise denoising term


After a series of derivations, the mean μ̃ of q(x ₁|x , x₀) is shown above. ₜ₋ ₜ
For those who want to see the step-by-step derivation of the mean μ̃ (yellow box), ₜ
please see the Appendix.

To approximate the target denoising step q, we only need to approximate its mean
using a neural network. So we set the approximated mean μθ to be in the same form

as the target mean μ̃ (with a learnable neural network εθ):

Approximated mean

The comparison between the target mean and the approximated mean can be done
using a mean squared error (MSE):

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 9/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Derivation of the simplified stepwise denoising loss

Experimentally, better results can be achieved by ignoring the weighting term and
simply comparing the target and predicted noises with MSE.

So, it turns out that to approximate the desired denoising step q, we just need to

approximate the noise ε using a neural network εθ.

3. L₀: Reconstruction term

This is the reconstruction loss of the last denoising step and it can be ignored during
training for the following reasons:

It can be approximated using the same neural network in L ₋₁. ₜ


Ignoring it makes the sample quality better and makes it simpler to implement.

Simplified Loss
So the final simplified training objective is as follows:

Simplified training objective

We find that training our models on the true variational bound yields better codelengths
than training on the simplified objective, as expected, but the latter yields the best sample

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 10/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

quality. [2]

The U-Net Model


Dataset
In each epoch:

1. A random time step t will be selected for each training sample (image).

2. Apply the Gaussian noise (corresponding to t) to each image.

3. Convert the time steps to embeddings (vectors).

Dataset for training

Training

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 11/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Training algorithm [2]

The official training algorithm is as above, and the following diagram is an


illustration of how a training step works:

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 12/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Training step illustration

Reverse Diffusion

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 13/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Sampling algorithm [2]

We can generate images from noises using the above algorithm. The following
diagram is an illustration of it:

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 14/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Sampling illustration

Note that in the last step, we simply output the learned mean μθ(x₁, 1) without
adding the noise to it.
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 15/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

Summary
Here are some main takeaways from this article:

The Diffusion model is divided into two parts: forward diffusion and reverse
diffusion.

The forward diffusion can be done using the closed-form formula.

The backward diffusion can be done using a trained neural network.

To approximate the desired denoising step q, we just need to approximate the



noise ε using a neural network εθ.

Training on the simplified loss function yields better sample quality.

Appendix
The following is the detailed derivation of the mean μ̃ of q(x ₁|x , x₀) in the stepwise ₜ ₜ₋ ₜ
denoising term in the Loss Function section.

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 16/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium


Derivation of the mean μ̃ of q(x ₋₁|x , x₀)ₜ ₜ
References
[1] K. Roose, “An a.i.-generated picture won an art prize. artists aren’t happy.,” The
New York Times, 02-Sep-2022. [Online]. Available:
https://www.nytimes.com/2022/09/02/technology/ai-artificial-intelligence-
artists.html.

[2] J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic models,”


arXiv.org, 16-Dec-2020. [Online]. Available: https://arxiv.org/abs/2006.11239.

[3] N. A. Sergios Karagiannakos, “How diffusion models work: The math from
scratch,” AI Summer, 29-Sep-2022. [Online]. Available:
https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 17/26
25/02/2025, 10:25 Diffusion Model Clearly Explained! | by Steins | Medium

https://theaisummer.com/diffusion-models.

[4] L. Weng, “What are diffusion models?,” Lil’Log, 11-Jul-2021. [Online]. Available:
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/.

[5] A. Seff, “What are diffusion models?,” YouTube, 20-Apr-2022. [Online]. Available:
https://www.youtube.com/watch?v=fbLgFrlTnGU.

[6] Outlier, “Diffusion models | paper explanation | math explained,” YouTube, 06-
Jun-2022. [Online]. Available: https://www.youtube.com/watch?v=HoKDTa5jHvg.

Ai Art AI Diffusion Models Deep Learning Machine Learning

Follow

Written by Steins
966 Followers · 142 Following

Developer & AI Researcher. Write about AI, web dev/hack. Be my referred member:
https://medium.com/@steinsfu/membership. Support me: https://ko-fi.com/steinsfu

Responses (15)

What are your thoughts?

Respond

Vishal Singh
Mar 14, 2023

https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166 18/26

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy