0% found this document useful (0 votes)

11 views31 pages

Solving Inverse Problems

The paper presents ReSample, an algorithm that utilizes latent diffusion models (LDMs) to effectively solve inverse problems by incorporating hard data consistency during the reverse sampling process. This approach addresses challenges posed by the nonlinearity of the encoder and decoder in LDMs, allowing for improved performance on various linear and nonlinear inverse problems in both natural and medical imaging. The authors demonstrate that ReSample outperforms existing state-of-the-art methods, particularly those reliant on pixel-space diffusion models.

Uploaded by

wenxuan fang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views31 pages

Solving Inverse Problems

Uploaded by

wenxuan fang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Published as a conference paper at ICLR 2024

S OLVING I NVERSE P ROBLEMS WITH L ATENT D IFFU -

SION M ODELS VIA H ARD DATA C ONSISTENCY

Bowen Song1∗, Soo Min Kwon1∗ , Zecheng Zhang2 , Xinyu Hu3 , Qing Qu1 Liyue Shen1
1
University of Michigan, 2 Kumo.AI, 3 Microsoft

A BSTRACT

Latent diffusion models have been demonstrated to generate high-quality images,

while offering efficiency in model training compared to diffusion models operat-
ing in the pixel space. However, incorporating latent diffusion models to solve
inverse problems remains a challenging problem due to the nonlinearity of the en-
coder and decoder. To address these issues, we propose ReSample, an algorithm
that can solve general inverse problems with pre-trained latent diffusion models.
Our algorithm incorporates data consistency by solving an optimization problem
during the reverse sampling process, a concept that we term as hard data consis-
tency. Upon solving this optimization problem, we propose a novel resampling
scheme to map the measurement-consistent sample back onto the noisy data man-
ifold and theoretically demonstrate its benefits. Lastly, we apply our algorithm to
solve a wide range of linear and nonlinear inverse problems in both natural and
medical images, demonstrating that our approach outperforms existing state-of-
the-art approaches, including those based on pixel-space diffusion models.

1 I NTRODUCTION
Inverse problems arise from a wide range of applications across many domains, including compu-
tational imaging (Beck & Teboulle, 2009; Afonso et al., 2011), medical imaging (Suetens, 2017;
Ravishankar et al., 2019), and remote sensing (Liu et al., 2021; 2022), to name a few. When solv-
ing these inverse problems, the goal is to reconstruct an unknown signal x∗ ∈ Rn given observed
measurements y ∈ Rm of the form
y = A(x∗ ) + η,
where A(·) : Rn → Rm denotes some forward measurement operator (can be linear or nonlinear)
and η ∈ Rm is additive noise. Usually, we are interested in the case when m < n, which follows
many real-world scenarios. When m < n, the problem is ill-posed and some kind of regularizer (or
prior) is necessary to obtain a meaningful solution.
In the literature, the traditional approach of using hand-crafted priors (e.g. sparsity) is slowly being
replaced by rich, learned priors such as deep generative models. Recently, there has been a lot of
interests in using diffusion models as structural priors due to their state-of-the-art performance in
image generation (Dhariwal & Nichol, 2021a; Karras et al., 2022; Song et al., 2023b; Lou & Ermon,
2023a). Compared to generative adversarial networks (GANs), diffusion models are generally easier
and more stable to train, making them a generative prior that is more readily accessible (Dhariwal
& Nichol, 2021b). The most common approach for using diffusion models as priors is to resort to
posterior sampling, which has been extensively explored in the literature (Song et al., 2022; Chung
et al., 2023a; 2022; Kawar et al., 2022; Song et al., 2023a; Chung et al., 2023b; Meng & Kabashima,
2022; Zhang & Zhou, 2023). However, despite their remarkable success, these techniques exhibit
several limitations. The primary challenge is that the majority of existing works train these models
directly in the pixel space, which requires substantial computational resources and a large volume
of training data (Rombach et al., 2022).
Latent diffusion models (LDMs), which embed data in order to operate in a lower-dimensional
space, present a potential solution to this challenge, along with considerable improvements in com-
putational efficiency (Rombach et al., 2022; Vahdat et al., 2021) by training diffusion models in a
∗
Equal Contribution; Corresponding authors {bowenbw, kwonsm}@umich.edu

1
Published as a conference paper at ICLR 2024

Nonlinear Deblurring CT Reconstruction

Measurement Reference ReSample (Ours) Measurement Reference ReSample (Ours)

Figure 1: Example reconstructions of our algorithm (ReSample) on two noisy inverse problems,
nonlinear deblurring and CT reconstruction, on natural and medical images, respectively.

compressed latent space. They can also provide a great amount of flexibility, as they can enable
one to transfer and generalize these models to different domains by fine-tuning on small amounts
of training data (Ruiz et al., 2023). Nevertheless, using LDMs to solve inverse problems poses a
significant challenge. The main difficulty arises from the inherent nonlinearity and nonconvexity
of the decoder, making it challenging to directly apply existing solvers designed for pixel space.
To address this issue, a concurrent work by Rout et al. (2023) recently introduced a posterior sam-
pling algorithm operating in the latent space (PSLD), designed to solve linear inverse problems with
provable guarantees. However, we observe that PSLD may reconstruct images with artifacts in the
presence of measurement noise. Therefore, developing an efficient algorithm capable of addressing
these challenges remains an open research question.
In this work, we introduce a novel algorithm named ReSample, which effectively employs LDMs
as priors for solving general inverse problems. Our algorithm can be viewed as a two-stage process
that incorporates data consistency by (1) solving a hard-constrained optimization problem, ensuring
we obtain the correct latent variable that is consistent with the observed measurements, and (2)
employing a carefully designed resampling scheme to map the measurement-consistent sample back
onto the correct noisy data manifold. As a result, we show that our algorithm can achieve state-of-
the-art performance on various inverse problem tasks and different datasets, compared to existing
algorithms. Notably, owing to using the latent diffusion models as generative priors, our algorithm
achieves a reduction in memory complexity. Below, we highlight some of our key contributions.

• We propose a novel algorithm that enables us to leverage latent diffusion models for solving
general inverse problems (linear and nonlinear) through hard data consistency.
• Particularly, we carefully design a stochastic resampling scheme that can reliably map the
measurement-consistent samples back onto the noisy data manifold to continue the reverse sam-
pling process. We provide a theoretical analysis to further demonstrate the superiority of the
proposed stochastic resampling technique.
• With extensive experiments on multiple tasks and various datasets, encompassing both natural
and medical images, our proposed algorithm achieves state-of-the-art performance on a variety of
linear and nonlinear inverse problems.

2 BACKGROUND

Denoising Diffusion Probabilistic Models. We first briefly review the basic fundamentals of dif-
fusion models, namely the denoising diffusion probabilistic model (DDPM) formulation (Ho et al.,
2020). Let x0 ∼ pdata (x) denote samples from the data distribution. Diffusion models start by
progressively perturbing data to noise via Gaussian kernels, which can be written as the variance-
preserving stochastic differential equation (VP-SDE) (Song et al., 2021) of the form
βt p
dx = − xdt + βt dw, (1)
2
where βt ∈ (0, 1) is the noise schedule that is a monotonically increasing sequence of t and w is the
standard Wiener process. This is generally defined such that we obtain the data distribution when
t = 0 and obtain a Gaussian distribution when t = T , i.e. xT ∼ N (0, I). The objective of diffusion

2
Published as a conference paper at ICLR 2024

unconditional
sample

reverse predict
process
encoder Latent
noise optimization
Space
ReSample

reverse
process
data consistent
estimated decoder sample
image

Our Algorithm

Latent Space
measurement

Figure 2: Overview of our ReSample algorithm during the reverse sampling process condi-
tioned on the data constraints from measurement. The entire sampling process is conducted in
the latent space upon passing the sample through the encoder. The proposed algorithm performs
hard data consistency at some time steps t via a skipped-step mechanism.

models is to learn the corresponding reverse SDE of Equation (1), which is of the form

βt p
dx = − x − βt ∇xt log p(xt ) dt + βt dw̄, (2)
2
where dw̄ is the standard Wiener process running backward in time and ∇xt log p(xt ) is the (Stein)
score function. In practice, we approximate the score function using a neural network sθ parameter-
ized by θ, which can be trained via denoising score matching (Vincent, 2011):
θ̂ = arg min E ∥sθ (xt , t) − ∇xt log p(xt |x0 )∥22 ,

(3)
θ

where t is uniformly sampled from [0, T ] and the expectation is taken over t, xt ∼ p(xt |x0 ), and
x0 ∼ pdata (x). Once we have access to the parameterized score function sθ , we can use it to
approximate the reverse-time SDE and simulate it using numerical solvers (e.g. Euler-Maruyama).

Denoising Diffusion Implicit Models. As the DDPM formulation is known to have a slow sam-
pling process, Song et al. (2020) proposed denoising diffusion implicit models (DDIMs) that defines
the diffusion process as a non-Markovian process to remedy this (Ho et al., 2020; Song et al., 2020;
2023b; Lu et al., 2022). This enables a faster sampling process with the sampling steps given by
√
q
xt−1 = ᾱt−1 x̂0 (xt ) + 1 − ᾱt−1 − ηδt2 sθ (xt , t) + ηδt ϵ, t = T, . . . , 0, (4)
Qt
where αt = 1 − βt , ᾱt = i=1 αi , ϵ ∼ N (0, I), η is the temperature parameter, δt controls the
stochasticity of the update step, and x̂0 (xt ) denotes the predicted x0 from xt which takes the form
1 √
x̂0 (xt ) = √ (xt + 1 − ᾱt sθ (xt , t)), (5)
ᾱt
which is an application of Tweedie’s formula. Here, sθ is usually trained using the epsilon-matching
score objective (Song et al., 2020). We use DDIM as the backbone of our algorithm and show how
we can leverage these update steps for solving inverse problems.

Solving Inverse Problems with Diffusion Models. Given measurements y ∈ Rm from some
forward measurement operator A(·), we can use diffusion models to solve inverse problems by
replacing the score function in Equation (2) with the conditional score function ∇xt log p(xt |y).
Then by Bayes rule, notice that we can write the conditional score as
∇xt log p(xt |y) = ∇xt log p(xt ) + ∇xt log p(y|xt ).
This results in the reverse SDE of the form

βt p
dx = − x − βt (∇xt log p(xt ) + ∇xt log p(y|xt )) dt + βt dw̄.
2

3
Published as a conference paper at ICLR 2024

In the literature, solving this reverse SDE is referred as posterior sampling. However, the issue
with posterior sampling is that there does not exist an analytical formulation for the likelihood term
∇xt log p(y|xt ). To resolve this, there exists two lines of work: (1) to resort to alternating pro-
jections onto the measurement subspace to avoid using the likelihood directly (Chung et al., 2022;
Kawar et al., 2022; Wang et al., 2022) and (2) to estimate the likelihood under some mild assump-
tions (Chung et al., 2023a; Song et al., 2023a). For example, Chung et al. (2023a) proposed diffusion
posterior sampling (DPS) that uses a Laplacian approximation of the likelihood, which results in the
discrete update steps
√
q
xt−1 = ᾱt−1 x̂0 (xt ) + 1 − ᾱt−1 − ηδt2 sθ (xt , t) + ηδt ϵ (6)
xt−1 = x′t−1 − ζ∇xt ∥y − A(x̂0 (xt ))∥22 , (7)
where ζ ∈ R can be viewed as a tunable step-size. However, as previously mentioned, these tech-
niques have limited applicability for real-world problems as they are all built on the pixel space.

Solving Inverse Problems with Latent Diffusion Models. The limited applicability of pixel-
based diffusion models can be tackled by alternatively utilizing more efficient LDMs as generative
priors. The setup for LDMs is the following: given an image x ∈ Rn , we have an encoder E : Rn →
Rk and a decoder D : Rk → Rn where k ≪ n. Let z = E(x) ∈ Rk denote the embedded samples
in the latent space. One way of incorporating LDMs to solve inverse problems would be to replace
the update steps in Equation (6) with
√
q
′
zt−1 = ᾱt−1 ẑ0 (zt ) + 1 − ᾱt−1 − ηδt2 sθ (zt , t) + ηδt ϵ, (8)
′
zt−1 = zt−1 − ζ∇zt ∥y − A(D(ẑ0 (zt )))∥22 , (9)
After incorporating LDMs, this can be viewed as a non-linear inverse problem due to the non-
linearity of the decoder D(·). As this builds upon the idea behind DPS, we refer to this algorithm as
Latent-DPS. While this formulation seems to work, we empirically observe that Latent-DPS often
produces reconstructions that are often noisy or blurry and inconsistent with the measurements. We
conjecture that since the forward operator involving the decoder is highly nonconvex, the gradient
update may lead towards a local minimum. We provide more insights in Appendix Section D.

3 R E S AMPLE : I NVERSE P ROBLEMS USING L ATENT D IFFUSION M ODELS

3.1 P ROPOSED M ETHOD

Hard Data Consistency. Similar to Latent-DPS, our algorithm involves incorporating data con-
sistency into the reverse sampling process of LDMs. However, rather than a gradient update as
shown in Equation (9), we propose to solve an optimization problem on some time steps t:
1
ẑ0 (y) ∈ arg min ∥y − A(D(z))∥22 , (10)
z 2
where we denote ẑ0 (y) as the sample consistent with the measurements y ∈ Rm . This optimization
problem has been previously explored in other works that use GANs for solving inverse problems,
and can be efficiently solved using iterative solvers such as gradient descent (Bora et al., 2017;
Jalal et al., 2021; Shah et al., 2021; Lempitsky et al., 2018). However, it is well known that solv-
ing this problem starting from a random initial point may lead to unfavorable local minima (Bora
et al., 2017). To address this, we solve Equation (10 starting from an initial point ẑ0 (zt+1 ), where
ẑ0 (zt+1 ) is the estimate of ground-truth latent vector at time 0 based on the sample at time t + 1.
The intuition behind this initialization is that we want to start the optimization process within local
proximity of the global solution of Equation (10), to prevent resulting in a local minimum. We term
this overall concept as hard data consistency, as we strictly enforce the measurements via optimiza-
tion, rather than a “soft” approach through gradient update like Latent-DPS. To obtain ẑ0 (zt+1 ), we
use Tweedie’s formula (Efron, 2011) that gives us an approximation of the posterior mean which
takes the following formula:
1
ẑ0 (zt ) = E[z0 |zt ] = √ (zt + (1 − ᾱt )∇ log p(zt )). (11)
ᾱt

4
Published as a conference paper at ICLR 2024

Algorithm 1 ReSample: Solving Inverse Problems with Latent Diffusion Models

Require: Measurements y, A(·), Encoder E(·), Decoder D(·), Score function sθ (·, t), Pretrained
LDM Parameters βt , ᾱt , η, δ, Hyperparameter γ to control σt2 , Time steps to perform resample C
zT ∼ N (0, I) ▷ Initial noise vector
for t = T − 1, . . . , 0 do
ϵ 1 ∼ N (0, I)
ϵ̂t+1 = sθ (zt+1 , t + 1) √ ▷ Compute the score
ẑ0 (zt+1 ) = √ᾱ1t+1 (zt+1 − 1 − ᾱt+1 ϵ̂t+1 ) ▷ Predict ẑ0 using Tweedie’s formula
√ p
zt′ = ᾱt ẑ0 (zt+1 ) + 1 − ᾱt − ηδ 2 ϵ̂t+1 + ηδϵϵ1 ▷ Unconditional DDIM step
if t ∈ C then ▷ ReSample time step
ẑ0 (y) ∈ arg min 12 ∥y − A(D(z))∥22 ▷ Solve with initial point ẑ0 (zt+1 )
z
zt = StochasticResample(ẑ0 (y), zt′ , γ) ▷ Map back to t
else
zt = zt′ ▷ Unconditional sampling if not resampling
x0 = D(z0 ) ▷ Output reconstructed image

However, we would like to note that performing hard data consistency on every reverse sampling
iteration t may be very costly. To address this, we first observe that as we approach t = T , the
estimated ẑ0 (zt+1 ) can deviate significantly from the ground truth. In this regime, we find that
hard data consistency provides only marginal benefits. Additionally, in the literature, existing works
point out the existence of a three-stage phenomenon (Yu et al., 2023), where they demonstrate that
data consistency is primarily beneficial for the semantic and refinement stages (the latter two stages
when t is closer to 0) of the sampling process. Following this reasoning, we divide T into three
sub-intervals and only apply the optimization in the latter two intervals. This approach provides
both computational efficiency and accurate estimates of ẑ0 (zt+1 ).
Furthermore, even during these two intervals, we observe that we do not need to solve the opti-
mization problem on every iteration t. Because of the continuity of the sampling process, after each
data-consistency optimization step, the samples in the following steps can retain similar semantic or
structural information to some extent. Thus, we “reinforce” the data consistency constraint during
the sampling process via a skipped-step mechanism. Empirically, we see that it is sufficient to per-
form this on every 10 (or so) iterations of t. One can think of hard data consistency as guiding the
sampling process towards the ground truth signal x∗ (or respectively z ∗ ) such that it is consistent
with the given measurements. Lastly, in the presence of measurement noise, minimizing Equa-
tion (10) to zero loss can lead to overfitting the noise. To remedy this, we perform early stopping,
where we only minimize up to a threshold τ based on the noise level. We will discuss the details of
the optimization process in the Appendix. We also observe that an additional Latent-DPS step after
unconditional sampling can (sometimes) marginally increase the overall performance. We perform
an ablation study on the performance of including Latent-DPS in the Appendix.

Remapping Back to zt . Following the flowchart in Figure 2, the next step is to map the
measurement-consistent sample ẑ0 (y) back onto the data manifold defined by the noisy samples
at time t to continue the reverse sampling process. Doing so would be equivalent to computing
the posterior distribution p(zt |y). To incorporate ẑ0 (y) into the posterior, we propose to construct
an auxiliary distribution p(ẑt |ẑ0 (y), y) to replace p(zt |y). Here, ẑt denotes the remapped sam-
ple and zt′ denotes the unconditional sample before remapping. One simple way of computing this
distribution to obtain ẑt is shown in Proposition 1.
Proposition 1 (Stochastic Encoding). Since the sample ẑt given ẑ0 (y) and measurement y is con-
ditionally independent of y, we have that
√
p(ẑt |ẑ0 (y), y) = p(ẑt |ẑ0 (y)) = N ( ᾱt ẑ0 (y), (1 − ᾱt )I). (12)

We defer all of the proofs to the Appendix. Proposition 1 provides us a way of computing ẑt , which
we refer to as stochastic encoding. However, we observe that using stochastic encoding can incur
a high variance when t is farther away from t = 0, where the ground truth signal exists. This large
variance can often lead to noisy image reconstructions. To address this issue, we propose a posterior

5
Published as a conference paper at ICLR 2024

Super Resolution 4× Inpainting (Random 70%)

Method
LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑
DPS Chung et al. (2023a) 0.173 ± 0.04 28.41 ± 2.20 0.782 ± 0.06 0.102 ± 0.02 32.48 ± 2.30 0.899 ± 0.03
MCG Chung et al. (2022) 0.193 ± 0.03 25.92 ± 2.35 0.740 ± 0.05 0.134 ± 0.03 29.53 ± 2.70 0.847 ± 0.05
ADMM-PnP Ahmad et al. (2019) 0.304 ± 0.04 21.08 ± 3.13 0.631 ± 0.11 0.627 ± 0.07 15.40 ± 2.09 0.342 ± 0.09
DDRM Kawar et al. (2022) 0.151 ± 0.03 29.49 ± 1.93 0.817 ± 0.05 0.166 ± 0.03 27.69 ± 1.54 0.798 ± 0.04
DMPS Meng & Kabashima (2022) 0.147 ± 0.03 28.48 ± 1.92 0.811 ± 0.05 0.175 ± 0.03 28.84 ± 1.65 0.826 ± 0.03
Latent-DPS 0.272 ± 0.05 26.83 ± 2.00 0.690 ± 0.07 0.226 ± 0.04 26.23 ± 1.84 0.703 ± 0.07
∗
PSLD-LDM Rout et al. (2023) 0.209 ± 0.10 27.61 ± 2.95 0.704 ± 0.17 0.260 ± 0.08 27.07 ± 2.45 0.689 ± 0.11
ReSample (Ours) 0.144 ± 0.029 30.45 ± 2.09 0.832 ± 0.05 0.082 ± 0.02 32.77 ± 2.23 0.903 ± 0.03

Table 1: Quantitative results of super resolution and inpainting on the CelebA-HQ dataset.
Input images have an additive Gaussian noise with σy = 0.01. Best results are in bold and second
best results are underlined.
sampling technique that reduces the variance by additionally conditioning on zt′ , the unconditional
sample at time t. Here, the intuition is that by using information of zt′ , we can get closer to the
ground truth zt , which effectively reduces the variance. In Lemma 2, under some mild assumptions,
we show that this new distribution p(ẑt |zt′ , ẑ0 (y), y) is a tractable Gaussian distribution.
Proposition 2 (Stochastic Resampling). Suppose that p(zt′ |ẑt , ẑ0 (y), y) is normally distributed
such that p(zt′ |ẑt , ẑ0 (y), y) = N (µt , σt2 ). If we let p(zˆt |ẑ0 (y), y) be a prior for µt , then the
posterior distribution p(ẑt |zt′ , ẑ0 (y), y) is given by
2√
σt ᾱt ẑ0 (y) + (1 − ᾱt )zt′ σt2 (1 − ᾱt )

p(ẑt |zt′ , ẑ0 (y), y) = N , I . (13)
σt2 + (1 − ᾱt ) σt2 + (1 − ᾱt )
We refer to this new mapping technique as stochastic resampling. Since we do not have access to
σt2 , it serves as a hyperparameter that we tune in our algorithm. The choice of σt2 plays a role of
controlling the tradeoff between prior consistency and data consistency. If σt2 → 0, then we recover
unconditional sampling, and if σt2 → ∞, we recover stochastic encoding. We observe that this new
technique also has several desirable properties, for which we rigorously prove in the next section.

3.2 T HEORETICAL R ESULTS

In Section 3.1, we discussed that stochastic resampling induces less variance than stochastic encod-
ing. Here, we aim to rigorously prove the validity of this statement.
Lemma 1. Let z̃t and ẑt denote the stochastically encoded and resampled image of ẑ0 (y), respec-
tively. If VAR(zt′ ) > 0, then we have that VAR(ẑt ) < VAR(z̃t ).
Theorem 1. If ẑ0 (y) is measurement-consistent such that y = A(D(ẑ0 (y))), i.e. ẑ0 = ẑ0 (zt+1 ) =
ẑ0 (y), then stochastic resample is unbiased such that E[ẑt |y] = E[zt′ ].
These two results, Lemma 1 and Theorem 1, prove the benefits of stochastic resampling. At a
high-level, these proofs rely on the fact the posterior distributions of both stochastic encoding and
resampling are Gaussian and compare their respective means and variances. In the following result,
we characterize the variance induced by stochastic resampling, and show that as t → 0, the variance
decreases, giving us a reconstructed image that is of better quality.
Theorem 2. Let z0 denote a sample from the data distribution and zt be a sample from the noisy
perturbed distribution at time t. Then,
(1 − ᾱt )2 2 1 − ᾱt
Cov(z0 |zt ) = ∇zt log pzt (zt ) + I.
ᾱt ᾱt
By Theorem 2, notice that since as αt is an increasing sequence that converges to 1 as t decreases,
the variance between the ground truth z0 and the estimated ẑ0 decreases to 0 as t → 0, assuming
that ∇2zt log pzt (zt ) < ∞. Following our theory, we empirically show that stochastic resampling
can reconstruct signals that are less noisy than stochastic encoding, as shown in the next section.

4 E XPERIMENTS
We conduct experiments to solve both linear and nonlinear inverse problems on natural and medical
images. We compare our algorithm to several state-of-the-art methods that directly apply the diffu-
sion models that are trained in the pixel space: DPS (Chung et al., 2023a), Manifold Constrained

6
Published as a conference paper at ICLR 2024

Nonlinear Deblurring Gaussian Deblurring

Method
LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑
DPS Chung et al. (2023a) 0.230 ± 0.065 26.81 ± 2.84 0.720 ± 0.077 0.175 ± 0.03 28.36 ± 2.12 0.772 ± 0.07
MCG Chung et al. (2022) - - - 0.517 ± 0.06 15.85 ± 1.08 0.536 ± 0.08
ADMM-PnP Ahmad et al. (2019) 0.499 ± 0.073 16.17 ± 4.01 0.359 ± 0.140 0.289 ± 0.04 20.98 ± 4.51 0.602 ± 0.15
DDRM Kawar et al. (2022) - - - 0.193 ± 0.04 26.88 ± 1.96 0.747 ± 0.07
DMPS Meng & Kabashima (2022) - - - 0.206 ± 0.04 26.45 ± 1.83 0.726 ± 0.07
Latent-DPS 0.225 ± 0.04 26.18 ± 1.73 0.703 ± 0.07 0.205 ± 0.04 27.42 ± 1.84 0.729 ± 0.07
PSLD-LDM∗ Rout et al. (2023) - - - 0.323 ± 0.09 24.21 ± 2.79 0.548 ± 0.12
ReSample (Ours) 0.153 ± 0.03 30.18 ± 2.21 0.828 ± 0.05 0.148 ± 0.04 30.69 ± 2.14 0.832 ± 0.05

Table 2: Quantitative results of Gaussian and nonlinear deblurring on the CelebA-HQ dataset.
Input images have an additive Gaussian noise with σy = 0.01. Best results are in bold and second
best results are underlined. For nonlinear deblurring, some baselines are omitted, as they can only
solve linear inverse problems.
Abdominal Head Chest
Method
PSNR↑ SSIM↑ PSNR↑ SSIM↑ PSNR↑ SSIM↑
Latent-DPS 26.80 ±1.09 0.870 ±0.026 28.64 ±5.38 0.893 ±0.058 25.67±1.14 0.822 ±0.033
MCG (Chung et al., 2022) 29.41 ±3.14 0.857 ±0.041 28.28 ±3.08 0.795 ±0.116 27.92 ±2.48 0.842 ±0.036
DPS (Chung et al., 2023a) 27.33 ±2.68 0.715 ± 0.031 24.51 ±2.77 0.665 ±0.058 24.73 ±1.84 0.682 ±0.113
PnP-UNet (Gilton et al., 2021) 32.84 ±1.29 0.942 ±0.008 33.45 ±3.25 0.945 ±0.023 29.67 ±1.14 0.891 ±0.011
FBP 26.29 ±1.24 0.727 ±0.036 26.71 ±5.02 0.725 ±0.106 24.12 ±1.14 0.655 ±0.033
FBP-UNet (Jin et al., 2017) 32.77 ±1.21 0.937 ±0.013 31.95 ±3.32 0.917 ±0.048 29.78 ±1.12 0.885 ±0.016
ReSample (Ours) 35.91 ±1.22 0.965 ±0.007 37.82 ±5.31 0.978 ±0.014 31.72 ±0.912 0.922 ±0.011

Table 3: Quantitative results of CT reconstruction on the LDCT dataset. Best results are in bold
and second best results are underlined.
Gradients (MCG) (Chung et al., 2022), Denoising Diffusion Destoration Models (DDRM) (Kawar
et al., 2022), Diffusion Model Posterior Sampling (DMPS) (Meng & Kabashima, 2022). Then we
compare an algorithm that uses a plug-and-play approach that apply a pretrained deep denoiser
for inverse problems (ADMM-PnP) (Ahmad et al., 2019). We also compare our algorithm to
Latent-DPS and Posterior Sampling with Latent Diffusion (PSLD) (Rout et al., 2023), a concurrent
work we recently notice also tackling latent diffusion models. Various quantitative metrics are
used for evaluation including Learned Perceptual Image Patch Similarity (LPIPS) distance, peak
signal-to-noise-ratio (PSNR), and structural similarity index (SSIM). Lastly, we conduct ablation
study to compare the performance of stochastic encoding and our proposed stochastic resampling
technique as mentioned in Section 3.1, and also demonstrate the memory efficiency gained by
leveraging LDMs.

Experiments on Natural Images. For the experiments on natural images, we use datasets
FFHQ (Kazemi & Sullivan, 2014), CelebA-HQ (Liu et al., 2015), and LSUN-Bedroom (Yu et al.,
2016) with the image resolution of 256×256×3. We take pre-trained latent diffusion models LDM-
VQ4 trained on FFHQ and CelebA-HQ provided by (Rombach et al., 2022) with autoencoders that
yield images of size 64 × 64 × 3, and DDPMs (Ho et al., 2020) also trained on FFHQ and CelebA-
HQ training sets. Then, we sample 100 images from both the FFHQ and CelebA-HQ validation sets
for testing evaluation. For computing quantitative results, all images are normalized to the range
[0, 1]. All experiments had Gaussian measurement noise with standard deviation σy = 0.01. Due to
limited space, we put the results on FFHQ and details of the hyperparameters to the Appendix.
For linear inverse problems, we consider the following tasks: (1) Gaussian deblurring, (2) inpainting
(with a random mask), and (3) super resolution. For Gaussian deblurring, we use a kernel with size
61 × 61 with standard deviation 3.0. For super resolution, we use bicubic downsampling and a
random mask with varying levels of missing pixels for inpainting. For nonlinear inverse problems,
we consider nonlinear deblurring as proposed by (Chung et al., 2023a). The quantitative results are
displayed in Tables 1 and 2, with qualitative results in Figure 3. In Tables 1 and 2, we can see that
ReSample significantly outperforms all of the baselines across all three metrics on the CelebA-HQ
dataset. We also observe that ReSample performs better than or comparable to all baselines on the
FFHQ dataset as shown in the Appendix. Remarkably, our method excels in handling nonlinear
inverse problems, further demonstrating the flexibility of our algorithm. We further demonstrate the
superiority of ReSample for handling nonlinear inverse problems in Figure 6a, where we show that
we can consistently outperform DPS.

*We have updated the baseline results for PSLD. More details are provided in the Appendix (Section A.4).

7
Published as a conference paper at ICLR 2024

Measurement Reference DPS Latent-DPS ReSample Measurement Reference DPS Latent-DPS ReSample

Inpainting (Random 70%)

Super Resolution (x4)
Nonlinear Deblurring

Gaussian Deblurring
Figure 3: Qualitative results of multiple tasks on the LSUN-Bedroom and CelebA-HQ datasets.
All inverse problems have Gaussian measurement noise with variance σy = 0.01.

Figure 4: Qualitative results of CT reconstruction on the LDCT dataset. We annotate the critical
image structures in a red box, and zoom in below the image.

Experiments on Medical Images. For experiments on medical images, we fine-tune LDMs on

2000 2D computed tomography (CT) images with image resolution of 256×256, randomly sampled
from the AAPM LDCT dataset (Moen et al., 2021) of 40 patients, and test on the 300 2D CT
images from the remaining 10 patients. We take the FFHQ LDM-VQ4 model provided by Rombach
et al. (2022) as the pre-trained LDM followed by 100K fine-tuning iterations. Then, we simulate
CT measurements (sinograms) with a parallel-beam geometry using 25 projection angles equally
distributed across 180 degrees using the torch-radon package (Ronchetti, 2020). Following
the natural images, the sinograms were perturbed with additive Gaussian measurement noise with
σy = 0.01. Along with the baselines used in the natural image experiments, we also compare to
the following methods: (1) Filtered Backprojection (FBP), which is a standard CT reconstruction
technique, and (2) FBP-UNet, a supervised method that uses a UNet model as its backbone. For
FBP-Unet and PnP-UNet, we trained a model on 3480 2D CT from the training set. For MCG
and DPS, we used the pre-trained checkpoints provided by Chung et al. (2022). We present visual
reconstructions with highlighted critical anatomic structures in Figure 4 and quantitative results in
Table 3. Here, we observe that our algorithm outperforms all of the baselines in terms of PSNR and
SSIM. Visually, we also observe that our algorithm is able to reconstruct smoother images that have
more accurate and sharper details.

Effectiveness of the Resampling Technique. Here, we validate our theoretical results by con-
ducting ablation studies on stochastic resampling. Specifically, we perform experiments on the
LSUN-Bedroom and CelebA-HQ datasets with tasks of Gaussian deblurring and super-resolution.
As shown in Figure 5, we observe that stochastic resampling reconstructs smoother images with
higher PSNRs compared to stochastic encoding, corroborating our theory.

8
Published as a conference paper at ICLR 2024

Reference ReSample Encoding Reference ReSample Encoding

PSNR: 27.03 PSNR: 25.58 PSNR: 29.87 PSNR: 28.08

PSNR: 21.33 PSNR: 20.32 PSNR: 30.35 PSNR: 27.82

Figure 5: Effectiveness of our resampling technique compared to stochastic encoding. Results

were demonstrated on the LSUN-Bedroom and CelebA-HQ datasets with measurement noise of
σy = 0.05 to highlight the effectiveness of stochastic resample.

Measurement Reference DPS Latent-DPS ReSample

(b) Performance v.s. number of time steps

(a) More examples of nonlinear deblurring to perform resample

Figure 6: Left: Additional results on nonlinear deblurring highlighting the performance of ReSam-
ple. Right: Ablation study on the ReSample frequency on the performance.

Effectiveness of Hard Data Consistency. In Figure 6b, we perform an ablation study on the Re-
Sample frequency on CT reconstruction. This observation is in line with what we expect intuitively,
as more ReSample time steps (i.e., more data consistency) lead to more accurate reconstructions.

Memory Efficiency. To demon- Table 4: Memory Usage of Different Methods for Gaussian
strate memory efficiency, we use the Deblurring on the FFHQ Dataset
command nvidia-smi to moni- Model Algorithm Model Only Memory Increment Total
tor the memory consumption dur- DDPM DPS 1953MB +3416MB (175%) 5369MB
ing solving an inverse problem. We MCG +3421MB (175%) 5374MB
present the memory usage for Gaus- DMPS +5215MB (267 %) 7168MB
DDRM +18833MB (964 %) 20786MB
sian deblurring on the FFHQ dataset LDM PSLD 3969MB +5516MB (140%) 9485MB
in Table 4. Although the entire LDM ReSample +1040MB (26.2%) 5009MB
models occupy more memory due to
the autoencoders, our algorithm it-
self exhibits memory efficiency, resulting in lower overall memory usage. This highlights its po-
tential in domains like medical imaging, where memory plays a crucial role in feasibility.

5 C ONCLUSION

In this paper, we propose ReSample, an algorithm that can effectively leverage LDMs to solve gen-
eral inverse problems. We demonstrated that our algorithm can reconstruct high-quality images
compared to many baselines, including those in the pixel space. One limitation of our method lies
in the computational overhead of hard data consistency, which we leave as a significant challenge
for future work to address and improve upon.

9
Published as a conference paper at ICLR 2024

6 R EPRODUCIBILITY S TATEMENT

To ensure the reproducibility of our results, we thoroughly detail the hyperparameters employed
in our algorithm in the Appendix. Additionally, we provide a comprehensive explanation of the
configuration of all baselines used in our experiments. As we use pre-trained diffusion models
throughout our experiments, they are readily accessible online. Lastly, our code is available at
https://github.com/soominkwon/resample.

7 ACKNOWLEDGEMENTS

BS and LS acknowledges support from U-M MIDAS PODS Grant and U-M MICDE Catalyst Grant,
and computing resource support from NSF ACCESS Program and Google Cloud Research Credits
Program. This work used NCSA Delta GPU through allocation CIS230133 and ELE230011 from
the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) pro-
gram, which is supported by National Science Foundation grants #2138259, #2138286, #2138307,
#2137603, and #2138296. SMK and QQ acknowledge support from U-M START & PODS grants,
NSF CAREER CCF-2143904, NSF CCF-2212066, NSF CCF-2212326, ONR N00014-22-1-2529,
AWS AI Award, and a gift grant from KLA.

R EFERENCES
Manya V. Afonso, José M. Bioucas-Dias, and Mário A. T. Figueiredo. An augmented lagrangian
approach to the constrained optimization formulation of imaging inverse problems. IEEE Trans-
actions on Image Processing, 20(3):681–695, 2011. doi: 10.1109/TIP.2010.2076294.
Rizwan Ahmad, Charles A Bouman, Gregery T Buzzard, Stanley H Chan, Edward T Reehorst, and
Philip Schniter. Plug and play methods for magnetic resonance imaging. 2019.
Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse
problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009. doi: 10.1137/080716542.
URL https://doi.org/10.1137/080716542.
Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis. Compressed sensing using genera-
tive models. In International Conference on Machine Learning, pp. 537–546. PMLR, 2017.
Paul A. Bromiley. Products and convolutions of gaussian probability density functions. 2013.
Hyungjin Chung, Byeongsu Sim, Dohoon Ryu, and Jong Chul Ye. Improving diffusion models for
inverse problems using manifold constraints. arXiv preprint arXiv:2206.00941, 2022.
Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul
Ye. Diffusion posterior sampling for general noisy inverse problems. In The Eleventh Interna-
tional Conference on Learning Representations, 2023a. URL https://openreview.net/
forum?id=OnD9zGAGT0k.
Hyungjin Chung, Suhyeon Lee, and Jong Chul Ye. Fast diffusion sampler for inverse problems by
geometric decomposition. arXiv preprint arXiv:2303.05754, 2023b.
Hyungjin Chung, Jong Chul Ye, Peyman Milanfar, and Mauricio Delbracio. Prompt-tuning latent
diffusion models for inverse problems. arXiv preprint arXiv:2310.01110, 2023c.
Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. arXiv preprint
arXiv:2105.05233, 2021a.
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances
in Neural Information Processing Systems, 34:8780–8794, 2021b.
Bradley Efron. Tweedie’s formula and selection bias. Journal of the American Statistical Asso-
ciation, 106(496):1602–1614, 2011. doi: 10.1198/jasa.2011.tm11181. URL https://doi.
org/10.1198/jasa.2011.tm11181.

10
Published as a conference paper at ICLR 2024

Nic Fishman, Leo Klarner, Valentin De Bortoli, Emile Mathieu, and Michael Hutchinson. Diffusion
models for constrained domains. arXiv preprint arXiv:2304.05364, 2023.
Davis Gilton, Gregory Ongie, and Rebecca Willett. Model adaptation for inverse problems in imag-
ing. IEEE Transactions on Computational Imaging, 7:661–674, 2021.
Harshit Gupta, Kyong Hwan Jin, Ha Q Nguyen, Michael T McCann, and Michael Unser. Cnn-based
projected gradient descent for consistent ct image reconstruction. IEEE transactions on medical
imaging, 37(6):1440–1453, 2018.
Yoseob Han and Jong Chul Ye. Framing u-net via deep convolutional framelets: Application to
sparse-view ct. IEEE transactions on medical imaging, 37(6):1418–1429, 2018.
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in
Neural Information Processing Systems, 33:6840–6851, 2020.
Shady Abu Hussein, Tom Tirer, and Raja Giryes. Image-adaptive gan based reconstruction. In
Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 3121–3129, 2020.
Ajil Jalal, Marius Arvinte, Giannis Daras, Eric Price, Alexandros G Dimakis, and Jon
Tamir. Robust compressed sensing mri with deep generative priors. In M. Ranzato,
A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neu-
ral Information Processing Systems, volume 34, pp. 14938–14954. Curran Associates, Inc.,
2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/
file/7d6044e95a16761171b130dcb476a43e-Paper.pdf.
Kyong Hwan Jin, Michael T McCann, Emmanuel Froustey, and Michael Unser. Deep convolutional
neural network for inverse problems in imaging. IEEE Transactions on Image Processing, 26(9):
4509–4522, 2017.
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-
based generative models. arXiv preprint arXiv:2206.00364, 2022.
Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration
models. arXiv preprint arXiv:2201.11793, 2022.
Vahid Kazemi and Josephine Sullivan. One millisecond face alignment with an ensemble of regres-
sion trees. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–
1874, 2014. doi: 10.1109/CVPR.2014.241.
Victor Lempitsky, Andrea Vedaldi, and Dmitry Ulyanov. Deep image prior. In 2018 IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pp. 9446–9454, 2018. doi: 10.1109/
CVPR.2018.00984.
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir:
Image restoration using swin transformer. In Proceedings of the IEEE/CVF international confer-
ence on computer vision, pp. 1833–1844, 2021.
Guan-Horng Liu, Tianrong Chen, Evangelos A Theodorou, and Molei Tao. Mirror diffusion models
for constrained and watermarked generation. arXiv preprint arXiv:2310.01236, 2023.
Wei Liu, Xin Xia, Lu Xiong, Yishi Lu, Letian Gao, and Zhuoping Yu. Automated vehicle sideslip
angle estimation considering signal measurement characteristic. IEEE Sensors Journal, 21(19):
21675–21687, 2021.
Wei Liu, Karoll Quijano, and Melba M Crawford. Yolov5-tassel: detecting tassels in rgb uav im-
agery with improved yolov5 based on transfer learning. IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing, 15:8085–8094, 2022.
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild.
In Proceedings of International Conference on Computer Vision (ICCV), 2015.
Aaron Lou and Stefano Ermon. Reflected diffusion models. arXiv preprint arXiv:2304.04740,
2023a.

11
Published as a conference paper at ICLR 2024

Aaron Lou and Stefano Ermon. Reflected diffusion models. arXiv preprint arXiv:2304.04740,
2023b.
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A
fast ode solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint
arXiv:2206.00927, 2022.
Xiangming Meng and Yoshiyuki Kabashima. Diffusion model based posterior sampling for noisy
linear inverse problems. arXiv preprint arXiv:2211.12343, 2022.
Taylor R Moen, Baiyu Chen, David R Holmes III, Xinhui Duan, Zhicong Yu, Lifeng Yu, Shuai
Leng, Joel G Fletcher, and Cynthia H McCollough. Low-dose ct image and projection dataset.
Medical physics, 48(2):902–911, 2021.
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models.
In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
Saiprasad Ravishankar, Jong Chul Ye, and Jeffrey A Fessler. Image reconstruction: From sparsity
to data-adaptive methods and machine learning. Proceedings of the IEEE, 108(1):86–109, 2019.
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-
resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
Matteo Ronchetti. Torchradon: Fast differentiable routines for computed tomography. arXiv preprint
arXiv:2009.14788, 2020.
Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alexandros G. Dimakis, and Sanjay
Shakkottai. Solving linear inverse problems provably via posterior sampling with latent diffusion
models. arXiv preprint arXiv:2307.00619, 2023.
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman.
Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Pro-
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22500–
22510, 2023.
Viraj Shah, Rakib Hyder, M. Salman Asif, and Chinmay Hegde. Provably convergent algorithms
for solving inverse problems using generative models. arXiv preprint arXiv:2105.06371, 2021.
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv
preprint arXiv:2010.02502, 2020.
Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion
models for inverse problems. In International Conference on Learning Representations, 2023a.
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben
Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint
arXiv:2011.13456, 2021.
Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. Solving inverse problems in medical imag-
ing with score-based generative models. International Conference on Learning Representations,
2022. URL https://openreview.net/forum?id=vaRCHVj0uGI.
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. arXiv preprint
arXiv:2303.01469, 2023b.
Paul Suetens. Fundamentals of medical imaging. Cambridge university press, 2017.
Arash Vahdat, Karsten Kreis, and Jan Kautz. Score-based generative modeling in latent space.
In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.),
Advances in Neural Information Processing Systems, volume 34, pp. 11287–11302. Curran
Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/
paper/2021/file/5dca4c6b9e244d24a30b4c45601d9720-Paper.pdf.

12
Published as a conference paper at ICLR 2024

Pascal Vincent. A connection between score matching and denoising autoencoders. Neural Compu-
tation, 23(7):1661–1674, 2011. doi: 10.1162/NECO a 00142.
Yinhuai Wang, Jiwen Yu, and Jian Zhang. Zero-shot image restoration using denoising diffusion
null-space model. arXiv preprint arXiv:2212.00490, 2022.
Haoyu Wei, Florian Schiffers, Tobias Würfl, Daming Shen, Daniel Kim, Aggelos K Katsaggelos,
and Oliver Cossairt. 2-step sparse-view ct reconstruction with a domain-specific perceptual net-
work. arXiv preprint arXiv:2012.04743, 2020.
Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun:
Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv
preprint arXiv:1506.03365, 2016.
Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, and Jian Zhang. Freedom: Training-free
energy-guided conditional diffusion model. arXiv preprint arXiv:2303.09833, 2023.
Dan Zhang and Fangfang Zhou. Self-supervised image denoising for real-world images with
context-aware transformer. IEEE Access, 11:14340–14349, 2023.
Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser:
Residual learning of deep cnn for image denoising. IEEE transactions on image processing, 26
(7):3142–3155, 2017.
Bo Zhu, Jeremiah Z Liu, Stephen F Cauley, Bruce R Rosen, and Matthew S Rosen. Image recon-
struction by domain-transform manifold learning. Nature, 555(7697):487–492, 2018.

13
Published as a conference paper at ICLR 2024

Appendix

In this section, we present additional experimental results to supplement those presented in the
main paper. In Section A, we present additional qualitative and quantitative results to highlight the
performance of ReSample. In Section B, we briefly discuss some of the implementation details
regarding hard data consistency. In Section C, we outline all of the hyperparameters used to produce
our results. In Section D, we discuss some more reasons to why Latent-DPS fails to consistently
accurately recover the underlying image. Lastly, in Section F, we present our deferred proofs.

A A DDITIONAL R ESULTS

A.1 R ESULTS ON FFHQ

Here, we provide additional quantitative results on the FFHQ dataset. Similar to the results in the
main text on the CelebA-HQ dataset, ReSample outperforms the baselines across many different
tasks.

Super Resolution 4× Inpainting (Random 70%)

Method
LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑
DPS 0.175 ± 0.04 28.47 ± 2.09 0.793 ± 0.05 0.106 ± 0.03 32.32 ± 2.21 0.897 ± 0.04
MCG 0.223 ± 0.04 23.74 ± 3.18 0.673 ± 0.07 0.178 ± 0.03 24.89 ± 2.57 0.731 ± 0.06
ADMM-PnP 0.303 ± 0.03 21.30 ± 3.98 0.760 ± 0.04 0.308 ± 0.07 15.87 ± 2.09 0.608 ± 0.09
DDRM 0.257 ± 0.03 27.51 ± 1.79 0.753 ± 0.05 0.287 ± 0.03 24.97 ± 1.51 0.680 ± 0.05
DMPS 0.181 ± 0.03 27.21 ± 1.92 0.766 ± 0.06 0.150 ± 0.03 28.17 ± 1.58 0.814 ± 0.04
Latent-DPS 0.344 ± 0.05 24.65 ± 1.77 0.609 ± 0.07 0.270 ± 0.05 27.08 ± 1.97 0.727 ± 0.06
PSLD 0.580 ± 0.05 15.81 ± 1.54 0.183 ± 0.07 0.270 ± 0.06 25.61 ± 2.02 0.630 ± 0.09
ReSample (Ours) 0.164 ± 0.03 28.90 ± 1.86 0.804 ± 0.05 0.099 ± 0.02 31.34 ± 2.11 0.890 ± 0.03

Table 5: Comparison of quantitative results on inverse problems on the FFHQ dataset. Input images
have an additive Gaussian noise with σy = 0.01. Best results are in bold and second best results are
underlined.

Nonlinear Deblurring Gaussian Deblurring

Method
LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑
DPS 0.197 ± 0.06 27.13 ± 3.11 0.762 ± 0.08 0.169 ± 0.04 27.70 ± 2.04 0.774 ± 0.05
MCG - - - 0.371 ± 0.04 25.33 ± 1.74 0.668 ± 0.06
ADMM-PnP 0.424 ± 0.03 21.80 ± 1.42 0.497 ± 0.05 0.399 ± 0.03 21.23 ± 3.01 0.675 ± 0.06
DDRM - - - 0.299 ± 0.039 26.51 ± 1.67 0.702 ± 0.06
DMPS - - - 0.227 ± 0.036 26.04 ± 1.75 0.699 ± 0.066
Latent-DPS 0.319 ± 0.05 24.52 ± 1.75 0.637 ± 0.07 0.258 ± 0.05 25.98 ± 1.66 0.704 ± 0.06
PSLD - - - 0.422 ± 0.11 20.08 ± 2.97 0.400 ± 0.16
ReSample (Ours) 0.171 ± 0.04 30.03 ± 1.78 0.838 ± 0.04 0.201 ± 0.04 28.73 ± 1.87 0.794 ± 0.05

Table 6: Comparison of quantitative results on inverse problems on the FFHQ dataset. Input images
have an additive Gaussian noise with σy = 0.01. Best results are in bold and second best results are
underlined.
We observe that ReSample outperforms all baselines in nonlinear deblurring and super resolution,
while demonstrating comparable performance to DPS (Chung et al., 2023a) in Gaussian deblurring
and inpainting. An interesting observation is that ReSample exhibits the largest performance gap
from DPS in nonlinear deblurring and the smallest performance gap in random inpainting, mirroring
the pattern we observed in the results of the CelebA-HQ dataset. This may suggest that ReSample
performs better than baselines when the forward operator is more complex.

A.2 A DDITIONAL R ESULTS ON B OX I NPAINTING

In this section, we present both qualitative and quantitative results on a more challenging image
inpainting setting. More specifically, we consider the “box” inpainting setting, where the goal is to

14
Published as a conference paper at ICLR 2024

recover a ground truth image in which large regions of pixels are missing. We present our results
in Figure 7 and Table 7, where we compare the performance of our algorithm to PSLD (Rout et al.,
2023) (latent-space algorithm) and DPS (Chung et al., 2023a) (pixel-space algorithm). Even in
this challenging setting, we observe that ReSample can outperform the baselines, highlighting the
effectiveness of our method.
Box Inpainting
Method
LPIPS↓ PSNR↑ SSIM↑
DPS 0.127 22.85 0.861
DDRM 0.120 24.33 0.860
DMPS 0.233 22.01 0.803
Latent-DPS 0.199 23.14 0.784
PSLD 0.201 23.99 0.787
ReSample (Ours) 0.093 24.67 0.892

Table 7: Comparison of quantitative results for box inpainting on CelebA-HQ dataset. Input images
have an additive Gaussian noise with σy = 0.01. Best results are in bold and second best results are
underlined.

A.3 A DDITIONAL P RELIMINARY R ESULTS ON H IGH -R ESOLUTION I MAGES

In this section, we present techniques for extending our algorithm to address inverse problems
with high-resolution images. One straightforward method involves using arbitrary-resolution ran-
dom noise as the initial input for the LDM. The convolutional architecture shared by the UNet in
both the LDM and the autoencoder enables the generation of images at arbitrary resolutions, as
demonstrated by Rombach et al. (2022). To illustrate the validity of this approach, we conducted a
random-inpainting experiment with images of dimensions (512 × 512 × 3) and report the results in
Figure 8. In Figure 8, we observe that our method can achieve excellent performance even in this
high-resolution setting. Other possible methods include obtaining an accurate pair of encoders and
decoders that can effectively map higher-resolution images to the low-dimensional space, and then
running our algorithm using the latent diffusion model. We believe that these preliminary results
represent an important step towards using LDMs as generative priors for solving inverse problems
for high-resolution images.
Measurement Reference ReSample PSLD DPS

PSNR: 22.79 PSNR: 17.57 PSNR: 17.84

LPIPS: 0.132 LPIPS: 0.316 LPIPS: 0.151

PSNR: 25.37 PSNR: 24.79 PSNR: 19.60

LPIPS: 0.128 LPIPS: 0.100 LPIPS: 0.144

Figure 7: Qualitative results on box inpainting with measurement noise σy = 0.01. These results
highlight the effectiveness of ReSample on more difficult inverse problem tasks.

A.4 D ISCUSSION ON PSLD BASELINE

To better reflect the performance of PSLD (Rout et al., 2023), we re-ran several experiments with
PSLD while collaborating with the original authors to fine-tune hyperparameters. Furthermore,

15
Published as a conference paper at ICLR 2024

Measurement Reference ReSample

Figure 8: Additional results on 70% random inpainting with high-resolution images (512 × 512 × 3)
with measurement noise σy = 0.01.

as the work by Rout et al. (2023) initially employed the stable diffusion (SD) model rather than the
LDM model, we have relabeled the baselines as “PSLD-LDM”. We report the hyperparameters used
to generate the new results, as well as the ones used to generate the previous results in Table 8. The
previous hyperparameters were chosen based on the implementation details provided by Rout et al.
(2023) in Section B.1, where they stated that they used γ = 0.1 and that the hyperparameters were
available in the codebase. However, it has been brought to our attention that this was not the optimal
hyperparameter. Therefore, we additionally tuned the parameter so that each baseline could obtain
the best results. We have also conducted additional experiments directly comparing to PSLD-LDM
for inpainting tasks on the FFHQ 1K validation set and present the results in Table 9. Throughout
these results, we still observe that our algorithm largely outperforms the baselines, including PSLD.

Hyperparameter (Rout et al., 2023) SR (FFHQ) SR (CelebA) Gaussian Deblur (CelebA)

Previous (γ) 0.1 0.1 0.1
New (γ) 1.0 1.1 0.15

Table 8: Different hyperparameters used to generate PSLD results. SR refers to super resolution
(4×), “previous” and “new” refer to the hyperparameter γ used to generate the old and new results,
respectively.

Inpainting (Random) Inpainting (Box)

Method
LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑
DPS 0.212 29.49 0.844 0.214 23.39 0.798
PSLD-LDM 0.221 30.31 0.851 0.158 24.22 0.819
ReSample (Ours) 0.135 31.64 0.906 0.128 23.81 0.868

Table 9: Comparison of quantitative results for different inpainting tasks on the FFHQ 1k validation
set. Best results are in bold and second best results are underlined.

16
Published as a conference paper at ICLR 2024

A.5 A DDITIONAL Q UALITATIVE R ESULTS

Measurement Reference DPS ReSample Measurement Reference DPS ReSample

Figure 9: Additional results on super resolution (4×) on the CelebA-HQ dataset with Gaussian
measurement noise of variance σy = 0.01.

Measurement Reference DPS ReSample Measurement Reference DPS ReSample

Figure 10: Additional results on Gaussian deblurring on the CelebA-HQ dataset with Gaussian
measurement noise of variance σy = 0.01.

17
Published as a conference paper at ICLR 2024

Measurement Reference DPS ReSample Measurement Reference DPS ReSample

Figure 11: Additional results on inpainting with a random mask (70%) on the LSUN-Bedroom
dataset with Gaussian measurement noise of variance σy = 0.01.

Measurement Reference DDRM DMPS DPS Latent-DPS ReSample

Figure 12: Comparison of algorithms on Gaussian deblurring on the FFHQ dataset with Gaussian
measurement noise of variance σy = 0.01.

18
Published as a conference paper at ICLR 2024

Measurement Reference Latent-DPS ReSample Measurement Reference Latent-DPS ReSample

Figure 13: Additional results on Gaussian deblurring with additive Gaussian noise σy = 0.05.

Measurement Reference Latent-DPS ReSample Measurement Reference Latent-DPS ReSample

Figure 14: Additional results on super resolution 4× with additive Gaussian noise σy = 0.05.

19
Published as a conference paper at ICLR 2024

Figure 15: Additional results on CT reconstruction with additive Gaussian noise σy = 0.01.
20
Published as a conference paper at ICLR 2024

A.6 A BLATION S TUDIES

Here, we present some ablation studies regarding computational efficiency and stochastic resam-
pling, amongst others.

PSNR

PSNR
Strength of Hard Data Consistency Term # of Timesteps for Optimization and ReSample

Figure 16: An ablation study on a few of the hyperparameters associated with ReSample. Left:
Performance of different values of γ (hyperparameter in stochastic resampling) on the CelebA-HQ
dataset. Right: CT reconstruction performance as a function of the number of timesteps to perform
optimization.

Effectiveness of Stochastic Resampling. Recall that in stochastic resampling, we have one hy-
perparameter σt2 . In Section C, we discuss that our choice for this hyperparameter is

1 − ᾱt−1 ᾱt
σt2 = γ 1− ,
ᾱt ᾱt−1
where ᾱ is an adaptive parameter associated to the diffusion model process. Thus, the only param-
eter that we need to choose here is γ. To this end, we perform a study on how γ affects the image
reconstruction quality. The γ term can be interpreted as a parameter that balances the prior consis-
tency with the measurement consistency. In Figure 16 (left) observe that performance increases a
lot when γ increases from a small value, but plateaus afterwards. We also observe that the larger γ
gives more fine details on the images, but may introduce additional noise. In Figure 17, we provide
visual representations corresponding to the choices of γ in order.

Figure 17: Visual representation of varying γ on the CelebA-HQ dataset.

Effect of the Skip Step Size. In this section, we conduct an ablation study to investigate the im-
pact of the skip step size in our algorithm. The skip step size denotes the frequency at which we
’skip’ before applying the next hard data consistency. For instance, a skip step size of 10 implies
that we apply hard data consistency every 10 iterations of the reverse sampling process. Intuitively,
one might expect that fewer skip steps would lead to better reconstructions, indicating more frequent
hard data consistency steps. To validate this intuition, we conducted an experiment on the effect of
the skip step size on CT reconstruction, and the results are displayed in Figure 18, with correspond-
ing inference times in Table 16. We observe that for skip step sizes ranging from 1 to 10, the results
are very similar. However, considering the significantly reduced time required for a skip step size

21
Published as a conference paper at ICLR 2024

of 10 as shown in Table 16, we chose a skip step size of 10 in our experimental setting in order to
balance the trade-off between reconstruction quality and inference time. Lastly, we would like to
note that in Figure 18, a skip step size of 4 exhibits a (very) slight improvement over a skip step
of 1. Due to the reverse sampling process initiating with a random noise vector, there is a minor
variability in the results, with the average outcomes for skip step sizes of 1 and 4 being very similar.

Algorithm Skip Step Size Inference Time PSNR SSIM

ReSample (Ours) 1 31:20 31.74 0.922
4 8:24 31.74 0.923
10 3:52 31.72 0.922
20 2:16 31.23 0.918
50 1:20 30.86 0.915
100 1:08 30.09 0.906
200 0:54 28.74 0.869

Table 10: Inference times and corresponding performance for performing hard data consistency on
with varying skip step sizes for chest CT reconstruction.

Reference 200 100 20 10 4 1

PSNR: 27.63 PSNR: 29.95 PSNR: 31.30 PSNR: 31.73 PSNR: 31.80 PSNR: 31.79
SSIM: 0.869 SSIM: 0.897 SSIM: 0.918 SSIM: 0.922 SSIM: 0.923 SSIM: 0.922

Figure 18: Observing the effect of the skip step size of hard data consistency optimization on CT
reconstruction. The numbers above the image correspond to the number of hard data consistency
steps per iteration of the reverse sampling process (e.g., 1 refers to hard data consistency on every
step), and the red boxes outline the differences in reconstructions.

Computational Efficiency. Here, we present the memory usage

and inference times of ReSample compared to our baselines in Ta- Table 11: Average inference
bles 11 and 12. We observe that the memory gap between ReSample of different algorithms for
and other algorithms widens when the pre-trained diffusion model Gaussian Deblurring
increases in size, as the hard data consistency step of ReSample re-
quires minimal memory. Conversely, since we only apply hard data Algorithm
DPS
Inference Time
1:12
consistency in a subset of all sampling steps (which differs from ex- MCG 1:20
DMPS 1:25
isting methods that perform consistency updates at every time step), DDRM 10 seconds
we note a slight increase in inference time compared to algorithms ReSample (Ours) 2:11
PSLD
1:54
such as DPS, DMPS, and MCG. ReSample outperforms PSLD in
terms of inference time.

Discussion on Latent-DPS. In
the main text, we briefly dis- Table 12: Memory Usage of different algorithms with differ-
cussed how adding the Latent- ent pretrained models for Gaussian Deblurring on FFHQ256 and
DPS gradient term into our Re- ImageNet512 datasets
Sample algorithm can improve
the overall reconstruction qual- Pretrained Model Algorithm Model Only Memory Increment Total Memory
1953MB
ity. Generally, we observe that DDPM(FFHQ) DPS MCG
+3416MB (175%) 5369MB
+3421MB (175%) 5374MB
adding the Latent-DPS gradient DMPS +5215MB (267 %) 7168MB
DDRM +18833MB (964 %) 20786MB
will cause a very marginal boost LDM(FFHQ) PSLD 3969MB +5516MB (140%) 9485MB
in the PSNR, but only when ReSample (Ours) +1040MB (26.2%) 5009MB
4394MB
the learning rate scale is chosen DDPM(ImageNet) DPS
MCG
+6637MB (151%) 11031MB
+6637MB (151%) 11031MB
“correctly”. Interestingly, we ob- DMPS +8731MB (199 %) 13125MB
DDRM +4530MB (103 %) 8924MB
serve that choosing ᾱt is criti- LDM(ImageNet) PSLD 5669MB +5943MB (105%) 11612MB
cal in the performance of Latent- ReSample (Ours) +1322MB (30.1%) 7002MB
DPS. To this end, we perform a

22
Published as a conference paper at ICLR 2024

brief ablation study on the learning rate for Latent-DPS, where we choose the learning rate to be
k ᾱt for some k > 0. We vary k and test the performance on 50 images of chest CT images and
display the results in Table 13. In Table 13, we observe that k = 0.5 returns the best results, and
should be chosen if one were to adopt the method of adding Latent-DPS into ReSample.

k 0 0.5 1.0 1.5 2.0 2.5

PSNR 31.64 31.73 31.58 31.51 31.27 31.04
Measurement Loss 9.82e − 5 1.21e − 4 5.03e − 4 1.35e − 3 1.96e − 3 2.63e − 3

Table 13: The effect of the Latent-DPS gradient scale (learning rate) as a function of k > 0. We
study the PSNR and measurement loss (objective function) changes for varying k for chest CT
reconstruction on 50 test samples. The best results are in bold. Note that k = 0 here refers to no
Latent-DPS and only using ReSample.

Training Efficiency of LDMs. To underscore the significance of our work, we conducted an abla-
tion study focusing on the training efficiency of LDMs. This study demonstrates that LDMs require
significantly fewer computational resources for training, a trait particularly beneficial for down-
stream applications like medical imaging. To support this assertion, we present the training time
and performance of LDMs, comparing them to DDPMs (Ho et al., 2020) in the context of medical
image synthesis. For LDMs, we utilized the pretrained autoencoder and LDM architecture provided
by Rombach et al. (2022), training them on 9000 2D CT slices from three organs across 40 patients
sourced from the LDCT training set Moen et al. (2021). In the case of DDPMs, we employed the
codebase provided by Nichol & Dhariwal (2021) to train pixel-based diffusion models on the same
set of 2D CT training slices. The results are presented in comparison with 300 2D slices from 10
patients in the validation set, detailed in Table 14. Observing the results in Table 14, we note that by
utilizing LDMs, we can significantly reduce training time and memory usage while achieving better
generation quality, as measured by the FID score.

Model type Training Time Training Memory FID

LDM (Ours) 133 mins 7772MB 100.48
DDPM 229 mins 10417MB 119.50

Table 14: Comparison of training efficiency and performance of LDMs and DDPMs on CT images,
computed on a V100 GPU.

B D ISCUSSION ON H ARD DATA C ONSISTENCY

Recall that the hard data consistency step involves solving the following optimization problem:
1
ẑ0 (y) ∈ arg min ∥y − A(D(z))∥22 , (14)
z 2
where we initialize using ẑ0 (zt ). Instead of solving for the latent variable z directly, notice that one
possible technique would be to instead solve for vanilla least squares:
1
x̂0 (y) ∈ arg min ∥y − A(x)∥22 , (15)
x 2
where we instead initialize using D(ẑ0 (zt )), where D(·) denotes the decoder. Similarly, our goal
here is to find the x̂0 (y) that is close to D(ẑ0 (zt ))that satisfies the measurement consistency: ∥y −
A(x)∥22 < σ 2 , where σ 2 is an estimated noise level. Throughout the rest of the Appendix, we refer
to the former optimization process as latent optimization and the latter as pixel optimization.
In our experiments, we actually found that these two different formulations yield different results,
in the sense that performing latent optimization gives reconstructions that are “noisy”, yet much
sharper with fine details, whereas pixel optimization gives results that are “smoother” yet blurry
with high-level semantic information. Here, the intuition is that pixel optimization does not directly
change the latent variable where as latent optimization directly optimizes over the latent variable.

23
Published as a conference paper at ICLR 2024

Moreover, the encoder E(·) can add additional errors to the estimated ẑ0 (y), perhaps throwing the
sample off the data manifold, yielding images that are blurry as a result.
There is also a significant difference in time complexity between these two methods. Since latent
optimization needs to backpropagate through the whole network of the decoder D(·) on every gradi-
ent step, it takes much longer to obtain a local minimum (or converge). Empirically, to balance the
trade-off between reconstruction speed and image quality, we see that using both of these formula-
tions for hard data consistency can not only yield the best results, but also speed up the optimization
process. Since pixel optimization is easy to get a global optimum, we use it first during the reverse
sampling process and then use latent optimization when we are closer to t = 0 to refine the images
with the finer details.
Lastly, we would like to remark that for pixel optimization, there is a closed-form solution that could
be leveraged under specific settings Wang et al. (2022). If the forward operator A is linear and can
take the matrix form A and the measurements are noiseless (i.e., y = A(x̂0 (y))), then we can pose
the optimization problem as
1
x̂0 (y) ∈ arg min ∥D(ẑ0 (zt )) − x∥22 , s.t. Ax = y. (16)
x 2
Then, the solution to this optimization problem is given by
x̂0 (y) = D(ẑ0 (zt )) − (A+ AD(ẑ0 (zt )) − A+ y), (17)
where by employing the encoder, we obtain
ẑ0 (y) = E(x̂0 (y)) = E(D(ẑ0 (zt )) − (A+ AD(ẑ0 (zt )) − A+ y)). (18)
This optimization technique does not require iterative solvers and offers great computational effi-
ciency. In the following section, we provide ways in which we can compute a closed-form solution
in the case in which the measurements may be noisy.

B.1 ACCELERATED P IXEL O PTIMIZATION BY C ONJUGATE G RADIENT

Notice that since pixel optimization directly operates in the pixel space, we can use solvers such
as conjugate gradients least squares for linear inverse problems. Let A be the matrix form of the
linear operator A(·). Then, as discussed in the previous subsection, the solution to the optimization
problem in the noiseless setting is given by
x̂ = x0 − (A+ Ax0 − A+ y), (19)

where A+ = A⊤ (AA⊤ )−1 and (AA⊤ )−1 can be implemented by conjugate gradients. In the
presence of measurement noise, we can relax this solution to
x̂ = x0 − κ(A+ Ax0 − A+ y), (20)
κ ∈ (0, 1) is a hyperparameter that can reduce the impact between the noisy component of the
measurements and x0 (the initial image before optimization, for which we use D(ẑ0 (zt ))).
We use this technique for CT reconstruction, where the forward operator A is the radon transform
and A⊤ is the non-filtered back projection. However, we can use this conjugate gradient technique
for any linear inverse problem where the matrix A is available.

C I MPLEMENTATION D ETAILS
In this section, we discuss the choices of the hyperparameters used in all of our experiments for our
algorithm. All experiments are implemented in PyTorch on NVIDIA GPUs (A100 and A40).

C.1 L INEAR AND N ONLINEAR I NVERSE P ROBLEMS ON NATURAL I MAGES

For organizational purposes, we tabulate all of the hyperparameters associated to ReSample in Ta-
ble 15. The parameter for the number of times to perform hard data consistency is not included in
the table, as we did not have any explicit notation for it. For experiments across all natural images

24
Published as a conference paper at ICLR 2024

Notation Definition
τ Hyperparameter for early stopping in hard data consistency
σt Variance scheduling for the resampling technique
T Number of DDIM or DDPM steps

Table 15: Summary of the hyperparameters with their respective notations for ReSample.

datasets (LSUN-Bedroom, FFHQ, CelebA-HQ), we used the same hyperparameters as they seemed
to all empirically give the best results.
For T , we used T = 500 DDIM steps. For hard data consistency, we first split T into three even
sub-intervals. The first stage refers to the sub-interval closest to t = T and the third stage refers to
the interval closest to t = 0. During the second stage, we performed pixel optimization, whereas in
the third stage we performed latent optimization for hard data consistency as described in Section B.
We performed this optimization on every 10 iterations of t.
We set τ = 10−4 , which seemed to give us the best results for noisy inverse problems, with a maxi-
mum number of iterations of 2000 for pixel optimization and 500 for latent optimization (whichever
convergence criteria came first). For the variance hyperparameter σt in the stochastic resample step,
we chose an adaptive schedule of

2 1 − ᾱt−1 ᾱt
σt = γ 1− ,
ᾱt ᾱt−1
as discussed in Section A. Generally, we see that γ = 40 returns the best results for experiments on
natural images.

C.2 L INEAR I NVERSE P ROBLEMS ON M EDICAL I MAGES

Since the LDMs for medical images is not as readily available as compared to natural images, we
largely had to fine-tune existing models. We discuss these in more detail in this section.
Backbone Models. For the backbone latent diffusion model, we use the pre-trained model from
latent diffusion (Rombach et al., 2022). We select the VQ-4 autoencoder and the FFHQ-LDM with
CelebA-LDM as our backbone LDM.For inferencing, upon taking pre-trained checkpoints provided
by Rombach et al. (2022), we fine-tuned the models on 2000 CT images with 100K iterations and a
learning rate of 10−5 .

Inferencing. For T , we used a total of T = 1000 DDIM steps. For hard data consistency, we
split T into three sub-intervals: t > 750, 300 < t ≤ 750, and t ≤ 300. During the second
stage, we performed pixel optimization by using conjugate gradients as discussed previously, with
50 iterations with κ = 0.9. In the third stage, we performed latent optimization with τ as the
estimated noise level τ = 10−4 . We set skip step size to be 10 and γ = 40, with σt as the same as
the experiments for the natural images.

C.3 I MPLEMENTATIONS OF BASELINES

Latent-DPS. For the Latent-DPS baseline, we use T = 1000 DDIM steps for CT reconstruction
and T = 500 DDIM steps for natural image experiments. Let ζt denote the learning rate. For
medical images, we use ζt = 2.5ᾱt and ζt = 0.5ᾱt for natural images. Empirically, we observe
that our proposed ζt step size schedule gives the best performance and is robust to scale change, as
previously discussed.

DPS and MCG. For DPS, we use the original DPS codebase provided by Chung et al. (2023a)
and pre-trained models trained on CelebA and FFHQ training sets for natural images. For medical
images, we use the pretrained checkpoint from Chung et al. (2022) on the dataset provided by
Moen et al. (2021). For MCG (Chung et al., 2022), we modified the MCG codebase by deleting
the projection term tuning the gradient term for running DPS experiments on CT reconstruction.
Otherwise, we directly used the codes provided by Chung et al. (2022) for both natural and medical
image experiments.

25
Published as a conference paper at ICLR 2024

DDRM. For DDRM, we follow the original code provided by Kawar et al. (2022) withDDPM
models trained on FFHQ and CelebA training sets adopted from the repository provided by Dhariwal
& Nichol (2021b). We use the default parameters as displayed by Kawar et al. (2022).

DMPS. We follow the original code from the repository of Meng & Kabashima (2022) with the
DDPM models trained on FFHQ and CelebA training sets adopted from the repository of Dhariwal
& Nichol (2021b). We use the default parameters as displaye d by Meng & Kabashima (2022).

PSLD. We follow the original code from the repo Shah et al. (2021) with the pretrained LDMs on
CelebA and FFHQ datasets provided by Rombach et al. (2022). We use the default hyperparameters
as implied in Shah et al. (2021).

ADMM-PnP and Other (Supervised) Baselines. For ADMM-PnP we use the 10 iterations with
τ tuned for different inverse problems. We use τ = 5 for CT reconstruction, τ = 0.1 for linear in-
verse problems, and τ = 0.075 for nonlinear deblurring. We use the pre-trained model from original
DnCNN repository provided by Zhang et al. (2017). We observe that ADMM-PnP tends to over-
smooth the images with more iterations, which causes performance degradation. For FBP-UNet, we
trained a UNet that maps FBP images to ground truth images. The UNet network architecture is the
same as the one explained by Jin et al. (2017).

26
Published as a conference paper at ICLR 2024

D M ORE D ISCUSSION ON THE FAILURE C ASES OF L ATENT-DPS

In this section, we provide a further explanation to which why Latent-DPS often fails to give accurate
reconstructions.

D.1 FAILURE OF M EASUREMENT C ONSISTENCY

Previously, we claimed that by using Latent-DPS, it is likely that we converge to a local minimum
of the function ∥y − A(D(ẑ0 ))∥22 and hence cannot achieve accurate measurement consistency.
Here, we validate this claim by comparing the measurement consistency loss between Latent-DPS
and ReSample. We observe that ReSample is able to achieve better measurement consistency than
Latent-DPS. This observation validates our motivation of using hard data consistency to improve
reconstruction quality.

Anatomical Site ReSample (Ours) Latent-DPS

Chest 5.36e-5 2.99e-3
Abdominal 6.28e-5 3.92e-3
Head 1.73e-5 2.03e-3

Table 16: Comparison of average measurement consistency loss for CT reconstruction between
ReSample and Latent-DPS

D.2 FAILURE OF L INEAR M ANIFOLD A SSUMPTION

We hypothesize that one reason that Latent-DPS fails to give accurate reconstructions could due to
the nonlinearity of the decoder D(·). More specifically, the derivation of DPS provided by Chung
et al. (2023a) relies on a linear manifold assumption. For our case, since our forward model can be
viewed as the form A(D(·)), where the decoder D(·) is a highly nonlinear neural network, this linear
manifold assumption fails to hold. For example, if two images z (1) and z (2) lie on the clean data
distribution manifold M at t = 0, a linear combination of them az (1) + bz (2) for some constants
a and b, may not belong to M since D(az (1) + bz (2) ) may not give us an image that is realistic.
Thus, in practice, we observe that DPS reconstructions tend to be more blurry, which implies that
the reverse sampling path falls out of this data manifold. We demonstrate this in Figure 19, where
we show that the average of two latent vectors gives a blurry and unrealistic image.
We would like to point out that this reasoning may also explain why our algorithm outperforms DPS
on nonlinear inverse tasks.

Figure 19: Depiction of the violation of the linear manifold assumption.

D.3 I NACCURATE E STIMATES OF THE P OSTERIOR M EAN

In this section, we discuss how inaccurate estimates of ẑ0 (zt ) (i.e., the posterior mean via Tweedie’s
formula) can be one of the reasons why Latent-DPS returns image reconstructions that are noisy.
This was mainly because for values of t closer to t = T , the estimate of ẑ0 (zt ) may be inaccurate,
leading us to images that are noisy at t = 0. Generally, we find that the estimation of ẑ0 (zt ) is

27
Published as a conference paper at ICLR 2024

inaccurate in the early timesteps (e.g. t > 0.5T ) and vary a lot when t decreases. This implies that
the gradient update may not point to a consistent direction when t is large. We further demonstrate
this observation in Figure 20.

Figure 20: Comparison of the prediction of the ground truth signal ẑ0 (zt ) for different values of t.
Left: ẑ0 (zt ) when t = 0.5T . Right: ẑ0 (zt ) when t = 0. This serves to show the estimation error of
the posterior mean for large values of t (i.e. when t is closer to pure noise).

E R ELATED W ORKS
Deep neural networks have been extensively employed as priors for solving inverse problems (Han &
Ye, 2018; Bora et al., 2017; Zhu et al., 2018; Gupta et al., 2018). Numerous works focus on learning
the mapping between measurements and clean images, which we term as supervised methods (Han
& Ye, 2018; Zhu et al., 2018; Wei et al., 2020; Liang et al., 2021). Supervised methods necessitate
training on pairs of measurements and clean images, requiring model retraining for each new task.
On the other hand, another line of research aims at learning the prior distribution of ground truth
images, solving inverse problems at inference time without retraining by using the pre-trained prior
distributions (Bora et al., 2017; Jalal et al., 2021; Hussein et al., 2020; Lempitsky et al., 2018). We
categorize this as unsupervised methods.
Until the advent of diffusion models, unsupervised methods struggled to achieve satisfactory re-
construction quality compared to supervised methods (Jalal et al., 2021). These methods rely on
optimization within a constrained space (Bora et al., 2017), while constructing a space that accu-
rately encodes the ground truth data distribution to solve the optimization problem is challenging.
However, with the accurate approximation of the prior distribution now provided by diffusion mod-
els, unsupervised methods can outperform supervised methods for solving inverse problems more
efficiently without retraining the model (Song et al., 2021; Jalal et al., 2021).
For unsupervised approaches using diffusion models as priors, the plug-and-play approach has been
widely applied for solving linear inverse problems (Song et al., 2021; Kawar et al., 2022; Wang et al.,
2022). These methods inject the measurement on the noisy manifold or incorporate an estimate of
the ground truth image into the reverse sampling procedure. We refer to these methods as hard
data consistency approaches, as they directly inject the measurements into the reverse sampling pro-
cess. However, to the best of our knowledge, only a few of these methods extend to nonlinear inverse
problems or even latent diffusion models (Rombach et al., 2022). Other approaches focus on approx-
imating the conditional score (Dhariwal & Nichol, 2021b; Chung et al., 2023a; 2022) under some
mild assumptions using gradient methods. While these methods can achieve excellent performance
and be extended to nonlinear inverse problems, measurement consistency may be compromised, as
demonstrated in our paper. Since these methods apply data consistency only through a gradient in
the reverse sampling process, we refer to these methods as soft data consistency approaches.
Recently, there has been a growing interest in modeling the data distribution by diffusion models
in a latent space or a constraint domain Liu et al. (2023); Rombach et al. (2022); Lou & Ermon
(2023b); Fishman et al. (2023). There has also been a growing interest in solving inverse problems
using latent diffusion models. In particular, Rout et al. (2023) proposed PSLD, an unsupervised soft
approach that adds a gradient term at each reverse sampling step to solve linear inverse problems
with LDMs. However, there are works that observe that this approach may suffer from instability
due to estimating the gradient term Chung et al. (2023c). Our work proposes an unsupervised hard
approach that involves “resampling” and enforcing hard data consistency for solving general inverse
problems (both linear and nonlinear) using latent diffusion models. By using a hard data consistency
approach, we can obtain much better reconstructions, highlighting the effectiveness of our algorithm.

28
Published as a conference paper at ICLR 2024

F D EFERRED P ROOFS FOR R E S AMPLE

Here, we provide the proofs regarding the theory behind our resampling technique. To make this sec-
tion self-contained, we first restate all of our results and discuss notation that will be used throughout
this section.

Notation. We denote scalars with under-case letters (e.g. α) and vectors with bold under-case
letters (e.g. x). Recall that in the main body of the paper, z ∈ Rk denotes a sample in the
latent space, zt′ denotes an unconditional sample at time step t, ẑ0 (zt ) denotes a prediction
of the ground truth signal z0 at time step t, ẑ0 (y) denotes the measurement-consistent sample
of ẑ0 (zt ) using hard data consistency, and ẑt denotes the re-mapped sample from ẑ0 (y) onto
the data manifold at time step t. We use ẑt as the next sample to resume the reverse diffusion process.

Proposition 1 (Stochastic Encoding). Since the sample ẑt given ẑ0 (y) and measurement y is con-
ditionally independent of y, we have that
√
p(ẑt |ẑ0 (y), y) = p(ẑt |ẑ0 (y)) = N ( ᾱt ẑ0 (y), (1 − ᾱt )I). (21)

Proof. By Tweedie’s formula, we have that ẑ0 (y) is the estimated mean of the ground truth signal
z0 . By the forward process of the DDPM formulation Ho et al. (2020), we also have that
√
p(ẑt |ẑ0 (y)) = N ( ᾱt ẑ0 (y), (1 − ᾱt )I).
Then, since y is a measurement of z0 at t = 0, we have p(y|ẑ0 (y), ẑt ) = p(y|ẑ0 (y)). Finally, we
get
p(y|ẑt , ẑ0 (y))p(ẑt |ẑ0 (y))p(ẑ0 (y))
p(ẑt |ẑ0 (y), y) = (22)
p(y, ẑ0 (y))
p(y, ẑ0 (y))p(ẑt |ẑ0 (y))
= (23)
p(y, ẑ0 (y))
= p(ẑt |ẑ0 (y)). (24)

Proposition 2 (Stochastic Resampling). Suppose that p(zt′ |ẑt , ẑ0 (y), y) is normally distributed
such that p(zt′ |ẑt , ẑ0 (y), y) = N (µt , σt2 ). If we let p(zˆt |ẑ0 (y), y) be a prior for µt , then the
posterior distribution p(ẑt |zt′ , ẑ0 (y), y) is given by
2√
σt ᾱt ẑ0 (y) + (1 − ᾱt )zt′ σt2 (1 − ᾱt )

′
p(ẑt |zt , ẑ0 (y), y) = N , 2 I . (25)
σt2 + (1 − ᾱt ) σt + (1 − ᾱt )

Proof. We have that

p(zt′ |ẑt , ẑ0 , y)p(zˆt |ẑ0 , y)p(ẑ0 , y)
p(ẑt |zt′ , ẑ0 , y) = , (26)
p(zt′ , ẑ0 , y)
where both p(zt′ ,√ ẑ0 , y) and (ẑ0 , y) are normalizing constants. Then by Lemma 1, we have
p(zˆt |ẑ0 , y) = N ( ᾱt ẑ0 , (1 − ᾱt )I). Now, we can compute the posterior distribution:
p(ẑt = k|zt′ , ẑ0 , y) ∝ p(zt′ |ẑt = k, ẑ0 , y)p(ẑt = k|ẑ0 , y) (27)
√
(k − zt′ )2 (k − ᾱt ẑ0 )2

∝ exp − + (28)
σt2 1 − ᾱt
′ √
 2
2 
 − k − (1−ᾱσt2)z
 t + ᾱt ẑ0 σt 

t +(1−ᾱt )
∝ exp 1 . (29)

 σt2
+ 1−1ᾱt 


This as a Gaussian distribution, which can be easily shown using moment-generating func-
tions Bromiley (2013).

29
Published as a conference paper at ICLR 2024

Theorem 1. If ẑ0 (y) is measurement-consistent such that y = A(D(ẑ0 (y))), i.e. ẑ0 = ẑ0 (zt+1 ) =
ẑ0 (y), then stochastic resample is unbiased such that E[ẑt |y] = E[zt′ ].

Proof. We have that E[ẑt |y] = Ezt′ [Eẑ0 (y) [Eẑt [ẑt |zt′ , ẑ0 (y), y]]].
σt2 √
Let γ = σ2 +1− ᾱt
. By using Proposition 2, we have E[ẑt |y] = Ezt′ [Eẑ0 [(γ ᾱt ẑ0 + (1 −
t
γ)zt′ )|ẑ0 , zt′ , y]].
Since ẑ0 is measurement-consistent such that y = A(D(ẑ0 (y))), let k = −1, we have
1
ẑ0 (y) = ẑ0 = √ (z ′ + (1 − ᾱt−k )∇ log p(zt−k
′
)) (30)
ᾱt−k t−k
Then,we have that
√
E[ẑt |y] = Ezt′ [Eẑ0 [(γ ᾱt ẑ0 + (1 − γ)zt′ )|ẑ0 , zt′ , y]] (31)
r
ᾱt ′ ′
=γ Ez′ [E[zt−k + (1 − ᾱt−k )∇ log p(zt−k )|zt′ ]] + (1 − γ)E[zt′ ], (32)
ᾱt−k t
′
as both zt′ and zt−k are unconditional samples and independent of y. Now, we have
′ ′
Ezt′ [E[zt−k + (1 − ᾱt−k )∇ log p(zt−k )|zt′ ]] = Ezt−k
′
′
[zt−k ′
+ (1 − ᾱt−k )∇ log p(zt−k )] (33)
q
Since zt′ is the unconditional reverse sample of zt−k ′
, we have E[zt′ ] = ᾱᾱt−k
t ′
E[zt−k ], and then
Z
′ ′ ′ ′
Ezt−k
′ [∇ log p(zt−k )] = ∇ log p(zt−k )p(zt−k )dzt−k (34)
Z ′ ′
p (zt−k ) ′ ′
= ′ p(zt−k )dzt−k (35)
p(zt−k )
∂(1)
= ′ =0 (36)
∂zt−k
Finally, we have E[ẑt |y] = γE[zt′ ] + (1 − γ)E[zt′ ] = E[zt′ ].
Lemma 1. Let z̃t and ẑt denote the stochastically encoded and resampled image of ẑ0 (y), respec-
tively. If VAR(zt′ ) > 0, then we have that VAR(ẑt ) < VAR(z̃t ).

Proof. Recall that ẑt and z̃t are both normally distributed, with
VAR(ẑt ) = 1 − ᾱt (37)
1
VAR(z̃t ) = 1 1 (38)
1−ᾱt + σt2

σt2 (1 − ᾱt )
= . (39)
σt2 + (1 − ᾱt )
For all σt2 ≥ 0, we have
σt2 (1 − ᾱt )
< 1 − ᾱt (40)
σt2 + (1 − ᾱt )
=⇒ VAR(ẑt ) < VAR(z̃t ). (41)

Theorem 2. Let z0 denote a sample from the data distribution and zt be a sample from the noisy
perturbed distribution at time t. Given that the score function ∇zt log pzt (zt ) is bounded,
(1 − ᾱt )2 2 1 − ᾱt
Cov(z0 |zt ) = ∇zt log pzt (zt ) + I,
ᾱt ᾱt
where αt ∈ (0, 1) is an decreasing sequence in t.

30
Published as a conference paper at ICLR 2024

Proof. By the LDM forward process, we have

√
p(zt |z0 ) = N ( ᾱt , 1 − ᾱt ).
√
ᾱt
Consider the following variable: ẑt = 1−ᾱt zt , which is simply a scaled version of zt . Then,
consider the distribution
 
1  ∥z̄t − ᾱt z0 ∥2 
2
p(z̄t |z0 ) = d/2 exp − 1−ᾱt .
ᾱt

2π ᾱt  2 1−ᾱt

1−ᾱt

Now, we want to separate the term only with z̄t and the termonly with z0 for applying Tweedie’s
2
formula. Hence, let p0 (z̄t ) = 1
d/2 exp − 2 ||z̄ᾱt ||t , we achieve
(2π( 1−ᾱᾱt t )) ( 1−ᾱt )

ᾱt
p(z̄t |z0 ) = p0 (z̄t ) exp z̄t⊤ z0 − ∥z0 ∥2 ,
2(1 − ᾱt )
which separates the distribution interaction term z̄t⊤ z0 from ∥z0 ∥2 term.
Let λ(z̄t ) = log p(z̄t ) − log p0 (z̄t ). Then by Tweedie’s formula, we have E[z0 |z̄t ] = ∇λ(z̄t ) and
Cov(z0 |z̄t ) = ∇2 λ(z̄t ). Since p0 (z̄t ) is a Gaussian distribution with mean at 0 and variance equal
ᾱt
to 1− α¯t , we obtain

1 − ᾱt
∇λ(z̄t ) = ∇ log p(z̄t ) + z̄t .
ᾱt
We observe that since ẑ is a scaled version of zt , we can obtain the distribution of ẑ as

1 − ᾱt 1 − ᾱt
p(z̄t ) = √ p √ · zt .
ᾱt ᾱt
Then, we can apply chain rule to first compute then gradient on zt , and then account for z̄t . As a
result we get
1 − ᾱt
∇z̄t log p(z̄t ) = √ ∇zt log p(zt )
ᾱt
and then
1 − ᾱt 1
∇λ(z̄t ) = E[z0 |zt ] = √ ∇ log p(zt ) + √ zt ,
ᾱt ᾱt
which is consistent with the score function of Chung et al. (2023a). Afterwards, we take the gradient
again with respective to the z̄t , and apply the chain rule again, then we have
(1 − ᾱt )2 2 1 − ᾱt
∇2 λ(z̄t ) = Cov(z0 |zt ) = ∇ log p(zt ) + I.
ᾱt ᾱt
This gives the desired result.

Bronstein V Latin School of Chicago EXHIBITS To Omnibus Opposition To Motions To Dismiss
No ratings yet
Bronstein V Latin School of Chicago EXHIBITS To Omnibus Opposition To Motions To Dismiss
134 pages
Diffusion: by Aryan Jain
100% (1)
Diffusion: by Aryan Jain
55 pages
Planet Report PDF
No ratings yet
Planet Report PDF
7 pages
M272 Engine Part 2
100% (4)
M272 Engine Part 2
42 pages
Into The Flames
100% (1)
Into The Flames
38 pages
2209.14687v4
No ratings yet
2209.14687v4
30 pages
DPS
No ratings yet
DPS
30 pages
Decomposed diffusion sampler for accelerating large scale inverse problems
No ratings yet
Decomposed diffusion sampler for accelerating large scale inverse problems
28 pages
2306.02949v1 (1)
No ratings yet
2306.02949v1 (1)
6 pages
MCG
No ratings yet
MCG
29 pages
2305.04391v2
No ratings yet
2305.04391v2
24 pages
547 Diffusion Posterior Sampli
No ratings yet
547 Diffusion Posterior Sampli
27 pages
Chung_MedIA_2022
No ratings yet
Chung_MedIA_2022
19 pages
2410.00083v1
No ratings yet
2410.00083v1
38 pages
Denoising Diffusion Restoration Models
No ratings yet
Denoising Diffusion Restoration Models
32 pages
diffusion_survey
No ratings yet
diffusion_survey
38 pages
2310.04378v1
No ratings yet
2310.04378v1
18 pages
Divide-and-Conquer Posterior Sampling For Denoising Diffusion Priors
No ratings yet
Divide-and-Conquer Posterior Sampling For Denoising Diffusion Priors
30 pages
4924 Pseudoinverse Guided Diffusion
No ratings yet
4924 Pseudoinverse Guided Diffusion
30 pages
Improved Denoising Diffusion Probabilistic Models
No ratings yet
Improved Denoising Diffusion Probabilistic Models
17 pages
2309.16948v3
No ratings yet
2309.16948v3
26 pages
Diffusion Based Representation Learning
No ratings yet
Diffusion Based Representation Learning
20 pages
d3pm
No ratings yet
d3pm
33 pages
Learning The Domain Specific Inverse Nufft
No ratings yet
Learning The Domain Specific Inverse Nufft
5 pages
2410.23061v2
No ratings yet
2410.23061v2
21 pages
Diffusion With Forward Models Solving Stochastic Inverse Problems Without Direct Supervision
No ratings yet
Diffusion With Forward Models Solving Stochastic Inverse Problems Without Direct Supervision
14 pages
Solving Inverse Problems Using Datadriven Models
No ratings yet
Solving Inverse Problems Using Datadriven Models
174 pages
Efficient Diffusion Model For Image Super Resolution
No ratings yet
Efficient Diffusion Model For Image Super Resolution
19 pages
Generative Diffusion Prior for Unified Image Restoration and Enhancement
No ratings yet
Generative Diffusion Prior for Unified Image Restoration and Enhancement
12 pages
Flow Priv
No ratings yet
Flow Priv
26 pages
Arbitrary-steps Image Super-resolution via Diffusion Inversion 2412.09013v2
No ratings yet
Arbitrary-steps Image Super-resolution via Diffusion Inversion 2412.09013v2
16 pages
2301.12686v2
No ratings yet
2301.12686v2
22 pages
1 s2.0 S1051200421003249 Main
No ratings yet
1 s2.0 S1051200421003249 Main
13 pages
1-s2.0-S1361841523001329-main
No ratings yet
1-s2.0-S1361841523001329-main
16 pages
4 PLMS
No ratings yet
4 PLMS
24 pages
CVPR2022 Tutorial Diffusion Model
No ratings yet
CVPR2022 Tutorial Diffusion Model
188 pages
Computational Inverse Problems
100% (1)
Computational Inverse Problems
67 pages
latent(1)
No ratings yet
latent(1)
10 pages
Resfusion Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise
No ratings yet
Resfusion Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise
30 pages
Denoising Diffusion Probabilistic Models
No ratings yet
Denoising Diffusion Probabilistic Models
25 pages
Diffusion
No ratings yet
Diffusion
55 pages
Inversion by direct iteration An alternative to denoising diffusion 2303.11435v5
No ratings yet
Inversion by direct iteration An alternative to denoising diffusion 2303.11435v5
35 pages
diffusion_models_for_pnp_IR
No ratings yet
diffusion_models_for_pnp_IR
48 pages
MAT-52506 Inverse Problems: Samuli Siltanen February 20, 2009
No ratings yet
MAT-52506 Inverse Problems: Samuli Siltanen February 20, 2009
58 pages
High-Frequency Space Diffusion Model For Accelerated MRI
No ratings yet
High-Frequency Space Diffusion Model For Accelerated MRI
14 pages
DiffusionModel DDPM
No ratings yet
DiffusionModel DDPM
52 pages
Denoising Diffusion Implicit Models
No ratings yet
Denoising Diffusion Implicit Models
22 pages
Kingma 等 - 2023 - Variational Diffusion Models
No ratings yet
Kingma 等 - 2023 - Variational Diffusion Models
27 pages
Lecture7-8 Diffusion Model
No ratings yet
Lecture7-8 Diffusion Model
136 pages
Lecture7 8 - Diffusion - Model 1 78 1 66
No ratings yet
Lecture7 8 - Diffusion - Model 1 78 1 66
66 pages
Lecture7 8 Diffusion Model 1 78
No ratings yet
Lecture7 8 Diffusion Model 1 78
78 pages
2302.07194v1
No ratings yet
2302.07194v1
52 pages
Consistency Models
No ratings yet
Consistency Models
42 pages
使用扩散模型进行快速无监督脑异常检测和分割
No ratings yet
使用扩散模型进行快速无监督脑异常检测和分割
10 pages
Demystifying Variational Diffusion Models
No ratings yet
Demystifying Variational Diffusion Models
48 pages
Pourya 2024
No ratings yet
Pourya 2024
16 pages
Final Term Paper Draft 2
No ratings yet
Final Term Paper Draft 2
33 pages
Diffusion Model
No ratings yet
Diffusion Model
17 pages
Denker2021 - Conditional Normalizing Flow
No ratings yet
Denker2021 - Conditional Normalizing Flow
27 pages
Diffusion Models A Concise Perspective
No ratings yet
Diffusion Models A Concise Perspective
8 pages
Eneralization in Diffusion Models Arises From Geometry Adaptive Harmonic Representations
No ratings yet
Eneralization in Diffusion Models Arises From Geometry Adaptive Harmonic Representations
25 pages
2209.04747v6
No ratings yet
2209.04747v6
25 pages
Diffusion Models For Time Series Applications: A Survey
No ratings yet
Diffusion Models For Time Series Applications: A Survey
25 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Brainware University: (BP501T) Class Notes (Medicinal Chemistry-II Theory)
No ratings yet
Brainware University: (BP501T) Class Notes (Medicinal Chemistry-II Theory)
6 pages
21st-Wk-1-2
No ratings yet
21st-Wk-1-2
30 pages
Limanora The Island of Progress 1000120012 PDF
No ratings yet
Limanora The Island of Progress 1000120012 PDF
729 pages
Complete Download (eBook PDF) Best Practices of Literacy Leaders Second 2nd Edition PDF All Chapters
100% (4)
Complete Download (eBook PDF) Best Practices of Literacy Leaders Second 2nd Edition PDF All Chapters
56 pages
Mac Family Tree User Guide
No ratings yet
Mac Family Tree User Guide
179 pages
3 - 23-cv-02866-K - 34 - PRIMARY DOCUMENT
No ratings yet
3 - 23-cv-02866-K - 34 - PRIMARY DOCUMENT
15 pages
Judge Florence Pan's Decision On DOJ V Penguin Random House
No ratings yet
Judge Florence Pan's Decision On DOJ V Penguin Random House
80 pages
Hold-Time-Study-Protocol-Cum-Report-for-Oral-Liquid
No ratings yet
Hold-Time-Study-Protocol-Cum-Report-for-Oral-Liquid
10 pages
A Interface Cause Codes
100% (2)
A Interface Cause Codes
6 pages
Chemical bonds (Chemistry) Grade 7
No ratings yet
Chemical bonds (Chemistry) Grade 7
3 pages
Design and Fiber Installation For University Campus System
No ratings yet
Design and Fiber Installation For University Campus System
7 pages
Ethics - Wikipedia
No ratings yet
Ethics - Wikipedia
319 pages
Ph.D. Program in Political Science of The City University of New York
No ratings yet
Ph.D. Program in Political Science of The City University of New York
22 pages
SJPO2011 Results
No ratings yet
SJPO2011 Results
8 pages
Bai Tap On Tap Theo Tung Unit
No ratings yet
Bai Tap On Tap Theo Tung Unit
77 pages
Кролик
No ratings yet
Кролик
14 pages
Grade IX Holiday Homework
No ratings yet
Grade IX Holiday Homework
3 pages
Autonomic Receptors Atf
No ratings yet
Autonomic Receptors Atf
26 pages
VIM Overview
20% (5)
VIM Overview
32 pages
QAR-QD404-1 Liquid Penetrant Testing Report-General
No ratings yet
QAR-QD404-1 Liquid Penetrant Testing Report-General
1 page
Epidemiology, Pathogenesis, Microbiology, And Diagnosis of Hospital-Acquired and Ventilator-Associated Pneumonia in Adults - UpToDate
No ratings yet
Epidemiology, Pathogenesis, Microbiology, And Diagnosis of Hospital-Acquired and Ventilator-Associated Pneumonia in Adults - UpToDate
19 pages
1. Unit 1,2 MCQs set
No ratings yet
1. Unit 1,2 MCQs set
29 pages
Operational Guidelines For Cea
No ratings yet
Operational Guidelines For Cea
178 pages
Git
100% (7)
Git
302 pages
worksheets in 3i
No ratings yet
worksheets in 3i
2 pages
Post Tensioned Slabs
No ratings yet
Post Tensioned Slabs
11 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Solving Inverse Problems

Uploaded by

Solving Inverse Problems

Uploaded by

Published as a conference paper at ICLR 2024

S OLVING I NVERSE P ROBLEMS WITH L ATENT D IFFU -

Latent diffusion models have been demonstrated to generate high-quality images,

Nonlinear Deblurring CT Reconstruction

Measurement Reference ReSample (Ours) Measurement Reference ReSample (Ours)

3 R E S AMPLE : I NVERSE P ROBLEMS USING L ATENT D IFFUSION M ODELS

Algorithm 1 ReSample: Solving Inverse Problems with Latent Diffusion Models

Super Resolution 4× Inpainting (Random 70%)

3.2 T HEORETICAL R ESULTS

Nonlinear Deblurring Gaussian Deblurring

Inpainting (Random 70%)

Experiments on Medical Images. For experiments on medical images, we fine-tune LDMs on

Reference ReSample Encoding Reference ReSample Encoding

PSNR: 27.03 PSNR: 25.58 PSNR: 29.87 PSNR: 28.08

PSNR: 21.33 PSNR: 20.32 PSNR: 30.35 PSNR: 27.82

Figure 5: Effectiveness of our resampling technique compared to stochastic encoding. Results

Measurement Reference DPS Latent-DPS ReSample

(b) Performance v.s. number of time steps

A.1 R ESULTS ON FFHQ

Super Resolution 4× Inpainting (Random 70%)

Nonlinear Deblurring Gaussian Deblurring

A.2 A DDITIONAL R ESULTS ON B OX I NPAINTING

A.3 A DDITIONAL P RELIMINARY R ESULTS ON H IGH -R ESOLUTION I MAGES

PSNR: 22.79 PSNR: 17.57 PSNR: 17.84

PSNR: 25.37 PSNR: 24.79 PSNR: 19.60

A.4 D ISCUSSION ON PSLD BASELINE

Measurement Reference ReSample

Hyperparameter (Rout et al., 2023) SR (FFHQ) SR (CelebA) Gaussian Deblur (CelebA)

Inpainting (Random) Inpainting (Box)

A.5 A DDITIONAL Q UALITATIVE R ESULTS

Measurement Reference DPS ReSample Measurement Reference DPS ReSample

Measurement Reference DPS ReSample Measurement Reference DPS ReSample

Measurement Reference DPS ReSample Measurement Reference DPS ReSample

Measurement Reference DDRM DMPS DPS Latent-DPS ReSample

Measurement Reference Latent-DPS ReSample Measurement Reference Latent-DPS ReSample

Measurement Reference Latent-DPS ReSample Measurement Reference Latent-DPS ReSample

A.6 A BLATION S TUDIES

Figure 17: Visual representation of varying γ on the CelebA-HQ dataset.

Algorithm Skip Step Size Inference Time PSNR SSIM

Reference 200 100 20 10 4 1

Computational Efficiency. Here, we present the memory usage

k 0 0.5 1.0 1.5 2.0 2.5

Model type Training Time Training Memory FID

B D ISCUSSION ON H ARD DATA C ONSISTENCY

B.1 ACCELERATED P IXEL O PTIMIZATION BY C ONJUGATE G RADIENT

C.1 L INEAR AND N ONLINEAR I NVERSE P ROBLEMS ON NATURAL I MAGES

C.2 L INEAR I NVERSE P ROBLEMS ON M EDICAL I MAGES

C.3 I MPLEMENTATIONS OF BASELINES

D M ORE D ISCUSSION ON THE FAILURE C ASES OF L ATENT-DPS

D.1 FAILURE OF M EASUREMENT C ONSISTENCY

Anatomical Site ReSample (Ours) Latent-DPS

D.2 FAILURE OF L INEAR M ANIFOLD A SSUMPTION

Figure 19: Depiction of the violation of the linear manifold assumption.

D.3 I NACCURATE E STIMATES OF THE P OSTERIOR M EAN

F D EFERRED P ROOFS FOR R E S AMPLE

Proof. We have that

Proof. By the LDM forward process, we have

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.