MCG
MCG
3
Kim Jaechul Graduate School of AI
∗
Equal contribution
Korea Advanced Institute of Science and Technology (KAIST)
{hj.chung, byeongsu.s, dh.ryu, jong.ye}@kaist.ac.kr
Abstract
Recently, diffusion models have been used to solve various inverse problems in
an unsupervised manner with appropriate modifications to the sampling process.
However, the current solvers, which recursively apply a reverse diffusion step
followed by a projection-based measurement consistency step, often produce sub-
optimal results. By studying the generative sampling path, here we show that
current solvers throw the sample path off the data manifold, and hence the error
accumulates. To address this, we propose an additional correction term inspired
by the manifold constraint, which can be used synergistically with the previous
solvers to make the iterations close to the manifold. The proposed manifold
constraint is straightforward to implement within a few lines of code, yet boosts
the performance by a surprisingly large margin. With extensive experiments,
we show that our method is superior to the previous methods both theoretically
and empirically, producing promising results in many applications such as image
inpainting, colorization, and sparse-view computed tomography. Code available
here
1 Introduction
Diffusion models have shown impressive performance both as generative models themselves [41, 13],
and also as unsupervised inverse problem solvers [22, 8, 9, 25] that do not require problem-specific
training. Specifically, given a pre-trained unconditional score function (i.e. denoiser), solving the
reverse stochastic differential equation (SDE) numerically would amount to sampling from the data
generating distribution [41]. For many different inverse problems (e.g. super-resolution [8, 9],
inpainting [41, 9], compressed-sensing MRI (CS-MRI) [40, 9], sparse view CT (SV-CT) [40], etc.),
it was shown that simple incorporation of the measurement process produces satisfactory conditional
samples, even when the model was not trained for the specific problem.
Nevertheless, for certain problems (e.g. inpainting), currently used algorithms often produce unsat-
isfactory results when implemented naively (e.g. boundary artifacts, as shown in Fig. 1 (b)). The
authors in [32] showed that in order to produce high quality reconstructions, one needs to iterate
back and forth between the noising and the denoising step at least > 10 times per iteration. These
iterations are computationally demanding and should be avoided, considering that diffusion models
are slow to sample from even without such iterations. On the other hand, a classic result of Tweedie’s
formula [37, 42] shows that one can perform Bayes optimal denoising in one step, once we know
the gradient of the log density. Extending such result, it was recently shown that one can indeed
perform a single-step denoising with learned score functions for denoising problems from the general
exponential family [28].
In this work, we leverage the denoising result through Tweedie’s formula and show that such
denoised samples can be the key to significantly improving the performance of reconstruction using
diffusion models across arbitrary linear inverse problems, despite the simplicity in the implementation.
Moreover, we theoretically prove that if the score function estimation is globally optimal, the
correction term from the manifold constraint enforces the sample path to stay on the plane tangent
to the data manifold1 , so by combining with the reverse diffusion step, the solution becomes more
stable and accurate.
2 Related Works
2.1 Diffusion Models
Continuous Form For a continuous diffusion process x(t) ∈ Rn , t ∈ [0, 1], we set x(0) ∼
p0 (x) = pdata , where pdata represents the data distribution of interest, and x(1) ∼ p1 (x), with
p1 (x) approximating spherical Gaussian distribution, containing no information of data. Here, the
forward noising process is defined with the following Itô stochastic differential equation (SDE) [41]:
dx = f̄ (x, t)dt + ḡ(t)dw, (1)
d d
with f̄ : R 7→ R defining the linear drift function, ḡ(t) : R 7→ R defining a scalar diffusion
coefficient, and w ∈ Rn denoting the standard n−dimensional Wiener process. The forward SDE in
(1) is coupled with the following reverse SDE by the Anderson’s theorem [1, 41]:
dx = [f̄ (x, t) − ḡ(t)2 ∇x log pt (x)]dt + ḡ(t)dw̄, (2)
with dt denoting the infinitesimal negative time step, and w̄ defining the standard Wiener process
running backward in time. Note that the reverse SDE defines the generative process through the
score function ∇x log pt (x), which in practice, is typically replaced with ∇x log p0t (x(t)|x(0)) to
minimize the following denoising score-matching objective
min Et∼U (ε,1),x(0)∼p0 (x),x(t)∼p0t (x(t)|x(0)) ∥sθ (x(t), t) − ∇xt log p0t (x(t)|x(0))∥22 . (3)
θ
Once the parameter θ∗ for the score function is estimated, one can replace the score function in (2)
with sθ∗ (x(t), t) to solve the reverse SDE [41].
Discrete Form Due to the linearity of f̄ and ḡ, the forward diffusion step can be implemented with
a simple reparameterization trick [29]. Namely, the general form of the forward diffusion is
xi = ai x0 + bi z, z ∼ N (0, I), (4)
1
We coin our method Manifold Constrained Gradient (MCG).
2
where we have replaced the continuous index t ∈ [0, 1] with the discrete index i ∈ N. On the other
hand, the discrete reverse diffusion step can in general be represented as
xi−1 = f (xi , sθ∗ ) + g(xi )z, z ∼ N (0, I), (5)
where we have replaced the ground truth score function with the trained one. We detail the choice of
ai , bi , f , g in Appendix. B.
The main problem of our interest in this paper is the inverse problem, retrieving the unknown x ∈ Rn
from a measurement y:
y = Hx + ϵ, y ∈ Rm , H ∈ Rm×n , (6)
m
where ϵ ∈ R is the noise in the measurement. Accordingly, for the case of the inverse problems,
our goal is to generate samples from a conditional distribution with respect to the measurement
y, i.e. p(x|y). Accordingly, the score function ∇x log pt (x) in (2) should be replaced by the
conditional score ∇x log pt (x|y). Unfortunately, this strictly restricts the generalization capability
of the neural network since the conditional score should be retrained whenever the conditions change.
To address this, recent conditional diffusion models [22, 41, 8, 9] utilize the unconditional score
function ∇x log pt (x) but rely on a projection-based measurement constraint to impose the conditions.
Specifically, one can apply the following:
x′i−1 = f (xi , sθ ) + g(xi )z, z ∼ N (0, I), (7)
xi−1 = Ax′i−1 + bi , (8)
where A, bi are functions of H, y, and x0 . Note that (7) is identical to the unconditional reverse
diffusion step in (5), whereas (8) effectively imposes the condition. Using the projection approach
on Tweedie denoised estimates was proposed in [22]. Also, it was shown in [9] that any general
contraction mapping (e.g. projection onto convex sets, gradient step) may be utilized as (8) to impose
the constraint.
Another recent work [25] advancing [26] establishes the state-of-the-art (SOTA) in solving noisy
inverse problems with unconditional diffusion models, by running the conditional reverse diffusion
process in the spectral domain achieved by performing singular value decomposition (SVD), and
leveraging approximate gradient of the log likelihood term in the spectral space. The authors show
that feasible solutions can be obtained with as small as 20 diffusion steps.
Prior to the development of diffusion models, Plug-and-Play (PnP) models [47, 53, 44] were used
in a similar fashion by utilizing a general-purpose unconditional denoiser in the place of proximal
mappings in model-based iterative reconstruction methods [5, 3]. Similarly, outside the context of
diffusion models, iterative denoising followed by projection-based data consistency was proposed
in [44]. In such view, diffusion models can be understood as generative variant of PnPs trained with
multiple scales of noise.
GAN-based solvers are also widly explored [4, 10, 20], where the pre-trained generators are tuned at
the test time by optimizing over the latent, the parameters, or jointly.
In the case of Gaussian noise, a classic result of Tweedie’s formula [37] tells us that one can achieve
the denoised result by computing the posterior expectation:
E[x|x̃] = x̃ + σ 2 ∇x̃ log p(x̃), (9)
2
where the noise is modeled by x̃ ∼ N (x, σ I). If we consider a diffusion model in which the forward
step is modeled as xi ∼ N (ai x0 , b2i I) (discrete form), the Tweedie’s formula can be rewritten as:
E[x0 |xi ] = (xi + b2i ∇xi log p(xi ))/ai . (10)
Tweedie’s formula is in fact not only relevant to Gaussian denoising in the Bayesian framework, but
have also been extended to be in close relation with kernel regression [34]. Moreover, it was shown
that it can be applied to arbitrary exponential noise distributions beyond Gaussian [14, 28]. In the
following, we use this key property to develop our algorithm.
3
3 Conditional Diffusion using Manifold Constraints
Although our original motivation of using the measurement constraint step in (8) was to utilize the
unconditionally trained score function in the reverse diffusion step in (7), there is room for imposing
additional constraints while still using the unconditionally trained score function.
Specifically, the Bayes rule p(x|y) = p(y|x)p(x)/p(y) leads to
∇x log p(x|y) = ∇x log p(x) + ∇x log p(y|x). (11)
Hence, the score function in the reverse SDE in (7) can be replaced by (11), leading to
∂
x′i−1 = f (xi , sθ ) − α ∥W (y − Hxi )∥22 + g(xi )z, z ∼ N (0, I) (12)
∂xi
where α and W depend on the noise covariance, if the noise ϵ in (6) is Gaussian.
Now, one of the important contributions of this paper is to reveal that the Bayes optimal denoising step
in (10) from the Tweedie’s formula leads to a preferred condition both empirically and theoretically.
Specifically, we define the set constraint for xi , called the manifold constrained gradient (MCG), so
that the gradient of the measurement term stays on the manifold (see Theorem 1):
x ∈ Xi , where Xi = {x ∈ Rn | x = (x + b2i sθ (x, i))/ai } (13)
To deal with the potential deviation from the measurement consistency, we again impose the data
consistency step (8). Putting them together, the discrete reverse diffusion under the additional
manifold constraint and the data consistency can be represented by
∂
x′i−1 = f (xi , sθ ) − α ∥W (y − H x̂0 (xi ))∥22 + g(xi )z, z ∼ N (0, I), (14)
∂xi
xi−1 = Ax′i−1 + b. (15)
We illustrate our scheme visually in Fig. 1 (a), specifically for the task of image inpainting. The
additional step leads to a dramatic performance boost, as can be seen in Fig. 1 (b). Note that while the
mapping (10) does not rely on the measurement, our gradient term in (14) incorporates the information
of y so that the gradient of the measurement terms stays on the manifold. In the following, we study
the theoretical properties of the method. Further algorithmic details and adaptations to each problem
that we tackle are presented in Section C.
We note that the authors of [19] proposed a similar gradient method for the application of temporal
imputation and super-resolution. When combining (14) with (15), one can arrive at a similar gradient
method proposed in [19], and hence our method can be seen as a generalization to arbitrary linear
inverse problems. Furthermore, there are vast literature in the context of PnP models that utilize pre-
trained denoisers together with gradient of the log-likelihood to solve inverse problems [30, 48, 11].
Among them, [30] is especially relevant to this work since their method relies on modified Langevin
diffusion, together with Tweedie’s denoising and projections to the measurement subspace.
4
(a) Geometry of diffusion model (b) MCG correction
Figure 2: In both (a) and (b), the central manifolds represent the data manifold M, encircled by
manifolds of noisy data Mi . The concentration on the manifold of noisy data and the distance from
the clean data manifold are prescribed by Proposition 1. In (a), the backward (resp. forward) step
depicted by blue (resp. red) arrows can be considered as transitions from Mi to Mi−1 (resp. Mi−1
to Mi ). In (b), arrows refer to the directions of conventional projection onto convex sets (POCS) step
(green arrow) and MCG step (red arrow) which can be predicted by Theorem 1.
We need to recall that the conventional manifold assumption is about the intrinsic geometry of data
points having a low dimensional nature. However, we assume more in this work: the manifold
is locally linear. Although this stronger assumption might narrow the practice of the theory, the
geometric approach may provide new insights on diffusion models. Under this assumption, the
following proposition shows how the data perturbed by noise lies in the ambient space, illustrated
pictorially in Fig. 2a.
Proposition 1 (Concentration of noisy data). Consider the distribution of noisy data pi (xi ) =
p(xi |x)p0 (x)dx, p(xi |x) ∼ N (ai x, b2i I).
R
√ Then pi (xi ) is concentrated on (n − 1)-dim manifold
Mi := {y ∈ Rn : d(y, ai M) = ri := bi n − l}. Rigorously, pi (Bϵri (Mi )) > 1 − δ, for some
small ϵ, δ > 0.
Remark 1 (Geometric interpretation of the diffusion process). Considering Proposition 1, the mani-
folds of noisy data can be interpreted as interpolating manifolds between the two: the hypersphere,
where pure noise N (a∞ x0 , b2∞ ) is concentrated, and the clean data manifold. In this regard, the
diffusion steps are mere transitions from one manifold to another and the diffusion process is a
transport from the data manifold to the hypersphere through interpolating manifolds. See Fig. 2a.
Remark 2. We can infer from the proposition that the score functions are trained only with the data
points concentrated on the noisy data manifolds. Therefore, inaccurate inference might be caused by
application of a score function on points away from the noisy data manifold.
Proposition 2 (score function). Suppose sθ is the minimizer of the denoising score matching loss in
(3). Let Qi be the function that maps xi to x̂0 for each i,
1
Qi : Rd → Rd , xi 7→ x̂0 := (xi + b2i sθ (xi , i)).
ai
Then, Qi (xi ) ∈ M and J 2Qi = J Qi = J TQi : Rd → TQi (xi ) M. Intuitively, Qi is locally an
orthogonal projection onto M.
According to the proposition, the score function only concerns the normal direction of the data
manifold. In other words, the score function cannot discriminate two data points whose difference is
tangent to the manifold. In solving inverse problems, however, we desire to discriminate data points
to reconstruct the original signal, and the discrimination is achievable by measurement fidelity. In
order to achieve the original signal, the measurement plays a role in correcting the tangent component
near the data manifold. Furthermore, with regard to remark 2, diffusion model-based inverse problem
solvers should follow the tangent component. The following theorem shows how existing algorithms
and the proposed method are different in this regard.
5
Figure 3: Inpainting results on FFHQ (1st, 2nd row) and ImageNet (3rd, 4th row). (a) Measurement,
(b) Ground truth, (c) IAGAN [20] for FFHQ, LaMa [43] for ImageNet, (d) DDRM [25], (e) Score-
SDE [41], (f) RePAINT [32], (g) MCG (Ours). Out of 256 × 256 image, the 1st and the 3rd row is
masked with size 128 × 128 box. 92% of pixels (all RGB channels) from the images in the 2nd and
4th row are blocked.
Theorem 1 (Manifold constrained gradient). A correction by the manifold constrained gradient does
not leave the data manifold. Formally,
∂
∥W (y − H x̂0 )∥22 = −2J TQi H T W T W (y − H x̂0 ) ∈ Tx̂0 M,
∂xi
the gradient is the projection of the data fidelity term onto Tx̂0 M,
This theorem suggests that in diffusion models, the naive measurement fidelity step (without consid-
ering the data manifold) pushes the inference path out of the manifolds and might lead to inaccurate
reconstruction. (To see this pictorially, see section. D, and Fig. 7.) On the other hand, our correction
term from the manifold constraint guides the diffusion to lie on the data manifold, leading to better
reconstruction. Such geometric views are illustrated in Fig. 2b.
Remark 3. One may concern that the suboptimality of the denoising score matching loss optimization
may lead to inaccurate inference of the MCG steps. In practice, however, most of the error in denoising
score matching is concentrated on t ∼ 1[9], and in such region, the Tweedie’s inference cannot make
meaningful images. That is, the score function cannot detect the data manifold. Nonetheless, in this
regime, the magnitudes of the MCGs are small when the denoising score is inaccurate, and hence
the matters arising from suboptimality is minimal. As t → 0, the estimation becomes exact, and
subsequently leads to accurate implementation of the MCG.
5 Experiments
For all tasks, we aim to verify the superiority of our method against other diffusion model-based
approaches, and also against strong supervised learning-based baselines. Further details can be found
in Section. F.
Datasets and Implementation For inpainting, we use FFHQ 256×256 [24], and ImageNet
256×256 [12] to validate our method. We utilize pre-trained models from the open sourced reposi-
tory based on the implementation of ADM (VP-SDE) [13]. We validate the performance on 1000
held-out validation set images for both FFHQ and ImageNet dataset. For the colorization task, we
use FFHQ 256×256, and LSUN-bedroom 256×256 [51]. We use pre-trained score functions from
6
FFHQ (256 × 256) ImageNet (256 × 256)
Box Random Extreme Wide masks Box Random Wide masks
Method FID ↓ LPIPS ↓ FID ↓ LPIPS ↓ FID ↓ LPIPS ↓ FID ↓ LPIPS ↓ FID ↓ LPIPS ↓ FID ↓ LPIPS ↓ FID ↓ LPIPS ↓
MCG (ours) 23.7 0.089 21.4 0.186 30.6 0.366 22.1 0.099 25.4 0.157 34.8 0.308 21.9 0.148
Score-SDE [41] 30.3 0.135 109.3 0.674 48.6 0.488 29.8 0.132 43.5 0.199 143.5 0.758 25.9 0.150
RePAINT∗ [32] 25.7 0.093 38.1 0.240 35.9 0.398 24.2 0.108 26.1 0.156 59.3 0.387 37.0 0.205
DDRM [25] 28.4 0.109 111.6 0.774 48.1 0.532 27.5 0.113 88.8 0.386 99.6 0.767 80.6 0.398
LaMa [43] 27.7 0.086 188.7 0.648 61.7 0.492 23.2 0.096 26.8 0.139 134.1 0.567 20.4 0.140
AOT-GAN [52] 29.2 0.108 97.2 0.514 69.5 0.452 28.3 0.106 35.3 0.163 119.6 0.583 29.8 0.161
ICT [49] 27.3 0.103 91.3 0.445 56.7 0.425 26.9 0.104 31.9 0.148 131.4 0.584 25.4 0.148
DSI [35] 27.9 0.096 126.4 0.601 77.5 0.463 28.3 0.102 34.5 0.155 132.9 0.549 24.3 0.154
IAGAN [20] 26.3 0.098 41.5 0.279 56.1 0.417 23.8 0.110 - - - - - -
Table 1: Quantitative evaluation (FID, LPIPS) of inpainting task on FFHQ and ImageNet. ∗ : Re-
implemented with our score function. MCG, Score-SDE, RePAINT, and DDRM all share the same
score function and differ only in the inference method. Bold: Best, under: second best.
score-SDE [41] based on VE-SDE. We use 300 validation images for testing the performance with re-
spect to the LSUN-bedroom dataset. For experiments with CT, we train our model based on ncsnpp
as a VE-SDE from score-SDE [41], on the 2016 American Association of Physicists in Medicine
(AAPM) grand challenge dataset, and we process the data as in [23]. Specifically, the dataset contains
3839 training images resized to 256×256 resolution. We simulate the CT measurement process with
parallel beam geometry with evenly-spaced 180 degrees. Evaluation is performed on 421 held-out
validation images from the AAPM challenge.
7
Figure 4: Colorization results on FFHQ / LSUN-bedroom, Sparse view CT reconstruction results on
AAPM.
method outperforms all other methods in terms of both PSNR/LPIPS in LSUN-bedroom, and also
achieves strong performance in the colorization of FFHQ dataset.
CT reconstruction To the best of our knowledge, [40] is the only method that tackles CT re-
construction directly with diffusion models. We compare our method against [40], which we refer
to as score-CT henceforth. We also compare with the best-in-class supervised learning methods,
cGAN [15] and SIN-4c-PRN [50]. As a compressed sensing baseline, FISTA-TV [3] was included,
along with the analytical reconstruction method, FBP. We use two standard metrics - peak-signal-
to-noise-ratio (PSNR), and SSIM for quantitative evaluation. From Table 3, we see that the newly
proposed MCG method outperforms the previous score-CT [40] by a large margin. We can observe
the superiority of MCG over other methods more clearly in Fig. 4, where MCG reconstructs the
measurement with high fidelity and detail. All other methods including the fully supervised baselines
fall behind the proposed method.
reconstruction. When considering gradient steps without Tweedie’s denoising (i.e. keeping the noise
level at the ith step), the performance heavily degrades, especially when implemented without the
projection steps. Here, we see that the proposed denoising step to utilize x̂0 is indeed the key to the
superior performance.
Second, looking at Fig. 5a, we immediately see that the graph of MCG stays in the lowest (best)
LPIPS regime across all NFEs by a large margin, except for when the NFE drops below 100. Here,
DDRM [25] takes over the 1st place - allegedly due to the DDIM sampling strategy they take. The
performance of RePAINT deteriorates rapidly as we decrease NFE. Furthermore, we observe that
the LPIPS of score-SDE [41] actually increases (i.e. worsen), as we increase the number of NFEs
from a few hundred to one thousand. This suggests that the inference process that score-SDE takes
(i.e. projection only) is inherently flawed, and cannot be corrected by taking small enough steps. In
Table. 4, we list the runtime of all the methods that were used for comparison in the task of inpainting.
Note that the proposed method takes longer for compute than score-SDE albeit having the same
NFE. The gap is due to the backpropagation steps that are required for the MCG step, where the gap
can be potentially ameliorated by switching to JAX [6] implementation from the current PyTorch
implementation.
Lastly, we observe the difference in the performance as we vary the values of α. Implementation-wise,
we find that we yield superior results when normalizing the squared norm with the norm of itself
(e.g. α = α′ /∥W (y − H x̂0 )∥, where α′ is some constant). In order to avoid cluttered notation, we
instead experiment with changing the values of α′ in Fig. 5b. Inspecting Fig. 5b, we see that α values
within the range [0.1, 1.0] produce satisfactory results. α values that are too low do not fully enjoy
the advantages of MCG and collapses to the projection-only method, while using too high values of
α results in exploding gradients, and the reconstruction saturates.
Properties of our method Our proposed method is fully unsupervised and is not trained on solving
a specific inverse problem. For example, our box masks and random masks have very different
forms of erasing the pixel values. Nevertheless, our method generalizes perfectly well to such
different measurement conditions, while other methods have a large performance gap between
the different mask shapes. We further note two appealing properties of our method as an inverse
problem solver: 1) the ability to generate multiple solutions given a condition, and 2) the ability to
maintain perfect measurement consistency. The former ability often lacks in supervised learning-
based methods [43, 50], and the latter is often not satisfied for some unsupervised GAN-based
solutions [10, 4].
6 Conclusion
In this work, we proposed a general framework that can greatly enhance the performance of the
diffusion model-based solvers for solving inverse problems. We showed several promising appli-
cations - inpainting, colorization, sparse view CT reconstruction, and showed that our method can
outperform the current state-of-the-art methods. We analyzed our method theoretically and show that
9
MCG prevents the data generation process from falling off the manifold, thereby reducing the errors
that might accumulate at every step. Further, we showed that MCG controls the direction tangent to
the data manifold, whereas the score function controls the direction that is normal, such that the two
components complement each other.
Limitations and Broader Impact The proposed method is inherently stochastic since the diffusion
model is the main workhorse of the algorithm. When the dimension m is pushed to low values, at
times, our method fails to produce high quality reconstructions, albeit being better than the other
methods overall. For extreme cases of inpainting (e.g. Half masks) with the ImageNet model, we
often observe artifacts in our reconstruction (e.g. generating perfectly symmetric images), which we
discuss in further detail in Sec. E. We note that our method is slow to sample from, inheriting the
existing limitations of diffusion models. This would likely benefit from leveraging recent solvers
aimed at accelerating the inference speed of diffusion models. In line with the arguments of other
generative model-based inverse problem solvers, our method is a solver that relies heavily on the
underlying diffusion model, and can thus potentially create malicious content such as deepfakes.
Further, the reconstructions could intensify the social bias that is already existent in the training
dataset.
References
[1] Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their
Applications, 12(3):313–326, 1982.
[2] Lynton Ardizzone, Carsten Lüth, Jakob Kruse, Carsten Rother, and Ullrich Köthe. Guided
image generation with conditional invertible neural networks. arXiv preprint arXiv:1907.02392,
2019.
[3] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear
inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
[4] Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis. Compressed sensing using
generative models. In International Conference on Machine Learning, pages 537–546. PMLR,
2017.
[5] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al. Distributed opti-
mization and statistical learning via the alternating direction method of multipliers. Foundations
and Trends® in Machine learning, 3(1):1–122, 2011.
[6] James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal
Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao
Zhang. JAX: composable transformations of Python+NumPy programs, 2018.
[7] Thorsten M Buzug. Computed tomography. In Springer handbook of medical technology, pages
311–342. Springer, 2011.
[8] Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. ILVR:
Conditioning method for denoising diffusion probabilistic models. In Proceedings of the
IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
[9] Hyungjin Chung, Byeongsu Sim, and Jong Chul Ye. Come-Closer-Diffuse-Faster: Acceler-
ating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
10
[10] Giannis Daras, Joseph Dean, Ajil Jalal, and Alexandros G Dimakis. Intermediate layer opti-
mization for inverse problems using deep generative models. In International Conference on
Machine Learning, 2021.
[11] Valentin De Bortoli, Alain Durmus, Marcelo Pereyra, and Ana Fernandez Vidal. Maximum
likelihood estimation of regularization parameters in high-dimensional inverse problems: an
empirical bayesian approach. part ii: Theoretical analysis. SIAM Journal on Imaging Sciences,
13(4):1990–2028, 2020.
[12] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-
scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern
recognition, pages 248–255. Ieee, 2009.
[13] Prafulla Dhariwal and Alexander Quinn Nichol. Diffusion models beat GANs on image
synthesis. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances
in Neural Information Processing Systems, 2021.
[14] Bradley Efron. Tweedie’s formula and selection bias. Journal of the American Statistical
Association, 106(496):1602–1614, 2011.
[15] Muhammad Usman Ghani and W Clem Karl. Deep learning-based sinogram completion
for low-dose ct. In 2018 IEEE 13th Image, Video, and Multidimensional Signal Processing
Workshop (IVMSP), pages 1–5. IEEE, 2018.
[16] Richard Gordon, Robert Bender, and Gabor T Herman. Algebraic reconstruction techniques
(art) for three-dimensional electron microscopy and x-ray photography. Journal of theoretical
Biology, 29(3):471–481, 1970.
[17] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter.
Gans trained by a two time-scale update rule converge to a local nash equilibrium. In I. Guyon,
U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,
Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
[18] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In
Advances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020.
[19] Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J
Fleet. Video diffusion models. arXiv preprint arXiv:2204.03458, 2022.
[20] Shady Abu Hussein, Tom Tirer, and Raja Giryes. Image-adaptive gan based reconstruction. In
Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3121–3129,
2020.
[21] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with
conditional adversarial networks. In Proceedings of the IEEE conference on computer vision
and pattern recognition, pages 1125–1134, 2017.
[22] Zahra Kadkhodaie and Eero Simoncelli. Stochastic solutions for linear inverse problems
using the prior implicit in a denoiser. In Advances in Neural Information Processing Systems,
volume 34, pages 13242–13254. Curran Associates, Inc., 2021.
[23] Eunhee Kang, Junhong Min, and Jong Chul Ye. A deep convolutional neural network using
directional wavelets for low-dose x-ray ct reconstruction. Medical physics, 44(10):e360–e375,
2017.
[24] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative
adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition, pages 4401–4410, 2019.
[25] Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration
models. In ICLR Workshop on Deep Generative Models for Highly Structured Data, 2022.
[26] Bahjat Kawar, Gregory Vaksman, and Michael Elad. Snips: Solving noisy inverse problems
stochastically. Advances in Neural Information Processing Systems, 34:21757–21769, 2021.
11
[27] Daniil Kazantsev, Edoardo Pasca, Martin J Turner, and Philip J Withers. Ccpi-regularisation
toolkit for computed tomographic image reconstruction with proximal splitting algorithms.
SoftwareX, 9:317–323, 2019.
[28] Kwanyoung Kim and Jong Chul Ye. Noise2score: Tweedie’s approach to self-supervised image
denoising without clean images. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman
Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
[29] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In 2nd International
Conference on Learning Representations, ICLR, 2014.
[30] Rémi Laumont, Valentin De Bortoli, Andrés Almansa, Julie Delon, Alain Durmus, and Marcelo
Pereyra. Bayesian imaging using plug & play priors: when langevin meets tweedie. SIAM
Journal on Imaging Sciences, 15(2):701–737, 2022.
[31] Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional by model
selection. Annals of Statistics, pages 1302–1338, 2000.
[32] Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc
Van Gool. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. arXiv preprint
arXiv:2201.09865, 2022.
[33] Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley.
Least squares generative adversarial networks. In Proceedings of the IEEE international
conference on computer vision, pages 2794–2802, 2017.
[34] Frank Ong, Peyman Milanfar, and Pascal Getreuer. Local kernels that approximate bayesian
regularization and proximal operators. IEEE Transactions on Image Processing, 28(6):3007–
3019, 2019.
[35] Jialun Peng, Dong Liu, Songcen Xu, and Houqiang Li. Generating diverse structure for
image inpainting with hierarchical VQ-VAE. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pages 10775–10784, 2021.
[36] Ali Razavi, Aaron Van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images
with vq-vae-2. Advances in neural information processing systems, 32, 2019.
[37] Herbert E Robbins. An empirical bayes approach to statistics. In Breakthroughs in statistics,
pages 388–394. Springer, 1992.
[38] Simo Särkkä and Arno Solin. Applied stochastic differential equations, volume 10. Cambridge
University Press, 2019.
[39] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale
image recognition. In 3rd International Conference on Learning Representations, ICLR, 2015.
[40] Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. Solving inverse problems in medi-
cal imaging with score-based generative models. In International Conference on Learning
Representations, 2022.
[41] Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and
Ben Poole. Score-based generative modeling through stochastic differential equations. In 9th
International Conference on Learning Representations, ICLR, 2021.
[42] Charles M Stein. Estimation of the mean of a multivariate normal distribution. The annals of
Statistics, pages 1135–1151, 1981.
[43] Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha,
Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky.
Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the
IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2149–2159, 2022.
[44] Tom Tirer and Raja Giryes. Image restoration by iterative denoising and backward projections.
IEEE Transactions on Image Processing, 28(3):1220–1234, 2018.
12
[45] Github PK Tool, Nov Sun Mon Tue Wed Thu, and Fri Sat. dkazanc/tomobar.
[46] Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning. Advances in
neural information processing systems, 30, 2017.
[47] Singanallur V Venkatakrishnan, Charles A Bouman, and Brendt Wohlberg. Plug-and-play priors
for model based reconstruction. In 2013 IEEE Global Conference on Signal and Information
Processing, pages 945–948. IEEE, 2013.
[48] Ana Fernandez Vidal, Valentin De Bortoli, Marcelo Pereyra, and Alain Durmus. Maximum
likelihood estimation of regularization parameters in high-dimensional inverse problems: An
empirical bayesian approach part i: Methodology and experiments. SIAM Journal on Imaging
Sciences, 13(4):1945–1989, 2020.
[49] Ziyu Wan, Jingbo Zhang, Dongdong Chen, and Jing Liao. High-fidelity pluralistic image
completion with transformers. In Proceedings of the IEEE/CVF International Conference on
Computer Vision, pages 4692–4701, 2021.
[50] Haoyu Wei, Florian Schiffers, Tobias Würfl, Daming Shen, Daniel Kim, Aggelos K Katsaggelos,
and Oliver Cossairt. 2-step sparse-view ct reconstruction with a domain-specific perceptual
network. arXiv preprint arXiv:2012.04743, 2020.
[51] Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun:
Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv
preprint arXiv:1506.03365, 2015.
[52] Yanhong Zeng, Jianlong Fu, Hongyang Chao, and Baining Guo. Aggregated contextual
transformations for high-resolution image inpainting. IEEE Transactions on Visualization and
Computer Graphics, 2022.
[53] Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang. Learning deep cnn denoiser prior
for image restoration. In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 3929–3938, 2017.
[54] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea-
sonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE
conference on computer vision and pattern recognition, pages 586–595, 2018.
[55] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image
translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international
conference on computer vision, pages 2223–2232, 2017.
13
Checklist
1. For all authors...
(a) Do the main claims made in the abstract and introduction accurately reflect the paper’s
contributions and scope? [Yes]
(b) Did you describe the limitations of your work? [Yes] We discuss the limitations in (6).
(c) Did you discuss any potential negative societal impacts of your work? [Yes] We discuss
potential negative impacts in (6).
(d) Have you read the ethics review guidelines and ensured that your paper conforms to
them? [Yes]
2. If you are including theoretical results...
(a) Did you state the full set of assumptions of all theoretical results? [Yes]
(b) Did you include complete proofs of all theoretical results? [Yes] We provide all proofs
of results in supplementary material.
3. If you ran experiments...
(a) Did you include the code, data, and instructions needed to reproduce the main experi-
mental results (either in the supplemental material or as a URL)? [Yes] We include all
code for our experiments in the supplementary material. We will release the code once
the paper is published.
(b) Did you specify all the training details (e.g., data splits, hyperparameters, how they
were chosen)? [Yes]
(c) Did you report error bars (e.g., with respect to the random seed after running experi-
ments multiple times)? [No] Due to our limited resources we do not have time to run
multiple sets of experiments.
(d) Did you include the total amount of compute and the type of resources used (e.g., type
of GPUs, internal cluster, or cloud provider)? [Yes]
4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...
(a) If your work uses existing assets, did you cite the creators? [Yes] We have cited the
original works that released the datasets.
(b) Did you mention the license of the assets? [No] Licenses are standard and can be found
online.
(c) Did you include any new assets either in the supplemental material or as a URL? [Yes]
We include our implementation as the supplementary material. We will release the
code upon publication.
(d) Did you discuss whether and how consent was obtained from people whose data you’re
using/curating? [No] All datasets used in our work are publicly available.
(e) Did you discuss whether the data you are using/curating contains personally identifiable
information or offensive content? [N/A]
5. If you used crowdsourcing or conducted research with human subjects...
(a) Did you include the full text of instructions given to participants and screenshots, if
applicable? [N/A]
(b) Did you describe any potential participant risks, with links to Institutional Review
Board (IRB) approvals, if applicable? [N/A]
(c) Did you include the estimated hourly wage paid to participants and the total amount
spent on participant compensation? [N/A]
14
A Proofs
Proof. Suppose that the data manifold is an l-dimensional linear subspace. By rotation and translation,
n
q that M = {x ∈ R : xl+1 = xl+2 = · · · = xn = 0}. Then, we can simply write
we safely assume
d(x, M) = x2l+1 + · · · + x2n , and Mi = {x ∈ Rn : x2l+1 + · · · + x2n = ri2 }. For a given point
x′ = (x′1 , x′2 , . . . ) ∈ M, we consider p(x|x′ ) ∼ N (ai x′ , b2i I) and obtain a concentration inequality
independent to the choice of x′ . We need the standard Laurent-Massart bound for a chi-square
variable [31]. When X is a chi-square distribution with k degrees of freedom,
√
P [X − k ≥ 2 kt + 2t] ≤ e−t ,
√
P [X − k ≤ −2 kt] ≤ e−t .
x2 x2
As bl+1
2 + · · · + b2n is a chi-square distribution with n − l degrees of freedom, by substituting
i i
t = (n − l)ε′ in the above bound,
√ x2 x2n √
P −2(n − l) ε′ ≤ l+1 + · · · + − (n − l) ≤ 2(n − l)( ε′ + ε′ )
b2i b2i
√ √
q q q
′
=P x2l+1 + · · · + x2n ∈ ( ri 1 − 2 ε′ , ri 1 + 2 ε′ + 2ε′ ) ≥ 1 − 2e−(n−l)ε .
Proposition 2 (score function). Suppose sθ is the minimizer of the denoising score matching loss in
(3). Let Qi be the function that maps xi to x̂0 for each i,
1
Qi : Rd → Rd , xi 7→ x̂0 := (xi + b2i sθ (xi , i)).
ai
15
Proof. To minimize (3), or equivalently,
Z
||sθ (xt , t) − ∇xt log p(xt |x0 )||22 p(xt |x)p(x)dxdxt dt,
v T J Qi u = (v t + v n )T ut
= v Tt ut
= (ut + un )T v t
= uT J Qi v,
Proof.
∂
∥W (y − H x̂0 )∥22 = −2J TW HQi W (y − H x̂0 )
∂xi
= −2J TQi H T W T W (y − H x̂0 )
= J Qi d ∈ TQi (xi ) M
where d = −2H T W T W (y − H x̂0 ). The first and second equality is given by the chain rule and
the last line is by Proposition 2.
In Fig 6, we illustrate how the proposed algorithm benefits from mixing the MCG step with the
conventional POCS step. Pushing the points to the tangent directions, we expect less deviation from
the manifold which is attributed to POCS.
16
Figure 6: The advantage of mixing the MCG and the POCS steps over the conventional POCS step.
Each curve represents a manifold of (noisy) data. Arrows suggest the POCS steps (green arrows) and
steps mixing the MCG and the POCS (red arrows). Due to the path along the manifolds, proposed
mixing step alleviates reverse diffusion step leaving the manifolds (black arrows).
Due to the linearity of the drift and diffusion functions, we can analytically sample from p(xi |x0 )
via reparameterization trick:
xi = ai x0 + bi z, z ∼ N (0, I). (16)
In VP-SDE [18], one defines a linearly increasing noise schedule β1 , β2 , . . . , βN ∈ (0, 1). Further,
Qi
we define αi = 1 − βi , and ᾱi = j=1 αj . Then, the forward diffusion process can be implemented
as
√ √
xi = ᾱi x0 + 1 − ᾱi z, z ∼ N (0, I). (17)
Ni−1−1
In VE-SDE [41], one defines a geometrically increasing noise schedule σi = σ0 σσN0 . Since
the drift function is zero, the forward diffusion simply becomes Brownian motion. Concretely,
xi = x0 + σi z, z ∼ N (0, I). (18)
First, for the case of VP-SDE, the reverse diffusion step is implemented by
p
1 1 − αi
xi−1 = √ xi − √ z θ (xi , i) + σ̃i z, z ∼ N (0, I), (19)
αi 1 − ᾱi
where z θ (xi , i) is trained with the epsilon-matching scheme as in [18], and σ̃i is set to a learn-
able parameter as in [13]. Note that eq. (19) was written in terms of z θ (xi , i) and not in terms
of√the score function, sθ (xi , i). One can re-write the expression using the relation z θ (xi , i) =
− 1 − ᾱi sθ (xi , i), as
1 √
xi−1 = √ (xi + (1 − αi )sθ (xi , i)) + σi z, z ∼ N (0, I). (20)
αi
Next, for the VE-SDE, the reverse diffusion step using the Euler-Maruyama solver [38] is given as
q
xi−1 = xi + (σi2 − σi−1
2
)sθ (xi , i) + σi2 − σi−1
2 z, z ∼ N (0, I). (21)
17
Type ai bi f (xi , sθ ) g(i)
√ √ √
VP-SDE ᾱi 1 − ᾱi √1 (xi + (1 − αi )sθ (xi , i)) σ̃i
ᾱi q
VE-SDE 1 σi xi + (σi2 − σi−1
2
)sθ (xi , i) 2
σi2 − σi−1
C Algorithms
Inpainting The forward model for inpainting is given as
y = P x + ϵ, P ∈ Rm×n , (22)
m×n
where P ∈ {0, 1} is the matrix consisting of the columns with standard coordinate vectors
indicating the indices of measurement. For the steps in (14), (15), we choose the following
W = I, A = I − PTP, bi = P T y i , y i ∼ q(y i |y) := N (y i |ai y, b2i I). (23)
Specifically, A takes the orthogonal complement of x′i−1 ,
meaning that the measurement subspace
is corrected by y i , while the orthogonal components are updated from x′i−1 . Note that we use y i
sampled from y to match the noise level of the current estimate.
We provide the algorithm used for inpainting in Algorithm. 1. The sampler is based on basic ancestral
sampling (AS) of [18], and the default configuration requires N = 1000, α = 1.0/∥y − P x̂0 ∥ for
sampling.
18
Figure 7: Comparison of the evolution (i.e. generative path) between score-SDE [41], and our method.
First rows in (a),(b): Evolution of xi , second rows in (a),(b): Evolution of x̂0 .
The sampler for colorization is based on the predictor-corrector (PC) sampler of [41] (VE-SDE), and
we choose to apply MCG after every iteration of both predictor, and corrector steps. N = 2000, α =
0.1/∥C T (y − C x̂0 )∥ are chosen as hyper-parameters.
19
E Limitations
Required compute time for inference All our sampling steps detailed in Algorithm C was
performed with a single RTX 3090 GPU. The inpainting algorithm based on ADM [13] takes about
90 seconds (1000 NFE) to reconstruct a single image of size 256×256. Our colorization and CT
reconstruction algorithm based on score-SDE [41] takes about 600 seconds (4000 NFE) to infer a
single 256×256 image.
Code Availability We will open-source our code used in our experiments upon publication to boost
reproducibility.
RePAINT RePAINT [32] proposes to iterate between denoising-noising steps multiple times in
order to better incorporate inter-dependency between the known and the unknown regions in the case
of image inpainting. We use the same score function and sampler for RePAINT as in the proposed
method. Following the default configurations in [32], we take N = 200 (corresponding to T in [32]),
and U = 10, where U denotes the count of iterated denoising-noising steps used within a single
update index i.
4
https://github.com/jychoi118/ilvr_adm
5
https://github.com/openai/guided-diffusion
20
DDRM DDRM [25] demonstrates that linear inverse problems can be solved via diffusion models
by decomposing the generative process with singular value decomposition (SVD), and performing
reverse diffusion sampling in the spectral space. The same score function adopted for the proposed
method is used. Using the notations from [25], we choose σy = 0, as we are aiming to solve noiseless
inverse problem, and η = 0.85, ηb = 1. The number of NFE is set to 20 with the DDRM sampling
steps.
LaMa LaMA contains fast Fourier convolution in generator architecture for reconstructing images.
We trained the model from scratch using adversarial loss with r1 regularization term with its coefficient
10 and gradient penalty coefficient 0.001. Adam optimizer is used with the fixed learning rate of
0.001 and 0.0001 for discriminator network. For FFHQ and Imagenet dataset, 500k iterations of
trainings were done with batch size of 8.
AOT-GAN AOT-GAN consists of a deep image generator with a AOT block which consists of
multiple length of residual blocks in parallel. The discriminator is the same architecture with
PatchGAN from [55]. We trained the model from the scratch with 0.0001 learning rate using Adam
optimizer β1 = 0 and β2 = 0.9 for both FFHQ and Imagent dataset. 500k iterations of trainings were
done with batch size of 8. Also, for style loss and the perceptual loss, VGG19 [39] pretrained on
ImageNet [12] was used.
ICT Image completion transformer (ICT) consists of two modules - a transformer model that
follows the tokenization procedure to process information in the lower dimensional space, and
another guided upsampling module to retrieve the data dimensionality. The encoded features are
sampled from a probability distribution via Gibbs sampling, such that one can capture multimodal
reconstructions from the same measurement. For both the FFHQ and Imagenet dataset, we used
pretrained models provided by the authors.
IAGAN Image adaptive GAN (IAGAN) uses a pre-trained generator and adapts it at test time
for the given forward model. Specifically, following compressed sensing using generative model
(CSGM) [4], one initializes the latent vector z such that z ∗ = arg minz ∥y − AGθ (z)∥. Then,
the latent code and the neural network parameters are jointly optimized through some iterations of
z ∗∗ , θ∗ = arg minz,θ ∥y − AGθ (z)∥. The final result is achieved by the forward pass through the
generator, after which follows the projection into the measurement subspace. For tuning the generator,
we follow the default configurations from the official codebase. Since the codebase uses a GAN that
generates 1024×1024 images, we downscale the result into 256×256 image as a final post-processing
step.
DSI DSI is structured with the combination of VQ-VAE [46], structure generator and texture
generator. The architectures were trained separately, with Adam optimizer. When inference, only
structure and texture generator was used. We trained the model from scratch. During optimization,
the structure generator used linear warm-up schedular and square-root decay schedule used in [36].
We used Adam optimizer on training all models with learning rate of 0.0001 and β1 = 0.5 using
exponetial moving average (EMA). Training was done for 500k iteration for both FFHQ and Imagenet
dataset.
cINN cINN is an invertible neural network which can take in additional conditions as input,
and in our case grayscale images. We train the model using default configurations as advised in
https://github.com/VLL-HD/conditional_INNs without modifications. FFHQ model
was trained with the learning rate of 0.0001 for 100 epohcs using the Adam optimizer. LSUN
bedroom model was trained with the learning rate of 0.0001 for 30 epochs.
pix2pix Pix2pix is a variant of conditional GAN (cGAN) that takes in as input, the corrupted image.
The model is trained in a supervised fashion, with the loss consisting of the reconstruction loss, and
the adversarial loss. As the discriminator architecture, we adopt patchGAN [21], and utilize the
LSGAN [33] loss, weighting the adversarial loss by the value of 0.1. Similar to cINN, FFHQ model
was trained with the learning rate of 0.0001 for 100 epochs using Adam optimizer. LSUN bedroom
model was trained with the same configuration for 30 epochs.
21
F.3 CT reconstruction
Score-CT We use the hyper-parameters as advised in [40] and set η = 0.246, λ = 0.841. The
measurement consistency step is imposed after every corrector-predictor sweep as in the proposed
method.
SIN-4c-PRN Directly using the official implementation6 [50], we train the sinogram inpainting
network (SIN) with the AAPM dataset for 200 epochs with the batch size of 8, and learning rate of
0.0001. We train two models separately for different number of views - 18, and 30.
cGAN We adopt the implementation of cGAN [15] from SIN-4c-PRN repository 6 . We train the
two separate networks for 18 view, and 30 view projection, with the same configuration - 200 epochs,
learning rate of 0.0001, and batch size of 8.
FISTA-TV We perform FISTA-TV [3] reconstruction using TomoBAR [45], together with the
CCPi regularization toolkit [27]. Leveraging the default setting, we use the least-squares (LS) data
model, and run the FISTA iteration for 300 iterations per image, with the total variation regularization
strength set to 0.001.
6
https://github.com/anonyr7/Sinogram-Inpainting
22
Figure 9: Inpainting results on FFHQ 256×256 data. (a) Measurement, (b) ground truth, (c)
IAGAN [20], (d) DSI [35], (e) LaMa [43], (f) DDRM [25], (g) score-SDE [41], (h) RePAINT [32],
(i) MCG (ours).
Figure 10: Inpainting results on FFHQ 256×256 data with the LaMa [43] wide mask. (a) Mea-
surement, (b) ground truth, (c) DSI [35], (d) LaMa [43], (e) DDRM [25], (f) score-SDE [41], (g)
RePAINT [32], (h) MCG (ours).
23
Figure 11: Inpainting results on ImageNet 256×256 data.(a) Measurement, (b) ground truth, (c)
DSI [35], (d) LaMa [43], (e) DDRM [25], (f) score-SDE [41], (g) RePAINT [32], (h) MCG (ours).
24
Figure 12: Sparse view CT reconstruction results on AAPM 256×256 data.(a) FBP, (b) FISTA-TV [3],
(c) cGAN [15], (d) SIN-4c-PRN [50], (e) Score-CT [41], (f) MCG (Ours), (g) ground truth (GT).
25
Figure 13: Inpainting results on FFHQ 256×256 data with MCG. (a) Inpainting of 128×128 box
region. We show three stochastic samples generated with the proposed method. (b) 92 % pixel
missing imputation.
26
Figure 14: Inpainting results on LSUN-bedroom 256×256 data with MCG. (a) Inpainting of 128×128
box region. We show three stochastic samples generated with the proposed method. (b) 92 % pixel
missing imputation.
27
Figure 15: Inpainting results on ImageNet 256×256 data with MCG. (a) Inpainting of 128×128
box region. We show three stochastic samples generated with the proposed method. (b) 92 % pixel
missing imputation.
28
Figure 16: Colorization results on (left) FFHQ 256×256 dataset, and (right) LSUN-bedroom
256×256 dataset. We show 3 different reconstructions for each measurement that are sampled
with the proposed method.
29