0% found this document useful (0 votes)
79 views14 pages

CNN-Based Projected Gradient Descent For Consistent CT Image Reconstruction

This document summarizes a research paper that proposes a new image reconstruction method combining projected gradient descent (PGD) with a convolutional neural network (CNN). The method replaces the projector in PGD with a CNN trained to project images onto the manifold of desired reconstructions. This allows the incorporation of more flexible prior knowledge than traditional regularization-based methods, while still enforcing consistency with measurements through gradient descent. The algorithm is guaranteed to converge to a local minimum. Experiments on sparse-view CT reconstruction show improved results over total variation regularization, dictionary learning, and direct deep learning reconstruction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views14 pages

CNN-Based Projected Gradient Descent For Consistent CT Image Reconstruction

This document summarizes a research paper that proposes a new image reconstruction method combining projected gradient descent (PGD) with a convolutional neural network (CNN). The method replaces the projector in PGD with a CNN trained to project images onto the manifold of desired reconstructions. This allows the incorporation of more flexible prior knowledge than traditional regularization-based methods, while still enforcing consistency with measurements through gradient descent. The algorithm is guaranteed to converge to a local minimum. Experiments on sparse-view CT reconstruction show improved results over total variation regularization, dictionary learning, and direct deep learning reconstruction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1440 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 37, NO.

6, JUNE 2018

CNN-Based Projected Gradient Descent for


Consistent CT Image Reconstruction
Harshit Gupta , Kyong Hwan Jin , Ha Q. Nguyen, Michael T. McCann , Member, IEEE ,
and Michael Unser , Fellow, IEEE

Abstract — We present a new image reconstruction formulated as an inverse problem where the image-formation
method that replaces the projector in a projected gradient physics are modeled by an operator H : R N → R M (called the
descent (PGD) with a convolutional neural network (CNN). forward model). The measurement equation is y = Hx + n ∈
Recently, CNNs trained as image-to-image regressors have
been successfully used to solve inverse problems in imag- R M , where x∈ R N is the space-domain image that we are
ing. However, unlike existing iterative image reconstruc- interested in recovering and n ∈ R M is the noise intrinsic to
tion algorithms, these CNN-based approaches usually lack the acquisition process.
a feedback mechanism to enforce that the reconstructed In the case of extreme imaging, the number of measurements
image is consistent with the measurements. We propose a is reduced as much as possible to decrease either the radiation
relaxed version of PGD wherein gradient descent enforces
measurement consistency, while a CNN recursively projects dose in computed tomography (CT) or the scanning time in
the solution closer to the space of desired reconstruc- MRI. Moreover, the measurements are typically very noisy due
tion images. We show that this algorithm is guaranteed to short integration times, which calls for some form of denois-
to converge and, under certain conditions, converges to ing. Indeed, there may be significantly fewer measurements
a local minimum of a non-convex inverse problem. Finally, than the number of unknowns (M  N). This gives rise to an
we propose a simple scheme to train the CNN to act like
a projector. Our experiments on sparse-view computed- ill-posed problem in the sense that there may be an infinity of
tomography reconstruction show an improvement over total consistent images that map to the same measurements y. Thus,
variation-based regularization, dictionary learning, and a one challenge of the reconstruction algorithm is to select the
state-of-the-art deep learning-based direct reconstruction best solution among a multitude of potential candidates.
technique. The available reconstruction algorithms can be broadly
Index Terms — Deep learning, inverse problems, biomed- arranged in three categories (or generations), which represent
ical image reconstruction, low-dose computed tomography. the continued efforts of the research community to address the
aforementioned challenges.
I. I NTRODUCTION 1) Classical Algorithms: Here, the reconstruction is per-

W HILE medical imaging is a fairly mature area, there is


recent evidence that it may still be possible to reduce
the radiation dose and/or speedup the acquisition process with-
formed directly by applying a suitable linear opera-
tor. In the case where H is unitary (as in a simple
MRI model), the operator is simply the backprojec-
out compromising image quality. This can be accomplished tion (BP) HT y. In general, the reconstruction operator
with the help of sophisticated reconstruction algorithms that should approximate a pseudoinverse of H. For example,
incorporate some prior knowledge (e.g., sparsity) on the class the filtered backprojection (FBP) for x-ray CT involves
of underlying images [1]. The reconstruction task is usually applying a linear filter to the measurements and back
projecting them, i.e. HT Fy where F : R M → R M .
Manuscript received February 13, 2018; revised April 24, 2018;
accepted April 25, 2018. Date of publication May 3, 2018; date of
Though its expression is usually derived in the con-
current version May 31, 2018. This work was supported in part by the tinuous domain [2], the filter F can be viewed as an
European Research Council (H2020-ERC Project GlobalBioIm) under approximate version of (HHT )−1 . Classical algorithms
Grant 692726 and in part by the European Union’s Horizon 2020 Frame-
work Programme for Research and Innovation (call 2015) under
are fast and provide excellent results when the number of
Grant 665667. (Corresponding author: Harshit Gupta.) measurements is large and the noise is small [3]. How-
H. Gupta, K. H. Jin, and M. Unser are with the Biomedical Imaging ever, they are not suitable for extreme imaging because
Group, École Polytechnique Fédérale de Lausanne, 1015 Lausanne,
Switzerland (e-mail: harshit.gupta.cor@gmail.com).
they introduce artifacts that are intimately connected to
H. Q. Nguyen was with the Biomedical Imaging Group, École Polytech- the inversion step.
nique Fédérale de Lausanne, 1015 Lausanne, Switzerland. He is now 2) Iterative Algorithms: These algorithms avoid the short-
with the Viettel Research and Development Institute, Hanoi VN-100000, comings of the classical ones by solving
Vietnam.
M. T. McCann is with the Center for Biomedical Imaging, Signal
Processing Core and the Biomedical Imaging Group, École Polytech- x∗ = arg min (E(Hx, y) + λR(x)), (1)
nique Fédérale de Lausanne, 1015 Lausanne, Switzerland. x∈R N
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. where E : R M × R M → R+ is a data-fidelity term
Digital Object Identifier 10.1109/TMI.2018.2832656 that favors solutions that are consistent with the mea-

This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/
GUPTA et al.: CNN-BASED PGD FOR CONSISTENT CT IMAGE RECONSTRUCTION 1441

surements, R : R N → R+ is a suitable regularizer


that encodes prior knowledge about the image x to be
reconstructed, and λ ∈ R+ is a tradeoff parameter. For
example, in CT reconstruction, E could be weighted
least-squares and R could be an indicator function that
enforces non-negativity. Under the assumption that the
functionals E and R are convex, the solution of (1) also
satisfies
x∗ = arg min E(Hx, y) (2)
x∈S R

with S R = {x ∈ R N : R(x) ≤ τ } for some unique τ that


depends on the regularizaton parameter λ. Therefore, the
Fig. 1. (a) Block diagram of projected gradient descent using a CNN as
solution has the best data fidelity among all images in the the projector and E as the data-fidelity term. The gradient step promotes
set S R which is implicitly defined by R. This shows that consistency with the measurements and the projector forces the solution
the quality of the reconstruction depends heavily on the to belong to the set of desired solutions. If the CNN is only an approximate
projector, the scheme may diverge. (b) Block diagram of the proposed
prior encoder R. Generally, these priors are either hand- relaxed projected gradient descent. The αk s are updated in such a way
picked (e.g., total variation (TV) or the 1 -norm of the that the algorithm always converges (see Algorithm 1 for more details).
wavelet coefficients of the image [1], [4]–[7]) or learned
through a dictionary [8]–[10]. However, in either case,
they are restricted to well-behaved functionals that can advantages of the existing algorithms and side-steps their
be minimized via a convex routine [11]–[14]. This limits disadvantages. Specifically:
the type of prior knowledge that can be injected into the • We first propose to learn a CNN that acts as a projector
algorithm. onto a set S which can be intuitively thought of as the
3) Learning-Based Algorithms: Recently, a surge in manifold of the data (e.g., biomedical images). In this
using deep learning to solve inverse problems in sense, our CNN encodes the prior knowledge of the data.
imaging [15]–[19], has established new state-of-the-art Its purpose is to map an input image to an output image
results for tasks such as sparse-view CT reconstruc- that is more similar to the training data.
tion [16]. Rather than reconstructing the image from the • Given a measurement y, we initialize our reconstruction
measurements y directly, the most successful strategies using a classical algorithm.
have been to train the CNN as a regressor between a • We then iteratively alternate between minimizing the
rough initial reconstruction Ay, where A : R M → R N , data-fidelity term and projecting the result onto the set S
and the final, desired reconstruction [16], [17]. This by applying a suitable variant of the projected gradient
initial reconstruction could be obtained using classical descent (PGD) which ensures convergence.
algorithms (e.g., FBP, BP) or by some other linear oper-
Besides the design of the implementation, our contribution
ation. Once the training is complete, the reconstruction
is in the proposal of the relaxed form of PGD that is guaranteed
for a new measurement y is given by x∗ = CNN θ ∗ (Ay),
to converge and under certain conditions can also find a local
where CNN θ : R N → R N denotes the CNN as a func-
minima of a nonconvex inverse problem. Moreover, as we shall
tion and θ ∗ denotes the internal parameters of the CNN
see later, this method outperforms existing algorithms on low-
after training. These schemes exploit the fact that the
dose x-ray CT reconstructions.
structure of images can be learned from representative
examples. CNNs are favored because of the way they
encode the data in their hidden layers. In this sense, B. Related and Prior Work
a CNN can be seen as a good prior encoder.
Deep learning has already shown promising results in image
Although the results reported so far are remarkable in
denoising, superresolution, and deconvolution. Recently, it has
terms of image quality, there is still some concern as to
also been used to solve inverse problems in imaging using lim-
whether or not they can be trusted, especially in the context of
ited data [16]–[19], and in compressed sensing [20]. However,
diagnostic imaging. The main limitation of direct algorithms
as discussed earlier, these regression-based approaches lack a
such as [16] is that they do not provide any guarantee on the
feedback mechanism that could be beneficial in solving inverse
worst-case performance. Moreover, even in the case of noise-
problems.
less (or low-noise) measurements, there is no insurance that
Another usage of deep learning is to complement iter-
the reconstructed image is consistent with the measurements
ative algorithms. This includes learning a CNN as an
because, unlike for the iterative schemes, there is no feedback
unrolled version of the iterative shrinkage-thresholding algo-
mechanism that imposes this consistency.
rithm (ISTA) [21] or ADMM [22]. In [23], inverse problems
involving non-linear forward models are solved by partially
A. Overview of Proposed Method learning the gradient descent. In [24], the iterative algorithm
In this paper, we present a simple yet effective itera- is replaced by a recurrent neural network (RNN). Recently,
tive scheme (see Figure 1), which tries to incorporate the in [25], a cascade of CNNs is used to reconstruct images.
1442 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 37, NO. 6, JUNE 2018

Within this cascade the data-fidelity is enforced at multiple will be effective, we first analyze how using a projector onto a
steps. However, in all of these approaches the training is set, combined with gradient descent, can be helpful in solving
performed end-to-end, meaning that the network parameters inverse problems. Properties of PGD using an orthogonal
are dependent on the iterative scheme chosen. projector onto a convex set are known [41]. Here, we extend
These approaches differ from plug-and-play these results for any projector onto a nonconvex set. This
ADMM [26]–[28], where an independent off-the-shelf extension is required because there is no guarantee that the
denoiser or a trained operator is plugged into the iterative set of desirable reconstruction images is convex. Proofs of all
scheme of the alternating-direction method of multipliers the results in this section can be found in the supplementary
(ADMM) [14]. ADMM is an iterative optimization technique material.
that alternates between (i) a linear solver that reinforces
consistency with respect to the measurements; and (ii) a A. Notation
nonlinear operation that re-injects the prior. The idea of
We consider the finite-dimensional Hilbert space R N
plug-and-play ADMM is to replace (ii), which resembles
equipped with the scalar product · , · that induces the 2
denoising, with an off-the-shelf denoiser. Plug-and-play
norm · 2 . The spectral norm of the matrix H, denoted by
ADMM is more general than the optimization framework (1)
H 2 , is equal to its largest singular value. For x ∈ R N and
but still lacks theoretical justifications. In fact, there is little
ε > 0, we denote by Bε (x) the 2 -ball centered at x with
understanding yet of the connection between the use of a
radius ε, i.e.,
given denoiser and the regularization it imposes (though this  
link has recently been explored in [29]). Bε (x) = z ∈ R N : z − x 2 ≤ ε .
In [30], a generative adversarial network (GAN) trained
as a projector onto a set, has been used with the plug-and- The operator T : R N → R N is Lipschitz-continuous with
play ADMM. Similarly, in [31], the inverse problem is solved constant L if
over a set parameterised by a generative model. However,
T (x) − T (z) 2 ≤ L x − z 2 , ∀x, z ∈ R N .
it requires a precise initialization of the parameters. In [32],
similarly to us, the projector in PGD is replaced with a neural It is contractive if it is Lipschitz-continuous with constant
network. However, the scheme lacks convergence guarantee L < 1 and non-expansive if L = 1. A fixed point x∗ of T (if
and a rigorous theoretical analysis. any) satisfies T (x∗ ) = x∗ .
Our scheme is similar in spirit to plug-and-play ADMM, but Given the set S ⊂ R N , the mapping PS : R N → S is called
is simpler to analyze. Although our methodology is generic a projector if it satisfies the idempotent property PS PS = PS .
and can be applied in principle to any inverse problem, our It is called an orthogonal projector if
experiments here involve sparse-view x-ray CT reconstruction.
For a recent overview of the field, see [33]. Current approaches PS (x) = inf x − z 2 , ∀x ∈ R N .
z∈S
to sparse-view CT reconstruction follow the formulation (1),
e.g., using a penalized weighted least-squares data term and
sparsity-promoting regularizer [34], dictionary learning-based B. Constrained Least Squares
regularizer [35], or generalized total variation regularizer [36]. Consider the problem of the reconstruction of the image
There are also prior works on the direct application of CNNs x ∈ R N from its noisy measurements y = Hx + n, where
to CT reconstruction. These methods generally use the CNN H ∈ R M×N is the linear forward model and n ∈ R M is additive
to denoise the sinogram [37] or the reconstruction obtained white Gaussian noise. The framework is also applicable to
from a standard technique [16], [38]–[40]; as such, they do Poisson noise model-based CT via a suitable transformation,
not perform the reconstruction directly. as shown in Appendix B.
Our reconstruction incorporates a strong form of prior
C. Roadmap
knowledge about the original image: We assume that x must
The paper is organized as follows: In Section II, we discuss lie in a set S ⊂ R N that contains all objects of interest. The
the mathematical framework that motivates our approach and proposed way to make the reconstruction consistent with the
justify the use of a projector onto a set as an effective measurements as well as with the prior knowledge is to solve
strategy to solve inverse problems. In Section III, we present the constrained least-squares problem
our algorithm, which is a relaxed version of PGD. It has
1
been modified so as to converge in practical cases where min Hx − y 22 . (3)
x∈S 2
the projection property is only approximate. We discuss in
Section IV a novel technique to train the CNN as a projector The condition x ∈ S in (3) plays the role of a regularizer.
onto a set, especially when the training data is small. This is If no two points in S have the same measurements and in
followed by experiments (Section V), results and discussions case y is noiseless, then out of all the points in R N that
(Section VI and Section VII), and conclusions (Section VIII). are consistent with the measurement y, (3) selects a unique
point x∗ ∈ S. In this way, the ill-posedness of the inverse
II. T HEORETICAL F RAMEWORK problem is bypassed. When the measurements are noisy, (3)
Our goal is to use a trained CNN iteratively inside PGD returns a point x∗ ∈ S such that y∗ = Hx∗ is as close as
to solve an inverse problem. To understand why this scheme possible to y. Thus, it also denoises the measurement, where
GUPTA et al.: CNN-BASED PGD FOR CONSISTENT CT IMAGE RECONSTRUCTION 1443

the quantity y∗ can be regarded as the denoised version of y. Propositions 1-3 suggest that, when S is non-convex,
Note that formulation (3) is similar to (2) for the case when E the best we can hope for is to find a local minimizer of (3)
is least-squares, with the difference that the search space is the through a fixed point of G γ . Theorem 1 provides a suffi-
data manifold S instead of a set defined by the regularizer S R . cient condition for PGD to converge to a unique fixed point
The point x∗ ∈ S is called a local minimizer of (3) if of G γ .
  Theorem 1: Let λmax and λmin be the largest and smallest
∃ε > 0 : Hx∗ − y2 ≤ Hx − y 2 , ∀x ∈ S ∩ Bε (x∗ ). eigenvalues of HT H, respectively. If PS satisfies (6) and is
Lipschitz-continuous with constant L < (λmax +λmin )/(λmax −
C. Projected Gradient Descent λmin ), then, for γ = 2/(λmax + λmin ), the sequence {xk }
When S is a closed convex set, it is well known [41] that generated by (4) converges to a local minimizer of (3),
a solution of (3) can be found by PGD regardless of the initialization x0 .
It is important to note that the projector PS can never be
xk+1 = PS (xk − γ HT Hxk + γ HT y), (4) contractive since it preserves the distance between any two
points on S. Therefore, when H has a nontrivial null space,
 T  γ is a step size chosen such that γ < 2/
where
the condition L < (λmax + λmin )/(λmax − λmin ) of Theorem 1
H H . This algorithm combines the orthogonal projec-
2 is not feasible. The smallest possible Lipschitz constant of PS
tion onto S with the gradient descent with respect to
is L = 1, which means that PS is non-expansive. Even with
the quadratic objective function, also called the Landwe-
this condition, it is not guaranteed that the combined operator
ber update [42]. PGD [43, Sec. 2.3] is a subclass of the
Fγ has a fixed point. This limitation can be overcome when
forward-backward splitting [44], [45], which is known in the
Fγ is assumed to have a nonempty set of fixed points. Indeed,
1 -minimization literature as iterative shrinkage/thresholding
we state in Theorem 2 that one of them must be reached by
algorithms (ISTA) [11], [12], [46].
iterating the averaged operator α Id +(1 − α)G γ , where α ∈
In our problem, S is presumably non-convex, but we
(0, 1) and Id is the identity operator. We call this scheme
propose to still use the update (4) with some projector PS that
averaged PGD (APGD).
may not be orthogonal. In the rest of this section, we provide
Theorem 2: Let λmax be the largest eigenvalue of HT H.
sufficient conditions on the projector PS (not on S itself) under
If PS satisfies (6) and is a non-expansive operator such that
which (4) leads to a local minimizer of (3). Similarly to the
G γ in (5) has a fixed point for some γ < 2/λmax , then the
convex case, we characterize the local minimizers of (3) by
sequence {xk } generated by APGD, with
the fixed points of the combined operator
G γ (x) = PS (x − γ HT Hx + γ HT y) (5) xk+1 = (1 − α)xk + αG γ (xk ) (8)

and then show that some fixed point of that operator must be for any α ∈ (0, 1), converges to a local minimizer of (3),
reached by the iteration xk+1 = G γ (xk ) as k → ∞, regardless regardless of the initialization x0 .
of the initial point x0 . We first state a sufficient condition for
each fixed point of G γ to become a local minimizer of (3). III. R ELAXATION WITH G UARANTEED C ONVERGENCE
Proposition 1: Let γ > 0 and PS be such that, for all Despite their elegance, Theorems 1 and 2 are not directly
x ∈ RN , productive when we construct the projector PS by training
z − PS x , x − PS x ≤ 0, ∀z ∈ S ∩ Bε (PS x), (6) a CNN because it is unclear how to enforce the Lipschitz
continuity of PS on the CNN architecture. Without putting
for some ε > 0. Then, any fixed point of the operator G γ in (5) any constraints on the CNN, however, we can still achieve the
is a local minimizer of (3). Furthermore, if (6) is satisfied convergence of the reconstruction sequence by modifying PGD
globally, in the sense that as described in Algorithm 1; we name it relaxed projected
gradient descent (RPGD). In Algorithm 1, the projector PS
z − PS x , x − PS x ≤ 0, ∀x ∈ R N , z ∈ S, (7) is replaced by the general nonlinear operator F. We also
then any fixed point of G γ is a solution of (3). introduce a sequence {ck } that governs the rate of convergence
Two remarks are in order. First, (7) is a well-known of the algorithm and a sequence {αk } of relaxation parameters
property of orthogonal projections onto closed convex sets. that evolves with the algorithm. The convergence of RPGD is
It actually implies the convexity of S (see Proposition 2). guaranteed by Theorem 3. More importantly, if the nonlinear
Second, (6) is much more relaxed and easily achievable, for operator F is actually a projector and the relaxation parameters
example, as stated in Proposition 3, by orthogonal projections do not go all the way to 0, then RPGD converges to a
onto unions of closed convex sets. (Special cases are unions meaningful point.
of subspaces, which have found some applications in data Theorem 3: Let the input sequence {ck } of Algorithm 1
modeling and clustering [47]). be asymptotically upper-bounded by C < 1. Then,
Proposition 2: If PS is a projector onto S ⊂ R N that the following statements hold true for the reconstruction
satisfies (7), then S must be convex. sequence {xk }:
Proposition 3: If S is a union of a finite number of closed (i) xk → x∗ as k → ∞, for all choices of F;
convex sets in R N , then the orthogonal projector PS onto S (ii) if F is continuous and the relaxation parameters {αk }
satisfies (6). are lower-bounded by ε > 0, then x∗ is a fixed
1444 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 37, NO. 6, JUNE 2018

Algorithm 1 Relaxed Projected Gradient Descent (RPGD) function

Input: H, y, A, nonlinear operator F, step size γ > 0, positive 


N 
Q
 q 
J (θ ) = x − CNN θ (x̃q,n )2 . (11)
sequence {cn }n≥1 , x0 = Ay ∈ R N , α0 ∈ (0, 1]. 2
n=1 q=1
Output: reconstructions {xk }, relaxation parameters {αk }.  
Jn (θ)
k←0
The optimization proceeds by stochastic gradient descent for
while not converged do
T epochs, where an epoch is defined as one pass though the
zk = F(xk − γ HT Hxk + γ HT y)
training data.
if k ≥ 1 then
It remains to select the perturbations that generate the xq,n .
if zk − xk 2 > ck zk−1 − xk−1 2 then
Our goal here is to create a diverse set of perturbations so that
αk = ck zk−1 − xk−1 2 / zk − xk 2 αk−1
the CNN does not overfit one specific type. In our experiments,
else
while training for the tth epoch, we chose
αk = αk−1
end if x̃q,1 = xq (12)
end if x̃ = AHx
q,2 q
(13)
xk+1 = (1 − αk )xk + αk zk
x̃q,3 = CNN θ t−1 (x̃q,2 ), (14)
k ←k+1
end while where A is a classical linear reconstruction algorithm (FBP
in our experiments), and θ t are the CNN parameters after
t epochs. Equations (12), (13), and (14) correspond to no
point of perturbation, a linear perturbation, and a dynamic nonlinear
perturbation, respectively. We now comment on each pertur-
G γ (x) = F(x − γ HT Hx + γ HT y); (9) bation in detail.
(iii) if, in addition to (ii), F is indeed a projector onto S that Keeping x̃q,1 in the training ensemble will train the CNN
satisfies (6), then x∗ is a local minimizer of (3). with the defining property of the projector: the projector maps
We prove Theorem 3 in Appendix A. Note that the weak- a point in the set S onto itself. If the CNN were trained only
est statement here is (i); it guarantees that RPGD always with (12), it would be an autoencoder [48].
converges, albeit not necessarily to a fixed point of G γ . To understand the perturbation x̃q,2 in (13), recall that
Moreover, the assumption about the continuity of F in (ii) is AHxq is the classical linear reconstruction of xq from its
automatically satisfied when F is a CNN. measurement y = Hxq . Perturbation (13) is indeed useful
In summary, we have described three algorithms: PGD, because we initialize RPGD with AHxq . Using only (13) for
APGD, and RPGD. PGD is a standard algorithm which, in the training would return the same CNN as in [16].
event of convergence, finds a local minima of (3); however, The perturbation x̃q,3 in (14) is the output of the CNN
it does not always converge. APGD ensures convergence under whose parameters θ t change with every epoch t; thus, it
the broader set of conditions given in Theorem 2; but, in order is a nonlinear and dynamic (epoch-dependent) perturbation
to have these properties, both PGD and APGD necessarily of xq . The rationale for using (14) is that it greatly increases
need a projector. While, we shall train our CNN to act like a the training diversity by allowing the network to see T new
projector, it may not exactly fulfill the required conditions. perturbations of each training point, without greatly increasing
This is the motivation for RPGD, which, unlike PGD and the total training size since it only requires Q additional
APGD, is guaranteed to converge. It also retains the desirable gradient computations per epoch. Moreover, (14) is in sync
properties of PGD and APGD: it finds a local minima of (3), with the iterative scheme of RPGD, where the output of the
given that the conditions (ii) and (iii) of Theorem 3 are CNN is processed with a gradient descent and is again fed
satisfied. Note, however, that when the set S is nonconvex, back into itself.
this local minimum may not be a global minimum. The results
of Section II and III are summarized in Table IV given in the A. Architecture
supplementary material. Our CNN architecture is the same as in [16], which is a
U-net [49] with intrinsic skip connections among its layers and
IV. T RAINING A CNN AS A P ROJECTOR an extrinsic skip connection between the input and the output.
The intrinsic skip connections help to eliminate singularities
For any point x ∈ S, a projector onto S should satisfy during the training [50]. The extrinsic skip connections make
PS x = x. Moreover, we want that this network a residual net; i.e., CNN = Id +Unet, where Id
x = PS (x̃), (10) denotes the identity operator and Unet : R N → R N denotes
U-net as a function. Therefore, U-net actually provides the
where x̃ is any perturbed version of x. Given the training set, projection error (negative perturbation) that should be added
{x1 , . . . , x Q } of Q points drawn from the set S, we generate to the input to get the projection.
the ensemble {{x̃1,1, . . . , x̃ Q,1 }, . . . , {x̃1,N . . . , x̃ Q,N }} of N × Residual nets have been shown to be effective for image
Q perturbed points and train the CNN by minimizing the loss recognition [51] and for solving inverse problems [16]. While
GUPTA et al.: CNN-BASED PGD FOR CONSISTENT CT IMAGE RECONSTRUCTION 1445

the residual-net architecture does not increase the capac- Light Source. During pre-processing, we split this sinogram
ity or the approximation power of the CNN, it does help in slice-by-slice and downsampled it to create a dataset of 377
learning functions that are close to an identity operator, as is (729 px × 720 view) sinograms. CT images of size (512×512)
the case in our setting. were then generated from these full-dose sinograms (using
the FBP, see Section V-C). For the qth z-slice, we denote
q
B. Sequential Training Strategy the corresponding image xFD . For experiments based on this
dataset, the first 327 and the last 25 slices are used for training
We train the CNN in three stages. In Stage 1, we train it
and testing, respectively. This left a gap of 25 slices in between
for T1 epochs with respect to the partial-loss function J2 in
the training and testing data.
(11) which only uses the ensemble {x̃q,2 } generated by (13).
In Stage 2, we add the ensemble {x̃q,3 } according to (14)
at every epoch and then train the CNN with respect to the B. Experimental Setups
loss function J2 + J3 ; we repeat this procedure for T2 epochs. We now describe three experimental setups. We use the first
Finally, in Stage 3, we train the CNN for T3 epochs with all dataset for the first experiment and the second for the last two.
three ensembles {x̃q,1 , x̃q,2 , x̃q,3 } to minimize the original loss 1) Experiment 1: We split the Mayo dataset into 475 images
function J = J1 + J2 + J3 from (11). from 9 patients for training and 25 images from the remaining
We shall see in Section VII-B that this sequential procedure patient for testing. We assume these images to be the ground
speeds up the training without compromising the performance. truth. From the qth image xq , we generated the sparse-view
The parameters of Unet are initialized by a normal distribution sinogram yq = Hxq using several different experimental
with a very low variance. Since CNN = Id +Unet, this conditions. Our task is to reconstruct the image from the
function acts close to an identity operator in the initial epochs sinogram.
and makes it redundant to use {x̃q,1 } for the initial training The sinograms always have 729 offsets per view, but we
stages. Therefore, {x̃q,1 } is only added at the last stage when varied the number of views and the level of measurement noise
the CNN is no longer close to an identity operator. After for different cases. We took 144 views and 45 views, which
training with only {x̃q,2 } in Stage 1, x̃q,3 will be close to corresponds to ×5 and ×16 dosage reductions (assuming a
xq since it is the output of the CNN for the input x̃q,2 . This full-view sinogram has 720 views). We added Gaussian noise
eases the training for {x̃q,3 } in the second and third stage. to the sinograms to make the SNR equal to 35, 40, 45, 70,
and infinity dB, where we refer to the first three as high
V. E XPERIMENTS measurement noise and the last two as low measurement noise.
We validate the proposed method on the challenging case The SNR of the sinogram y + n is defined as
of sparse-view CT reconstruction. Conventionally, CT imaging

SNR(y + n, y) = 20 log10 y 2 / n 2 . (15)
requires many views to obtain good quality reconstruction.
We call this scenario full-dose reconstruction. Our main aim For testing with the low and high measurement noise,
in these experiments is to reduce the number of views (or dose) we trained the CNNs without noise and at the 40-dB level
for CT imaging while retaining the quality of full-dose recon- of noise, respectively (see Section V-D for details).
structions. We denote a k-times reduction in views by ×k. To make the experiments more realistic and to reduce
The measurement operator H for our experiments is the the inverse crime, the sinograms were generated by slightly
Radon transform. It maps an image to the values of its perturbing the angles of the views by a zero-mean addi-
integrals along a known set of lines [2]. In 2D, the mea- tive white Gaussian noise (AWGN) with standard deviation
surements are indexed by the angle and offset of each lines of 0.05 degrees. This creates a deliberate mismatch between
and arranged in a 2D sinogram. We implemented H and HT the actual measurement process and the forward model.
with Matlab’s radon and iradon (normalized to satisfy the q
2) Experiment 2: We used images xFD from the rat-brain
adjoint property), respectively. The Matlab code for the RPGD dataset to generate Poisson-noise-corrupted sinograms yq with
and the sequential-strategy-based training are made publically 144 views. Just as in Experiment 1, the task is to reconstruct
available1 . q
xFD back from yq . Sinograms were generated with 25, 30,
q
and 35 dB SNR with respect to HxFD . To achieve this,
A. Datasets in (26) and (27), we assume the readout noise to be zero
We use two datasets for our experiments. and {b1 , . . . , bm } = b0 = 1.66 × 105 , 5.24 × 105 , and
1) Mayo Clinic Dataset. It consists of 500 clinically realis- 1.66 × 106 , respectively. More details about this process is
tic, (512 × 512) CT images from the lower lungs to the lower given in Appendix B. The CNNs were trained at only the 30-
abdomen of 10 patients. Those were obtained from the Mayo dB level of noise. Again, our task is to reconstruct the images
clinic AAPM Low Dose CT Grand Challenge [52]. from the sinograms.
2) Rat Brain Dataset. We use a real (1493 px × 720 view × 3) Experiment 3. We downsampled the views of the original,
377 slice) sinogram from a CT scan of a single rat brain. The (729 × 720) rat-brain sinograms by 5 to obtain sparse-view
data acquisition was performed at the Paul Scherrer Institute in sinograms of size (729 × 144). For the qth z-slice, we denote
q
Villigen, Switzerland at the TOMCAT beam line of the Swiss the corresponding sparse-view sinograms yReal . Note that,
unlike in Experiments 1 and 2, the sinogram was not generated
1 https://github.com/harshit-gupta-epfl/CNN-RPGD from an image but was obtained experimentally.
1446 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 37, NO. 6, JUNE 2018

C. Comparison Methods experiments with Poisson noise, we use the slightly modified
Given the ground truth x, our figure of merit for the RPGD described in Appendix B. For all the experiments, FBP
reconstructed x∗ is the regressed SNR given by is used for the operator A.

SNR(x∗ , x) = arg max SNR(ax∗ + b, x), (16) D. Training and Selection of Parameters
a,b
where the purpose of a and b is to adjust for contrast and 1) Experiment 1: For TV, the regularization parameter λ is
offset. We also evaluate the performance using the structural selected via a golden-section search over 20 values so as to
similarity index (SSIM) [53]. We compare five reconstruction maximize the SNR of xTV with respect to the ground truth.
methods. We set the additional penalty parameter inside ADMM (see
1) FBP. FBP is the classical direct inversion of the Radon [14, eq. (2.6)]) equal to λ. The rationale for this heuristic
transform H, here implemented in Matlab by the iradon is that it puts the soft-threshold parameter in the same order
command with the ram-lak filter and linear interpolation of magnitude as the image gradients. We set the number of
as options. iterations to 100, which was enough to show good empirical
2) Total-Variation Reconstruction. TV solves convergence.
For DL, the parameters are selected via a parameter sweep,
1
xTV = min Hx − y 22 + λ x TV s.t. x ≥ 0, (17) roughly following the approach described in [35, Table 1].
x 2 Specifically: The patch size is L = 8.
where During dictionary learning, the sparsity level is set to

 N−1
N−1 5 and 10. During reconstruction, the sparsity level for OMP
x TV = (Dh;i, j (x))2 + (Dv;i, j (x))2 , is set to 5, 8, 10, 12, 20, and 25, while the tolerance level is
i=1 j =1 taken to be 10, 100, and 1000. This, in effect, is the same as
Dh;i, j (x) = [x]i, j +1 − [x]i, j , and Dv;i, j (x) = [x]i, j +1 − [x]i, j . sweeping over ν j in (18). For each of these 2 × 6 × 3 = 36
The optimization is carried out via ADMM [14]. parameter settings, λ in (18) is chosen by a golden-section
3) Dictionary Learning (DL). DL [35] solves search over 7 values.
As discussed earlier, the CNNs for both the ×5
xDL and ×16 cases are trained separately for high and low mea-

J surement noise.
= arg min Hx − y + λ
2
E j x − Dα j + λν j α j 0 ,
2
a) Training with noiseless measurements: The training of the
x,α
j =1 projector for RPGD follows the sequential procedure described
(18) in Section IV, with the configurations
where E j : R N×N → RL
2
extracts and vectorizes the j th • ×5, no noise: T1 = 80, T2 = 49, T3 = 5;
• ×16, no noise: T1 = 71, T2 = 41, T3 = 11.
patch of size (L × L) from the image x, D ∈ R L ×256 is
2

the dictionary, α j is the j th column of α ∈ R256×R , and R = We use the CNN obtained right after the first stage for
(N − L +1)2 . Note that the patches are extracted with a sliding FBPconv, since during this stage, only the training ensemble
distance of one pixel. in (13) is taken into account. We empirically found that the
For a given y, the dictionary D is learned from the corre- training error J2 converged in T1 epochs of Stage 1, yielding
sponding ground truth using the procedure described in [54]. an optimal performance for FBPconv.
The objective (18) is then solved iteratively by first minimizing b) Training with 40-dB measurement noise: This includes
it with respect to x using gradient descent as described in [35] replacing the ensemble in (13) with {Ayq } where yq =
and then with respect to α using orthogonal matching pursuit Hxq + n, has a 40-dB SNR with respect to Hxq . With 20%
(OMP) [55]. Since D is learned from the testing ground truth probability, we also perturb the views of the measurements
itself, the performance that we report here is an upper bound with an AWGN of 0.05 standard deviation so as to enforce
to the one that would be achieved by learning it using the robustness to model mismatch. These CNNs are initialized
training images. with the ones obtained after the first stage of the noiseless
4) FBPconv. FBPconv [16] is a state-of-the-art deep- training and are then trained with the configurations
learning technique, in which a residual CNN with U-net • ×5, 40-dB noise: T1 = 35, T2 = 49, T3 = 5;
architecture is trained to directly denoise the FBP . It has • ×16, 40-dB noise: T1 = 32, T2 = 41, T3 = 11.
been shown to outperform other deep-learning-based direct Similarly to the previous case, the CNNs obtained after the
reconstruction methods for sparse-view CT. In our proposed first and the third training stage are used in FBPconv and
method, we use a CNN with the same architecture as in RPGD, respectively. For clarity, these variants will be referred
FBPconv. As a result, in our framework, FBPconv corresponds to as FBPconv40 and RPGD40.
to training with only the ensemble in (13). In the testing phase, The learning rate is decreased in a geometric progression
the FBP of the measurements is fed into the trained CNN to from 10−2 to 10−3 in Stage 1 and kept at 10−3 for Stages 2
output the reconstruction image. and 3. Recall that the last two stages contain the ensemble with
5) RPGD. RPGD is our proposed method. It is described dynamic perturbation (14) which changes in every epoch. The
in Algorithm 1. There the nonlinear operator F is the CNN lower learning rate, therefore, avoids drastic changes in para-
trained as a projector (as discussed in Section IV). For meters between the epochs. The batch size is fixed to 2. The
GUPTA et al.: CNN-BASED PGD FOR CONSISTENT CT IMAGE RECONSTRUCTION 1447

Fig. 2. Comparison of reconstructions using different methods for the ×16 case in Experiment 1. First column: reconstruction from noiseless
measurements of a lung image. Second column: zoomed version of the area marked by the box in the original in the first column. Third and
fourth columns: zoomed version for the case of 45 and 35 dB, respectively. Fifth to eighth columns: corresponding results for an abdomen image.
Seventh and eighth column correspond to 45 and 40 dB, respectively. (a) Results(∞-dB). (b) zoom(∞-dB). (c) zoom(45-dB). (d) zoom(35-dB).
(e) Results(∞-dB). (f) zoom(∞-dB). (g) zoom(45-dB). (h) zoom(40-dB).

TABLE I
R ECONSTRUCTION R ESULTS FOR E XPERIMENT 1 W ITH L OW M EASUREMENT N OISE (G AUSSIAN ). G RAY C ELLS I NDICATE T HAT
THE M ETHOD WAS T UNED /T RAINED FOR THE C ORRESPONDING N OISE L EVEL

other hyper-parameters follow [16]. For stability, gradients is set to the constant C = 0.99 for RPGD and C = 0.8 for
above 10−2 are clipped and the momentum is set to 0.99. The RPGD40. For each noise level and views number, the only
total training time for the noiseless case is around 21.5 hours free parameter γ is swept over 20 values geometrically spaced
on a Titan X GPU (Pascal architecture). between 10−2 and 10−5 . We pick the γ which gives the
The hyper-parameters for RPGD are chosen as follows: The best average SNR over the 25 test images. Note that, for TV
relaxation parameter α0 is initialized with 1, the sequence {ck } and DL, the value of the optimum λ generally increases as
1448 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 37, NO. 6, JUNE 2018

TABLE II
R ECONSTRUCTION R ESULTS FOR E XPERIMENT 1 W ITH H IGH M EASUREMENT N OISE (G AUSSIAN ). G RAY C ELLS I NDICATE T HAT
THE M ETHOD WAS T UNED /T RAINED FOR THE C ORRESPONDING N OISE L EVEL

Fig. 3. Profile of the high- and low-contrast regions marked in the first and fifth columns of Figure 2 by solid and dashed line segments, respectively.
First and second columns: ×16, 45-dB noise case for the lung image. Third and fourth columns: ×16, 40-dB noise case for the abdomen image.
(a) High-contrast profile. (b) Low-contrast profile. (c) High-contrast profile. (d) Low-contrast profile.

the measurement noise increases; however, no such obvious 2) Experiment 2: For this case the CNNs are trained simi-
relation exists for γ . This is mainly because it is the step larly to the CNN for RPGD40 in Experiment 1. Perturbations
q
size of the gradient descent in RPGD and not a regularization (12)-(14) are used with the replacement of AHxFD in (13) by
q q q q
parameter. In all experiments, the gradient step is skipped Ay , where y had 30 dB Poisson noise. The xFD and AyReal
during the first iteration. are multiplied with a constant so that their maximum pixel
On the GPU, one iteration of RPGD takes less than 1 sec- value is 480.
ond. The algorithm is stopped when the residual xk+1 − The CNN obtained after the first stage is used as FBPconv.
xk 2 reaches a value less than 1, which is sufficiently small While testing, we keep C = 0.4. Other training hyper-
compared to the dynamic range [0,350] of the image. It takes parameters and testing parameters of the RPGD are kept the
around 1-2 minutes to reconstruct an image with RPGD. same as the RPGD40 for ×5 case in Experiment 1.
GUPTA et al.: CNN-BASED PGD FOR CONSISTENT CT IMAGE RECONSTRUCTION 1449

3) Experiment 3: The CNNs are trained using the pertur-


bations (12)-(14) with two modifications: (i) xq is replaced
q
with xFD because the actual ground truth was unavailable; and
q
(ii) AHxq in (13) is replaced with AyReal because we have now
access to the actual sinogram.
All other training hyper-parameters and testing parameters
are kept the same as RPGD for the ×5 case in Experiment 1.
Similar to Experiment 1, the CNN obtained after the first stage
of the sequential training is used as the FBPconv.

VI. R ESULTS AND D ISCUSSIONS


A. Experiment 1
We report in Tables I and II the results for low and high
measurement noise, respectively. FBPconv and RPGD are used
for low noise, while FBPconv40 and RPGD40 are used for
high noise. The reconstruction SNRs and SSIMs are averaged
over the 25 test images. The gray cells indicate that the method
was optimized for that level of noise. As discussed earlier,
adjusting λ for TV and DL indirectly implies tuning for the
measurement noise; therefore, all of the cells in these columns
are gray. This is different for the learning methods, where
tuning for the measurement noise requires retraining.
1) Low Measurement Noise: In the low-noise cases (Table I),
the proposed RPGD method outperforms all the others for Fig. 4. Reconstruction results for a test slice in Experiment 3. Full-
both ×5 and ×16 reductions in terms of SNR and SSIM dose image is obtained by taking FBP of the full-view sinogram. The
indices. FBP performs the worst but is able to retain enough rest of the reconstructions are obtained from the sparse-view (×5)
sinogram. The last column shows the difference between the recon-
information to be utilized by FBPConv and RPGD. Due to the struction and the full-dose image. (a) Results (∞-dB). (b) zoom (∞-dB).
convexity of the iterative scheme, TV is able to perform well (c) diff (∞-dB).
but tends to smooth textures and edges. DL performs worse
than TV for ×16 case but is equivalent to it for ×5 case. the difference between the training and testing conditions.
On one hand, FBPConv outperforms both TV and DL. but By contrast, RPGD40 is more robust to this difference due to
it is surpassed by RPGD. This is mainly due to the feedback its iterative correction. In the ×16 case with 45-dB and 35-dB
mechanism in RPGD which lets RPGD use the information noise level, it outperforms FBPconv40 by around 3.5 dB and
in the given measurements to increase the quality of the 6 dB, respectively.
reconstruction. In fact, for the ×16, no noise, case, the SNRs 3) Case Study: The reconstructions of lung and abdomen
of the sinogram of the reconstructed images for TV, FBPconv, images for the case of ×16 downsampling and noiseless
and RPGD are around 47 dB, 57 dB, and 62 dB, respectively. measurements are illustrated in Figure 2 (first and fifth
This means that reconstruction using RPGD has both better columns). FBP is dominated by line artifacts, while TV and
image quality and more reliability since it is consistent with DL satisfactorily removes those but blurs the fine structures.
the given noiseless measurement. FBPConv and RPGD are able to reconstruct these details.
2) High Measurement Noise: In the noisier cases (Table II), The zoomed version (second and sixth columns) suggests
RPGD40 yields a better SNR than other methods in the low- that RPGD is able to reconstruct the fine details better than
view cases (×16) and is more consistent in performance than the other methods. This observation remains the same when
the others in the high-view (×5) cases. In terms of the SSIM the measurement quality degrades. The remaining columns,
index, it outperforms all of them. The performance of DL contain the reconstructions for different noise levels. For the
and TV are robust to the noise level with DL performing abdomen image it is noticeable that only TV is able to retain
better than others in terms of SNR for the 45-dB, ×5, the small bone structure marked by an arrow in the zoomed
case. FBPconv40 substantially outperforms DL and TV in version of the lung image (seventh column). Possible reason
the two scenarios with 40-dB noise measurement, over which for this could be that the structure similar to this were rare in
it was actually trained. For this noise level and ×5 case, the training set. Increasing the training data size with suitable
it even performs slightly better than RPGD40 but only in images could be a solution.
terms of SNR. However, as the level of noise deviates from Figure 3 contains the profiles of high- and low-contrast
40 dB, the performance of FBPconv40 degrades significantly. regions of the reconstructions for the two images. These
Surprisingly, its performances in the 45-dB cases are much regions are marked by line segments inside the original image
worse than those in the corresponding 40-dB cases. In fact, in the first column of Figure 2. The FBP profile is highly
its SSIM index for the 45-dB, ×5, case is even worse than noisy and the TV and DL profiles overly smooth the details.
FBP. This implies that FBPConv40 is highly sensitive to FBPconv40 is able to accommodate the sudden transitions
1450 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 37, NO. 6, JUNE 2018

Fig. 5. Convergence with iteration k of RPGD for the Experiment 1, ×16, no-noise case when C = 0.99. Results are averaged over 25 test images.
(a) SNRs of Ük with respect to the ground-truth image. (b) SNRs of ÀÜk with respect to the ground-truth sinogram. (c) Evolution of the relaxation
parameters αk . In (a) and (b), the FBP, FBPconv, and TV results are independent of the RPGD iteration k but have been shown for the sake of
comparison.

TABLE III A. Convergence of RPGD


R ECONSTRUCTION R ESULTS FOR E XPERIMENT 2 W ITH P OISSON
N OISE AND ×5 V IEWS R EDUCTION . G REY C ELL I NDICATE T HAT THE In Figure 5, we show the behavior of RPGD with respect
M ETHOD WAS T RAINED FOR THE C ORRESPONDING N OISE L EVEL to the iteration number k for Experiment 1. The evolution of
the SNR of images xk and their measurements Hxk computed
with respect to the ground truth image and the ground-truth
measurement are shown in Figures 5 (a) and (b), respectively.
We give αk with respect to the iteration k in Figure 5 (c). The
results are averaged over 25 test images for ×16, no noise,
case and C = 0.99. RPGD outperforms all the other meth-
ods in the context of both image quality and measurement
consistency.
Due to the high value of the step size (γ = 2 × 10−3 ) and
the large difference (Hxk − y), the initial few iterations have
large gradients and result in the instability of the algorithm.
in the high-contrast case. RPGD40 is slightly better in this The reason is that the CNN is fed with (xk − γ HT (Hxk −
regard. For the low-contrast case, RPGD40 is able to follow y)), which is drastically different from the perturbations on
the structures of the original (GT) profile better than the which it was trained. In this situation, αk decreases steeply and
others. A similar analysis holds for the ×5 case (Figure 7, stabilizes the algorithm. At convergence, αk = 0; therefore,
supplementary material). according to Theorem 3, x100 is the fixed point of (9) where
F = CNN.
B. Experiment 2
We show in Table III the regressed SNR and SSIM indices B. Advantages of Sequential Training
averaged over the 25 reconstructed slices. RPGD outperforms Here, we experimentally verify the advantages of the
both FBP and FBPconv in terms of SNR and SSIM. Similar sequential-training strategy discussed in Section V. Using
to the Experiment 1, its performance is also more robust with the setup of Experiment 1, we compare the training time
respect to noise mismatch. Fig. 9 in the supplementary material and performance of the CNNs trained with and without this
compares the reconstructions for a given test slice. strategy for the ×16 downsampling and no noise case. For
the gold standard (systematic training of CNN), we train
C. Experiment 3 a CNN as a projector with the 3 types of perturbation
in every epoch. We use 135 epochs for training which is
In Figure 4, we show the reconstruction result for one slice
roughly equal to {T1 + T2 + T3 } used during training for the
for γ = 10−5 . Since the ground truth is unavailable, we show
corresponding sequential-training-based CNN. This number
the reconstructions without a quantitative comparison. It can
was sufficient for the convergence of the training error. The
be seen that the proposed method is able to reconstruct images
reconstruction performance of RPGD using this gold standard
with reasonable perceptual quality.
CNN is 26.86 dB, compared to 27.02 dB for RPGD using
the sequentially trained CNN. The total training times are
VII. B EHAVIOR OF A LGORITHMS 48 and 22 hours, respectively. This demonstrates that the
We now explore the behavior of the proposed method in sequential strategy reduces the training time (in this case more
more details, including its empirical convergence and the effect than 50%), while preserving (or even slightly increasing) the
of sequential training. reconstruction performance.
GUPTA et al.: CNN-BASED PGD FOR CONSISTENT CT IMAGE RECONSTRUCTION 1451

VIII. C ONCLUSION Moreover, since the nonlinear operator F is continuous, G γ


is also continuous. Hence,
We have proposed a simple yet effective iterative scheme
(RPGD) where one step of enforcing measurement consistency lim G γ (xk ) = G γ lim xk = G γ (x∗ ). (25)
is followed by a CNN that tries to project the solution onto k→∞ k→∞
the set of desired reconstruction images. The whole scheme is By plugging (25) into (24), we get that x∗ = G γ (x∗ ), which
ensured to be convergent. We also introduced a novel method means that x∗ is a fixed point of the operator G γ .
to train a CNN that acts like a projector using a reasonably (iii) Now that F = PS satisfies (6), we invoke Proposition 1
small dataset (475 images). For sparse-view CT reconstruction, to infer that x∗ is a local minimizer of (3), thus completing
our method outperforms the previous techniques for both the proof.
noiseless and noisy measurements.
The proposed framework is generic and can be used to B. RPGD for Poisson Noise in CT
solve a variety of inverse problems including superresolution,
deconvolution, accelerated MRI, etc. This can bring more In the case where the CT measurements are corrupted by
robustness and reliability to the current deep-learning-based Poisson noise, the data-fidelity term in (3) should be replaced
techniques. by weighted least squares [35], [56], [57]. For the sake of
completeness, we show a sketch of the derivation. Let x
represent the distribution of linear attenuation coefficient of
A PPENDIX an object and [Hx]m represents their line integral. The mth
A. Proof of Theorem 3 CT measurement, ym , is a Poisson random variable with
parameters
(i) Set rk = (xk+1 − xk ). On one hand, it is clear that  
pm ∼ Poisson bm e−[Hx]m + rm (26)
rk = (1 − αk )xk + αk zk − xk = αk (zk − xk ) . (19)
pm
ym = − log (27)
On the other hand, from the construction of {αk }, bm
αk zk − xk 2 ≤ ck αk−1 zk−1 − xk−1 2 where bm is the blank scan factor and rm is the readout noise.
Since logarithm is bijective, the negative log-likelihood of y
⇔ rk 2 ≤ ck rk−1 2 . (20)
given x is equal to the one of p given x. After removing
Iterating (20) gives the constants, we use this negative log-likelihood as the data-
fidelity term

k
rk 2 ≤ r0 2 ci , ∀k ≥ 1. (21) 
M


E(Hx, y) = p̂m − pm log p̂m , (28)
i=1
m=1
We now show that {xk } is a Cauchy sequence. Since {ck } is
where p̂m = bm e−[Hx]m + rm is the expected value of pm . We
asymptotically upper-bounded by C < 1, there exists K such
then perform a quadratic approximation of E with respect to
that ck ≤ C, ∀k > K . Let m, n be two integers such that
Hx around the point (− ln( p̂mb−r m
)) using a Taylor expansion.
m > n > K . By using (21) and the triangle inequality, m
After ignoring the higher-order terms, this yields

m−1 
K 
m−1−K
M 2
xm − xn 2 ≤ rk 2 ≤ r0 2 ci Ck wm bm
E(Hx, y) = Hx − log , (29)
k=n i=1 k=n−K 2 pm − r m
  m=1

K
C n−K − C m−K
where wm = ( pm p−r m)
2
≤ r0 2 ci . (22) .
1−C m
In the case when the readout noise rm is insignificant, (29)
i=1
can be written as
The last inequality proves that xm − xn 2 → 0 as m →
∞, n → ∞, or {xk } is a Cauchy sequence in the complete 
M
wm
E(Hx, y) = ([Hx]m − ym )2 (30)
metric space R N . As a consequence, {xk } must converge to 2
m=1
some point x∗ ∈ R N . 1 1 1
(ii) Assume from now on that {αk } is lower-bounded by = W 2 Hx − W 2 y 2 (31)
2
ε > 0. By definition, {αk } is also non-increasing and, thus, 1
convergent to α ∗ > 0. Next, we rewrite the update of xk in = H x − y 2 , (32)
2
Algorithm 1 as
where W ∈ R M×M is a diagonal matrix with [diag(W)]m =
xk+1 = (1 − αk )xk + αk G γ (xk ),
1 1
(23) wm = pm , H = W 2 H, and y = W 2 y.
Imposing the data manifold prior, we get the equivalent of
where G γ is defined by (9). Taking the limit of both sides Problem (3) as
of (23) leads to
1
min H x − y 2 . (33)
x∗ = (1 − α ∗ )x∗ + α ∗ limk→∞ G γ (xk ). (24) x∈S 2
1452 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 37, NO. 6, JUNE 2018

Note that all the results discussed in Section II and III apply to [17] Y. S. Han, J. Yoo, and J. C. Ye. (2017). “Deep learning with domain
Problem (33). As a consequence, we use Algorithm 1 to solve adaptation for accelerated projection-reconstruction MR.” [Online].
Available: https://arxiv.org/abs/1703.01135
the problem with the following small change in the gradient [18] S. Antholzer, M. Haltmeier, and J. Schwab. (2017). “Deep learning
step: for photoacoustic tomography from sparse data.” [Online]. Available:
https://arxiv.org/abs/1704.04587
zk = F(xk − γ HT H xk + γ HT y ). (34) [19] S. Wang et al., “Accelerating magnetic resonance imaging via deep
learning,” in Proc. IEEE Int. Symp. Biomed. Imag. (ISBI), Apr. 2016,
pp. 514–517.
[20] A. Mousavi and R. G. Baraniuk. (2017). “Learning to invert: Sig-
nal recovery via deep convolutional networks.” [Online]. Available:
ACKNOWLEDGMENT https://arxiv.org/abs/1701.03891
The authors thank Emmanuel Soubies for his helpful sug- [21] K. Gregor and Y. LeCun, “Learning fast approximations of sparse
coding,” in Proc. Int. Conf. Mach. Learn. (ICML), 2010, pp. 399–406.
gestions on training the CNN and Dr. Cynthia McCollough, [22] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMM-net for compressive
the Mayo Clinic, the American Association of Physicists in sensing MRI,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2016,
Medicine, and the National Institute of Biomedical Imaging pp. 10–18.
[23] J. Adler and O. Öktem. (2017). “Solving ill-posed inverse prob-
and Bioengineering for the Mayo-clinic dataset. They also lems using iterative deep neural networks.” [Online]. Available:
thank Dr. Marco Stampanoni, Swiss Light Source, Paul Scher- https://arxiv.org/abs/1704.04058
rer Institute, Villigen, Switzerland, for the rat-brain dataset. [24] P. Putzky and M. Welling. (2017). “Recurrent inference machines
for solving inverse problems.” [Online]. Available: https://arxiv.
They also thankfully acknowledge the support of the NVIDIA org/abs/1706.04008
Corporation, in providing the Titan X GPU for this research. [25] J. Schlemper, J. Caballero, J. V. Hajnal, A. Price, and D. Rueckert,
“A deep cascade of convolutional neural networks for MR image
reconstruction,” in Proc. Int. Conf. Inf. Process. Med. Imag., 2017,
R EFERENCES pp. 647–658.
[1] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of [26] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-
compressed sensing for rapid MR imaging,” Magn. Reson. Med., vol. 58, play priors for model based reconstruction,” in Proc. IEEE Global Conf.
no. 6, pp. 1182–1195, 2007. Signal Inf. Process. (GlobalSIP), Dec. 2013, pp. 945–948.
[2] A. C. Kak and M. Slaney, Principles of Computerized Tomographic [27] S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-play
Imaging (Classics in Applied Mathematics). New York, NY, USA: ADMM for image restoration: Fixed-point convergence and appli-
SIAM, 2001. cations,” IEEE Trans. Comput. Imag., vol. 3, no. 1, pp. 84–98,
[3] X. C. Pan, E. Y. Sidky, and M. Vannier, “Why do commercial CT Jan. 2017.
scanners still employ traditional, filtered back-projection for image [28] S. Sreehari et al., “Plug-and-play priors for bright field electron tomog-
reconstruction?” Inverse Problems, vol. 25, no. 12, p. 123009, 2009. raphy and sparse interpolation,” IEEE Trans. Comput. Imag., vol. 2,
[4] C. Bouman and K. Sauer, “A generalized Gaussian image model for no. 4, pp. 408–423, Dec. 2016.
edge-preserving MAP estimation,” IEEE Trans. Image Process., vol. 2, [29] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could:
no. 3, pp. 296–310, Jul. 1993. Regularization by denoising (RED),” SIAM J. Imag. Sci., vol. 10, no. 4,
[5] P. Charbonnier, L. Blanc-Féraud, G. Aubert, and M. Barlaud, “Determin- pp. 1804–1844, 2017.
istic edge-preserving regularization in computed imaging,” IEEE Trans. [30] J. H. R. Chang, C.-L. Li, B. Póczos, B. V. K. V. Kumar, and
Image Process., vol. 6, no. 2, pp. 298–311, Feb. 1997. A. C. Sankaranarayanan. (2017). “One network to solve them all—
[6] E. Candès and J. Romberg, “Sparsity and incoherence in compressive Solving linear inverse problems using deep projection models.” [Online].
sampling,” Inverse Probl., vol. 23, no. 3, pp. 969–985, 2007. Available: https://arxiv.org/abs/1703.09912
[7] S. Ramani and J. A. Fessler, “Parallel MR image reconstruction using [31] A. Bora, A. Jalal, E. Price, and A. G. Dimakis. (2017). “Com-
augmented Lagrangian methods,” IEEE Trans. Med. Imag., vol. 30, pressed sensing using generative models.” [Online]. Available:
no. 3, pp. 694–706, Mar. 2011. https://arxiv.org/abs/1703.03208
[8] M. Elad and M. Aharon, “Image denoising via sparse and redundant [32] B. Kelly, T. P. Matthews, and M. A. Anastasio. (2017). “Deep learning-
representations over learned dictionaries,” IEEE Trans. Image Process., guided image reconstruction from incomplete data.” [Online]. Available:
vol. 15, no. 12, pp. 3736–3745, Dec. 2006. https://arxiv.org/abs/1709.00584
[9] E. J. Candès, Y. C. Eldar, D. Needell, and P. Randall, “Compressed sens- [33] J. Z. Liang, P. J. La Riviere, G. El Fakhri, S. J. Glick, and J. Siewerdsen,
ing with coherent and redundant dictionaries,” Appl. Comput. Harmon. “Guest editorial low-dose CT: What has been done, and what challenges
Anal., vol. 31, no. 1, pp. 59–73, Jul. 2011. remain?” IEEE Trans. Med. Imag., vol. 36, no. 12, pp. 2409–2416,
[10] S. Ravishankar, R. R. Nadakuditi, and J. A. Fessler, “Efficient sum Dec. 2017.
of outer products dictionary learning (SOUP-DIL) and its application
[34] S. Ramani and J. A. Fessler, “A splitting-based iterative algorithm
to inverse problems,” IEEE Trans. Comput. Imag., vol. 3, no. 4,
for accelerated statistical X-ray CT reconstruction,” IEEE Trans. Med.
pp. 694–709, Dec. 2017.
Imag., vol. 31, no. 3, pp. 677–688, Mar. 2012.
[11] M. A. T. Figueiredo and R. D. Nowak, “An EM algorithm for wavelet-
based image restoration,” IEEE Trans. Image Process., vol. 12, no. 8, [35] Q. Xu, H. Yu, X. Mou, L. Zhang, J. Hsieh, and G. Wang, “Low-dose
pp. 906–916, Aug. 2003. X-ray CT reconstruction via dictionary learning,” IEEE Trans. Med.
[12] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding Imag., vol. 31, no. 9, pp. 1682–1697, Sep. 2012.
algorithm for linear inverse problems with a sparsity constraint,” Com- [36] S. Niu et al., “Sparse-view X-ray CT reconstruction via total gen-
mun. Pure Appl. Math., vol. 57, no. 11, pp. 1413–1457, Nov. 2004. eralized variation regularization,” Phys. Med. Biol., vol. 59, no. 12,
[13] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding pp. 2997–3017, 2014.
algorithm for linear inverse problems,” SIAM J. Imag. Sci., vol. 2, no. 1, [37] L. Gjesteby, Q. Yang, Y. Xi, Y. Zhou, J. Zhang, and G. Wang, “Deep
pp. 183–202, 2009. learning methods to guide CT image reconstruction and reduce metal
[14] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed artifacts,” Proc. SPIE, vol. 10132, p. 101322W, Mar. 2017.
optimization and statistical learning via the alternating direction method [38] H. Chen et al., “Low-dose CT with a residual encoder-decoder convo-
of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, lutional neural network,” IEEE Trans. Image Process., vol. 36, no. 12,
Jan. 2011. pp. 2524–2535, Dec. 2017.
[15] M. T. McCann, K. H. Jin, and M. Unser, “Convolutional neural networks [39] E. Kang, J. Min, and J. C. Ye, “A deep convolutional neural network
for inverse problems in imaging: A review,” IEEE Signal Process. Mag., using directional wavelets for low-dose X-ray CT reconstruction,” Med.
vol. 34, no. 6, pp. 85–95, Nov. 2017. Phys., vol. 44, no. 10, pp. e360–e375, Oct. 2017.
[16] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convo- [40] Y. S. Han, J. Yoo, and J. C. Ye. (2016). “Deep residual learning
lutional neural network for inverse problems in imaging,” IEEE Trans. for compressed sensing CT reconstruction via persistent homology
Image Process., vol. 26, no. 9, pp. 4509–4522, Sep. 2017. analysis.” [Online]. Available: https://arxiv.org/abs/1611.06391
GUPTA et al.: CNN-BASED PGD FOR CONSISTENT CT IMAGE RECONSTRUCTION 1453

[41] B. Eicke, “Iteration methods for convexly constrained ill-posed problems [50] A. E. Orhan and X. Pitkow. (2017). “Skip connections eliminate singu-
in Hilbert space,” Numer. Funct. Anal. Optim., vol. 13, nos. 5–6, larities.” [Online]. Available: https://arxiv.org/abs/1701.09175
pp. 413–429, 1992. [51] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
[42] L. Landweber, “An iteration formula for fredholm integral equations image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog-
of the first kind,” Amer. J. Math., vol. 73, no. 3, pp. 615–624, nit. (CVPR), Jun. 2016, pp. 770–778.
Jul. 1951. [52] C. McCollough, “TU-FG-207A-04: Overview of the low dose CT grand
[43] D. P. Bertsekas, Nonlinear Programming, 2nd ed. Cambridge, MA, challenge,” Med. Phys., vol. 43, no. 6, pp. 3759–3760, 2016.
USA: Athena Scientific, 1999. [53] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural
[44] P. L. Combettes and V. R. Wajs, “Signal recovery by proximal similarity for image quality assessment,” in Proc. 37th Asilomar Conf.
forward-backward splitting,” Multiscale Model. Simul., vol. 4, no. 4, Signals, Syst. Comput., vol. 2. Nov. 2003, pp. 1398–1402.
pp. 1168–1200, 2005. [54] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online learning for
[45] P. L. Combettes and J.-C. Pesquet, Proximal Splitting Meth- matrix factorization and sparse coding,” J. Mach. Learn. Res., vol. 11,
ods in Signal Processing. New York, NY, USA: Springer, 2011, pp. 19–60, Mar. 2010.
pp. 185–212. [55] J. A. Tropp and A. C. Gilbert, “Signal recovery from random mea-
[46] J. Bect, L. Blanc-Féraud, G. Aubert, and A. Chambolle, “A 1 -unified surements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory,
variational framework for image restoration,” in Proc. Eur. Conf. Com- vol. 53, no. 12, pp. 4655–4666, Dec. 2007.
put. Vis. (ECCV), 2004, pp. 1–13. [56] K. Sauer and C. Bouman, “A local update strategy for iterative recon-
[47] A. Aldroubi and R. Tessera, “On the existence of optimal unions of struction from projections,” IEEE Trans. Signal Process., vol. 41, no. 2,
subspaces for data modeling and clustering,” Found. Comput. Math., pp. 534–548, Feb. 1993.
vol. 11, no. 3, pp. 363–379, Jun. 2011. [57] I. A. Elbakri and J. A. Fessler, “Statistical image reconstruction for
[48] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, polyenergetic X-ray computed tomography,” IEEE Trans. Med. Imag.,
MA, USA: MIT Press, 2016. vol. 21, no. 2, pp. 89–99, Feb. 2002.
[49] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks [58] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone
for biomedical image segmentation,” in Proc. Med. Image. Comput. Operator Theory in Hilbert Spaces. New York, NY, USA: Springer,
Comput. Assist. Intervent. (MICCAI), 2015, pp. 234–241. 2011.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy