Medical Image Denoising Using Convolutional Denoising Autoencoders
Medical Image Denoising Using Convolutional Denoising Autoencoders
denoising autoencoders
Lovedeep Gondara
Department of Computer Science
Simon Fraser University
lgondara@sfu.ca
Abstract—Image denoising is an important pre-processing step image denoising performance for their ability to exploit strong
arXiv:1608.04667v2 [cs.CV] 18 Sep 2016
in medical image analysis. Different algorithms have been pro- spatial correlations.
posed in past three decades with varying denoising performances.
More recently, having outperformed all conventional methods, In this paper we present empirical evidence that stacked
deep learning based models have shown a great promise. These denoising autoencoders built using convolutional layers work
methods are however limited for requirement of large training well for small sample sizes, typical of medical image
sample size and high computational costs. In this paper we show databases. Which is in contrary to the belief that for optimal
that using small sample size, denoising autoencoders constructed performance, very large training datasets are needed for models
using convolutional layers can be used for efficient denoising based on deep architectures. We also show that these methods
of medical images. Heterogeneous images can be combined to can recover signal even when noise levels are very high, at the
boost sample size for increased denoising performance. Simplest
point where most other denoising methods would fail.
of networks can reconstruct images with corruption levels so high
that noise and signal are not differentiable to human eye. Rest of this paper is organized as following, next section
Keywords—Image denoising, denoising autoencoder, convolu- discusses related work in image denoising using deep architec-
tional autoencoder tures. Section III introduces autoencoders and their variants.
Section IV explains our experimental set-up and details our
empirical evaluation and section V presents our conclusions
I. I NTRODUCTION
and directions for future work.
Medical imaging including X-rays, Magnetic Resonance
Imaging (MRI), Computer Tomography (CT), ultrasound etc. II. R ELATED WORK
are susceptible to noise [21]. Reasons vary from use of
different image acquisition techniques to attempts at decreasing Although BM3D [7] is considered state-of-the-art in image
patients exposure to radiation. As the amount of radiation is denoising and is a very well engineered method, Burger et
decreased, noise increases [1]. Denoising is often required for al. [4] showed that a plain multi layer perceptron (MLP) can
proper image analysis, both by humans and machines. achieve similar denoising performance.
Denoising autoencoders are a recent addition to image de-
Image denoising, being a classical problem in computer
noising literature. Used as a building block for deep networks,
vision has been studied in detail. Various methods exist, rang-
they were introduced by Vincent et al. [24] as an extension to
ing from models based on partial differential equations (PDEs)
classic autoencoders. It was shown that denoising autoencoders
[18], [20], [22], domain transformations such as wavelets [6],
can be stacked [25] to form a deep network by feeding the
DCT [29], BLS-GSM [19] etc., non local techniques including
output of one denoising autoencoder to the one below it.
NL-means [30], [3], combination of non local means and
domain transformations such as BM3D [7] and a family of Jain et al. [12] proposed image denoising using convolu-
models exploiting sparse coding techniques [17], [9], [15]. All tional neural networks. It was observed that using a small sam-
methods share a common goal, expressed as ple of training images, performance at par or better than state-
of-the-art based on wavelets and Markov random fields can be
achieved. Xie et al. [28] used stacked sparse autoencoders for
z =x+y (1)
image denoising and inpainting, it performed at par with K-
SVD. Agostenelli et al. [1] experimented with adaptive multi
Where z is the noisy image produced as a sum of original column deep neural networks for image denoising, built using
image x and some noise y. Most methods try to approximate combination of stacked sparse autoencoders. This system was
x using z as close as possible. IN most cases, y is assumed to shown to be robust for different noise types.
be generated from a well defined process.
With recent developments in deep learning [14], [11], [23], III. P RELIMINARIES
[2], [10], results from models based on deep architectures
A. Autoencoders
have been promising. Autoencoders have been used for im-
age denoising [24], [25], [28], [5]. They easily outperform An autoencoder is a type of neural network that tries
conventional denoising methods and are less restrictive for to learn an approximation to identity function using back-
specification of noise generative processes. Denoising au- propagation, i.e. given a set of unlabeled training inputs
toencoders constructed using convolutional layers have better x(1) , x(2) , ..., x(n) , it uses
1) Denoising Autoencoders: Denoising autoencoder is a
z (i)
=x (i)
(2) stochastic extension to classic autoencoder [24], that is we
force the model to learn reconstruction of input given its noisy
version. A stochastic corruption process randomly sets some
An autoencoder first takes an input x ∈ [0, 1]d and of the inputs to zero, forcing denoising autoencoder to predict
0
maps(encode) it to a hidden representation y ∈ [0, 1]d using missing(corrupted) values for randomly selected subsets of
deterministic mapping, such as missing patterns.
Basic architecture of a denoising autoencoder is shown in
Fig. 2
y = s(W x + b) (3)
Output from the layer below is fed to the current layer and
training is done layer wise.
IV. E VALUATION
A. Data
We used two datasets, mini-MIAS database of mammo-
grams(MMM) [13] and a dental radiography database(DX)
[26]. MMM has 322 images of 1024 × 1024 resolution and
DX has 400 cephalometric X-ray images collected from 400
patients with a resolution of 1935 × 2400. Random images
from both datasets are shown in Fig. 4.
C. Empirical evaluation
For baseline comparison, images corrupted with lowest
noise level (µ = 0, σ = 1, p = 0.1) were used. To keep similar
Fig. 7. Training and validation loss from 100 epochs using a batchsize of 10
sample size for training, we used 300 images from each of the
datasets, leaving us with 22 for testing in MMM and 100 in
DX.
Using a batch size of 10 and 100 epochs, denoising results
are presented in Fig. 6 and Table II.
Fig. 6. Denoising results on both datasets, top row shows real images with
second row showing the noisier version (µ = 0, σ = 1, p = 0.1), third row
shows images denoised using CNN DAE and fourth row shows results of
applying a median filter
TABLE II. M EAN SSIM SCORES FOR TEST IMAGES FROM MMM AND
DX DATASETS
Image type MMM DX
Noisy 0.45 0.62
CNN DAE 0.81 0.88
Median filter 0.73 0.86
Fig. 8. Denoising performance of CNN DAE on combined dataset, top row
Results show an increased denoising performance using shows real images, second row is noisier version with minimal noise, third
this simple architecture on small datasets over the use of row is denoising result of NL means, fourth rows shows results of median
filter, fifth row is results of using smaller dataset (300 training samples) with
median filter, which is most often used for this type of noise. CNN DAE, sixth row is the results of CNN DAE on larger combined dataset.
Model converged nicely for the given noise levels and
sample size, shown in Fig. 7. It can bee seen that even using
50 epochs, reducing training time in half, we would have got Denoising results on three randomly chosen test images
similar results. from combined dataset are shown in Fig. 8 and Table III.
Table III shows that CNN DAE performs better than NL
To test if increased sample size by combining hetero-
means and median filter. Increasing sample size marginally
geneous data sources would have an impact on denoising
enhanced the denoising performance.
performance, we combined both datasets with 721 images for
training and 100 for testing. To test the limits of CNN DAEs denoising performance, we
TABLE III. C OMPARING MEAN SSIM SCORES USING DIFFERENT
DENOISING FILTERS
It can be seen that as noise level increases, this simple
network has trouble reconstructing original signal. However,
Image type SSIM
Noisy 0.63
even when the image is not visible to human eye, this network
NL means 0.62 is successful in partial generation of real images. Using a more
Median filter 0.80 complex deeper model, or by increasing number of training
CNN DAE(a) 0.89
CNN DAE(b) 0.90
samples and number of epochs might help.
CNN DAE(a) is denoising performance using smaller dataset and CNN DAE(b) is Performance of CNN DAE was tested on images corrupted
denoising performance on same images using the combined dataset. using Poisson noise with p = 0.2, λ = 1 and λ = 5. Denoising
results are shown in Fig. 10.
Fig. 10. CNN DAE performance on Poisson corrupted images. Top row
shows images corrupted with p = 0.2, λ = 1 with second row showing
denoised results using CNN DAE. Third and fourth rows show noisy and
denoised images corrupted with p = 0.2, λ = 5.