0% found this document useful (0 votes)
27 views13 pages

DDR 2016

Uploaded by

Norbert Hounsou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views13 pages

DDR 2016

Uploaded by

Norbert Hounsou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
1

Demosaicing based on Directional Difference


Regression and Efficient Regression Priors
Jiqing Wu, Radu Timofte, Member, IEEE, and Luc Van Gool, Fellow, IEEE

Abstract—Color demosaicing is a key image processing step 38 FR + ERP


aiming to reconstruct the missing pixels from a recorded raw FR LSSC + ERP
LDINAT + ERP
image. On the one hand, numerous interpolation methods fo- 37
DDR
MLRI
cusing on spatial-spectral correlations have been proved very BILINEAR + ERP
efficient, whereas they yield a poor image quality and strong LDINAT

PSNR (dB)
36 LSSC
visible artifacts. On the other hand, optimization strategies such
as learned simultaneous sparse coding (LSSC) and sparsity and PCSD
adaptive PCA (SAPCA) based algorithms were shown to greatly 35
LPAICI MSG
DLMMSE
improve image quality compared to that delivered by interpo- GBTF
lation methods, but unfortunately are computationally heavy. In 34
AHD
this paper we propose ‘efficient regression priors (ERP)’ as a HQL
novel, fast post-processing algorithm that learns the regression 33 AP
priors offline from training data. We also propose an independent BILINEAR
efficient demosaicing algorithm based on directional difference 32
regression (DDR), and introduce its enhanced version based on 0 200 400 600 800 1,000 1,200 1,400 1,600
fused regression (FR). We achieve an image quality comparable running time (s)
to that of state-of-the-art methods for three benchmarks, while
being order(s) of magnitude faster. Fig. 2: Our proposed methods (DDR, FR and ERP) provide the
Index Terms—Demosaicing, Color filter array, Super- best average demosaicing quality with low time complexity, on
resolution, Image Enhancement, Linear Regression. the IMAX dataset. Details are given in Section V.

I. I NTRODUCTION method, with a gain parameter to weight the gradient correc-


tion term. In other words, Malvar et al. first apply bilinear
F OR reasons of cost, most digital cameras are based on
a single image sensor equipped with a color filter array
(CFA). The Bayer pattern filter [1], as shown in Fig. 1, is the
interpolation to compute lost G values at R/B locations, then
correct them by using the spatial gradients of R/B. A similar
strategy is applied for the interpolation of the missing R/B
most frequently used CFA. Other patterns are discussed in [2].
values.
The study of demosaicing algorithms for the Bayer pattern,
DLMMSE. Zhang and Wu [6] develop the directional linear
aiming at recovering the missing color bands at each pixel,
minimum mean-square error estimation (DLMMSE) tech-
has a long history (see [3], [4]). We can broadly fit them into
nique. DLMMSE builds on the assumption that differencing
two categories: interpolation- and optimization-based methods.
G and R/B channels amounts to low-pass filtering, given their
Initially, interpolation-based methods were developed.
strong correlation. The results are typically referred to as
Among those, nearest neighbor, bilinear or bicubic methods
‘primary difference signals’ or ‘PDS’. In particular, DLMMSE
are the simplest as they interpolate within the R, G, and B
adaptively estimates the missing G values in both horizontal
channels independently. Later on, researchers started to exploit
and vertical directions, and then optimally fuses them. Finally,
the spatial-spectral correlations that exist between the RGB
the R/B channels are computed, guided by the reconstructed
channels.
G channel and the PDS.
HQL. Malvar et al. [5] introduce high-quality linear interpola-
LPAICI. Paliy et al. [7] propose spatially adaptive color filter
tion (HQL). HQL is a gradient-corrected bilinear interpolation
array interpolation. They employ local polynomial approxi-
J. Wu, R. Timofte, and L. Van Gool are with the Department of Information mation (LPA) (Katkovnik et al. [8]) and the paradigm of in-
Technology and Electrical Engineering, ETH Zurich, Switzerland, e-mail: tersection of confidence intervals (ICI) (Katkovnik et al. [9]).
{jwu,radu.timofte,vangool}@vision.ee.ethz.ch
L. Van Gool is also with ESAT, KU Leuven, Belgium ICI serves to determine the scales of LPA. LPAICI aims to
Codes available at: http://www.vision.ee.ethz.ch/∼timofter/ filter the directional differences obtained by the Hamilton and
Adam algorithm [10].
PCSD. Wu and Zhang [11] present a primary-consistent soft-
decision method (PCSD). PCSD computes several estimations
of the RGB channels via primary-consistent interpolation un-
der different assumptions on edge and texture directions. Here,
the primary-consistent interpolation indicates that all three
primary components of a color are interpolated in the same
direction. The final step is to test the assumptions and select
Fig. 1: Scheme of demosaicing.

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
2

the best, through an optimal statistical decision or inference methods, and boost their performance. Of particular interest
process. is its combination with the fastest ones, as this leads to state-
GBTF & MSG. Pekkucuksen and Altunbasak propose the of-the-art performance at high speed. On top of that, we also
gradient-based threshold-free (GBTF) method [12] and an propose modifications that go beyond sheer post-processing
improved version, the multiscale gradients-based (MSG) [13] and that further improve the results.
color filter array interpolation. GBTF addresses certain limi- Our post-processing step is coined ‘efficient regression
tations of DLMMSE by introducing gradients of color differ- priors method’ (ERP). For a given demosaicing method,
ences to compute weights for the west, east, north and south ERP learns offline linear regressors for the residuals between
directions. MSG further applies multiscale color gradients to demosaiced training images and the ground truth, and then
adaptively combine color estimates from different directions. applies them to the output of the demosaicing method at
MLRI. Incorporating the idea from GBTF, Kiku et al. [14] runtime. ERP is inspired by the adjusted anchored neigh-
propose minimized-Laplacian residue interpolation (MLRI). borhood regression (A+) [21], [22], a state-of-the-art method
They estimate the tentative pixel values by minimizing the in image super-resolution. Farsiu et al. [23] were among the
Laplacian energies of the residuals. first to observe the connection between super-resolution and
AVSC. Zhang et al. [15] propose a robust color demosaic- demosaicing. ERP as sheer post-processing step has already
ing method with adaptation to varying spectral correlations been introduced in our previous paper [24]. Here we add
(AVSC). AVSC is a hybrid approach which combines an two further refined versions for fast demosaicing, one based
existing color demosaicing algorithm such as DLMMSE [6] on directional difference regression (DDR) and the other on
with an adaptive intraband interpolation. fused regression (FR). DDR and FR integrate MLRI and
LDINAT. Zhang et al. [16] derive a color demosaicing method ERP beyond simply post-processing the demosaiced images.
by local directional interpolation and nonlocal adaptive thresh- Motivated by MLRI, we fully explore the correlation between
olding (LDINAT) and exploit the non-local image redundancy channels by training directional differences. As a result, our
to enhance the local color results. methods reduce the color artifacts and achieve state-of-the-
Besides interpolation based methods, the demosaicing prob- art performance comparable to those of LSSC/SAPCA, but at
lem is also tackled with optimization-based methods. running times that are order(s) of magnitude lower (see Fig. 2).
AP. For optimization, Gunturk et al. [17] iteratively exploit Our paper is organized as follows. Section II briefly reviews
inter-channel correlation in an alternating-projections scheme MLRI and A+, as both underly our methods. Section III intro-
(AP). After initial estimation, intermediate results are pro- duces our proposed post-processing method - ERP. Section IV
jected onto two constraint sets, which are determined by the further introduces our novel demosaicing methods DDR and
observed data and prior information on spectral correlation. FR. In section V, we discuss the choices of parameters and
AHD. Hirakawa et al. [18] propose an adaptive homogeneity- the experimental results. Finally, we conclude the paper in
directed demosaicing algorithm (AHD). AHD employs metric section VI.
neighborhood modeling and filter bank interpolation in order
to determine the interpolation direction and cancel aliasing, II. R EVIEW OF MLRI AND A+
followed by artifact reduction iterations.
This section briefly reviews the two major sources of
LSSC. Mairal et al. [19] derive a learned simultaneous sparse
inspiration for our proposed methods: the MLRI demosaicing
coding method (LSSC) for both denoising and demosaicing.
method [14] and the A+ super-resolution method [21].
Essentially, they unify two steps – dictionary learning adapted
to sparse signal description and exploiting the self-similarities
of images into LSSC. A. Minimized-Laplacian Residue Interpolation (MLRI)
SAPCA. Last but not least, Gao et al. [20] propose the The MLRI method of Kiku et al. [14] is mainly motivated
sparsity and adaptive principal component analysis (PCA) by the GBTF method of Pekkucuksen et al.[12]. MLRI in-
based algorithm (SAPCA) by solving a minimization problem, cludes two stages (see Fig. 3). Let Gx,y and Rx,y denote the
i.e. by minimizing an l1 function that contains sparsity and raw values at position (x, y) for the green and red channels,
PCA terms. resp.:
We observe that most methods do not perform consistently First stage. 1 MLRI estimates the missing G values at loca-
on the IMAX and Kodak datasets (see Fig. 7), which are tions with R information as well as the R values at locations
the two most commonly used datasets for testing demosaicing with G information through linear interpolation. Assuming the
algorithms. When they perform well on Kodak, they tend to raw value Gi,j or Ri,j is missing then we have,
be less convincing on IMAX. Of course, part of the reason
is that the study of Kodak has a longer history than that of GH H
i,j = (Gi,j−1 + Gi,j+1 )/2, Ri,j = (Ri,j−1 + Ri,j+1 )/2.
IMAX, and the images in IMAX seem to be more challenging (1)
to reconstruct. LSSC and SAPCA report the best performances Next, after computing the horizontal Laplacian of tentative R
on the Kodak dataset and SAPCA substantially outperforms and G estimations by the 1D-filter
all other methods on the IMAX dataset. Yet, both methods 
F1D = −1 0 2 0 −1 ,

(2)
come with a high computational cost.
In this paper, we propose an efficient post-processing step 1 Here we only discuss the estimation of the G values at R position in the
that can be combined with all aforementioned demosaicing horizontal direction, G values at B are handled similarly.

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
3

Fig. 3: G channel interpolation of MLRI.

MLRI uses a modified version of guided image filters where FG is the Gaussian weighted averaging filter
(GIF) [25] to obtain intermediate G values, meaning that the  
F1D RH is treated as the guided Laplacian for F1D GH , so FG = 0.56 0.35 0.08 0.01 0 , (9)
that the dilation coefficient ai,j is obtained, ωn,s,e,w are computed by color difference gradients and ωt is
1 H H the sum of ωn,s,e,w . Eventually, G values at R locations are
P
|ω| (m,n)∈ωi,j (F1D Rm,n )(F1D Gm,n )
ai,j = 2 + , (3) obtained by
σi,j
G̃i,j = Ri,j + ∆˜ g,r/b (i, j). (10)
where ωi,j is a local image patch centered at pixel (i, j), |ω|
2
is the number of pixels in ωi,j , σi,j is the variance of F1D RH A similar derivation holds for the G values at B locations.
in ωi,j ,  is a regularization parameter. As to the R channel, MLRI computes the Laplacian of R
The translation coefficient bi,j is obtained as follows, and G values with the 2D-filter
 
bi,j = GH i,j − ai,j RH i,j , (4) 0 0 −1 0 0
0 0 0 0 0
where GH i,j and RH i,j are the mean values of GH and RH
 
F2D =  −1 0 4 0 −1 . (11)

in ωi,j . The intermediate G value ǦH
i,j is further determined, 0 0 0 0 0
1 X 0 0 −1 0 0
ǦH
i,j =
H
(ak,l Ri,j + bk,l ). (5)
|ω| Again, the modified GIF is applied. The R channel is guided
(k,l)∈ωi,j
by G̃i,j values. In the end, the output R values are enhanced
Under the assumption that the residues vary linearly in a
by smoothing the residues as Eq. (6) indicates. The B channel
small area, the smoothed residues ∆H
g are estimated by linear goes through exactly the same process.
interpolation
∆H H H
g (i, j) = (Gi,j−1 − Ǧi,j−1 )/2 + (Gi,j+1 − Ǧi,j+1 )/2. (6) B. Adjusted Anchored Neighborhood Regression (A+)
Correspondingly, the horizontally enhanced G values at the R A+ proposed by Timofte et al. [21] derives from and
locations are acquired by adding the tentative values ǦH and greatly enhances the performance of Anchored Neighborhood
the interpolated residuals ∆H g . To get other enhanced R, B Regression (ANR) [26] for image super-resolution tasks. The
values at different positions MLRI applies the same modified algorithm contains two important stages:
GIF. Offline Stage. A+ uses Zeyde et al.’s algorithm [27] as a
Second stage. It starts with computing the tentative horizon- starting point, which trains a sparse dictionary from millions
tal/vertical (h/v) color differences (G-R, G-B) ∆ ˜ H,V
g,r/b of low resolution (LR) patches collected from 91 training
 H,V images [28]. To begin with, the LR images in the YCbCr
 G̃i,j − Ri,j G is interpolated at R,

 G̃H,V − B
 color space are scaled up to the size of output high-resolution
˜ H,V (i, j) = i,j i,j G is interpolated at B,
∆ g,r/b H,V
(HR) images by bicubic interpolation. In the next step, the

 G i,j − R̃ i,j R is interpolated, upscaled LR image yl is filtered by the first- and second-order
H,V

Gi,j − B̃i,j B is interpolated, gradients, and features {p̃kl }k corresponding to LR patches

(7) of size 3 × 3 are collected accordingly. A+ projects them
where G̃H,V
i,j , R̃ H,V
i,j and B̃ H,V
i,j are the above enhanced hor- onto a low-dimensional subspace by PCA, discarding 0.1%
izontal/vertical values. Then the color differences ∆ ˜ g,r/b are of the energy. When it comes to the training, K-SVD [29],
weighted and improved as an iterative method that alternates between sparse coding of
˜ g,r/b (i, j) = {ωs FG ∆
∆ ˜ V (i − 4 : i, j) the examples and updating the dictionary atoms, is applied to
g,r/b
solve the following optimization problem
˜ V (i : i + 4, j)
+ ωn FG ∆ g,r/b
(8)
X
˜ H (i, j − 4 : j)FTG
+ ωw ∆ Dl , {qk } = argmin kpkl − Dl qk k2 s.t. kqk k0 ≤ L ∀k.
g,r/b Dl ,{qk } k
+ ˜ H (i, j
ωe ∆ : j + 4)FTG }/ωt , (12)
g,r/b

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
4

Fig. 4: Scheme of ERP.

where {pkl }k are the training LR feature vectors, qk are and vertical finite differences are calculated. After extracting
the coefficient vectors and Dl is the LR training dictionary. the LR patch features (PCA projected), A+ searches the atom
The training process of A+ goes through 20 iterations of the dj in Dl with the highest correlation to each input LR feature
K-SVD algorithm, with 1024 atoms in the dictionary, and yj , and the residual HR patch xj without low frequencies is
allocating L = 3 atoms per coefficient vector. obtained by multiplication of the regressor Pj anchored to dj
Instead of optimizing the reconstruction of high resolution with yj
(HR) patches at runtime, A+ uses offline trained anchored xj = Pj yj . (16)
regressors to directly obtain them. More specifically, A+ uses
the atoms of the trained dictionary Dl as anchors for the Subsequently the low frequencies are added. The HR patches
surrounding neighbourhood and the corresponding LR to HR are combined by averaging in the overlapping area to complete
patch regressor. A+ collects 5 million pairs of corresponding the output HR image.
LR features and HR patches from a scaled pyramid of the
91 training images. For each anchor point (LR atom), A+ III. E FFICIENT REGRESSION PRIORS (ERP)
retrieves the nearest n = 2048 training samples. Due to the Our ERP method is inspired by the A+ method introduced
l2 -norm used in Eq. (12) the distance between the atom and its for image super-resolution. As a post-processing step, ERP
neighbor is also Euclidean, and all of the 5 million candidates has two major strengths. Firstly, it is capable of improving the
are normalized by the l2 -norm. Then for an arbitrary input LR results of many demosaicing methods. Especially MLRI+ERP
feature y, A+ solves combines low time complexity and good performance. Sec-
min{ky − Sl,y δk2 + λkδk2 }, (13) ondly, ERP trains offline a dictionary and regressors and,
δ thus, allows for low computational times during testing. In
where Sl,y is the matrix of 2048 nearest neighbors anchored the following we describe how ERP (see Fig. 4) is derived
to the atom dy and λ is set to be 0.1. ‘Nearest’ is measured and used for the post-processing of demosaiced images. ERP
by correlation. The algebraic solution of Eq. (13) is goes through two stages just as A+ does.
Offline stage. ERP is trained using 100 high quality im-
δ = Pl,y y, Pl,y = (STl,y Sl,y + λI)−1 STl,y , (14) ages collected from Internet for post-processing the results
where I is the unit matrix. As to the HR training images, of a selected demosaicing method. The demosaicing method
the first thing is to remove the low frequencies by subtracting reconstructs the LR image. In the CPCA (Collection + PCA)
the upscaled corresponding LR image. Then, A+ collects 5 step (see Fig. 4), the first and second-order finite differences
million 3 × 3 such HR patches corresponding each to LR of the G channel are extracted for the LR training images, in
patches. The HR patch values are further normalized by the both the h/v direction,
corresponding l2 -norm of LR patch features. The anchored F1h = 1 −1 = FT1v ,
 
regressor Py corresponding to the atom dy is precomputed (17)
F2h = 1 −2 1 /2 = FT2v ,
 
offline
so that we can keep information on edges and mosaic artifacts,
Py = Sh,y Pl,y = Sh,y (STl,y Sl,y + λI)−1 STl,y , (15)
and train a dictionary adapted to a specific demosaicing
where Sh,y contains 2048 HR patches corresponding to LR method. Small 3 × 3 regions at the same position of the
features in Sl,y . filtered G channel are collected and concatenated to form
Online Stage. During this stage, the testing LR image is (as one vector (feature), along with PCA dimensionality reduction
done in the offline stage) firstly scaled up to the target size by with 99.9% preserved energy. After repeating the process for
bicubic interpolation and the first- and second-order horizontal the R and B channels, the input vector vlk is eventually formed

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
5

by three features at the same position of the RGB channels. IV. D IRECTIONAL D IFFERENCE R EGRESSION (DDR) AND
This process is called CPCA step in Fig. 4. F USED REGRESSION (FR)
Later, ERP applies the K-SVD [29] method as in [21], [26], In this section, we make a couple of observations on
[27] to train an LR dictionary Wl with 4096 atoms: the MLRI and ERP methods and introduce our proposed
X
Wl = argmin kvlk − Wl ck k2 s.t. kck k0 ≤ N ∀k, (18) independent demosacing methods, DDR and FR.
Wl ,{ck } k

where {ck }k is the set of coefficient vectors. The training A. Observations


process goes through 20 iterations of the K-SVD algorithm, MLRI computes the enhanced h/v differences (G-R, G-
allocating N = 3 atoms per coefficient vector. Here, the choice B) with the modified guided image filter and residual inter-
of 3 atoms and 20 iterations is based on A+, which shows polation, which leads to a rather inaccurate estimation. To
good performances on the super-resolution task. improve the h/v differences we follow the idea of regressor
Assuming that the atoms are sparsely embedded in a mani- training. As described previously, ERP maps LR features
fold, it is natural to use input vectors {vlk }k for densely sam- into HR patches without low frequencies. We implement a
pling the manifold. Moreover, not only the input vectors {vlk }k similar idea and map inaccurate color differences into accurate
but also the LR vectors collected from the scaled pyramids color differences without low frequencies by offline trained
of the LR training images can serve to better approximate the regressors. MLRI also uses the Laplacian filtered G channel to
manifold. Here, the overall size scaling factors of the pyramids guide the reconstruction of the R/B channels. The set of first
layers are of the form 0.98p with levels p = 0, . . . , 11. Thus, and second-order finite differences highlight edge and blob-
ERP selects 2048 nearest neighbors anchored to an atom from like profiles in the intensity patterns. Therefore we use them
5 million region/vector candidates, all of which are normalized both in our methods.
by the l2 -norm. ‘Nearest’ is measured by correlation. In the ERP as an enhancement step can be applied to different
following step, ERP computes {Ql,i }i=1,...,4096 demosaicing methods. This said, its performance depends on
Ql,i = (NTl,i Nl,i + λI)−1 NTl,i , (19) that choice. If we use bilinear interpolation as the starting
point, the final performance is less impressive than that of
where Nl,i is the matrix of 2048 nearest neighbors anchored
ERP starting from a state-of-the-art method. This brings the
to the atom wi and λ is set to be 0.1. As to the ground
question whether we can improve beyond the combination of
truth images, the first thing needed is to remove the low
ERP with any of the demosaicing methods that we described
frequencies by subtracting the demosaiced LR image. Then,
in the introduction. In what follows, we present two novel
ERP collects 5 million high resolution (HR) patches without
methods, both making use of ERP, but also improving on the
low frequencies corresponding to the previously collected LR
initial demosaicing.
candidate vectors. The HR candidates are further normalized
by the corresponding l2 -norm of the LR candidates. Finally,
the anchored regressor Qi corresponding to the atom wi is B. Directional Difference Regression (DDR)
precomputed offline Our proposed DDR method has three steps (see Fig. 5). Due
−1
Qi = Nh,i Ql,i = Nh,i (NTl,i Nl,i + λI) NTl,i , (20) to space constraints, we discuss the horizontal case of three
channels. Let Rx,y , Gx,y , Bx,y to be the raw input image
where Nh,i contains 2048 HR patches corresponding to 2048 values at position (x, y):
nearest neighbors in Nl,i .
(i) Without relying on sophisticated methods as the starting
Online stage. The same demosaicing method is applied first
point, assume raw values Ri,j , Gi,j , Bi,j are missing, based
at test time. Among the studied demosaicing methods, we
on Eq. (1) we use the simplest linear interpolation to obtain the
consider MLRI to be the best match for ERP because of its low H
tentative values R̄i,j , ḠH H
i,j , B̄i,j horizontally. Then the tentative
time complexity and good performance. MLRI can be used to ¯H
horizontal color differences (G-R, G-B) ∆ g,r/b are computed
independently interpolate RGB channels before applying ERP
as
(see Fig. 4), or interpolate the G channel first, then guide the  H
RB channels with the ERP updated G channel. No matter what Ḡ − Ri,j G is interpolated at R,
 i,j

H

the case may be, ERP searches the nearest neighbor atom ¯ H (i, j) = Ḡ i,j − B i,j G is interpolated at B,
∆ g,r/b H
wj in Wl for a vector vj of the input image with highest 
 G i,j − B̄ i,j B is interpolated,
H
Gi,j − R̄i,j R is interpolated.

correlation; the output patch yj is computed by multiplying
regressor Qj anchored to wj and vj , which is indicated by (23)
the brown arrow confluencing with the black arrow in Fig. 4, In the CPCA step (see Fig. 5), instead of applying gradient
filters of h/v directions as ERP, we filter ∆ ¯H
g,r/b by horizontal
yj = Qj vj = Nh,j Ql,j vj , (21) F1h , F2h (Eq. (17)), collect 3 × 3 regions at the same po-
sition of filtered ∆¯H
where Ql,j vj is the algebraic solution of g,r/b and concatenate them to one vector
(horizontal feature) respectively, along with PCA reduction.
min{kvj − Nl,j xk2 + λkxk2 }. (22)
x The training images go through the same process, and we
After adding yj to the low frequencies (the input demoisaced use the K-SVD method to iteratively compute the LR color
image) as well as averaging the overlapping area, the small (region) difference dictionary with 4096 atoms. Furthermore,
patches are integrated into a complete output image. we collect 5 million LR (region) differences from the scaled

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
6

Fig. 5: Scheme of DDR.

pyramids of LR training images, selecting 2048 LR differ- Finally, the color difference is updated as follows
ences most correlated to an atom as anchored neighbors. ˆ V (i − k + 1 : i, j)
∆g,r/b (i, j) = {ws Fk ∗ ∆ g,r/b
Correspondingly, we collect 5 million HR differences without
low frequencies. Then, we compute the directional difference ˆ
+ wn Fk ∗ ∆ V
(i : i + k − 1, j)
g,r/b
regressor Pd anchored to an atom d (29)
ˆ H (i, j − k + 1 : j) ∗ FT
+ ww ∆ g,r/b k
Pd = Sh,d (STl,d Sl,d + λI)−1 STl,d , (24) ˆ H (i, j : j + k − 1) ∗ FT }/wt ,
+ we ∆ g,r/b k
where Sh,d is the matrix of 2048 HR differences correspond- ˆV
where ∆g,r/bis the vertical enhanced color difference. By
ing to LR differences in Sl,d and λ is 0.1. Finally, a tentative
adding the ground truth R/B values and ∆g,r/b we obtain the
color difference in the input image is improved by offline
updated G values.
computed regressors as follows
(iii) When it comes to the R channel, we apply the Laplacian
yh,j = λ1 Pj yl,j , (25) filter (Eq. (11)) to obtain the tentative R values R1 , as well as
the horizontal and vertical first-order difference filters Fh , Fv
where Pj is the regressor anchored to the atom with highest
Fh = −1 0 0 0 1 = FTv ,
 
correlation to the LR difference yl,j and λ1 is the regres- (30)
sor correction parameter. By adding HR differences to the
and get the h/v R values R2 and R3 . The above processes yield
channel-shared low frequencies we have the enhanced color
ˆH . residues for the raw values of the R channel. For k = 1, 2, 3
differences ∆ g,r/b consider
(ii) Now we come to the stage of color difference updating.
k

Based on MLRI, we compute the weights wn,s,e,w and the Ri,j − Ri,j Ri,j is the raw value,
k (i, j) = (31)
sum of the weights wt = wn + ws + we + ww , where 0 others .
X j+2
i+1 X i+1
X j
X After using the bilinear filter
H 2 H 2
we = 1/( Da,b ) , ww = 1/( Da,b ) , 
0.25 0.5 0.25

a=i−1 b=j a=i−1 b=j−2
Fb =  0.5 1 0.5  , (32)
i j+1 j+1
i+2 X
X X
V 2
X
V 2 0.25 0.5 0.25
ws = 1/( Da,b ) , wn = 1/( Da,b ) ,
a=i−2 b=j−1 a=i b=j−1 we have the enhanced estimations of R values
(26) R̃k = λ2 Fb ∗ k + Rk , for k ∈ 1, 2, 3, (33)
H V
here Di,j and Di,j are given by where λ2 is the residue correction parameter. Thus, we have
H
Di,j ˆ H (i, j + 1) − ∆
= k∆ ˆ H (i, j − 1)k, the Laplacian updated R value R̃1 . With the help of the
g,r/b g,r/b
(27) previous weights wn,s,e,w , we obtain the gradient updated
V
Di,j ˆ V (i + 1, j) − ∆
= k∆ ˆ V (i − 1, j)k.
g,r/b g,r/b color difference for R̃2 and R̃3
We observe that MLRI uses the neighbor color differences ∆r (i, j) = {wn ∆˜ 3r (i − 1, j) + ws ∆
˜ 3r (i + 1, j)
weighted by a Gaussian filter (Eq. (9)) to alleviate the error (34)
˜ 2 (i, j − 1) + ww ∆
+ we ∆ ˜ 2 (i, j + 1)}/wt ,
of color differences in the center, and the neighbor size is r r

fixed to be 4. Instead of a Gaussian filter, we prefer a simpler where


averaging filter to weigh the neighbor color differences ˜ kr (i, j) = Gi,j − r̃i,j
∆ k
for k = 2, 3 . (35)
 
Fk = 1/k 1/k . . . 1/k . (28) Then, calculating the ground truth G minus ∆r , we have the
| {z }
k
updated R values R̃2,3 . The final R value is obtained simply

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
7

37.5 37.5 37.5 37.5


PSNR (dB)

37.3 37.3 37.3 37.3 FR


DDR
37.1 37.1 37.1 37.1

36.9 36.9 36.9 36.9


1 2 3 4 5 8 9 10 11 12
2 2 2 2 2 20 40 60 80 100 3 5 7
6
# LR/HR differences (×10 ) # atoms/regressors # training images Size of region

Fig. 6: PSNR versus number of training LR/HR differences, atoms/regressors, training images, and region size for the IMAX
dataset.
by averaging R̃1 , R̃3 and R̃3 . For the B values we follow the
same process.

C. Fused Regression (FR)


As reported in [24], the PSNR performance significantly
improves after the ERP post-processing step. Since 50% of the
ground truth pixels are available in the G channel compared
to only 25% for the R/B channels, the G channel is easier
to enhance than the R/B channels. This means ERP works
especially well on the G channel. The above observation mo-
tivates us to feed the ERP enhanced G channel into our DDR
method, and so deriving our Fused Regression (FR) method.
We train the regressors for the directional differences and for
the MLRI demosaiced G value in the same training stage.
In other words, besides the directional difference dictionaries
and regressors which are trained in step (i) of DDR, we also
train the LR dictionary and regressors for MLRI demosaiced G
a) IMAX dataset a) Kodak dataset
values, according to the Offline stage of ERP. After applying
the ERP step to the G values of an input image demosaiced Fig. 7: The IMAX and Kodak datasets.
by MLRI, we obtain another updated G value at the online
stage.
By simply averaging the two versions of updated G values Kodak International Newspaper Snapshot Awards (KINSA).
we obtain the enhanced G values of FR. Our experiments will Such choice of images ensures the high-fidelity of the Kodak
show us that the R/B guided image quality is highly related benchmark. Besides, it has valuable artistic merit. Another
to the one of the G channel. The better the recovery of the important factor about Kodak is that the images contain a large
G channel, the better is the R/B channel restoration. This is amount of constant intensity regions. Moreover, Kodak’s use
another crucial reason underlying the idea of fused regression. for testing by researchers has a long history. Therefore, the
The running time of FR is merely marginally increased with PSNR performances on Kodak are generally good and above
respect to that of the DDR method, as also shown in the 40dB on average.
following experimental section.
IMAX. Besides the Kodak dataset, we also test our methods
on another standard dataset, IMAX, which is also widely
V. E XPERIMENTS
used for validation of demosaicing methods. IMAX contains
In this section we describe and discuss the datasets and the 18 images of size 500x500 pixels and exhibits more color
setup used to validate the parameters of our methods and to gradations than the Kodak images. IMAX is a newer dataset,
experimentally compare with the state-of-the-art demosaicing and generally, considered to be more challenging. In fact,
methods. The results are analyzed together with the limitations the reported PSNR performances on IMAX are a lot worse
of our methods and future directions of improvement. than those on Kodak, usually lower than 37dB on average.
More importantly, the hue and saturation conditions of IMAX
A. Datasets images are closer to the images acquired by current digital
Kodak. The Kodak dataset contains 24 images of size cameras.
512x768 pixels and photographic quality involving a variety RW. Despite the high-fidelity and artistic merit of Kodak and
of subjects in many locations under different lighting con- IMAX, they are not representative enough for the images taken
ditions. The images are either created by Kodak’s profes- by normal people. Because the color and composition of those
sional photographers; or selected from the winners of the images are biased to the artistic taste. Therefore, as a dataset

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
8

complementary to the standard benchmarks, we also selected ensure the relevance of our training dictionary and its anchored
500 real world (RW) color images with RGB channels as HR regressors.
images, using the Google search engine. Dozens of keywords Performance measures. In order to evaluate the performance
– such as nature, landscape, people, city – yielded images of the demosaicing methods we employ the standard Peak-
from daily life. We made sure that all the categories contain Signal-to-Noise-Ratio (PSNR), the Structural Similarity Index
a similar amount of images. Then we added a Bayer pattern (SSIM) [30], the Zipper Effect Ratio (ZER) [31], and the
mask on them to obtain LR images. runtime at test. All the compared methods along with our pro-
As a result, we not only consider images of high visual posed methods share the same testing environment – Intel(R)
quality as those in Kodak and IMAX, but also the products of CoreT M i7-930 @2.80GHz with 8 GB RAM. PSNR measures
everyday photography. Whereas regions with slowly varying quantitatively the fidelity of the restoration in comparison with
color intensities tend to show good performance with inter- the ground truth, while SSIM measures the structural similarity
polation methods, reconstructing high-quality outputs from with the ground truth and ZER the ratio of pixels affected by
‘busy’ images is more difficult. Therefore, we also focus on the zipper effect or edge blurring.
images which are highly textured and have a rich color gamut.

B. Experimental setup C. Parameters


DDR and FR. The Kodak images have relatively mild in- The main parameters that influence the PSNR performance
tensity shifts, while IMAX images are richer in detail and of our proposed DDR and FR demosaicing methods are
high frequencies, which have a smaller number of neighboring evaluated in Fig. 6 on the IMAX dataset. 2 Besides the 100
pixels with similar color intensities, on average. Thus, for shared training images, we collect 4 other sets. The results
our methods we set the neighbor size (Eq. (28)) to 1 for achieved with the 5 training sets are between 37.01dB and
IMAX and 4 for Kodak. Due to rich high frequencies of 37.17dB for DDR and between 37.41dB and 37.49dB for
the IMAX images, it is difficult for IMAX to benefit from FR on IMAX. On the Kodak dataset, the performances are
linear regression. Therefore, we set the regressor correction 41.06dB - 41.10dB for DDR and 41.01dB - 41.08dB for FR.
parameter λ1 (Eq. (25)) to 1 for IMAX and 1.5 for Kodak. The variance is very small; a training set of 100 images offers
IMAX and RW share the same parameters, since RW images a stable performance. We report the mean performance for
also show obvious intensity shifts. As to the residual correction these 5 training sets for all curves in Fig. 6.
parameter, we optimized it on several arbitrarily selected Number of differences. The performance changes slightly
training images, and we fix λ2 = 1.2 (Eq. (33)) for all datasets. when increasing the collected (region) differences from 1 to
ERP. The Kodak dataset has been used for decades and most 5 million. For example, the mean PSNR of DDR varies from
of the state-of-the-art methods have already achieved good 37.07dB to 37.13dB, less than 0.06dB. We believe that after
performances (∼ 40dB PSNR). So there is not so much space collecting 1 million differences, the results are sufficiently
left for any ERP enhancement. In order to make the final relevant.
results comparable, we multiply yj of Eq. (21) by the regressor Number of atoms/regressors. As shown in Fig. 6 the PSNR
correction parameter γ1 = 0.5 for all compared methods. performance of our methods improves with the number of
Due to the same reason, we use a small residual correction atoms/regressors. The more atoms we train, the more chances
parameter γ2 = 0.5 for all methods. On IMAX the average we have to better approximate the input LR differences with
PSNR results achieved by the compared methods are less anchored regressors. However, we should point out that the
impressive than on Kodak, and are lower than 37dB. This is improvements gained by increasing the atoms tend to be
the reason we set γ1 and γ2 to the larger value of 1.5 in case trivial, that is, we can observe more obvious improvements
of the IMAX dataset. We refer to our previous work [24] for when raising the atom number from 16 to 4096 than from
more ERP experiments. 4096 to 8192. There is a speed/quality trade-off.
Compared methods. We compare our DDR and FR methods Number of training images. When we increase the number
to BILINEAR, HQL [5], AHD [18], AP [17], PCSD [11], of training images from 20 to 100, as shown in Fig. 6, the
DLMMSE [6], LPAICI [7], GBTF [12], MSG [13], LD- mean PSNR only slightly grows. Here, we ignore the unstable
INAT [16], MLRI [14], and LSSC [19]. Unfortunately neither starting point at 20 for DDR. We believe that 20 images
the code nor output images of SAPCA [20] and AVSC [15] form a rather small training pool lack statistical significance,
are available to us, so we cannot reproduce their results. We therefore we fix by default to 100 images the training pool in
refer to the introduction Section I for the brief description of all our experiments. The results confirm that the training set
the methods. containing 100 images is representative and large enough to
Default settings. In all our experiments, if not stated other- collect millions of LR/HR differences.
wise, we use the following default parameters for the DDR, Size of the region. As to the size of the region, we witness
FR, and ERP methods: 4096 atoms/regressors, 2048 near- a clear quality drop when going up to 7 × 7 regions, not to
est neighbors for learning each anchored regressor, 3 × 3 mention the increase in runtime. For 5 × 5 regions the DDR
region size, 100 training images, 5 million training candi- and FR methods behave differently. Clearly, 3 × 3 appears to
dates/regions. We keep the same 100 high-quality training be the most cost-effective choice in our methods.
images for all above demosaicing methods during the ex-
periments on the three datasets. With this training set we 2 On the Kodak and RW datasets the performances show a similar pattern.

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
9

IMAX Kodak RW
Method
R G B All Time (s) R G B All Time (s) R G B All Time (s)
BILINEAR 31.72 35.41 31.27 32.36 0.02 29.30 33.19 29.24 30.22 0.04 26.85 30.99 27.16 27.92 0.02
HQL 34.02 37.57 33.03 34.46 0.02 34.85 39.08 34.74 35.80 0.02 30.80 34.98 30.93 31.84 0.01
AHD 33.06 37.00 32.17 33.53 38.96 37.02 39.67 37.33 37.79 59.54 31.51 35.20 31.61 32.36 19.39
AP 32.85 34.92 32.01 33.06 0.46 38.03 41.56 38.64 39.11 0.74 31.72 34.79 31.78 32.49 0.25
PCSD 34.66 38.12 33.46 34.94 1.21 38.07 40.55 38.30 38.78 1.45 32.72 36.12 32.85 33.55 0.92
DLMMSE 34.06 38.00 33.04 34.49 27.13 39.19 42.63 39.58 40.12 42.50 32.81 36.88 32.89 33.72 12.84
LPAICI 34.40 37.87 33.30 34.73 0.61 39.68 43.04 39.95 40.56 1.00 33.01 36.80 33.02 33.86 0.33
GBTF 34.05 37.37 33.09 34.42 10.60 39.69 43.36 40.03 40.64 16.42 33.11 37.03 33.15 33.99 5.28
MSG 34.44 37.68 33.40 34.76 9.65 40.09 43.81 40.35 41.02 14.89 33.50 37.37 33.51 34.36 4.73
LDINAT 36.28 39.76 34.39 36.21 1523.63 37.02 39.47 37.15 37.72 2418.24 32.84 36.33 32.95 33.67 717.23
MLRI 36.62 40.03 35.43 36.80 0.54 39.33 42.92 39.63 40.24 0.74 33.76 37.44 33.98 34.66 0.26
LSSC 35.90 38.69 34.64 36.05 453.58 40.60 44.42 40.74 41.52 707.59 33.93 37.56 33.91 34.76 258.71
DDR (ours) 37.12 40.34 35.63 37.17 6.31 40.18 43.92 40.39 41.10 9.60 34.12 37.53 34.26 34.95 3.17
FR (ours) 37.50 41.01 35.82 37.49 10.75 40.19 43.85 40.34 41.07 16.35 34.29 37.85 34.42 35.13 5.20

TABLE I: PSNR performance comparison of 14 demoisaicing methods on 3 datasets. The best two results are in bold.

IMAX Kodak RW
Method
R G B All Time (s) R G B All Time (s) R G B All Time (s)
BILINEAR 36.36 40.05 35.14 36.66 32.26 35.65 32.00 33.01 32.72 36.54 33.08 33.74
HQL 36.62 40.13 35.19 36.78 36.70 40.50 36.52 37.54 33.37 37.55 33.47 34.33
AHD 34.53 39.27 33.35 34.96 37.46 40.09 37.66 38.19 31.94 36.39 32.01 32.91
AP 36.86 40.28 35.19 36.89 38.82 42.65 39.07 39.80 33.20 37.41 33.25 34.15
PCSD 37.19 40.40 35.59 37.21 38.55 41.21 38.61 39.24 33.74 37.62 33.82 34.65
DLMMSE 37.23 40.53 35.57 37.23 39.57 42.88 39.77 40.41 33.98 38.02 34.00 34.88
LPAICI 37.29 40.66 35.55 37.26 +8.35 39.94 43.18 40.01 40.73 +12.74 33.97 37.96 33.94 34.85 +4.01
GBTF 37.26 40.63 35.62 37.28 39.96 43.46 40.11 40.81 34.06 38.09 34.06 34.95
MSG 37.28 40.63 35.60 37.28 40.23 43.83 40.32 41.08 34.14 38.20 34.14 35.03
LDINAT 37.50 40.54 35.42 37.24 37.71 40.23 37.65 38.34 33.81 37.50 33.74 34.62
MLRI 37.81 40.93 36.08 37.72 39.60 43.17 39.80 40.47 34.24 38.07 34.37 35.16
LSSC 37.52 40.74 35.90 37.53 40.68 44.41 40.68 41.54 34.49 38.47 34.54 35.40
DDR (ours) 38.00 40.76 36.20 37.82 40.25 43.91 40.40 41.13 34.46 38.29 34.54 35.35
FR (ours) 38.15 41.03 36.26 37.95 40.27 43.93 40.38 41.14 34.57 38.42 34.66 35.47

TABLE II: PSNR performance comparison after ERP post-processing on 3 datasets. The best two results are in bold.
D. Results step we significantly boost the performance of most methods,
especially on IMAX, as shown in Table II, at the cost of some
In order to rule out boundary effects, we shave off 2, 4, 6, extra seconds. The improvements on the IMAX dataset vary
8 boundary pixels for all the methods. The compared methods from +0.46dB to +4.36dB (see Table II). Broadly speaking,
are stable on the boundary, only for DLMMSE we need to cut the worse the initial demosaicing methods, the larger the
6 boundary pixels to reach good stable performance. improvements. As to our post-processing method on MLRI,
PSNR. In Tables I and II we report the best PSNR results the PSNR is almost 1dB better than the original MLRI and
from the above discussed 4 candidates for 14 demosaicing outperforms all the other compared methods on IMAX, even
methods. The best two results are in bold. Our DDR and FR when equipped with ERP. DDR with ERP achieves 37.82dB
methods are the best on both the IMAX and RW datasets. while 37.88dB is the best result to date, reported for SAPCA.
For instance, the mean PSNR of DDR is 37.17dB on IMAX, FR even slightly improves over SAPCA by achieving 37.95db.
1.12dB higher than the state-of-the-art method LSSC, while FR with the ERP step takes less than 20 seconds while SAPCA
FR reaches an improvement of 1.34dB over LSSC. At the same costs 20 minutes. To sum up, all our proposed methods
time (see Fig. 2), DDR is almost 80 times faster than LSSC MLRI + ERP and DDR/FR + ERP achieve very good results,
and 250 times faster than LDINAT. The ranking is preserved comparable to SAPCA on IMAX, while being significantly
on the 500 images of the RW dataset. Our proposed methods faster than SAPCA (about 2 orders of magnitude).
are among the fastest demosaicing methods, while performing When it comes to the Kodak dataset, the improvement of
significantly better. our post-processing methods is less impressive. It varies from
SSIM and ZER. Complementary to the PSNR results, we +2.79dB for BILINEAR to as low as +0.02dB for LSSC. This
report ZER [31] (computed on the CIELAB color space) and is mainly due to the fact that on the Kodak dataset many
SSIM [30] results on each image of the IMAX and Kodak methods have achieved impressive results above 40dB, and
datasets (see Table III and IV). The listed results correspond hence, there may not be much space left for further improve-
to the images in Fig. 7 from left to right and top to bottom for ment. Still, our ERP methods achieve 40.47dB/40.63dB on 24
each dataset. Fig. 9 shows that the ZER relative ranking of the Kodak images, 0.23dB/0.39dB higher than MLRI.
top 5 methods, together with FR + ERP, stays the same for a Also note that, if we apply the ERP regression training
large range of values for the ZER threshold. The ZER values method multiple times, we repeatedly benefit, which is further
in Tables III and IV are obtained by fixing the threshold to 2.5. confirmed by the results presented in [24].
The results for different measures confirm the top performance Visual Comparison. For visual quality assessment we show
achieved by the proposed DDR/FR and ERP. a couple of image results of the top methods in Fig. 8.
ERP post-processing. By applying the ERP post-processing For example, in the image ‘Flower’ we clearly observe false

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
10

Ground truth ERP FR (ours) DDR (ours) LSSC MLRI MSG

3
Fig. 8: Visual assessment of demosaicing results with and without ERP post-processing.

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
11

Image Measure DDR (ours) FR (ours) LSSC MLRI LDINAT


PSNR 29.5787 29.8766 28.3874 29.3128 29.0508
1 ZER 0.2504 0.2512 0.2885 0.2702 0.2518
SSIM 0.9707 0.9724 0.9648 0.9696 0.9683
Image Measure DDR (ours) FR (ours) LSSC MLRI LDINAT
PSNR 39.3412 39.6974 38.2949 38.8945 38.7399
2 ZER 0.0540 0.0532 0.0631 0.0654 0.0544 PSNR 40.0882 39.7031 41.3972 39.2617 34.4912
SSIM 0.9850 0.9856 0.9870 0.9844 0.9858 1 ZER 0.0274 0.0354 0.0157 0.0479 0.2114
PSNR 40.1342 40.4334 39.4959 39.9989 39.4450 SSIM 0.9946 0.9938 0.9964 0.9929 0.9766
3 ZER 0.0549 0.0532 0.0660 0.0683 0.0564 PSNR 41.4867 41.6328 42.2242 40.9048 39.8064
SSIM 0.9889 0.9894 0.9882 0.9891 0.9876 2 ZER 0.0145 0.0143 0.0156 0.0187 0.0399
PSNR 39.6395 40.1709 38.0073 39.6490 38.8778 SSIM 0.9995 0.9995 0.9997 0.9994 0.9990
4 ZER 0.0290 0.0263 0.0429 0.0348 0.0399 PSNR 44.0570 44.1495 44.4702 42.8711 41.8840
SSIM 0.9984 0.9986 0.9977 0.9984 0.9980 3 ZER 0.0119 0.0116 0.0119 0.0163 0.0324
PSNR 41.0852 41.2874 40.1714 40.6218 40.7822 SSIM 0.9969 0.9969 0.9960 0.9963 0.9954
5 ZER 0.0158 0.0152 0.0220 0.0208 0.0182 PSNR 41.4366 41.5841 42.5509 40.4821 39.7177
SSIM 0.9983 0.9984 0.9979 0.9981 0.9981 4 ZER 0.0158 0.0156 0.0161 0.0231 0.0514
PSNR 39.1238 39.4262 38.3310 39.0300 38.6947 SSIM 0.9986 0.9986 0.9988 0.9979 0.9969
6 ZER 0.0512 0.0514 0.0495 0.0564 0.0478 PSNR 39.3056 39.3367 39.7253 37.8291 35.8183
SSIM 0.9826 0.9829 0.9825 0.9815 0.9827 5 ZER 0.0479 0.0483 0.0449 0.0682 0.1485
PSNR 39.3334 39.6756 38.6516 39.1818 38.8972 SSIM 0.9953 0.9953 0.9957 0.9931 0.9878
7 ZER 0.0478 0.0483 0.0589 0.0575 0.0455 PSNR 41.4361 41.3260 41.8365 40.6487 35.8244
SSIM 0.9856 0.9860 0.9866 0.9844 0.9855 6 ZER 0.0164 0.0173 0.0140 0.0236 0.1281
PSNR 34.5391 35.3247 32.2477 35.2574 33.4692 SSIM 0.9963 0.9963 0.9970 0.9955 0.9840
8 ZER 0.1547 0.1411 0.1936 0.1749 0.1699 PSNR 43.6870 43.8461 44.1721 42.4423 41.5526
SSIM 0.9886 0.9902 0.9838 0.9906 0.9856 7 ZER 0.0147 0.0140 0.0136 0.0206 0.0360
PSNR 33.7582 34.2723 32.4733 33.2218 32.8558 SSIM 0.9979 0.9980 0.9978 0.9972 0.9967
9 ZER 0.2353 0.2292 0.2704 0.2640 0.2163 PSNR 37.5114 37.3933 37.6831 36.7108 32.6177
SSIM 0.9843 0.9859 0.9804 0.9830 0.9804 8 ZER 0.0511 0.0546 0.0517 0.0690 0.2027
PSNR 35.7958 36.2427 34.7990 36.0956 34.9374 SSIM 0.9887 0.9881 0.9874 0.9863 0.9716
10 ZER 0.1027 0.1029 0.1124 0.1149 0.1204 PSNR 43.8016 43.7787 43.9857 43.1668 40.5876
SSIM 0.9735 0.9750 0.9719 0.9751 0.9716 9 ZER 0.0073 0.0071 0.0078 0.0096 0.0347
PSNR 35.3552 35.6413 34.4012 35.2228 35.0137 SSIM 0.9792 0.9782 0.9764 0.9761 0.9686
11 ZER 0.1246 0.1253 0.1369 0.1355 0.1232 PSNR 43.3092 43.3516 43.3361 42.5853 40.8827
SSIM 0.9624 0.9638 0.9614 0.9613 0.9621 10 ZER 0.0085 0.0083 0.0089 0.0109 0.0292
PSNR 34.5080 34.5717 33.1745 33.8563 32.6291 SSIM 0.9882 0.9880 0.9870 0.9867 0.9813
12 ZER 0.1417 0.1440 0.1728 0.1681 0.1970 PSNR 41.3479 41.2523 41.6387 40.5579 36.9755
SSIM 0.9622 0.9628 0.9567 0.9605 0.9567 11 ZER 0.0207 0.0228 0.0200 0.0285 0.0909
PSNR 38.9513 39.2534 37.1949 37.8776 36.0160 SSIM 0.9856 0.9854 0.9801 0.9838 0.9750
13 ZER 0.0426 0.0424 0.0751 0.0564 0.0848 PSNR 44.6544 44.7106 45.0720 43.8345 41.0818
SSIM 0.9628 0.9616 0.9606 0.9606 0.9600 12 ZER 0.0074 0.0073 0.0066 0.0096 0.0295
PSNR 34.8557 35.1883 32.4814 34.4732 34.1576 SSIM 0.9979 0.9979 0.9981 0.9973 0.9936
14 ZER 0.1102 0.1087 0.1483 0.1213 0.1151 PSNR 36.1702 35.7880 36.6171 35.2809 30.7352
SSIM 0.9743 0.9753 0.9674 0.9725 0.9720 13 ZER 0.0885 0.1048 0.0767 0.1262 0.3267
PSNR 38.9375 39.4944 37.8871 38.6092 37.8008 SSIM 0.9904 0.9894 0.9915 0.9880 0.9688
15 ZER 0.0770 0.0715 0.0941 0.0891 0.0734 PSNR 38.2368 38.3700 39.0632 37.0632 36.2735
SSIM 0.9705 0.9722 0.9692 0.9692 0.9666 14 ZER 0.0477 0.0485 0.0492 0.0606 0.1190
PSNR 37.3275 36.8685 39.5520 36.8480 36.0186 SSIM 0.9919 0.9916 0.9915 0.9898 0.9805
16 ZER 0.0890 0.1081 0.0309 0.1107 0.1361 PSNR 40.1363 40.1631 41.6871 39.1055 38.5148
SSIM 0.9446 0.9410 0.9616 0.9418 0.9342 15 ZER 0.0276 0.0289 0.0245 0.0374 0.0635
PSNR 39.2806 39.4434 36.4009 37.5119 37.4848 SSIM 0.9905 0.9905 0.9908 0.9879 0.9869
17 ZER 0.0498 0.0519 0.0611 0.0636 0.0692 PSNR 44.7720 44.7449 45.0422 44.2519 39.0930
SSIM 0.9130 0.9151 0.9199 0.9141 0.9089 16 ZER 0.0050 0.0055 0.0045 0.0077 0.0744
PSNR 37.5248 38.0197 37.0091 36.7090 36.8613 SSIM 0.9931 0.9932 0.9934 0.9924 0.9779
18 ZER 0.0739 0.0712 0.0878 0.0931 0.0778 PSNR 42.4419 42.4173 42.2897 41.8726 39.0506
SSIM 0.9914 0.9921 0.9910 0.9902 0.9906 17 ZER 0.0097 0.0102 0.0102 0.0131 0.0480
PSNR 37.1705 37.4938 36.0534 36.7984 36.2073 SSIM 0.9838 0.9836 0.9815 0.9819 0.9745
AVG ZER 0.0947 0.0942 0.1097 0.1092 0.1054 PSNR 38.1835 38.0551 38.3631 37.4112 34.7181
SSIM 0.9743 0.9749 0.9738 0.9736 0.9719 18 ZER 0.0488 0.0510 0.0458 0.0623 0.1342
SSIM 0.9805 0.9801 0.9797 0.9775 0.9675
PSNR 42.0490 41.9158 42.4187 41.2460 37.5344
TABLE III: IMAX per image results. The best is in bold. 19 ZER 0.0112 0.0130 0.0113 0.0184 0.1041
SSIM 0.9857 0.9856 0.9850 0.9840 0.9778
PSNR 42.3627 42.3495 42.2987 41.4251 39.3560
color artifacts near red petals for the other methods. Our 20 ZER 0.0158 0.0165 0.0165 0.0219 0.0542
SSIM 0.9317 0.9327 0.8686 0.9194 0.9350
DDR method accomplishes good improvements over com- PSNR 40.5918 40.3939 40.7610 39.7414 35.9479
pared methods, and FR makes further progress to show natural 21 ZER 0.0257 0.0294 0.0228 0.0385 0.1315
SSIM 0.9923 0.9921 0.9923 0.9909 0.9826
transition near red petals. The comparison on the image ‘T- PSNR 39.4463 39.5640 39.4511 38.8388 37.1633
shirt’ also confirms the experimental results. First of all, the 22 ZER 0.0329 0.0329 0.0352 0.0412 0.0856
SSIM 0.9867 0.9868 0.9865 0.9847 0.9816
zippering effects on LSSC demosaiced images is quite obvious PSNR 44.0747 44.2184 44.5398 43.2718 42.3016
near the edges of color stains. In contrast, both our methods 23 ZER 0.0101 0.0097 0.0097 0.0124 0.0226
SSIM 0.9961 0.9961 0.9956 0.9953 0.9943
have output images very close to the ground truth, and one PSNR 35.7649 35.6556 35.9736 34.9547 33.2356
can barely observe any zippering effects. Finally, the ‘Sail’ 24 ZER 0.0564 0.0590 0.0581 0.0672 0.1238
SSIM 0.9800 0.9790 0.9721 0.9771 0.9733
image from the RW dataset demonstrates the effectiveness PSNR 41.0980 41.0708 41.5249 40.2399 37.7152
of our methods. The false black colors in the yellow balls AVG ZER 0.0260 0.0278 0.0246 0.0355 0.0968
SSIM 0.9884 0.9882 0.9850 0.9863 0.9803
generated by LSSC and other methods are only weakly visible
for FR and DDR. In conclusion, DDR and FR indeed provide TABLE IV: Kodak per image results. The best is in bold.
natural-looking images close to the ground truth, while the
other methods exhibit stronger color artifacts. Moreover, the

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
12

0.35 offline learned on training data and its enhanced version based
FR + ERP on fused regression (FR), along with an efficient regression
0.3 FR priors (ERP) post-processing step. We keep time complexity
Zipper Effect Ratio (ZER)

DDR limited during the online stage and shift the learning and the
MLRI bulk of computations to the offline stage. Thus, we achieve
0.25 LDINAT order(s) of magnitude lower running times at testing time
LSSC than the state-of-the-art LSSC, LDINAT, and SAPCA methods.
0.2 Moreover, the experimental results on various datasets prove
competitive performances. Last but not least, the performance
0.15 of the proposed DDR and FR methods, and of any other
demosaicing method can be further improved by applying our
0.1 ERP post-processing method.

ACKNOWLEDGMENT
1 1.5 2 2.5 3 This work was supported by the ERC Advanced Grant
Threshold VarCity (#273940).
Fig. 9: Zipper Effect Ratio (ZER) vs. threshold on IMAX.
R EFERENCES
visual performance is consistent with the numerical PSNR
[1] B. E. Bayer, “Color imaging array,” Jul. 20 1976, uS Patent 3,971,065.
results presented in tables I and II. Last but not least, the visual 1
artifacts are generally alleviated by the ERP post-processing. [2] C. Bai, J. Li, Z. Lin, J. Yu, and Y.-W. Chen, “Penrose demosaicking,”
Image Processing, IEEE Transactions on, vol. 24, no. 5, pp. 1672–1684,
2015. 1
E. Limitations and future work [3] X. Li, B. Gunturk, and L. Zhang, “Image demosaicing: A systematic
Self-similarities. Our DDR and FR methods rely on trained survey,” in Electronic Imaging 2008. International Society for Optics
and Photonics, 2008, pp. 68 221J–68 221J. 1
priors and do not exploit the self-similarities and the particular [4] D. Menon and G. Calvagno, “Color image demosaicking: an overview,”
content of the input image. LSSC does exploit the self- Signal Processing: Image Communication, vol. 26, no. 8, pp. 518–533,
similarities and is capable to achieve 0.4dB better PSNR 2011. 1
[5] H. S. Malvar, L.-W. He, and R. Cutler, “High-quality linear interpo-
performance than our methods on the Kodak dataset, but not lation for demosaicing of bayer-patterned color images,” in Acoustics,
on the IMAX dataset and the RW dataset. We believe this Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04). IEEE
to be caused by the particularities of the Kodak dataset with International Conference on, vol. 3. IEEE, 2004, pp. iii–485. 1, 8
[6] L. Zhang and X. Wu, “Color demosaicking via directional linear mini-
respect to the other datasets, such as larger flat regions and mum mean square-error estimation,” Image Processing, IEEE Transac-
larger images, that better suit the LSSC method. The 16th tions on, vol. 14, no. 12, pp. 2167–2178, 2005. 1, 2, 8
IMAX image is the only in that dataset where LSSC achieves [7] D. Paliy, V. Katkovnik, R. Bilcu, S. Alenius, and K. Egiazarian,
“Spatially adaptive color filter array interpolation for noiseless and noisy
superior demosaicing results than our methods. It differs from data,” International Journal of Imaging Systems and Technology, vol. 17,
the other IMAX images by the highly regular texture content, no. 3, pp. 105–122, 2007. 1, 8
a perfect fit for LSSC. The use of self-similarities is a direction [8] V. Katkovnik, K. Egiazarian, and J. Astola, “Local approximation
techniques in signal and image processing.” SPIE Bellingham, 2006.
for further performance improvement of our methods. 1
Design choices. All our methods (ERP, DDR, FR) follow [9] V. Katkovnik, K. Egiazarian, and J. astola, “Adaptive window size
closely the settings of the A+ super-resolution method [21]. image de-noising based on intersection of confidence intervals (ici) rule,”
Journal of Mathematical Imaging and Vision, vol. 16, no. 3, pp. 223–
The effect of the patch features and training procedure on the 235, 2002. 1
overall performance are unclear. If we were to train regressors [10] J. E. Adams Jr and J. F. Hamilton Jr, “Adaptive color plane interpolation
specific to the different offsets with respect to the underlying in single sensor color electronic camera,” May 13 1997, uS Patent
5,629,734. 1
mosaic pattern, further improvements are to be expected. As [11] X. Wu and N. Zhang, “Primary-consistent soft-decision color demo-
shown in [24], cascading the demosaicing methods such that saicking for digital cameras (patent pending),” Image Processing, IEEE
each cascade stage starts from the demosaicing result of the Transactions on, vol. 13, no. 9, pp. 1263–1274, 2004. 1, 8
[12] I. Pekkucuksen and Y. Altunbasak, “Gradient based threshold free color
previous cascade stage is another direction for future research. filter array interpolation,” in Image Processing (ICIP), 2010 17th IEEE
Time complexity. The proposed methods are highly paral- International Conference on. IEEE, 2010, pp. 137–140. 2, 8
lelizable, but the time complexity depends linearly on the [13] I. Pekkucuksen and Y. altunbasak, “Multiscale gradients-based color
filter array interpolation,” Image Processing, IEEE Transactions on,
number of regressors and anchors in the dictionary. However, vol. 22, no. 1, pp. 157–165, 2013. 2, 8
the use of a better sublinear data search structure instead of the [14] D. Kiku, Y. Monno, M. Tanaka, and M. Okutomi, “Minimized-laplacian
current linear search is rather straightforward and can lower residual interpolation for color image demosaicking,” in IS&T/SPIE
Electronic Imaging. International Society for Optics and Photonics,
the computation time [32]. 2014, pp. 90 230L–90 230L. 2, 8
[15] F. Zhang, X. Wu, X. Yang, W. Zhang, and L. Zhang, “Robust color
VI. C ONCLUSIONS demosaicking with adaptation to varying spectral correlations,” Image
Processing, IEEE Transactions on, vol. 18, no. 12, pp. 2706–2717, 2009.
We propose 3 a novel fast demosaicing method based on di- 2, 8
rectional difference regression (DDR) where the regressors are [16] L. Zhang, X. Wu, A. Buades, and X. Li, “Color demosaicking by local
directional interpolation and nonlocal adaptive thresholding,” Journal of
3 Codes available at http://www.vision.ee.ethz.ch/∼timofter/ Electronic Imaging, vol. 20, no. 2, pp. 023 016–023 016, 2011. 2, 8

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2574984, IEEE
Transactions on Image Processing
13

[17] B. K. Gunturk, Y. Altunbasak, and R. M. Mersereau, “Color plane Radu Timofte obtained the PhD degree in Electrical
interpolation using alternating projections,” Image Processing, IEEE Engineering at the KU Leuven, Belgium, in 2013,
Transactions on, vol. 11, no. 9, pp. 997–1013, 2002. 2, 8 the M.Sc. at the Univ. of Eastern Finland in 2007
[18] K. Hirakawa and T. W. Parks, “Adaptive homogeneity-directed demo- and the Dipl.-Ing. at the Technical Univ. of Iasi,
saicing algorithm,” Image Processing, IEEE Transactions on, vol. 14, Romania, in 2006. He worked as a researcher for the
no. 3, pp. 360–369, 2005. 2, 8 Univ. of Joensuu (2006-2007), the Czech Technical
[19] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Non-local Univ. (2008) and the KU Leuven (2008-2013). Since
sparse models for image restoration,” in Computer Vision, 2009 IEEE 2013 he is postdoc at ETH Zurich, Switzerland, in
12th International Conference on. IEEE, 2009, pp. 2272–2279. 2, 8 the lab of prof. Luc Van Gool. He serves as a re-
[20] D. Gao, X. Wu, G. Shi, and L. Zhang, “Color demosaicking with an viewer for major journals such as PAMI, TIP, IJCV,
image formation model and adaptive pca,” Journal of Visual Communi- TNNLS, TKDE, PRL, and T-ITS, and conferences
cation and Image Representation, vol. 23, no. 7, pp. 1019–1030, 2012. such as CVPR, ICCV, and ECCV. He received best paper awards at ICPR
2, 8 2012, at CVVT workshop from ECCV 2012 and at ChaLearn workshop from
[21] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored ICCV 2015. His current research interests include sparse and collaborative
neighborhood regression for fast super-resolution,” in IEEE Asian Con- representations, image restoration and enhancement, and multi-view object
ference on Computer Vision (ACCV), 2014. 2, 3, 5, 12 class recognition.
[22] R. Rothe, R. Timofte, and L. Van Gool, “Efficient regression priors
for reducing image compression artifacts,” in Image Processing (ICIP),
2015 IEEE International Conference on, Sept 2015, pp. 1543–1547. 2
[23] S. Farsiu, M. Elad, and P. Milanfar, “Multiframe demosaicing and super-
resolution of color images,” Image Processing, IEEE Transactions on,
vol. 15, no. 1, pp. 141–159, 2006. 2
[24] J. Wu, R. Timofte, and L. Van Gool, “Efficient regression priors for
post-processing demosaiced images,” in Image Processing (ICIP), 2015
IEEE International Conference on, Sept 2015, pp. 3495–3499. 2, 7, 8,
9, 12
[25] K. He, J. Sun, and X. Tang, “Guided image filtering,” in Computer
Vision–ECCV 2010. Springer, 2010, pp. 1–14. 3
[26] R. Timofte, V. De Smet, and L. Van Gool, “Anchored neighborhood
regression for fast example-based super-resolution,” in Computer Vision
(ICCV), 2013 IEEE International Conference on. IEEE, 2013, pp.
1920–1927. 3, 5
[27] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using
sparse-representations,” in Curves and Surfaces. Springer, 2012, pp.
711–730. 3, 5
[28] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via
sparse representation,” Image Processing, IEEE Transactions on, vol. 19,
no. 11, pp. 2861–2873, 2010. 3
[29] M. Aharon, M. Elad, and A. Bruckstein, “K-svd: An algorithm for
designing overcomplete dictionaries for sparse representation,” Signal
Processing, IEEE Transactions on, vol. 54, no. 11, pp. 4311–4322, 2006.
3, 5 Luc Van Gool obtained the master and PhD degrees
[30] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assess- in Electrical Engineering at the KU Leuven, Bel-
ment: from error visibility to structural similarity,” Image Processing, gium, resp. in 1982 and 1991. He is a full professor
IEEE Transactions on, vol. 13, no. 4, pp. 600–612, April 2004. 8, 9 for Computer Vision at both KU Leuven and ETH
[31] W. Lu and Y.-P. Tan, “Color filter array demosaicking: new method Zurich. With his two research labs, he focuses on
and performance measures,” Image Processing, IEEE Transactions on, object recognition, tracking and gesture analysis,
vol. 12, no. 10, pp. 1194–1210, 2003. 8, 9 and 3D acquisition and modeling. Luc Van Gool
[32] R. Timofte, R. Rothe, and L. Van Gool, “Seven ways to improve was a program chair of ICCV 2005, and general
example-based single image super resolution,” in CVPR 2016, 2016. chair of ICCV 2011, and acted as GC of ECCV
12 2014. He is an editor-in-chief of Foundations and
Trends in Computer Graphics and Vision. He also is
a co-founder of the spin-off companies Eyetronics, GeoAutomation, kooaba,
procedural, eSaturnus, upicto, Fashwell, Merantix, Spectando, and Parquery.
He received several best paper awards, incl. at ICCV 1998, CVPR 2007,
ACCV 2007, ICRA 2009, BMVC 2011, and ICPR 2012.

Jiqing Wu received the B.S. degree in mechani-


cal engineering from Shanghai Maritime University,
China, in 2006, the B.S. degree from TU Darm-
stadt, Germany, in 2012, the M.Sc. degree from
ETH Zurich, Switzerland, in 2015, in mathematics,
respectively. He is currently pursuing a PhD degree
under the supervision of prof. Luc Van Gool, in his
lab at ETH Zurich. His research interests mainly
concern image demosaicing, image restoration, and
early vision.

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy