DVDNet
DVDNet
ABSTRACT Based on this method, Chen and Pock proposed in [2] a train-
able nonlinear reaction diffusion model. This model can be
In this paper, we propose a state-of-the-art video denoising al- expressed as a feed-forward deep network by concatenating
gorithm based on a convolutional neural network architecture. a fixed number of gradient descent inference steps. Meth-
Previous neural network based approaches to video denois- ods such as these two attain denoising performances compa-
ing have been unsuccessful as their performance cannot com- rable to those of well-known algorithms such as BM3D [3]
pete with the performance of patch-based methods. However, or non-local Bayes (NLB [4]). However, their performance is
our approach outperforms other patch-based competitors with restricted to specific forms of prior. Additionally, many hand-
significantly lower computing times. In contrast to other ex- tuned parameters are involved in the training process. In [5],
isting neural network denoisers, our algorithm exhibits sev- a multi-layer perceptron was successfully applied for image
eral desirable properties such as a small memory footprint, denoising. Nevertheless, a significant drawback of all these
and the ability to handle a wide range of noise levels with a algorithms is that a specific model must be trained for each
single network model. The combination between its denois- noise level.
ing performance and lower computational load makes this al- Another popular approach involves the use of convolu-
gorithm attractive for practical denoising applications. We tional neural networks (CNN), e.g. RBDN [6], DnCNN [7],
compare our method with different state-of-art algorithms, and FFDNet [8]. Their performance compares favorably to
both visually and with respect to objective quality metrics. other state-of-the-art image denoising algorithms, both quan-
The experiments show that our algorithm compares favor- titatively and visually. These methods are composed of a
ably to other state-of-art methods. Video examples, code and succession of convolutional layers with nonlinear activation
models are publicly available at https://github.com/ functions in between them. This type of architecture has been
m-tassano/dvdnet. applied to the problem of joint denoising and demosaicing of
Index Terms— video denoising, CNN, residual learning, RGB and raw images by Gharbi et al. in [9]. Contrary to other
neural networks, image restoration deep learning denoising methods, one of the remarkable fea-
tures that these CNN-based methods present is the ability to
denoise several levels of noise with only one trained model.
1. INTRODUCTION Proposed by Zhang et al. in [7], DnCNN is an end-to-end
trainable deep CNN for image denoising. This method is able
We introduce a network for Deep Video Denoising: DVD- to denoise different noise levels (e.g. with standard deviation
net. The algorithm compares favorably to other state-of-the- σ ∈ [0, 55]) with only one trained model. One of its main
art methods, while it features fast running times. The out- features is that it implements residual learning [10], i.e. it es-
puts of our algorithm present remarkable temporal coherence, timates the noise existent in the input image rather than the de-
very low flickering, strong noise reduction, and accurate de- noised image. In a following paper [8], Zhang et al. proposed
tail preservation. FFDNet, which builds upon the work done for DnCNN.
Compared to image denoising, video denoising appears as a As for video denoising, the method proposed by Chen et al.
largely underexplored domain. Recently, new image denois- in [11] is one of the few to approach this problem with neural
ing methods based on deep learning techniques have drawn networks—recurrent neural networks in their case. However,
considerable attention due to their outstanding performance. their algorithm only works on grayscale images and it does
Schmidt and Roth proposed in [1] the cascade of shrinkage not achieve satisfactory results, probably due to the difficul-
fields method that unifies the random field-based model and ties associated with training recurring neural networks [12].
half-quadratic optimization into a single learning framework. Vogels et al. proposed in [13] an architecture based on kernel-
1806
Authorized licensed use limited to: Indian Institute of Technology (ISM) Dhanbad. Downloaded on August 06,2024 at 14:30:38 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Simplified architecture of our method.
ms
1 X 2
Lspa (θspa ) = Îj ( Ĩj ; θspa ) − Ij , (3) 4. RESULTS
2ms j=1
Two different testsets were used for benchmarking our
where θspa is the collection of all learnable parameters. method: the DAVIS-test testset, and Set8, which is composed
As for the temporal denoiser, the training dataset consists of 4 color sequences from the Derf’s Test Media collec-
of input-output pairs tion1 and 4 color sequences captured with a GoPro camera.
n omt The DAVIS set contains 30 color sequences of resolution
Ptj = ( (w Îjt−T , . . . , Îjt , . . . , w Îjt+T ), Mj ), Ijt , 1 https://media.xiph.org/video/derf
j=0
1807
Authorized licensed use limited to: Indian Institute of Technology (ISM) Dhanbad. Downloaded on August 06,2024 at 14:30:38 UTC from IEEE Xplore. Restrictions apply.
(b) Noisy σ = 50 (c) V-BM4D (d) VNLB (e) Neat Video (e) DVDnet (ours)
Fig. 3. Comparison of results. Left to right: noisy frame (PSNRseq = 14.15dB), output by V-BM4D (PSNRseq = 24.91dB),
output by VNLB (PSNRseq = 26.34dB), output by Neat Video (PSNRseq = 23.11dB), output by DVDnet (PSNRseq =
26.62dB). Note the clarity of the denoised text, and the lack of low-frequency residual noise and chroma noise for DVDnet.
Best viewed in digital format.
1808
Authorized licensed use limited to: Indian Institute of Technology (ISM) Dhanbad. Downloaded on August 06,2024 at 14:30:38 UTC from IEEE Xplore. Restrictions apply.
6. REFERENCES [14] Anil Christopher Kokaram, Motion picture restoration,
Ph.D. thesis, University of Cambridge, 1993.
[1] U. Schmidt and S. Roth, “Shrinkage fields for effective
image restoration,” 2014, number 8, pp. 2774–2781. [15] Matteo Maggioni, Giacomo Boracchi, Alessandro Foi,
and Karen Egiazarian, “Video denoising, deblocking,
[2] Y. Chen and T. Pock, “Trainable Nonlinear Reaction and enhancement through separable 4-D nonlocal spa-
Diffusion: A Flexible Framework for Fast and Effective tiotemporal transforms,” IEEE Trans. IP, vol. 21, no. 9,
Image Restoration,” IEEE Trans. PAMI, vol. 39, no. 6, pp. 3952–3966, 2012.
pp. 1256–1272, 2017.
[16] Pablo Arias and Jean-Michel Morel, “Video denois-
[3] K Dabov, A Foi, and V Katkovnik, “Image denoising ing via empirical Bayesian estimation of space-time
by sparse 3D transformation-domain collaborative filter- patches,” Journal of Mathematical Imaging and Vision,
ing,” IEEE Trans. IP, vol. 16, no. 8, pp. 1–16, 2007. vol. 60, no. 1, pp. 70—-93, 2018.
[4] M. Lebrun, A. Buades, and J. M. Morel, “A Nonlocal [17] Antoni Buades and Jose-Luis Lisani, “Patch-Based
Bayesian Image Denoising Algorithm,” SIAM Journal Video Denoising With Optical Flow Estimation,” IEEE
IS, vol. 6, no. 3, pp. 1665–1688, 2013. Trans. IP, vol. 25, no. 6, pp. 2573–2586, 2016.
[5] H.C. Burger, C.J. Schuler, and S. Harmeling, “Im- [18] Matias Tassano, Julie Delon, and Thomas Veit, “An
age denoising: Can plain neural networks compete with Analysis and Implementation of the FFDNet Image De-
BM3D?,” 2012, pp. 2392–2399. noising Method,” IPOL, vol. 9, pp. 1–25, 2019.
[6] V. Santhanam, V.I. Morariu, and L.S. Davis, “General- [19] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin-
ized Deep Image to Image Regression,” in Proc. CVPR, ton, “ImageNet Classification with Deep Convolutional
2016. Neural Networks,” NIPS, pp. 1–9, 2012.
[7] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, [20] Sergey Ioffe and Christian Szegedy, “Batch Normaliza-
“Beyond a Gaussian denoiser: Residual learning of deep tion: Accelerating Deep Network Training by Reducing
CNN for image denoising,” IEEE Trans. IP, vol. 26, no. Internal Covariate Shift,” in Proc. ICML. 2015, pp. 448–
7, pp. 3142–3155, 2017. 456, JMLR.org.
[8] K. Zhang, W. Zuo, and L. Zhang, “FFDNet: Toward [21] Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes
a Fast and Flexible Solution for CNN based Image De- Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert,
noising,” IEEE Trans. IP, vol. 27, no. 9, pp. 4608–4622, and Zehan Wang, “Real-Time Single Image and Video
2018. Super-Resolution Using an Efficient Sub-Pixel Convo-
lutional Neural Network,” in Proc. CVPR, 2016, pp.
[9] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand, “Deep 1874–1883.
joint demosaicking and denoising,” ACM Trans. Graph-
ics, vol. 35, no. 6, pp. 1–12, 2016. [22] K. Ma, Z. Duanmu, Q. Wu, Z. Wang, H. Yong, H. Li,
and L. Zhang, “Waterloo Exploration Database: New
[10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Challenges for Image Quality Assessment Models,”
Learning for Image Recognition,” in Proc. CVPR, 2016, IEEE Trans. IP, vol. 26, no. 2, pp. 1004–1016, 2017.
pp. 770–778.
[23] Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui,
[11] Xinyuan Chen, Li Song, and Xiaokang Yang, “Deep and Cordelia Schmid, “DeepFlow: Large displacement
rnns for video denoising,” in Applications of Digital Im- optical flow with deep matching,” in IEEE ICCV, Syd-
age Processing XXXIX. International Society for Optics ney, Australia, Dec. 2013.
and Photonics, 2016, vol. 9971, p. 99711T.
[24] Anna Khoreva, Anna Rohrbach, and Bernt Schiele,
[12] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio, “Video object segmentation with language referring ex-
“On the difficulty of training recurrent neural networks,” pressions,” in ACCV, 2018.
in ICML, 2013, pp. 1310–1318.
[25] D.P. Kingma and J.L. Ba, “ADAM: a Method for
[13] Thijs Vogels, Fabrice Rousselle, Brian McWilliams, Stochastic Optimization,” Proc. ICLR, pp. 1–15, 2015.
Gerhard Röthlin, Alex Harvill, David Adler, Mark
[26] ABSoft, “Neat Video,” https://www.
Meyer, and Jan Novák, “Denoising with kernel pre-
neatvideo.com, 1999–2019.
diction and asymmetric loss functions,” ACM Trans.
Graphics, vol. 37, no. 4, pp. 124, 2018.
1809
Authorized licensed use limited to: Indian Institute of Technology (ISM) Dhanbad. Downloaded on August 06,2024 at 14:30:38 UTC from IEEE Xplore. Restrictions apply.