0% found this document useful (0 votes)
28 views10 pages

BSRGAN

Uploaded by

K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views10 pages

BSRGAN

Uploaded by

K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Designing a Practical Degradation Model for Deep Blind

Image Super-Resolution

Kai Zhang1 Jingyun Liang1 Luc Van Gool1,2 Radu Timofte1


1 2
Computer Vision Lab, ETH Zurich, Switzerland KU Leuven, Belgium
{kai.zhang, jinliang, vangool, timofter}@vision.ee.ethz.ch
https://github.com/cszn/BSRGAN

Abstract Whereas SISR methods map an LR image onto an HR


counterpart, degradation models define how to map an HR
It is widely acknowledged that single image super- image to an LR one. Two representative degradation mod-
resolution (SISR) methods would not perform well if the els are bicubic degradation [46] and traditional degrada-
assumed degradation model deviates from those in real im- tion [28, 45]. The former generates an LR image via bicubic
ages. Although several degradation models take additional interpolation. The latter can be mathematically modeled by
factors into consideration, such as blur, they are still not ef-
fective enough to cover the diverse degradations of real im- \label {eq:sisr_degradation} \mathbf {y}\!= \!(\mathbf {x}\otimes \mathbf {k})\!\downarrow _{\bf {s}}\! + \,\mathbf {n}. (1)
ages. To address this issue, this paper proposes to design a
more complex but practical degradation model that consists It assumes the LR image is obtained by first convolving the
of randomly shuffled blur, downsampling and noise degra- HR image with a Gaussian kernel (or point spread function)
dations. Specifically, the blur is approximated by two con- k [12] to get a blurry image x ⊗ k, followed by a down-
volutions with isotropic and anisotropic Gaussian kernels; sampling operation ↓s with scale factor s and an addition of
the downsampling is randomly chosen from nearest, bilin- white Gaussian noise n with standard deviation σ. Specif-
ear and bicubic interpolations; the noise is synthesized by ically, the bicubic degradation can be viewed as a special
adding Gaussian noise with different noise levels, adopting case of traditional degradation as it can be approximated by
JPEG compression with different quality factors, and gen- setting a proper kernel with zero noise [3, 52]. The degrada-
erating processed camera sensor noise via reverse-forward tion model is generally characterized by several factors such
camera image signal processing (ISP) pipeline model and as blur kernel and noise level. Depending on whether these
RAW image noise model. To verify the effectiveness of the factors are known beforehand or not, DNNs-based SISR
new degradation model, we have trained a deep blind ES- methods can be broadly divided into non-blind methods and
RGAN super-resolver and then applied it to super-resolve blind ones.
both synthetic and real images with diverse degradations. Early non-blind SISR methods were mainly designed for
The experimental results demonstrate that the new degra- bicubic degradations [10]. Although significant improve-
dation model can help to significantly improve the practi- ments on the PSNR [27, 61] and perceptual quality [24, 49]
cability of deep super-resolvers, thus providing a powerful have been achieved, such methods usually do not perform
alternative solution for real SISR applications. well on real images. It is worth noting that this also holds
for deep models trained with a generative adversarial loss.
The reason is that blur kernels play a vital role for the
1. Introduction success of SISR methods [12] and a bicubic kernel is too
simple. To remedy this, some works use a more complex
Single image super-resolution (SISR), which aims to degradation model which involves a blur kernel and addi-
reconstruct the natural and sharp detailed high-resolution tive white Gaussian noise (AWGN) and a non-blind network
(HR) counterpart x from a low-resolution (LR) image that takes the blur kernel and noise level as conditional in-
y [10, 47], has recently drawn significant attention due to puts [3, 58]. Compared to methods based on bicubic degra-
its high practical value. With the advance of deep neural dation, these tend to be more applicable. Yet, they need an
networks (DNNs), there is a dramatic upsurge of using feed- accurate estimation of the kernel and the noise level. Oth-
forward DNNs for fast and effective SISR [17, 23, 25, 27, erwise the performance deteriorates seriously [12]. Mean-
49, 61]. This paper contributes to this strand. while, only a few methods are specially designed for the

4791
kernel estimation of SISR [3]. As a further step, some The contributions of this paper are:
blind methods propose to fuse the kernel estimation into
the network design [16, 31]. But such methods still fail 1) A practical SISR degradation model for real images is
to produce visually pleasant results for most real images designed. It considers more complex degradations for
such as JPEG compressed ones. Along another line of blind blur, downsampling and noise and, more importantly,
SISR work with unpaired LR/HR training data, the kernel involves a degradation shuffle strategy.
and the noise are first extracted from the LR images and 2) With synthetic training data generated using our degra-
then used to synthesize LR images from the HR images dation model, a blind SISR model is trained. It per-
for paired training [20]. Notably, without kernel estima- forms well on real images under diverse degradations.
tion, the blind model still has a promising performance. On
the other hand, it is difficult to collect accurate blur kernels 3) To the best of our knowledge, this is the first work to
and noise models from real images. From the above dis- adopt a new hand-designed degradation model for gen-
cussion, we draw two conclusions. Firstly, the degradation eral blind image super-resolution.
model is of vital importance to DNNs-based SISR methods
and a more practical degradation model is worth studying. 4) Our work highlights the importance of accurate degra-
Secondly, no existing blind SISR models are readily appli- dation modeling for practical applications of DNNs-
cable to super-resolve real images suffering from different based SISR methods.
degradation types. Hence, we see two main challenges: the
first is to design a more practical SISR degradation model 2. Related Work
for real images, and the second is to learn an effective deep Since this paper focuses on designing a practical degra-
blind model that can work well for most real images. In this dation model to train a deep blind DNN model, we will
paper, we attempt to solve these two challenges. next give a brief overview on related degradation models
For the first challenge, we argue that blur, downsam- and deep blind SISR methods.
pling and noise are the three key factors that contribute to
the degradation of real images. Rather than utilizing Gaus- 2.1. Degradation Models
sian kernel induced blur, bicubic downsampling, and simple As mentioned in the introduction, existing DNNs-based
noise models, we propose to expand each of these factors SISR methods are generally based on bicubic downsam-
to more practical ones. Specifically, the blur is achieved pling [23, 44] and traditional degradations [26, 37, 54, 59,
by two convolutions with an isotropic Gaussian kernel and 60], or some simple variants [11, 41, 53, 56, 58]. It can
an anisotropic Gaussian kernel; the downsampling is more be found that existing complex SISR degradation models
general but includes commonly-used downscaling opera- usually consist of a sequence of blur, downsampling and
tors such as bilinear and bicubic interpolations; the noise noise addition. For mathematical convenience, the noise
is modeled by AWGN with different noise levels, JPEG is usually assumed to be AWGN which rarely matches the
compression noise with different quality factors, and pro- noise distribution of real images. Indeed, the noise could
cessed camera sensor noise by applying reverse-forward also stem from camera sensor noise and JPEG compres-
camera image signal processing (ISP) pipeline model and sion noise which are usually signal-dependent and non-
RAW image noise model. Furthermore, instead of us- uniform [42]. Regardless of whether the blur is accu-
ing the commonly-used blur/downsampling/noise-addition rately modeled or not, the noise mismatch suffices to cause
pipeline, we perform randomly shuffled degradations to a performance drop when super-resolvers are applied to
synthesize LR images. As a result, our new degrada- real images. In other words, existing degradation models
tion model involves several more adjustable parameters and are wanting when it comes to the complexity of real im-
aims to cover the degradation space of real images. age degradations. Some works do not consider an explicit
For the second challenge, we train a deep model based degradation model [29, 51]. Instead, they use training data
on the new degradation model in an end-to-end supervised to learn the LR-to-HR mapping which only works for the
manner. Given an HR image, we can synthesize differ- degradations defined by the training images.
ent realistic LR images by setting different parameters for
2.2. Deep Blind SISR Methods
the degradation model. As such, an unlimited number of
paired LR/HR training data can be generated for training. Significant achievements resulted from the design and
Especially noteworthy is that such training data do not suf- training of deep non-blind SISR networks. This said, ap-
fer from the misalignment issue. By further taking advan- plying them for blind SISR is a non-trivial issue. It should
tage of the powerful expressiveness and advanced training be noted that blind SISR methods are mainly deployed for
of DNNs, the deep blind model is expected to produce vi- real SISR applications. To that end, different research di-
sually pleasant results for real LR images. rections have been tried.

4792
The first direction is to initially estimate the degrada- result, there is still a mismatch between the assumed degra-
tion parameters for a given LR image, and then apply a dation model and the real image degradation model. Fur-
non-blind method to obtain the HR result. Bell-Kligler et thermore, to the best of our knowledge, no existing deep
al. [3] propose to estimate the blur kernel via an internal- blind SISR model can be readily applied for general real
GAN method before applying the non-blind ZSSR [45] and image super-resolution. Therefore, it is worthwhile to de-
SRMD [58] methods. Yet, non-blind SISR methods are usu- sign a practical degradation model to train deep blind SISR
ally sensitive to errors in the blur kernel, producing over- models for real applications. Note that, although denoising
sharp or over-smooth results. and deblurring are related to noisy and blurry image super-
To remedy this, a second direction aims to jointly es- resolution, most super-resolution methods tackle the blur,
timate the blur kernel and the HR image. Gu et al. [16] noise and super-resolution in a unified rather than a cas-
propose an iterative correction scheme to alternately im- caded framework (see, e.g., [11, 12, 20, 28, 29, 30, 43, 45,
prove the blur kernel and HR result. Cornillere et al. [8] 51, 52, 56, 58]).
propose an optimization procedure for joint blur kernel and
HR image estimation by minimizing the error predicted by 3. A Practical Degradation Model
a trained kernel discriminator. Luo et al. [31] propose a
Before providing our new practical SISR degradation
deep alternating network that consists of a kernel estimator
model, it is useful to mention the following facts on the
module and an HR image restorer module. While promis-
bicubic and traditional degradation models:
ing, these methods do not fully take noise into considera-
tion and thus tend to suffer from inaccurate kernel estima- 1. According to the traditional degradation model, there
tion for noisy real images. As a matter of fact, the presence are three key factors, i.e., blur, downsampling and
of noise would aggravate the ill-posedness, especially when noise, that affect the degradations of real images.
the noise type is unknown and complex, and the noise level
is high. 2. Since both LR and HR images could be noisy
A third direction is to learn a supervised model with cap- and blurry, it is not necessary to adopt the
tured real LR/HR pairs. Cai et al. [7] and Wei et al. [50] blur/downsampling/noise-addition pipeline as in the
separately established a SISR dataset with paired LR/HR traditional degradation model to generate LR images.
camera images. Collecting abundant well-aligned training
data is cumbersome however, and the learned models are 3. The blur kernel space of the traditional degradation
constrained to the LR domain defined by the captured LR model should vary across scales, making it in practice
images. tricky to determine for very large scale factors.
Considering the fact that real LR images rarely come 4. While the bicubic degradation is rarely suitable for real
with the ground-truth HR, the fourth direction aims at learn- LR images, it can be used for data augmentation and is
ing with unpaired training data [48]. Yuan et al. [51] pro- indeed a good choice for clean and sharp image super-
pose a cycle-in-cycle framework to first map the noisy and resolution.
blurry LR input to a clean one and then super-resolve the in-
termediate LR image via a pre-trained model. Lugmayr et Inspired by the first fact, a direct way to improve the
al. [29] propose to learn a deep degradation mapping by em- practicability of degradation models is to make the degra-
ploying a cycle consistency loss and then generate LR/HR dation space of the three key factors as large and realistic as
pairs for supervised training. Following a similar frame- possible. Based on the second fact, we then further expand
work, Ji et al. [20] propose to estimate various blur ker- the degradation space by adopting a random shuffle strat-
nels and extract different noise maps from LR images and egy for the three key factors. Like that, an LR image could
then apply the traditional degradation model to synthesize also be a noisy, downsampled and blurred version of the HR
different LR images. Notably, [20] was the winner of the image. To tackle the third fact, one may take advantage of
NTIRE 2020 real-world super-resolution challenge [30], the analytical calculation of the kernel for a large scale fac-
which demonstrates the importance of accurate degrada- tor from a small one. Alternatively, according to the fourth
tion modeling. Although applying this method to training fact, for a large scale factor, one can apply a bicubic (or bi-
data corrupted by a more complex degradation seems to be linear) downscaling before the degradation with scale factor
straightforward, it would also reduce the accuracy of blur 2. Without loss of generality, this paper focuses on design-
kernel and noise estimation which in turn results in unreli- ing the degradation model for the widely-used scale factors
able synthetic LR images. 2 and 4.
As discussed above, existing deep blind SISR methods In the following, we will detail the degradation model
are mostly trained on ideal degradation settings or specific for the following aspects: blur, downsampling, noise, and
degradation spaces defined by the LR training data. As a random shuffle strategy.

4793
3.1. Blur 3.3. Noise
Blur is a common image degradation. We propose to Noise is ubiquitous in real images as it can be caused
model the blur from both the HR space and LR space. On by different sources. Apart from the widely-used Gaus-
the one hand, in the traditional SISR degradation model [28, sian noise, our new degradation model also considers JPEG
45], the HR image is first blurred by a convolution with a compression noise and camera sensor noise. We next detail
blur kernel. This HR blur actually aims to prevent aliasing the three noise types.
and preserve more spatial information after the subsequent
downsampling. On the other hand, the real LR image could Gaussian noise NG . The Gaussian noise assumption is
be blurry and thus it is a feasible way to model such blur in the most conservative choice when there is no information
the LR space. By further considering that Gaussian kernels about the noise [40]. To synthesize Gaussian noise, the
suffice for the SISR task, we perform two Gaussian blur three-dimensional (3D) zero-mean Gaussian noise model
operations, i.e., Biso with isotropic Gaussian kernels and N (0, Σ) [39] with covariance matrix Σ is adopted. Such
Baniso with anisotropic Gaussian kernels [3, 43, 58]. Note noise model has two special cases: when Σ = σ 2 I, where I
that the HR image or LR image could be blurred by two blur is the identity matrix, it turns into the widely-used channel-
operations (see Sec. 3.4 for more details). By doing so, the independent additive white Gaussian noise (AWGN) model;
degradation space of blur can be greatly expanded. when Σ = σ 2 1, where 1 is a 3 × 3 matrix with all elements
equal to one, it turns into the widely-used gray-scale AWGN
For the blur kernel setting, the size is uniformly sampled
model. In our new degradation model, we always add Gaus-
from {7 × 7, 9 × 9, · · · , 21 × 21}, the isotropic Gaussian
sian noise for data synthesis. In particular, the probabilities
kernel samples the kernel width uniformly from [0.1, 2.4]
of applying the general case and two special cases are set to
and [0.1, 2.8] for scale factors 2 and 4, respectively, while
0.2, 0.4, 0.4, respectively. As for σ, it is uniformly sampled
the anisotropic Gaussian kernel samples the rotation angle
from {1/255, 2/255, · · · , 25/255}.
uniformly from [0, π] and the length of each axis for scale
factors 2 and 4 uniformly from [0.5, 6] and [0.5, 8], respec- JPEG compression noise NJPEG . JPEG is the most
tively. Reflection padding is adopted to ensure the spatial widely-used image compression standard for bandwidth
size of the blurred output stays the same. Since the isotropic and storage reduction. Yet, it introduces annoying 8 × 8
Gaussian kernel with width 0.1 corresponds to delta (iden- blocking artifacts/noise, especially for the case of high com-
tity) kernel, we can always apply the two blur operations. pression. The degree of compression is determined by the
quality factor which is an integer in the range [0, 100]. The
3.2. Downsampling quality factor 0 means lower quality and higher compres-
sion, and vice versa. If the quality factor is larger than 90,
In order to downsample the HR image, perhaps the most
no obvious artifacts are introduced. In our new degradation
direct way is nearest neighbor interpolation. Yet, the result-
model, the JPEG quality factor is uniformly chosen from
ing LR image will have a misalignment of 0.5×(s − 1) pix-
[30, 95]. Since JPEG is the most popular digital image for-
els towards the upper-left corner [52]. As remedy, we shift a
mat, we apply two JPEG compression steps with possibili-
centered 21 × 21 isotropic Gaussian kernel by 0.5×(s − 1)
ties 0.75 and 1, respectively. In particular, the latter one is
pixels via a 2D linear grid interpolation method [28], and
used as the final degradation step.
apply it for convolution before the nearest neighbour down-
sampling. The Gaussian kernel width is randomly chosen Processed camera sensor noise NS . In modern digi-
from [0.1, 0.6 × s]. We denote such a downsampling as tal cameras, the output image is obtained by passing the
Dsnearest . In addition, we also adopt the bicubic and bilinear raw sensor data through the image signal processing (ISP)
downsampling methods, denoted by Dsbilinear and Dsbicubic , pipeline. In practice, if the ISP pipeline does not perform a
respectively. Furthermore, a down-up-sampling method denoising step, the processed sensor noise would deteriorate
s/a
Dsdown-up (= Ddown Daup ) which first downsamples the im- the output image by introducing non-Gaussian noise [42].
age with a scale factor s/a and then upscales with a scale To synthesize such kind of noise, we first get the raw image
factor a is also adopted. Here the interpolation methods from an RGB image via the reverse ISP pipeline, and then
are randomly chosen from bilinear and bicubic interpola- reconstruct the noisy RGB image via the forward pipeline
tions, and a is sampled from [1/2, s]. Clearly, the above after adding noise to the synthetic raw image. The raw im-
four downsampling methods have a blurring step in the HR age noise model is borrowed from [6]. According to the
space, while Dsdown-up can introduce upscaling-induced blur Adobe Digital Negative (DNG) Specification [1], our for-
in the LR space when a is smaller than 1. We do not in- ward ISP pipeline consists of demosaicing, exposure com-
clude such kinds of blur in Sec. 3.1 since they are coupled pensation, white balance, camera to XYZ (D50) color space
in the downsampling process. We uniformly sample these conversion, XYZ (D50) to linear RGB color space con-
four downsampling to downscale the HR image. version, tone mapping and gamma correction. For demo-

4794
D2bilinear Biso NG NJPEG

Baniso

LR
Degradation Shuffle

D2bicubic Biso NJPEG Baniso NG NS NJPEG

LR
HR
Biso

D3down NG Baniso 2/3


Dup NJPEG

Figure 1. Schematic illustration of the proposed degradation model for scale factor 2. For an HR image, the randomly shuffled
degradation sequences {Biso , Baniso , D2 , NG , NJPEG , NS } are first performed, then a JPEG compression degradation NJPEG is ap-
plied to save the LR image into JPEG format. The downscaling operation with scale factor 2, i.e., D2 , is uniformly chosen from
{D2nearest , D2bilinear , D2bicubic , D2down-up }.

saicing, the method in [34] which is the same as matlab’s for the new degradation model. Specifically, the degrada-
demosaic function, is adopted. For exposure compensa- tion sequence {Biso , Baniso , Ds , NG , NJPEG , NS } is ran-
tion, the global scaling is chosen from [2−0.1 , 20.3 ]. For domly shuffled, here Ds represents the downsampling op-
the white balance, the red gain and blur gain are uniformly eration with scale factor s which is randomly chosen from
chosen from [1.2, 2.4]. For camera to XYZ (D50) color {Dsnearest , Dsbilinear , Dsbicubic , Dsdown-up }. In particular, the se-
space conversion, the 3 × 3 color correction matrix is a s/a
quence of Ddown and Daup for Dsdown-up can insert other
random weighted combination of ForwardMatrix1 and degradations. Note that a similar idea of random shuffle
ForwardMatrix2 from the metadata of raw image files. strategy was proposed in [9], however, it is designed for im-
For the tone mapping, we manually select the best fitted age classification and object detection and could be instead
tone curve from [14] for each camera based on paired raw used to augment HR images.
image files and the RGB output. We use five digital cam- With the random shuffle strategy, the degradation space
eras, including the Canon EOS 5D Mark III and IV cam- can be expanded substantially. Firstly, other degradation
eras, Huawei P20, P30 and Honor V8 cameras, to estab- models, such as bicubic and traditional degradation models,
lish our ISP pipeline pool. Note that the tone curve and and the ones proposed in [16, 59], are special cases of ours.
forward color correction matrix do not necessarily come Secondly, the blur degradation space is enlarged by differ-
from the same camera. Since tone mapping is not reversible ent arrangements of the two blur operations and one of the
and would result in color shift issue, one should apply the four downsampling methods. Thirdly, the noise character-
reverse-forward tone mapping for the HR image. We apply istics could be changed by the blur and downsampling, thus
this noise synthesis step with a probability of 0.25. expanding the degradation space. For example, the down-
sampling can reduce the noise strength and make the noise
3.4. Random Shuffle
(e.g., processed camera sensor noise and JPEG compres-
Though simple and mathematically convenient, the tra- sion noise) less signal-dependent, whereas Daup (a < 1) can
ditional degradation model can hardly cover the degradation make the signal-independent Gaussian noise to be signal-
space of real LR images. On the one hand, the real LR im- dependent. Such kinds of noise could exist in real images.
age could also be a noisy, blurry, downsampled, and JPEG Fig. 1 illustrates the proposed degradation model. For
compressed version of the HR image. On the other hand, an HR image, we can generate different LR images with
the degradation model which assumes the LR image is a a wide range of degradations by shuffling the degradation
bicubicly downsampled, blurry and noisy version of the HR operations and setting different degradation parameters. As
image can also be used for SISR [16, 59]. Hence, an LR im- mentioned in Sec. 3, for scale factor 4, we additionally ap-
age can be degraded by blur, downsampling, and noise with ply a bilinear or bicubic downscaling before the degradation
different orders. We thus propose a random shuffle strategy for scale factor 2 with a probability of 0.25.

4795
4. Discussion of the Laplacian of an image. Secondly, BSRGAN uses a
larger LR patch size of 72 × 72. The reason is that our
It is necessary to add discussion to further understand degradation model can produce severely degraded LR im-
the proposed new degradation model. Firstly, the degrada- ages and a larger patch can enable deep models to capture
tion model is mainly designed to synthesize degraded LR more information for better restoration. Thirdly, we train
images. Its most direct application is to train a deep blind the BSRGAN by minimizing a weighted combination of
super-resolver with paired LR/HR images. In particular, the L1 loss, VGG perceptual loss and spectral norm-based least
degradation model can be performed on a large dataset of square PatchGAN loss [19] with weights 1, 1 and 0.1, re-
HR images to produce unlimited perfectly aligned training spectively. In particular, the VGG perceptual loss is oper-
images, which typically do not suffer from the limited data ated on the fourth convolution before the fourth rather than
issue of laboriously collected paired data and the misalign- the fifth maxpooling layer of the pre-trained 19-layer VGG
ment issue of unpaired training data. Secondly, the degrada- model as it is more stable to prevent color shift issues. We
tion model tends to be unsuited to model a degraded LR im- train BSRGAN with Adam, using a fixed learning rate of
age as it involves too many degradation parameters and also 1 × 10−5 and a batch size of 48.
adopts a random shuffle strategy. Thirdly, the degradation
model can produce some degradation cases that rarely hap-
pen in real-world scenarios, while this can still be expected
6. Experimental Results
to improve the generalization ability of the trained deep 6.1. Testing Datasets
blind super-resolver. Fourthly, a DNN with large capacity
has the ability to handle different degradations via a single Existing blind SISR methods are generally evaluated
model (see, e.g., [55]). It is worth noting that even when on specifically designed synthetic data and only very few
the super-resolver reduces the performance for unrealistic real images. For example, IKC [16] is evaluated on the
bicubic downsampling, it is still a preferred choice for real blurred, bicubicly downsampled synthetic LR images and
SISR. Fifthly, one can conveniently modify the degrada- two real images; KernelGAN [3] is evaluated on the syn-
tion model by changing the degradation parameter settings thetic DIV2KRK dataset and two real images. As a result,
and adding more reasonable degradation types (e.g., speckle to the best of our knowledge, a real LR image dataset with
noise and unaligned double JPEG compression [21]) to im- diverse blur and noise degradations is still lacking.
prove the practicability for certain applications. In order to pave the way for the evaluation of blind SISR
methods, we establish two datasets, including the synthetic
5. Deep Blind SISR Model Training DIV2K4D dataset which contains four subdatasets with a
total of 400 images generated from the 100 DIV2K val-
The novelty of this paper lies in the new degradation idation images with three different degradation types and
model and the possibility of existing network structures the real RealSRSet which consists of 20 real images either
such as ESRGAN [49] to be borrowed to train a deep blind downloaded from the internet or directly chosen from exist-
model. For the sake of showing the advantage of the pro- ing testing datasets [18, 35, 36, 57]. Specifically, the three
posed degradation model, we adopt the widely-used ESR- degradation types for DIV2K4D including 1) type I: the
GAN network and train it with the synthetic LR/HR paired commonly-used bicubic degradation; 2) type II: anisotropic
images produced by the new degradation model. Following Gaussian blur with nearest downsampling by a scale fac-
ESRGAN, we first train a PSNR-oriented BSRNet model
and then train the perceptual quality-oriented BSRGAN
model. Since the PSNR-oriented BSRNet model tends to
produce oversmoothed results due to the pixel-wise aver-
age problem [24], the perceptual quality-oriented model is
preferred for real applications [5]. Thus, unless otherwise
specified, we focus more on the BSRGAN model.
Compared to ESRGAN, BSRGAN is modified in several
ways. First, we use a slightly different HR image dataset
which includes DIV2K [2], Flick2K [27, 46], WED [33]
and 2,000 face images from FFHQ [22] to capture the im-
age prior. The reason is that the goal of BSRGAN is to
(a) Examples from DIV2K4D (b) Examples from RealSRSet
solve the problem of general-purpose blind image super-
resolution, and apart from the degradation prior, an image Figure 2. Some example images from the DIV2K4D and Real-
prior could also contribute to the success of a super-resolver. SRSet datasets. From top to bottom of (a), we show example im-
We also remove the blurry images based on the variance ages generated by the degradation types II, III and IV.

4796
Table 1. The PSNR and LPIPS results of different methods on the DIV2K4D dataset. The best and second best results are highlighted in
red and blue, respectively. The PSNR results are calculated on Y channel of YCbCr space.
Degradation FSSR FSSR RealSR RealSR BSRNet BSRGAN
Metric RRDB IKC ESRGAN
Type -DPED -JPEG -DPED -JPEG (Ours) (Ours)
Type I PSNR 30.89 29.95 28.16 24.55 22.71 21.72 27.35 29.07 27.30
(Bicubic) LPIPS 0.254 0.263 0.115 0.240 0.364 0.312 0.213 0.331 0.236
PSNR 25.66 27.35 25.56 25.81 25.33 26.29 25.36 27.76 26.26
Type II
LPIPS 0.542 0.392 0.526 0.460 0.399 0.263 0.479 0.397 0.284
PSNR 26.70 26.72 26.21 25.83 23.25 22.82 26.72 27.59 26.28
Type III
LPIPS 0.517 0.504 0.436 0.392 0.376 0.379 0.360 0.419 0.284
PSNR 24.03 24.01 23.68 23.62 22.40 22.97 23.85 25.67 24.58
Type IV
LPIPS 0.659 0.641 0.599 0.589 0.597 0.528 0.589 0.506 0.361

PSNR↑/LPIPS↓ 23.51/0.601 23.21/0.353 23.46/0.504 25.48/0.353 24.65/0.233


(a) LR (×4) (b) IKC [16] (c) FSSR-JPEG [13] (d) RealSR-JPEG [20] (e) BSRNet (Ours) (f) BSRGAN (Ours)
Figure 3. Results of different methods on super-resolving an LR image from the DIV2K4D dataset with scale factor 4. The testing image
is synthesized by our proposed degradation (i.e., degradation type IV).

tor of 4; 3) type III: anisotropic Gaussian blur with near- 6.3. Experiments on the DIV2K4D Dataset
est downsampling by a scale factor of 2 and subsequent
The PSNR and LPIPS (learned perceptual image patch
bicubic downsampling by another scale factor of 2 and fi-
similarity) results of different methods on the DIV2K4D
nal JPEG compression with quality factors uniformly sam-
datasets are shown in Table 1. Note that LPIPS is used to
pled from [41, 90]; and 4) type IV: our proposed degradation
measure the perceptual quality, and a lower LPIPS value
model. Note that the subdataset with degradation type II
means the super-resolved image is more perceptually simi-
and the downsampled images by a scale factor of 2 for sub-
lar to the ground-truth. We draw several conclusions from
dataset with degradation type III are directly borrowed from
Table 1. Firstly, as expected, RRDB and ESRGAN per-
the DIV2KRK dataset [3]. Some example images from
form well for bicubic degradation but do not perform well
the two datasets are shown in Fig. 2, from which we can
on non-bicubic degradation as they are trained with the
see the LR images are corrupted by diverse blur and noise
simplified bicubic degradation. It is worth noting that,
degradations. We argue that a general-purpose blind super-
even trained with GAN, ESRGAN can slightly improve
resolver should achieve a good overall performance on the
the LPIPS values over RRDB on degradation types II-IV.
two datasets.
Secondly, FSSR-DPED, FSSR-JPEG, RealSR-DPED and
RealSR-JPEG outperform RRDB and ESRGAN in terms
6.2. Compared Methods
of LPIPS since they consider a more practical degrada-
We compare the proposed BSRNet and BSRGAN tion. Thirdly, for degradation type II, IKC obtains promis-
with RRDB [49], IKC [16], ESRGAN [49], FSSR- ing PSNR results while RealSR-DPED achieves the best
DPED [13], FSSR-JPEG [13], RealSR-DPED [20] and LPIPS result as they are trained on a similar degradation.
RealSR-JPEG [20]. Specifically, RRDB and ESRGAN are For degradation types III and IV, they suffer a severe per-
trained on bicubic degradation; IKC is a blind model trained formance drop. Fourthly, our proposed BSRNet achieves
with different isotropic Gaussian kernels; FSSR-DPED and the best overall PSNR results, while BSRGAN yields the
RealSR-DPED are trained to maximize the performance on best overall LPIPS results.
the blurry and noisy DPED dataset; FSSR-JPEG is trained Fig. 3 shows the results of different methods on super-
for JPEG image super-resolution; RealSR-JPEG is a re- resolving an LR image from the DIV2K4D dataset. It can
cently released and unpublished model on github. Note that be seen that IKC and RealSR-JPEG fail to remove the noise
since our novelty lies in the degradation model, and RRDB, and to recover sharp edges. On the other hand, FSSR-JPEG
ESRGAN, FSSR-DPED, FSSR-JPEG, RealSR-DPED and can produce sharp images but also introduces some arti-
RealSR-JPEG use the same network architecture as ours, facts. In comparison, our BSRNet and BSRGAN produce
we thus did not re-train other models for comparison. better visual results than the other methods.

4797
NIQE↓/NRQM↑/PI↓ 4.47/3.15/5.65 4.19/7.08/3.55 3.12/6.81/3.15 3.89/4.39/4.75 4.52/5.79/4.36

NIQE↓/NRQM↑/PI↓ 5.85/4.66/5.59 4.16/7.98/3.09 4.64/6.56/4.04 6.95/4.32/6.31 5.07/7.44/3.82

NIQE↓/NRQM↑/PI↓ 7.10/3.92/6.59 5.31/6.26/4.52 6.39/6.83/4.78 4.45/7.14/3.65 5.83/5.99/4.92


(a) LR (×4) (b) ESRGAN [49] (c) FSSR-JPEG [13] (d) RealSR-DPED [20] (e) RealSR-JPEG [20] (f) BSRGAN (Ours)
Figure 4. Results of different methods on super-resolving real images from RealSRSet with scale factor 4. The LR images from top to
bottom in each row are “Building”, “Chip”, and “Oldphoto2”, respectively. Please zoom in for better view.

6.4. Experiments on the RealSRSet Dataset Table 2. The no-reference NIQE [38], NRQM [32] and PI [4] re-
sults of different methods on the RealSRSet dataset. The best and
Since the ground-truth for the RealSRSet dataset is not second best results are highlighted in red and blue, respectively.
available, we adopt the non-reference image quality assess- Note that all the methods use the same network architecture.
ment (IQA) metrics including NIQE [38], NRQM [32] and
FSSR FSSR RealSR RealSR BSRGAN
PI [4] for quantitative evaluation. As one can see from Ta- Metric ESRGAN
-DPED -JPEG -DPED -JPEG (Ours)
ble 2, BSRGAN fails to show promising results. Yet, as NIQE↓ 4.95 4.86 4.04 4.58 3.99 5.60
shown in Fig. 4, BSRNet produces much better visual re- NRQM↑ 6.02 6.28 6.88 6.59 6.23 6.17
PI↓ 4.47 4.29 3.58 3.99 4.29 4.72
sults than the other methods. For example, BSRGAN can
remove the unknown processed camera sensor noise for
“Building” and unknown complex noise for “Oldphoto2”, by making each of the degradation factors, i.e. blur, down-
while also producing sharp edges and fine details. In con- sampling and noise, more intricate and practical, and also
trast, FSSR-JPEG, RealSR-DPED and RealSR-JPEG pro- by introducing a random shuffle strategy, the new degrada-
duce some high-frequency artifacts but have better quanti- tion model can cover a wide range of degradations found
tative results than BSRNet. Such inconsistencies indicate in real-world scenarios. Based on the synthetic data gen-
that these no-reference IQA metrics do not always match erated by the new degradation model, we have trained a
perceptual visual quality [30] and the IQA metric could be deep blind model for general image super-resolution. Ex-
updated with new SISR methods [15]. We further argue that periments on synthetic and real image datasets have shown
the IQA metric for SISR should also be updated with new that the deep blind model performs favorably on images cor-
image degradation types, which we leave for future work. rupted by diverse degradations. We believe that existing
We note that our BSRGAN tends to produce ‘bubble’ arti- deep super-resolution networks can benefit from our new
facts in texture region, which may be solved by new loss degradation model to enhance their usefulness in practice.
function or more training data with diverse textures. As a result, this work provides a way towards solving blind
super-resolution for real applications.
7. Conclusions
Acknowledgments: This work was partly supported by the
In this paper, we have designed a new degradation model ETH Zürich Fund (OK), a Huawei Technologies Oy (Fin-
to train a deep blind super-resolution model. Specifically, land) project, and an Amazon AWS grant.

4798
References bile devices with deep convolutional networks. In ICCV,
pages 3277–3285, 2017. 6
[1] Adobe. Digital negative specification. 2019. Version 1.5.00. [19] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A
4 Efros. Image-to-image translation with conditional adver-
[2] Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge sarial networks. In CVPR, pages 1125–1134, 2017. 6
on single image super-resolution: Dataset and study. In [20] Xiaozhong Ji, Yun Cao, Ying Tai, Chengjie Wang, Jilin Li,
CVPR Workshops, volume 3, pages 126–135, July 2017. 6 and Feiyue Huang. Real-world super-resolution via kernel
[3] Sefi Bell-Kligler, Assaf Shocher, and Michal Irani. Blind estimation and noise injection. In CVPR Workshops, pages
super-resolution kernel estimation using an internal-gan. In 466–467, 2020. 2, 3, 7, 8
NeurIPS, pages 284–293, 2019. 1, 2, 3, 4, 6, 7 [21] Jiaxi Jiang, Kai Zhang, and Radu Timofte. Towards flexible
[4] Yochai Blau, Roey Mechrez, Radu Timofte, Tomer Michaeli, blind JPEG artifacts removal. In ICCV, 2021. 6
and Lihi Zelnik-Manor. The 2018 PIRM challenge on per- [22] Tero Karras, Samuli Laine, and Timo Aila. A style-based
ceptual image super-resolution. In ECCV Workshops, 2018. generator architecture for generative adversarial networks. In
8 CVPR, pages 4401–4410, 2019. 6
[5] Yochai Blau and Tomer Michaeli. The perception-distortion [23] Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-
tradeoff. In CVPR, pages 6228–6237, 2018. 6 Hsuan Yang. Deep laplacian pyramid networks for fast and
[6] Tim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen, accurate super-resolution. In CVPR, pages 624–632, July
Dillon Sharlet, and Jonathan T Barron. Unprocessing images 2017. 1, 2
for learned raw denoising. In CVPR, pages 11036–11045, [24] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero,
2019. 4 Andrew Cunningham, Alejandro Acosta, Andrew Aitken,
[7] Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-
Zhang. Toward real-world single image super-resolution: A realistic single image super-resolution using a generative ad-
new benchmark and a new model. In ICCV, pages 3086– versarial network. In CVPR, pages 4681–4690, July 2017. 1,
3095, 2019. 3 6
[8] Victor Cornillere, Abdelaziz Djelouah, Wang Yifan, Olga [25] Jingyun Liang, Andreas Lugmayr, Kai Zhang, Martin
Sorkine-Hornung, and Christopher Schroers. Blind image Danelljan, Luc Van Gool, and Radu Timofte. Hierarchi-
super-resolution with spatially variant degradations. ACM cal conditional flow: A unified framework for image super-
TOG, 38(6):1–13, 2019. 3 resolution and image rescaling. In ICCV, 2021. 1
[9] Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V [26] Jingyun Liang, Kai Zhang, Shuhang Gu, Luc Van Gool, and
Le. Randaugment: Practical automated data augmentation Radu Timofte. Flow-based kernel prior with application to
with a reduced search space. In CVPR Workshops, pages blind super-resolution. In CVPR, pages 10601–10610, 2021.
702–703, 2020. 5 2
[10] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou [27] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and
Tang. Learning a deep convolutional network for image Kyoung Mu Lee. Enhanced deep residual networks for single
super-resolution. In ECCV, pages 184–199, 2014. 1 image super-resolution. In CVPR Workshops, pages 136–
[11] Weisheng Dong, Lei Zhang, Guangming Shi, and Xin
144, July 2017. 1, 6
Li. Nonlocally centralized sparse representation for image [28] Ce Liu and Deqing Sun. On bayesian adaptive video super
restoration. IEEE TIP, 22(4):1620–1630, 2013. 2, 3 resolution. IEEE Transactions on Pattern Analysis and Ma-
[12] Netalee Efrat, Daniel Glasner, Alexander Apartsin, Boaz
chine Intelligence, 36(2):346–360, 2013. 1, 3, 4
Nadler, and Anat Levin. Accurate blur models vs. image pri- [29] Andreas Lugmayr, Martin Danelljan, and Radu Timofte. Un-
ors in single image super-resolution. In ICCV, pages 2832– supervised learning for real-world super-resolution. In ICCV
2839, 2013. 1, 3 Workshop, pages 3408–3416, 2019. 2, 3
[13] Manuel Fritsche, Shuhang Gu, and Radu Timofte. Frequency [30] Andreas Lugmayr, Martin Danelljan, and Radu Timofte.
separation for real-world super-resolution. In ICCV Work- Ntire 2020 challenge on real-world image super-resolution:
shop, pages 3599–3608, 2019. 7, 8 Methods and results. In CVPR Workshops, pages 494–495,
[14] Michael D Grossberg and Shree K Nayar. What is the space
2020. 3, 8
of camera response functions? In CVPR, pages II–602, 2003. [31] Zhengxiong Luo, Yan Huang, Shang Li, Liang Wang, and
5 Tieniu Tan. Unfolding the alternating optimization for blind
[15] Jinjin Gu, Haoming Cai, Haoyu Chen, Xiaoxing Ye, Jimmy
super resolution. NeurIPS, 33, 2020. 2, 3
Ren, and Chao Dong. Pipal: a large-scale image quality as- [32] Chao Ma, Chih-Yuan Yang, Xiaokang Yang, and Ming-
sessment dataset for perceptual image restoration. ECCV, Hsuan Yang. Learning a no-reference quality metric for
2020. 8 single-image super-resolution. CVIU, 158:1–16, 2017. 8
[16] Jinjin Gu, Hannan Lu, Wangmeng Zuo, and Chao Dong. [33] Kede Ma, Zhengfang Duanmu, Qingbo Wu, Zhou Wang,
Blind super-resolution with iterative kernel correction. In Hongwei Yong, Hongliang Li, and Lei Zhang. Waterloo ex-
CVPR, pages 1604–1613, 2019. 2, 3, 5, 6, 7 ploration database: New challenges for image quality assess-
[17] Zheng Hui, Xinbo Gao, Yunchu Yang, and Xiumei Wang.
ment models. IEEE TIP, 26(2):1004–1016, 2017. 6
Lightweight image super-resolution with information multi- [34] Henrique S Malvar, Li-wei He, and Ross Cutler. High-
distillation network. In ICME, pages 2024–2032, 2019. 1 quality linear interpolation for demosaicing of bayer-
[18] Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth
patterned color images. In IEEE International Conference on
Vanhoey, and Luc Van Gool. DSLR-quality photos on mo-

4799
Acoustics, Speech, and Signal Processing, volume 3, pages Chao Dong, and Liang Lin. Unsupervised image super-
iii–485, 2004. 5 resolution using cycle-in-cycle generative adversarial net-
[35] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database works. In CVPR Workshops, pages 701–710, 2018. 2, 3
of human segmented natural images and its application to [52] Kai Zhang, Luc Van Gool, and Radu Timofte. Deep un-
evaluating segmentation algorithms and measuring ecologi- folding network for image super-resolution. In CVPR, pages
cal statistics. In ICCV, volume 2, pages 416–423, 2001. 6 3217–3226, 2020. 1, 3, 4
[36] Yusuke Matsui, Kota Ito, Yuji Aramaki, Azuma Fujimoto, [53] Kai Zhang, Yawei Li, Wangmeng Zuo, Lei Zhang, Luc
Toru Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa. Van Gool, and Radu Timofte. Plug-and-play image restora-
Sketch-based manga retrieval using manga109 dataset. Mul- tion with deep denoiser prior. IEEE TPAMI, 2021. 2
timedia Tools and Applications, 76(20):21811–21838, 2017. [54] Kai Zhang, Xiaoyu Zhou, Hongzhi Zhang, and Wangmeng
6 Zuo. Revisiting single image super-resolution under inter-
[37] Tomer Michaeli and Michal Irani. Nonparametric blind net environment: blur kernels and reconstruction algorithms.
super-resolution. In ICCV, pages 945–952, 2013. 2 In Pacific Rim Conference on Multimedia, pages 677–687,
[38] Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak- 2015. 2
ing a “completely blind” image quality analyzer. IEEE SPL, [55] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and
20(3):209–212, 2012. 8 Lei Zhang. Beyond a gaussian denoiser: Residual learning
[39] Seonghyeon Nam, Youngbae Hwang, Yasuyuki Matsushita, of deep CNN for image denoising. IEEE TIP, pages 3142–
and Seon Joo Kim. A holistic approach to cross-channel im- 3155, 2017. 6
age noise modeling and its application to image denoising. [56] Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang.
In CVPR, pages 1683–1691, 2016. 4 Learning deep CNN denoiser prior for image restoration. In
[40] Sangwoo Park, Erchin Serpedin, and Khalid Qaraqe. Gaus- CVPR, pages 3929–3938, July 2017. 2, 3
sian assumption: The least favorable but the most useful [lec- [57] Kai Zhang, Wangmeng Zuo, and Lei Zhang. FFDNet: To-
ture notes]. IEEE SPM, 30(3):183–186, 2013. 4 ward a fast and flexible solution for CNN-based image de-
[41] Tomer Peleg and Michael Elad. A statistical prediction noising. IEEE TIP, 27(9):4608–4622, 2018. 6
model based on sparse representations for single image [58] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Learning a
super-resolution. IEEE TIP, 23(6):2569–2582, 2014. 2 single convolutional super-resolution network for multiple
[42] Tobias Plotz and Stefan Roth. Benchmarking denoising al- degradations. In CVPR, pages 3262–3271, 2018. 1, 2, 3,
gorithms with real photographs. In CVPR, pages 1586–1595, 4
2017. 2, 4 [59] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Deep plug-and-
[43] Gernot Riegler, Samuel Schulter, Matthias Ruther, and Horst play super-resolution for arbitrary blur kernels. In CVPR,
Bischof. Conditioned regression models for non-blind single pages 1671–1681, 2019. 2, 5
image super-resolution. In ICCV, pages 522–530, 2015. 3, 4 [60] Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng
[44] Mehdi SM Sajjadi, Bernhard Schölkopf, and Michael Zhong, and Yun Fu. Image super-resolution using very deep
Hirsch. Enhancenet: Single image super-resolution through residual channel attention networks. In ECCV, pages 286–
automated texture synthesis. In ICCV, pages 4501–4510, 301, 2018. 2
2017. 2 [61] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and
[45] Assaf Shocher, Nadav Cohen, and Michal Irani. “zero-shot” Yun Fu. Residual dense network for image super-resolution.
super-resolution using deep internal learning. In ICCV, pages In ICCV, pages 2472–2481, 2018. 1
3118–3126, 2018. 1, 3, 4
[46] Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming-
Hsuan Yang, and Lei Zhang. Ntire 2017 challenge on sin-
gle image super-resolution: Methods and results. In CVPR
Workshops, pages 114–125, 2017. 1, 6
[47] Radu Timofte, Vincent De Smet, and Luc Van Gool. A+:
Adjusted anchored neighborhood regression for fast super-
resolution. In ACCV, pages 111–126, 2014. 1
[48] Longguang Wang, Yingqian Wang, Xiaoyu Dong, Qingyu
Xu, Jungang Yang, Wei An, and Yulan Guo. Unsuper-
vised degradation representation learning for blind super-
resolution. In CVPR, pages 10581–10590, 2021. 3
[49] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu,
Chao Dong, Yu Qiao, and Chen Change Loy. ESRGAN:
Enhanced super-resolution generative adversarial networks.
In ECCV Workshops, 2018. 1, 6, 7, 8
[50] Pengxu Wei, Hannan Lu, Radu Timofte, Liang Lin, Wang-
meng Zuo, et al. Aim 2020 challenge on real image super-
resolution: Methods and results. In ECCV Workshops, 2020.
3
[51] Yuan Yuan, Siyuan Liu, Jiawei Zhang, Yongbing Zhang,

4800

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy