0% found this document useful (0 votes)
34 views16 pages

去雾

Uploaded by

301477
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views16 pages

去雾

Uploaded by

301477
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL.

29, 2020 6773

PMHLD: Patch Map-Based Hybrid Learning


DehazeNet for Single Image
Haze Removal
Wei-Ting Chen , Student Member, IEEE, Hao-Yu Fang, Jian-Jiun Ding , Senior Member, IEEE,
and Sy-Yen Kuo , Fellow, IEEE

Abstract— Images captured in a hazy environment usually where I(x) is the hazy image captured by the camera, J(x) is
suffer from bad visibility and missing information. Over many the haze-free image, A is the atmospheric light, and t(x) is the
years, learning-based and handcrafted prior-based dehazing algo- transmission map which can be expressed as t (x) = e−βd(x) ,
rithms have been rigorously developed. However, both algorithms
exhibit some weaknesses in terms of haze removal performance. where β is the scattering coefficient and d(x) is the path length
Therefore, in this work, we have proposed the patch-map-based from the sensor to the object. Then, to acquire the haze-free
hybrid learning DehazeNet, which integrates these two strategies scene, (1) can be reformulated as follows:
by using a hybrid learning technique involving the patch map and
a bi-attentive generative adversarial network. In this method, the I (x) − A
J (x) = + A. (2)
reasons limiting the performance of the dark channel prior (DCP) t (x)
have been analyzed. A new feature called the patch map has been
defined for selecting the patch size adaptively. Using this map, the In (2), the transmission map t(x) and the atmospheric light
limitations of the DCP (e.g., color distortion and failure to recover A evidently are crucial for dehazing. However, this problem is
images involving white scenes) can be addressed efficiently. ill-posed since it is difficult to estimate t(x) and A based on
In addition, to further enhance the performance of the method the information of a single image. Several studies based on
for haze removal, a patch-map-based DCP has been embedded
into the network, and this module has been trained with the handcrafted priors and learning techniques have focused on
atmospheric light generator, patch map selection module, and obtaining solutions for these two crucial variables.
refined module simultaneously. A combination of traditional and The dehazing algorithms based on hand-crafted
learning-based methods can efficiently improve the haze removal priors [2]–[6] have been proposed by observing the difference
performance of the network. Experimental results show that between hazy and haze-free images. Tarel and Hautiere [2]
the proposed method can achieve better reconstruction results
compared to other state-of-the-art haze removal algorithms. estimated the atmospheric veil by using the bilateral filter.
He et al. [3] proposed the dark channel prior (DCP) to
Index Terms— Haze removal, end-to-end hybrid learning sys- estimate the transmission map of a hazy image. Zhu et al. [4]
tem, dark channel prior, patch map, bi-attentive generative
adversarial network. developed the color attenuation prior which can predict the
transmission map efficiently. Berman et al. [5] proposed
I. I NTRODUCTION the non-local transmission map obtained from the haze-line

T HE presence of haze in an image might affect the


performance of a network in object detection and human
vision. Therefore, haze removal, a process of recovering a
property. Recently, with the development of learning
techniques, many learning-based haze and smoke removal
algorithms [7]–[12] have been proposed. Tang et al. [7]
hazy image into clean image, is an important issue in image proposed a novel dehazing algorithm based on the random
processing. According to Narasimhan and Nayar [1], the haze forest regression to predict transmission values. Cai et al. [8]
formation can be modeled as developed a convolutional neural network (CNN)-based
I (x) = J (x) t (x) + A (1 − t (x)) , (1) transmission prediction network called the DehazeNet.
Ren et al. [9] utilized the multi-scale CNN (MSCNN)
Manuscript received October 10, 2019; revised March 9, 2020 and April
22, 2020; accepted April 23, 2020. Date of publication May 14, 2020; date to predict the coarse and fine-scale transmission values.
of current version July 6, 2020. This work was supported by the Ministry Li et al. [10] integrated two variables in the haze formation
of Science and Technology, Taiwan, under Grant MOST 108-2221-E-002- model as a new parameter K to reduce the reconstruction
072-MY3 and Grant MOST 108-2638-E-002-002-MY2. The associate editor
coordinating the review of this manuscript and approving it for publication errors. Li et al. [13] proposed to fuse the multi-level
was Dr. Emanuele Salerno. (Corresponding author: Sy-Yen Kuo.) transmission maps based on the depth.
Wei-Ting Chen is with the Graduate Institute of Electronics Engi- Although handcrafted prior-based and learning-based dehaz-
neering, National Taiwan University, Taipei 10617 Taiwan (e-mail:
f05943089@ntu.edu.tw). ing algorithms have been well developed for a long period,
Hao-Yu Fang, Jian-Jiun Ding, and Sy-Yen Kuo are with the both strategies still exhibit some weaknesses in comparison
Department of Electrical Engineering, National Taiwan University, to each other. Traditional handcrafted prior algorithms which
Taipei 10617 Taiwan (e-mail: danielfang60609@gmail.com; jjding@
ntu.edu.tw; sykuo@ntu.edu.tw). predict haze-free images based on the priors observed by
Digital Object Identifier 10.1109/TIP.2020.2993407 humans have excellent performance. However, they may lead
1057-7149 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
6774 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 29, 2020

Fig. 1. Dehazing results: The hazy input image is shown in (a), while
(b) shows the corresponding ground truth image. The dehazing results
obtained by the traditional DCP, the proposed patch-map-based DCP, and
(e) the proposed patch-map-based hybrid learning DehazeNet (PMHLD) are
shown in (c), (d), and (e), respectively. Fig. 2. Comparison of the performance of the proposed patch-map-based
hybrid learning DehazeNet method with the other state-of-the-art dehazing
algorithms, in terms of the CIEDE2000 and the SSIM, by applying them
on 1000 images from the RESIDE [14] dataset.
to a loss in the color fidelity in some specific scenarios (e.g.,
the DCP may show color distortion in images containing white
and bright scenes). By contrast, the learning-based methods patch-map-based DCP, and the proposed PMHLD method to a
rarely fail, but the visual results may have some limitations hazy image. Clearly, by using the patch-map-based DCP and
since the learned features may not be bound in the haze-related the hybrid learning strategy, the dehazing performance can be
features. enhanced significantly. In Fig. 2, we compare the performance
Therefore, in this work, we proposed an end-to-end archi- of the proposed method with other dehazing algorithms in
tecture called the patch-map-based hybrid learning DehazeNet terms of the structural similarity (SSIM, where a higher value
(PMHLD), which combines two types of strategies. It lever- corresponds to a better result) and the CIEDE2000 metrics
ages the merits of these two strategies and compensate their (related to color distortion, where a lower value corresponds to
weaknesses. In the proposed method, the DCP and the genera- a better result) on the well-known RESIDE dataset [14]. Fig.2
tive adversarial networks (GANs) are adopted as the backbone. shows that the proposed algorithm can achieve outstanding
First, because the patch size affects the performance of the performance compared to other state-of-the-art methods.
DCP significantly, a new feature called the patch map has been To the best of our knowledge, the proposed architecture is
defined for selecting the patch size for each pixel adaptively. the first haze removal algorithm wherein the patch map has
However, the patch map is a complex feature that is difficult to been adopted and the learning-based and prior-based methods
train. Therefore, the bi-attentive patch map selection network have been integrated into a single network. Experimental
(BAPMS-Net) has been proposed to adaptively determine the results show that the proposed method can not only solve
patch size in each pixel based on the bi-attentive discriminator the color distortion problem but can also improve the result
and a new activation function called the patch map ReLU quality of the haze removal significantly. The following are
(PMReLU). By using the patch map, the transmission map the highlights of this work:
can be estimated with high accuracy and the color distortion 1) An end-to-end dehazing system, called PMHLD, has
can be addressed efficiently. been designed, which integrates the patch-map-based DCP to
Second, to further improve the recovered result, the PMHLD the network. With the leverage of the patch map, the entire
has been proposed which embeds the patch-map-based DCP network can be trained jointly with the estimated atmospheric
into the dehazing network by formulating the dark channel light, patch map, and refined network and the images can
layer with a learnable patch map. Using this layer, the dehazing be recovered. Experimental results show that the proposed
network can train the BAPMS-Net, the atmospheric light esti- PMHLD technique can achieve a better performance in recov-
mation network, and the refined network jointly. Thus, in this ering image as compared to the original PMS-Net.
architecture, the recovery process can be carried out with 2) To address the problem of color distortion in the DCP,
the patch-map-based DCP and the learning-based atmospheric the reasons for the failure of DCP in certain scenarios and
light estimation. poor performance owing to the use of a fixed patch size have
Figure 1 shows an example of the dehazed images obtained been investigated. A new feature called patch map has been
from both a quantitative and a qualitative analysis car- proposed. The patch size can be selected adaptively for each
ried out by applying the traditional DCP, the proposed pixel using this new feature.

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: PMHLD: PATCH MAP-BASED HYBRID LEARNING DehazeNet FOR SINGLE IMAGE HAZE REMOVAL 6775

3) To predict the patch map efficiently, a novel patch map B. Haze Removal Based on Learning Strategies
selection network based on the bi-attentive GAN and a new
With the development of learning techniques, several dehaz-
activation function, called the patch map ReLU, has been
ing algorithms based on synthetic data have been proposed.
proposed. With these two proposed modules, the prediction
Tang et al. [7] computed the transmission value for every patch
of the patch map can be improved efficiently.
using the random forest. Cai et al. [8] predicted the transmis-
4) To achieve hybrid learning, the DCP has been embed-
sion value for each patch based on an end-to-end system called
ded into the learning process, and the patch-map-based dark
the DehazeNet. Ren et al. [9] proposed the MSCNN to predict
channel layer with a learnable patch map has been proposed.
the transmission map accurately. Li et al. [10] reformulated
By using this layer, both the traditional and the patch-map-
the haze formation model to combine the atmospheric light
based dark channel operations can be integrated in the network
information and the transmission map for haze-free image
learning. To the best of our knowledge, this is the first work
recovery. Ren et al. [20] proposed the CNN-based haze
carried out toward the development of a trainable dark channel
removal algorithm for videos. Zhang and Patel unskip [21]
layer.
introduced the haze formation model and also utilized GANs
The remainder of the paper is organized as follows.
to achieve clear results. In CVPR 2019 [22], an algorithm
Section II presents a review of several conventional haze
based on the patch map concept is proposed to effectively
removal models and learning techniques. Section III provides
address the color distortion problem in haze removal.
the technical details of the proposed algorithm. Section IV
discusses the experimental results. Finally, Section V provides
a conclusion. C. Generative Adversarial Networks
The architecture of the GAN was proposed by
II. R ELATED W ORK Goodfellow et al. [23] in which the generator and the
discriminator are both trained simultaneously to increase the
A. Haze Removal Based on Handcrafted Priors performance of these two networks. This novel architecture
Handcrafted prior methods are usually based on statistical has been widely adopted in a variety of image processing
analysis and observation. He et al. [3] proposed that the dark tasks, such as super resolution [24], rain or haze removal [25],
channel of a haze-free image in the natural scenario is usually image impainting [26], image translation [27], [28], and text-
close to zero. The DCP is expressed as to-image synthesis [29], [30], [31]. Due to the merits of
this architecture, we adopt the GAN as the backbone of our
 
proposed network because it is able to learn the patch map
J Dark (x) = mi n mi n J k (x) ∼ = 0, (3) and the recover images efficiently.
k∈{r,g,b} y∈(x)

where Jk (x) is the intensity in the color channel k and (x) is III. P ROPOSED M ETHOD
a local patch with fixed size centered at x. Based on (1) and
the DCP, the transmission map t(x) can be estimated by This section describes the proposed PMHLD network. A
flowchart of the network is given in Fig. 3. The proposed
   
I k (y) J k (y) network is based on two types of dehazing strategies, namely,
mi n mi n = mi n mi n t (x) the patch-map-based DCP and the hybrid-learning DehazeNet.
y∈(x) k∈{r,g,b} A k y∈(x) k∈{r,g,b} A k
+1 − t (x) ∼
= 1 − t (x). (4) Its architecture can be divided into two parts, namely, the
haze-free image generator and the haze-free image discrim-
Therefore, inator. In the generator part, the transmission map and the
atmospheric light are the two crucial variables that need to be
 
I k (y) estimated. For predicting the transmission map, the DCP was
t (x) = 1 − ω mi n mi n , (5) selected as the backbone, because it achieves an outstanding
y∈(x) k∈{r,g,b} A k
performance compared to the other handcrafted-prior-based
where ω is a constant used for recovering the haze-free image dehazing methods. However, the results recovered using the
with a high accuracy. DCP may usually exhibit severe color distortion problem.
Among other hand-crafted-prior methods, Tarel and Hau- Therefore, the reasons causing this color distortion need to
tiere [2] computed the atmospheric veil in a hazy image be analyzed. To address this problem, a new feature called
based on a bilateral filter and white balance techniques. the patch map has been proposed. To generate the patch map
Fattal et al. [15] [16] recovered hazy images using the albedo accurately, a bi-attentive patch map selection network has
of the scene and the color-line. Zhu et al. [4] established been designed. For the atmospheric light estimation, which
the color attenuation prior and Berman et al. [5] pro- is another important variable for dehazing, an estimation
posed the concept of haze-line to calculate the transmis- network has been developed to predict it precisely. For the
sion map. Meng et al. [17] applied the boundary constraint haze-free discriminator part, because we hope to achieve the
to remove haze. Chen et al. [18] developed the Gradient reconstructed results to be as close as possible to the clear
Residual Minimization method to reduce the visual artifacts. images, the discriminator architecture has been adopted. The
Zhang et al. [19] applied the maximum reflectance prior to abovementioned techniques are described in detail in the
achieve the nighttime image dehazing. following subsections.

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
6776 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 29, 2020

Fig. 3. The flowchart of the proposed haze removal algorithm.

in (6) to be usually larger than that obtained from (5). For


the bright sky region in the image, the value of the dark
channel is always far from zero. This makes the denominator
of (6) much smaller than 1 and the value of t(x) estimated
from (6) is much larger than that estimated from (5) for the
sky region. Thus, the transmission map extracted from (5)
may usually be underestimated in these regions if a small
patch size is selected. A small patch size produces a detailed
transmission map, but it may lead to a nonzero dark channel
value. In contrast, a larger patch size can generate a coarse
transmission map, which can make the dark channel be zero,
Fig. 4. White scenes may have the dark channel far from zero. (the 1st
but the recovered results may produce halo artifacts [3] and
column is the input; the 2nd and the 3rd columns are the dark channels the computation time may increase dramatically. To address
estimated with the patch size 15 and 120, respectively). this problem, selecting the patch size adaptively at the local
level is essential to achieve high-quality recovered results.
A. Problems of the Dark Channel Prior
The traditional DCP can acquire a haze-free result B. Patch Map
by estimating the transmission map accurately. However, In this subsection, we describe the process of generating a
its performance may be limited in some scenarios [3] patch map (PM) for the learning process. In the training phase,
(e.g., oversaturation and color distortion in images contain- the input hazy image is first recovered using different patch
ing white or bright scenes). The main reason causing these sizes according to (7) and (8).
problems is the fixed patch size. In the traditional DCP, the  
I k (y)
minimum of the local patch in the three color channels is ti (x) = 1 − ω mi n mi n , i = 1, . . . , n, (7)
y∈i (x) k∈{r,g,b} A
assumed to be zero. However, this assumption may not hold
in certain scenarios and hence might deteriorate the recovered I (x) − A
Ji (x) = + A, k ∈ {r, g, b} i = 1, . . . , n, (8)
image quality. Using the original DCP prior from (3), t(x) can ti (x)
be estimated from (5). However, when the patch size is fixed to where I(x) and A are the hazy image and the atmospheric light.
a smaller value, (3) may be nonzero and the transmission map ti (x) and Ji (x), are the transmission maps and the haze-free
t(x) estimated from (5) would not be valid. This error may images estimated by the DCP with different patch sizes i,
often occur in regions with high intensity. The examples in respectively. Then, the error function between the dehazing
Fig. 4 illustrate this phenomenon in detail, from which it can results and the ground truth is computed by
be observed that for a smaller patch size, the assumption that  
the dark channel is zero may be invalid. In this case, instead E i (x) =  Jgt h (x) − Ji (x) , i = 1, . . . , n, (9)
of using (5), the real transmission map t(x) should be
  where Ei (x) is the error function and Jgt h (x) is the ground
I k (y) truth. The error functions will be calculated for different patch
1 − mi n mi n
y∈(x) k∈{r,g,b} A k size i in every pixel. In the next step, the ground-truth value
t (x) =   . (6) of the patch map is defined by searching the patch size that
mi n J A(y)
k
1 − mi n k can minimize the error function for each pixel.
y∈(x) k∈{r,g,b}

Nevertheless, in (6), since Jk (y) is usually unknown, the P M (x) = k, wher e k = arg mi n (E k (x)) , i = 1, . . . , n,
k
transmission value cannot be estimated in the haze removal (10)
process. Based on the examples in Fig. 4, one can notice
that the value of some pixels in the denominator in (6) may where PM(x) is the ground-truth patch map and k is the patch
usually be less than 1. This may lead to the value of t(x) map value at the location x. The maximal patch size n was set

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: PMHLD: PATCH MAP-BASED HYBRID LEARNING DehazeNet FOR SINGLE IMAGE HAZE REMOVAL 6777

the intensity and the depth difference information, the attention


map can allow the network to focus on important regions.
2) Patch Map Generator: The architecture of the patch map
generator can be divided into an encoder and a decoder part.
At the beginning of the network, the input image is multiplied
by Attention_G and convolved with a 3×3 kernel to project
the information into the higher dimension space. The result is
then passed to the proposed multiscale U-Module to extract
the image features. The structure of the multiscale U-module
consists of two different blocks, that is, the multiscale-W-
ResBlock (MWRB) and the multiscale deconvolution block
Fig. 5. The patch map generated by the method in sub-section III-B. (upper
(MDB), as shown in Fig. 7. In the case of MWRB, the
row: input hazy images; lower row: corresponding patch maps). Wide-ResNet (WRN) [36] is applied as the backbone. In each
block, the multiscale technique is used to extract information
in different scales. This idea has been inspired from the
to 120. From this operation, the ground-truth patch map of all work in [37] and [9], which utilized the multilevel strategy
images can be computed for the training procedure. to achieve better performance for feature extraction. In the
Some examples of the ground-truth patch maps are shown proposed network, we apply this strategy to generate the
in Fig. 5. Notice that, in general, in the white, gray, bright, patch map since we believe that these extracted features are
and sky regions, the patch size is preferred to be larger to fit haze-related. To further improve the capability of extracting
the DCP. In the dark regions or regions with only one or two features at different levels, the pyramid connected style has
color components, the patch size is preferred to be smaller. been utilized. To be more specific, in the first block, three
In the bright regions, the patch size may vary because the convolutional kernels of sizes 5×5, 3×3 and 1×1 are con-
patch size should be adjusted properly to attain the value of nected in parallel, since a large amount of information in
zero. For the regions with large depth difference, the patch different scales can be preserved. In the second and third
size may be small. Moreover, the regions with similar colors blocks, the convolutional kernels of sizes (3×3, 1×1) and
do not always have the same patch size, as the patch map will (1×1), respectively, are applied. Moreover, to preserve the
change according to the neighboring pixels (see the image of contextual information from the previous layer, the multilevel
the cabinet in the 3rd column of Fig. 5). Therefore, from the pooling (multipooling) is proposed.
analysis above, it can be understood that the patch map is a
complicated feature and is difficult to calculate. l =  Cek (x), (11)
e∈s

where x represents the features determined from the previous


C. Bi-Attentive Patch Map Selection Net MWRB layer, Cek () denotes the stride convolution operation
To predict the patch map accurately, the BAPMS-Net has with a kernel size e and the dilated level k, || denotes the
been proposed to select the patch size adaptively. The design concatenation operation, and s is the scale range for the stride
of this network is as shown in Fig. 6. The network is composed convolution where s ∈ {2, 3, 5}. In this work, k has been set
of two subnetworks, namely, the patch map generator and to 2. Using the multipooling operation, the extracted features
the patch map discriminator. In the patch map generator, the can be preserved in different receptive sizes and scales.
attention module and the encoder-decoder architecture have For the other part of the multiscale U-module, the
been designed to improve the performance of the patch map Multiscale-Deconvolution Block (MDB) is applied to com-
generation. Moreover, to improve the performance of the bine the information from the MWRB. This is because the
training process and the network, based on the analysis of deconvolution operation can help the network to reconstruct
data distribution, a novel activation function called the patch the shape information of the input data [38] and the patch
map ReLU (PMReLU) has been proposed. size is related to this shape information. In the rest of the
1) Attention Module: Inspired from the works in [32]–[35] patch map generator, the outputs from MWRB and MDB are
which utilized the attention map to focus on the important concatenated. This information will be fed into the next MDB
regions in the network, in this work, an attention module and the decoder in the higher layer. The design of the decoder
(called Attention_G) has been adopted to enhance the perfor- is composed of the global convolutional network module
mance of the patch map generation as in Fig. 8(a). The idea (GCN) [39] and the boundary refinement module (BR) [40]
of Attention_G is as follows: in the traditional DCP, the patch to preserve the high resolution data and the edge detail. The
selection is sensitive to the recovered image quality in the feature map is then fed into the upscale layer and the densely
regions having a high intensity difference and depth difference. connected style [40], [41] is adopted to merge the extracted
For the high intensity part, as mentioned in section III-A, the information.
color distortion problem is caused due to an erroneous selec- 3) Patch Map ReLU: In the training process, the activation
tion of the patch size. In addition, the region having large depth function is important since it can greatly enhance the conver-
difference may usually suffer from the halo artifacts [3] if the gence rate and improve the performance of the network. For
wrong patch size is selected. Thus, by using a combination of example, in a previous study [8], the BReLU was proposed

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
6778 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 29, 2020

Fig. 6. Overview of the proposed BAPMS-Net.

Fig. 9. Architecture of the proposed Patch Map ReLU.

function are usually Sigmoid or ReLU. However, these acti-


vation functions may have the problems of vanishing gradient
Fig. 7. Structure of the proposed Multiscale-W-ResBlock and the (i.e., zero gradient) and overflowing response in the last layer.
Multiscale-Deconvolution block.
In addition, these functions may not match the distribution of
data in some specific scenarios. Thus, in the design of the
BAPMS-Net, to accelerate the convergence rate and improve
the network performance, the patch map ReLU (PMReLU)
has been proposed by analyzing the data distribution. In the
learning process, the PMReLU has been defined in (12) and
its visual representation is given in Fig. 9.


⎪ γx i f 0 ≤ x ≤ p1




⎨α (x − p1 ) + pmax i f p1 ≤ x ≤ th 1
F(x) = αx i f th 0 ≤ x ≤ 0 (12)



⎪ β (x − th 1 ) + β0 i f x > th 1
Fig. 8. Generation of (a) Attention _G and (b) Attention_D (i.e., the ⎪

bi-attentive map introduced in subsection III-C). The depth difference corre- ⎩β (x − th ) + αth i f x < th ,
sponds to the value of the gradient of the depth intensity in the input image. 0 0 0

where
to predict the transmission map based on the properties of the pmax
p1 = , β0 = α (th 1 − p1) + pmax . (13)
transmission map. In general, the candidates of the activation γ

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: PMHLD: PATCH MAP-BASED HYBRID LEARNING DehazeNet FOR SINGLE IMAGE HAZE REMOVAL 6779

which may be fake frequently. However, since the patch map is


a complicated feature, inspired by these works, the bi-attentive
map (i.e., the Attention_D in Fig. 8(b)) has been proposed
to further enhance the performance of the discriminator.
In using this map, the hazy image and the predicted patch map
are concatenated together and this concatenated information
is multiplied with the attention map of the generator (i.e.,
Attention_G) and the error map computed from the difference
between the estimated patch map and the ground truth.
Although the information of attentive regions can enhance
the performance of the network, it may not be effective
sometimes. Meanwhile, a fake region may occur in the region
where the network does not focus on. Thus, for the proposed
discriminator, the bi-attentive map strategy can address this
problem during the training process, because the error map can
Fig. 10. Statistical analysis on the distribution of the optimal patch map be updated at each epoch. Based on the updated error map and
value in our training dataset. the original attention map, the fake region will be differentiated
thoroughly and the performance of the discriminator can
thus be improved. The input of the bi-attentive patch map
F(x) is the output of PMReLU, pmax is maximum of the discriminator is the predicted patch map, which multiplies with
patch size, th0 and th1 are the thresholds, and α, β and γ are the bi-attentive map. The detail of the discriminator is shown
the three different gradient values of the activation function, in Fig. 6. There are two reasons for applying the attention
respectively. The relation between the three gradient values is map used in the generator (i.e., Attention_G) as the other
α <<< β < γ . In this work, pmax = 120, and α, β and γ are attentive information. First, we believe that the region to which
10−5 , 1, and 2, respectively. The function is formulated as in the generator pays attention to should also be important in
(12). It was derived from the analysis of the patch map shown the discrimination process. Second, there are few errors in
in Fig. 10 in which the histogram of our training set has been the predicted patch map in the final training iterations, which
shown by applying it to 2400 hazy images from the RESIDE implies that the input of the discriminator may vanish because
dataset [14]. The figure shows that the distribution of the patch the entire error map is close to zero. This may degrade the
size generally concentrates in the regions having the highest performance of the discriminator. Therefore, application of
and the lowest values. Inspired by the statistical analysis, Attention_G can address this problem, as this mechanism can
the activation function called PMReLU has been designed, prevent the input of the discriminator from becoming zero.
as given by (12). For the first section of the activation function 5) Loss Function: As shown in Fig. 6, there are two losses
(i.e., F(x) = γ x, we hope that the higher gradient γ can help in our patch map generator: the loss of the patch map and the
the output between 0 and pmax to be easily shifted to 0 and loss of the attention map. The loss function of the generator
120. For the second section (i.e., F(x) = α(x − p1 ) + pmax ) in BAPMS-Net can be illustrated as
and third section (i.e., F(x) = αx), the concept of Leaky
ReLU [42] and BReLU [8] have been adapted. Instead of L G_B A P M S−Net = L P M + λ1 L AT T + λ2 L Adv_B A P M S−Net ,
clipping the value, as in BReLU, we assume that a small value
(14)
of slope can help the output to shift to the target range and
preserve some information, which would otherwise be clipped
by BReLU. For the last two sections (i.e., F(x) = β(x − th1 ) where LG_B A P M S−Net is the generative loss of the
+ β 0 and F(x) = β(x − th0 ) + αth0 , we consider that the BAPMS-Net, L P M denotes the mean squared error (MSE)
predicted values beyond these two ranges are far from the between the predicted patch map and the ground truth
target range. In this work, these two threshold values th0 and patch map, L AT T is the L1 -norm error between the pre-
th1 are set to −120 and 240. These two values are designed dicted attention map and the ground truth attention map,
as the indicators to make the patch size far from the upper and L Adv_B A P M S−Net represents the adversarial loss (The
and lower limits to shift to the target range rapidly during Wasserstein loss [43] is adopted here). λ1 and λ2 are set
the training process. In the proposed PMReLU function, the as 10000 and 2.5, respectively. Moreover, the losses of the
parameters α, β, and γ also proportional to the gradients of discriminator are defined as
different parts.
4) Bi-Attentive Patch Map Discriminator: The purpose of L D_B A P M S−Net = L W _Real + L W _F ake + 10L G P , (15)
using a discriminator is to help the generator learn the
real results by validating the output image. Recently, sev- where L D_B A P M S−Net is the loss function for training the
eral GAN-based image processing techniques [32], [33], [35] discriminator which was proposed in a previous study [43].
have been used to perform a local discriminator or attentive LW R eal and LW F ake are the Wasserstein losses [43] of real and
discriminator to achieve better performance. With the local fake patch map, respectively, passing through the discrimina-
information, the discriminator can focus more on the regions tor. Moreover, LG P is the gradient penalty loss [43].

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
6780 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 29, 2020

D. Hybrid Learning Dehaze-Network


As mentioned above, both the traditional handcrafted-prior
strategy and the learning-based strategy for haze removal have
their advantages and disadvantages. Therefore, a novel hybrid
learning dehaze network (shown in Fig. 3) has been proposed
to integrate the merits of these two types of approaches.
As seen from Fig. 3, the hazy image is fed into the networks
for transmission estimation and atmospheric light estimation.
For transmission estimation, the patch map is computed by the Fig. 11. Architecture of the patch-map-based dark channel layer.
BAPMS-Net (illustrated in Section III-C). Using the estimated
patch map and atmospheric light information, a coarse trans-
mission map is calculated by the patch-map-based DCP layer. for images containing sunlight. In another work [21], the
However, because the patch-map-based strategy may generate atmospheric light was predicted based on the U-Net [45].
the block-artifact problem [3], a network to refine the coarse However, this method may cause instability during the training
transmission map has been proposed. Using the estimated process because the atmospheric value in the ground truth
atmospheric light and the fine transmission map, the haze is a constant, whereas U-Net generates different values for
formation model is applied to generate the haze-free image. various locations. Therefore, in this work, we apply the
The recovered result and hazy image will be concatenated and architecture of VGG-16 [46] as the backbone for atmospheric
validated by the haze-free image discriminator to ensure that light estimation. Moreover, to further improve the performance
the recovered image looks like a real one. of this module, the convolution operation in the original model
1) Patch-Map-Based Dark Channel Layer: To perform end- has been replaced with the multilevel pooling operation (11)
to-end learning in the hybrid learning DehazeNet, we need to extract the multilevel information. As mentioned above,
to reformulate the patch-map-based DCP to a trainable layer. the predicted atmospheric light value will be extended as a
Therefore, a patch-map-based dark channel layer has been matrix with the same dimensions as the input and recovered
proposed, as shown in Fig. 11. This layer is extended with by (2).
a learnable patch map. Its inputs are the input hazy image 3) Refined Network: For a patch-based algorithm, a block-
and the patch map. First, the minimum operation as follows ing artifact [3] may sometimes be generated. To address this
is performed: problem, a refined network has been designed. Its architecture
is similar to that of the patch map generator mentioned in
C (i, j, k, l) = mi n [I (i + τ, j + η, k)] , (16) the Section III-A but with less depth. Its input is the coarse
0<τ ≤l, 0<η≤l
transmission map. The network is trained based on minimizing
where I is the input image; i, j, k are indices of each axis,
the difference between the output transmission map and the
respectively; l is the index of the patch size axis, and C (i, j,
ground truth transmission. With this refined network, the block
k, l) is the result of the minimum operation.
artifact problem is well addressed.
In the next step, the patch map is projected to the patch
4) Haze-Free Image Discriminator: To obtain an accurate
map box B(i, j, l) as follows:
estimation of the transmission map and the atmospheric light,
1, if P(i, j ) = l we use the haze-free image discriminator. The structure of the
B(i, j, l) = (17) discriminator (see Fig. 3) is similar to the discriminator used
0, otherwise,
in BAPMS-Net. However, its input is the recovered image.
where P(i, j) is the patch map predicted by the BAPMS-Net. By using the haze-free image discriminator, the recovered
Thereafter, the minimum filter will be performed on result can be differentiated with greater accuracy.
C(i, j, k, l) along the color channel axis. This information 5) Loss Function: The entire loss function in the hybrid
is multiplied with B(i, j, l) and the coarse transmission map DehazeNet can be expressed as follows:
is computed by summing the value along the patch size axis.
Note that the patch-map-based DCP can also be used in the L G_H ybrid = L G_B A P M S−Net + λ3 L At m + λ3 L Re f ine
fixed patch size scenario if the patch map is a constant. With +λ2 L Adv_H ybrid, (18)
this reformulation, the patch-map-based DCP can be integrated L D_H ybrid = L W _Real + L W _F ake + 10L G P, (19)
into the hybrid learning dehaze-network.
2) Atmospheric Light Estimation Module: The atmospheric where LG_H ybrid and L D_H ybrid are the generative and
light estimation module is used to predict the air-light intensity discriminative losses, respectively, of the Hybrid Learning
for haze removal. Based on the previous works on haze Dehaze-Net. L Adv_H ybrid represents the adversarial loss (the
removal [3], [4], [8], [9], [13], [21], [22], [44], we assume that Wasserstein loss [43] has been adopted here). LG_B A P M S−Net
the atmospheric light map A is homogeneous. In other words, is the generative loss of the BAPMS-Net defined in (14).
the estimated A has the same dimensions as the input image L At m is the atmospheric light loss, which is defined as the
and each pixel has the same value. In a few works [3], [4], MSE between the predicted atmospheric light and the ground
[8], [9], [22], the atmospheric light was estimated by sorting truth. L Re f ine is the loss of the refined network which presents
the value of the dark channel or pixel intensity. However, this MSE between the predicted transmission and the ground
type of strategy may generate an unpleasant result especially truth transmission. LW R eal and LW F ake are the Wasserstein

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: PMHLD: PATCH MAP-BASED HYBRID LEARNING DehazeNet FOR SINGLE IMAGE HAZE REMOVAL 6781

Fig. 12. Visual comparison of the dehazing results obtained by applying the proposed PMHLD algorithm and the other state-of-the-art algorithms to a
synthetic dataset.

losses [43] of real and fake haze-free image respectively,


passing through the discriminator. LG P is the gradient penalty
loss proposed in a previous work [43]. λ2 and λ3 are some
constants that have been set to 2.5 and 10−4 respectively.
Similar to (15), L D_H ybrid is the loss function for training
the discriminator which has been proposed in a previous
work [43].

IV. E XPERIMENTAL R ESULTS


This section describes the experimental results that illustrate
the performance of the proposed dehazing algorithm. The code Fig. 13. Comparison of different activation functions in the training progress.
for the algorithm can be found on our website.1
A. Dataset
1 The source code can be downloaded from:
https://github.com/weitingchen83/Dehazing-PMHLD-Patch-Map-Based- In this work, a popular large-scale hazy image dataset
Hybrid-Learning-DehazeNet-for-Single-Image-Haze-Removal-TIP-2020 called RESIDE [14] has been adopted. In the training process,

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
6782 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 29, 2020

TABLE I TABLE II
C OMPARISON OF PMHLD AND S TATE - OF - THE -A RT M ODELS W ITH THE Q UANTITATIVE E VALUATION R ESULTS , IN T ERMS OF THE F OUR A SSESS -
N UMBER OF PARAMETERS , FLOP S AND RUNTIME . N OTE T HAT, THE MENT M ETRICS , O BTAINED BY A PPLYING THE PMHLD N ETWORK
R ESULT OF DCPDN I S I NVESTIGATED BASED ON 512×512 I MAGES AND THE O THER S TATE - OF - THE -A RT M ETHODS ON THE T EST B
B ECAUSE T HIS M ETHOD R EQUIRED A F IXED -S IZE I NPUT I MAGES . T HE S YMBOL * D ENOTES THE GAN-BASED M ETHOD

we pick 2400 images from the indoor training set (ITS) and the
outdoor training set (OTS) in RESIDE. We use these images to
train the BAPMS-Net and the hybrid DehazeNet in PMHLD. other state-of-the-art networks on Test B images. In this
For the ablation study, 400 images from the ITS and the OTS experiment, 20 existing dehazing algorithms were applied
have been randomly chosen as the test dataset (called Test A). on the test images for comparison. All these results were
For quantitative evaluation, the synthetic objective testing set obtained by using the codes provided by the authors for the
(SOTS), which consists of 1000 indoor and outdoor images, respective networks. Among the different networks, 10 of
has been used (Test B). We have ensured that none of Test them have published in the past 2 years (i.e., 2018-2020).
images were used in the training process. Further, four assessment metrics, namely, MSE, SSIM, the
peak-to-peak signal to noise ratio (PSNR), and the CIEDE2000
B. Training Detail and Model Complexity Analysis color difference, were used to evaluate the performance of
these networks. The MSE, PSNR, and SSIM metrics are
For the training process, each submodule (i.e., the commonly used for calculating the image quality. CIEDE2000
BAPMS-Net, attention modules and atmospheric light estima- can present the color difference between the ground truth and
tion) were pre-trained separately. Following this, all modules the recovered result. A low value of CIEDE2000 indicates a
were trained together in the fine-tuning process. The Adam low color distortion in the recovered result.
optimizer [55] was used with a learning rate of 10−4 . In each As shown in Table II, our proposed method exhibits the best
epoch, the generator was trained with two iterations and two performance compared to other state-of-the-art algorithms in
discriminators were trained with five iterations separately with terms of all four metrics. The experimental results indicate that
a training batch size of 4. In each epoch, 10% of the images the prior-based methods tend to have a severe color distortion
were cut as the validation set. The PMHLD network was problem as compared to the learning-based method. Note that,
implemented on Tensorflow and run on a workstation with the methods in the upper part of the Table are prior-based,
3.7 GHz CPU, 64G RAM, and Nvidia Titan XP GPUs. while the others are learning-based. Moreover, from the results
In terms of its complexity, the proposed PMHLD network of CIEDE2000, it can be observed that our proposed algorithm
consists of 4.94 × 107 parameters and 2 × 108 FLOPs. exhibits outstanding color preservation (at least 27.8% smaller
The average time required for recovering one image of size value of CIEDE2000) compared to other methods. That is, the
480×640 is 0.076 second. A comparison between the PMHLD proposed method can not only remove the haze efficiently but
network and the other state-of-the-art dehazing networks is can also retain the color information. Therefore, the proposed
shown in Table I. These results have been obtained using the idea of the patch map and the combination of the prior-
same workstation as mentioned above and the code released and learning-based methods are effective for image dehazing.
by the authors. The results show that the complexity of the As compared to the traditional DCP, the proposed PMHLD
proposed method is comparable. network exhibits an improved performance (with MSE values
smaller by 74%, SSIM values larger by 8.1%, 37.8% higher
C. Quantitative Analysis on the Synthetic Dataset PSNR, and a 54% less value of CIEDE 2000 color difference).
Table II shows the quantitative comparison of the results In addition, as compared to the GAN-based methods, including
obtained by applying the proposed PMHLD algorithm and the conditional GAN (cGAN) [51], EPDN [49], Cycle (i.e.,

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: PMHLD: PATCH MAP-BASED HYBRID LEARNING DehazeNet FOR SINGLE IMAGE HAZE REMOVAL 6783

Fig. 14. Comparison of the results obtained by applying the proposed PMHLD algorithm and other state-of-the-art dehazing algorithms to real-world images.

CycleDehaze) [56], and the DCPDN [21], our method can in the figure. From the figure, it can be observed that the
achieve superior performance in terms of all the metrics as the proposed PMHLD network can achieve a better performance
proposed method integrates both the learning- and prior-based on hazy images of the real world as compared to the other
strategies into the GAN architecture. networks. It can be clearly seen that although the proposed
Figure 12 shows a visual comparison of the dehazed images model has been trained by a synthetic dataset, the robustness
from the SOTS dataset, from which one can see that the of the model still holds and the haze can be efficiently removed
proposed method exhibits excellent performance on dehazing without damaging the image quality. The results recovered
and does not suffer from the color distortion problem. by the prior-based methods (i.e., Meng, CEP, and DCP) may
suffer from a severe color distortion problem (see the 1st ,
the 3rd , and the 4t h rows for the results corresponding to
D. Dehazed Results on Real World Images these algorithms) even if they can remove haze clearly. In the
Figure 14 shows a collection of several hazy images of case of the learning-based methods, the DCPDN sometimes
the real world, which have been used for evaluation in the exhibits the overexposure problem, and the color fidelity may
previous works on dehazing. The proposed PMHLD network be lost (see the 1st , the 6t h , the 9t h , and the 11t h rows). The
and the other state-of-the-art haze removal algorithms have cGAN network shows good performance but suffers from the
been applied to these images and results have been compared color distortion problem (see the 1st , the 2nd , the 3rd and
Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
6784 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 29, 2020

Fig. 15. Comparison of the dehazing results for white and bright scenes. The 1st column shows the input images, the 2nd column shows the results recovered
by the conventional fixed patch size DCP, the 3rd shows the results recovered by the proposed PMHLD method. The 4th and 5th columns show the enlarged
views of the white and bright portions of the dehazing results obtained in the 2nd and the 3rd columns, respectively.

the 10t h rows). Unlike the proximal Dehaze-Net (PDN) and


the DCPL networks, the proposed method exhibits a superior
performance in color fidelity and visibility.
However, visual comparison is subjective because we do
not have the ground truth to calculate the image quality.
Recently, a reference-less image quality metrics, namely, the
perceptual index (PI) [62] and the fog aware density eval-
uator (FADE) [57] have been introduced for accessing the
performance of dehazing [49] quantitatively. These metrics
compute the visual quality and the fog density of the image
Fig. 16. Comparison of the learned latent statistical regularities of the
without the ground truth. Table III provides the PI and FADE proposed PMHLD algorithm, DCP, and PDN on the haze-free images.
values obtained by applying the different dehazing algorithms
to 10 real-world images shown in Fig. 14, to compare their
performance objectively. As shown in the table, the proposed generator, while the second part includes the verification of the
PMHLD algorithm achieves the best result compared to other modules of the discriminator. The generator consists of five
state-of-the-art methods. (Note that lower the values of PI and modules: 1) the patch map generator without the multiscale
FADE, the better is the perceptual quality and the lesser is the U-module and the pyramid connected style (Module A),
fog in the reconstructed images.) 2) the patch map generator with the multiscale U-module but
Figure 15 shows the results obtained by applying the without the pyramid connected style (Module B), 3) Module
traditional DCP and the proposed PMHLD algorithm to hazy B with the pyramid connected style (Module C) (note that
images which contain white, bright, and sky scenes. From this is the original architecture of PMS-Net [22]), 4) Module
these results, we can see that, the DCP with a fixed patch C with the PMReLU activation function (Module D), and
size may usually exhibit severe color distortion problem 5) Module D with Attention_G (Module E). Meanwhile, the
(e.g., the red bounding boxes shown in the 2nd and other four modules are used to verify the improvement of the
4t h columns). However, with the proposed method, this prob- discriminator. They are 6) Module E with the discriminator
lem is efficiently avoided and the quality of the recovered but without the bi-attentive mechanism (Module F), 7) Module
images is improved (see the blue bounding boxes in the 3rd F and the discriminator with only Attention_G (Module G),
and 5t h columns). 8) Module F and the discriminator with only the error map
(Module H), and 9) Module F with the bi-attentive mechanism
E. Ablation Study (i.e., all of the proposed techniques are adopted (ALL)). The
In our proposed PMHLD network, several modules have experiment has been performed on Test A images, and the
been proposed for improving its recovery performance. In this analysis results are shown in Table IV.
subsection, we form an ablation study to verify the effective- The table shows that for the patch map generator, using
ness of the proposed modules. the multiscale U-module and the pyramid connected style is
The ablation study includes two parts: the first part includes effective in enhancing the generation of the patch map. More-
verification of the improvement in the proposed modules in the over, the PMReLU and Attention_G can effectively reduce the

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: PMHLD: PATCH MAP-BASED HYBRID LEARNING DehazeNet FOR SINGLE IMAGE HAZE REMOVAL 6785

Fig. 17. Results of the images recovered by applying (b) DCP [3], (c) PDN [59] and (d) the proposed PMHLD on a haze-free image shown in (a).

TABLE III
Q UANTITATIVE E VALUATION OF THE D EHAZING R ESULTS , IN T ERMS OF THE PI [62] AND FADE [57] FOR R EAL -W ORLD I MAGES

TABLE IV
Q UANTITATIVE MSE E VALUATION FOR A BLATION S TUDY ON T EST A. N OTE T HAT, THE M ODULE C I S THE S AME A RCHITECTURE A PPLIED IN O UR
P REVIOUS W ORK (PMS-N ET [22])

MSE by 21.8%. A comparison of the results using the con-


ventional activation functions, including BReLU, ReLU, and
the proposed PMReLU, is shown in Fig. 13. The results show
that the proposed PMReLU converges much faster. PMReLU
and Attention_G can efficiently improve the generation of the
patch map. From the above analysis, in comparison to the orig-
inal PMS-Net (Module C), the proposed modules in this work
exhibit a considerable improvement in the performance of the
patch-map estimation, (corresponding to 37% enhancement in
MSE).
In the case of the patch-map-based hybrid learning
DehazeNet structure, the performances of the traditional DCP,
the patch-map-based DCP (i.e., PMS-Net), and the pro-
posed PMHLD architecture (patch-map-based DCP + hybrid
learning DehazeNet), have been compared in Table V. From Fig. 18. Relation between the average recovered image quality and the
Table V, it can be seen that as compared to the traditional DCP, selection of the maximum patch size.
the patch-map strategy provides an improved image quality.
In addition, this strategy can efficiently get rid of the color
distortion problem. (i.e., CIEDE2000 value is almost half of the ITS and the OTS datasets. Three different atmospheric light
the corresponding value obtained from the traditional DCP). estimation strategies are compared and we ensure that none
Because it can simultaneously learn the patch map, the trans- of the test data overlap with the training data. The results in
mission map, and the atmospheric light distribution. Moreover, Table VI show that the proposed atmospheric light estimation
because the proposed algorithm integrates the merits of the module exhibits superior performance in both the recovered
prior-based algorithm and the learning-based strategy, it can image quality and color fidelity as compared to the other
efficiently remove the haze from images while retaining color methods.
fidelity.
To prove the effectiveness of the proposed atmospheric light F. Analysis on Learned Latent Statistical Regularities
estimation module, in Table VI, a comparison of the results For single image dehazing, verifying learned latent sta-
obtained using different methods for air-light prediction is tistical regularities is essential as it can reveal whether the
given. In this experiment, 500 images were selected from both recovered results will be over-dehazed or not. Therefore,

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
6786 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 29, 2020

Fig. 20. Recovered results by selecting patch size (b) 120, (c) 300 where
the corresponding ground truth is in (a).

G. Analysis on Maximal Patch Size Selection

Fig. 19. The relation between the average time required for image processing
In this section, the analysis on the selection of the maximum
and the selection of the maximum patch size. patch size has been presented. 500 images were randomly
selected from the RESIDE dataset [14] and were resized to
different scales, i.e., 480 × 640, 240 × 320, 120 × 160 and
TABLE V
60 × 80. Their corresponding patch maps were calculated and
Q UANTITATIVE A NALYSIS FOR THE E FFECTIVENESS OF THE P ROPOSED
PATCH M AP AND THE H YBRID L EARNING S TRATEGY ON THE T EST the dehazed images were recovered from the hazy images
B. N OTE T HAT, THE PATCH M AP BASED DCP I S THE O RIGINAL using the process mentioned in Subsection III-B. A plot
V ERSION OF THE PMS-N ET [22] showing the relation between the selection of the maximum
patch size and the corresponding image quality is given in
Fig. 18. From the figure, it is observed that larger input
images require larger maximum patch sizes to achieve higher
recovered image quality. The recovered image quality will
improve with an increase in the maximal patch size. However,
this trend saturates when the maximum patch size exceeds 120.
The variation of the computation time versus the maximum
TABLE VI patch size is presented in Fig. 19. In this experiment, we cal-
Q UANTITATIVE A NALYSIS FOR THE E FFECTIVENESS OF THE P ROPOSED culated the time consumption only in the patch-map-based
ATMOSPHERIC L IGHT E STIMATION M ODULE
dark channel layer because the BAPMS-Net has the same
architecture for the same input image size. It can be seen that
the time required for processing increases significantly with
the maximum patch size. In this work, we chose a maximum
patch size of 120 since it could balance the trade-off between
the efficiency and the effectiveness.

H. Limitation
an experiment is conducted to prove the superior performance In Fig. 20, the limitation of the proposed PMHLD is
of the proposed PMHLD network on haze-free images. In this presented. Notice that the recovered results using the default
experiment, two dark-channel related methods, including the maximum patch size 120 may suffer from the over-exposed
vanilla DCP [3] and the PDN [59] were selected to test problem for the region which has a large area with high inten-
the performance on clean images. In the case of DCP and
sity (see the window and the neighbor region in Fig. 20(b)).
the PMHLD networks, the prior statistics on haze-free images If we apply an even larger upper bound of the patch size,
are investigated (i.e., the dark channel value = 0). In the the over-exposed problem can be avoided (see Fig. 20(b)).
case of PDN [59], the dark channel-like statistical priors were
However, the computation time may be increased (see the
calculated, i.e., 1 − t ≈ 0. comparison in Fig. 18 and 19). There is a tradeoff between
Similar to the experiments carried out in [47], to investigate the well addressing over-exposed region and the computation
the learned statistical regularities, 500 clear images were time.
tested. The histograms and the accumulation of the dark
channel value and 1−t are presented in Figs. 16(a) and 16(b).
V. C ONCLUSION
From the figure, it can be seen that the proposed network
can learn statistical regularity better than the other DCP-based In this work, a novel dehazing algorithm called the PMHLD
methods. Further, a visual comparison of two examples has network, which combines the handcrafted-prior-based and
been shown in Fig. 17. The results recovered by the vanilla the learning-based techniques, has been proposed. First, the
DCP and PDN algorithms tend to be over-dehazed and exhibit weaknesses of the DCP have been analyzed, and a new
some color distortion. In contrast, the proposed PMHLD feature called the patch map has been developed to improve
network avoids the over-dehazing problem. the dehazing process. To generate the patch map accurately,

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: PMHLD: PATCH MAP-BASED HYBRID LEARNING DehazeNet FOR SINGLE IMAGE HAZE REMOVAL 6787

a network called the BAPMS-Net has been designed. More- [17] G. Meng, Y. Wang, J. Duan, S. Xiang, and C. Pan, “Efficient image
over, the patch-related features, an attention map, and a new dehazing with boundary constraint and contextual regularization,” in
Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 617–624.
activation function called the PMReLU have been adopted for [18] C. Chen, M. N. Do, and J. Wang, “Robust image and video dehazing
the patch map generator. For the discriminator, a bi-attentive with visual artifact suppression via gradient residual minimization,” in
mechanism has been proposed to make the information on the Proc. Eur. Conf. Comput. Vis., 2016, pp. 576–591.
[19] J. Zhang, Y. Cao, S. Fang, Y. Kang, and C. W. Chen, “Fast haze removal
erroneous area more attentive. To further improve the dehazing for nighttime image using maximum reflectance prior,” in Proc. IEEE
performance, an end-to-end haze removal architecture has Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 7418–7426.
been designed to learn the patch map, the atmospheric light, [20] W. Ren et al., “Deep video dehazing with semantic segmentation,” IEEE
Trans. Image Process., vol. 28, no. 4, pp. 1895–1908, Apr. 2019.
and the transmission map simultaneously by proposing the [21] H. Zhang and V. M. Patel, “Densely connected pyramid dehazing
trainable dark channel layer. Experimental results show that network,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
the proposed dehazing system achieves a high recovered image Jun. 2018, pp. 3194–3203.
quality in both the synthetic and the real-world datasets. [22] W.-T. Chen, J.-J. Ding, and S.-Y. Kuo, “PMS-net: Robust haze removal
based on patch map for single images,” in Proc. IEEE/CVF Conf.
Moreover, the proposed model does not lead to color distortion Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 11681–11689.
in images. By employing the proposed techniques of the patch [23] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
map, end-to-end haze removal architecture, and bi-attentive S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in
Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
GAN, high quality dehazing results can be achieved. [24] C. Ledig et al., “Photo-realistic single image super-resolution using
a generative adversarial network,” in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jul. 2017, pp. 4681–4690.
ACKNOWLEDGMENT [25] H. Zhang and V. M. Patel, “Density-aware single image de-raining using
The authors are grateful to the National Center for a multi-stream dense network,” in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit., Jun. 2018, pp. 695–704.
High-performance Computing for computer time and facilities. [26] R. A. Yeh, C. Chen, T. Y. Lim, A. G. Schwing, M. Hasegawa-Johnson,
and M. N. Do, “Semantic image inpainting with deep generative
models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
R EFERENCES Jul. 2017, pp. 5485–5493.
[1] S. G. Narasimhan and S. K. Nayar, “Chromatic framework for vision [27] T. Xu et al., “AttnGAN: Fine-grained text to image generation with
in bad weather,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. attentional generative adversarial networks,” in Proc. IEEE/CVF Conf.
(CVPR), vol. 1, 2000, pp. 598–605. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1316–1324.
[2] J.-P. Tarel and N. Hautiere, “Fast visibility restoration from a single [28] L. Wang, V. Sindagi, and V. Patel, “High-quality facial photo-sketch
color or gray level image,” in Proc. IEEE 12th Int. Conf. Comput. Vis., synthesis using multi-adversarial networks,” in Proc. 13th IEEE Int.
Sep. 2009, pp. 2201–2208. Conf. Autom. Face Gesture Recognit. (FG), May 2018, pp. 83–90.
[3] K. He, J. Sun, and X. Tang, “Single image haze removal using dark [29] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative
channel prior,” IEEE Trans. pattern Anal. Mach. Intell., vol. 33, no. 12, image inpainting with contextual attention,” in Proc. IEEE/CVF Conf.
pp. 2341–2353, Dec. 2011. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 5505–5514.
[4] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze removal algorithm [30] S. Reed et al., “Generative adversarial text to image synthesis,” in Proc.
using color attenuation prior,” IEEE Trans. Image Process., vol. 24, 33rd Int. Conf. Mach. Learn., in Proceedings of Machine Learning
no. 11, pp. 3522–3533, Nov. 2015. Research, vol. 48, M. F. Balcan and K. Q. Weinberger, Eds., Jun. 2016,
[5] D. Berman, T. Treibitz, and S. Avidan, “Non-local image dehazing,” in pp. 1060–1069.
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, [31] Z. Zhang, Y. Xie, and L. Yang, “Photographic text-to-image synthesis
pp. 1674–1682. with a hierarchically-nested adversarial network,” in Proc. IEEE/CVF
[6] F. Yuan and H. Huang, “Image haze removal via reference retrieval and Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6199–6208.
scene prior,” IEEE Trans. Image Process., vol. 27, no. 9, pp. 4395–4409, [32] K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, and D. Wierstra,
Sep. 2018. “DRAW: A recurrent neural network for image generation,” in Proc.
[7] K. Tang, J. Yang, and J. Wang, “Investigating haze-relevant features in a 32nd Int. Conf. Mach. Learn. Res., in Proceedings of Machine Learning
learning framework for image dehazing,” in Proc. IEEE Conf. Comput. Research, vol. 37, Jul. 2015, pp. 1462–1471.
Vis. Pattern Recognit., Jun. 2014, pp. 2995–3000. [33] V. Mnih, N. Heess, A. Graves, and K. kavukcuoglu, “Recurrent models
[8] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “DehazeNet: An end-to-end of visual attention,” in Proc. Adv. Neural Inf. Process. Syst., 2014,
system for single image haze removal,” IEEE Trans. Image Process., pp. 2204–2212.
vol. 25, no. 11, pp. 5187–5198, Nov. 2016. [34] S. You, R. T. Tan, R. Kawakami, Y. Mukaigawa, and K. Ikeuchi,
[9] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.-H. Yang, “Single “Adherent raindrop modeling, detectionand removal in video,” IEEE
image dehazing via multi-scale convolutional neural networks,” in Proc. Trans. Pattern Anal. Mach. Intell., vol. 38, no. 9, pp. 1721–1733,
Eur. Conf. Comput. Vis., 2016, pp. 154–169. Sep. 2016.
[10] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “AOD-Net: All-in- [35] R. Qian, R. T. Tan, W. Yang, J. Su, and J. Liu, “Attentive generative
one dehazing network,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), adversarial network for raindrop removal from a single image,” in
Oct. 2017, pp. 4770–4778. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
[11] Q. Liu, X. Gao, L. He, and W. Lu, “Single image dehazing with pp. 2482–2491.
depth-aware non-local total variation regularization,” IEEE Trans. Image [36] S. Zagoruyko and N. Komodakis, “Wide residual networks,” in Proc.
Process., vol. 27, no. 10, pp. 5178–5191, Oct. 2018. Brit. Mach. Vis. Conf., 2016, pp. 1–15.
[12] W.-T. Chen, S.-Y. Yuan, G.-C. Tsai, H.-C. Wang, and S.-Y. Kuo, “Color [37] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4,
channel-based smoke removal algorithm using machine learning for inception-resnet and the impact of residual connections on learning,” in
static images,” in Proc. 25th IEEE Int. Conf. Image Process. (ICIP), Proc. 31st AAAI Conf. Artif. Intell., 2017, pp. 4278–4284.
Oct. 2018, pp. 2855–2859. [38] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for
[13] Y. Li et al., “LAP-net: Level-aware progressive network for image semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
dehazing,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1520–1528.
Oct. 2019, pp. 3276–3285. [39] C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun, “Large kernel matters—
[14] B. Li et al., “Benchmarking single-image dehazing and beyond,” IEEE Improve semantic segmentation by global convolutional network,” in
Trans. Image Process., vol. 28, no. 1, pp. 492–505, Jan. 2019. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
[15] R. Fattal, “Single image dehazing,” ACM Trans. Graph., vol. 27, no. 3, pp. 4353–4361.
p. 72, 2008. [40] D. Liu et al., “Densely connected large kernel convolutional network
[16] R. Fattal, “Dehazing using color-lines,” ACM Trans. Graph., vol. 34, for semantic membrane segmentation in microscopy images,” in Proc.
no. 1, p. 13, 2014. 25th IEEE Int. Conf. Image Process. (ICIP), Oct. 2018, pp. 2461–2465.

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.
6788 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 29, 2020

[41] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely Wei-Ting Chen (Student Member, IEEE) received
connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. the B.S. degree in electrical computer engineer-
Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708. ing from National Chiao Tung University, Hsinchu,
[42] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities Taiwan, in 2016. He is currently pursuing the
improve neural network acoustic models,” in Proc. ICML, vol. 30, 2013, Ph.D. degree in electronic engineering with National
p. 3. Taiwan University. His research interests relating to
[43] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, computer vision, digital image processing, machine
“Improved training of Wasserstein GANs,” in Proc. Adv. Neural Inf. learning, and neural networks. His previous research
Process. Syst., 2017, pp. 5767–5777. on image dehazing and desmoking was published by
[44] Y. Liu, J. Pan, J. Ren, and Z. Su, “Learning deep priors for image dehaz- the CVPR and ICIP.
ing,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019,
pp. 2492–2500.
[45] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks Hao-Yu Fang received the B.S. degree from
for biomedical image segmentation,” in Proc. Int. Conf. Med. Image National Chiao Tung University and the M.S. degree
Comput. Comput.-Assist. Intervent., 2015, pp. 234–241. from National Taiwan University. His research inter-
[46] K. Simonyan and A. Zisserman, “Very deep convolutional networks ests relating to computer vision, machine learning,
for large-scale image recognition,” 2014, arXiv:1409.1556. [Online]. and neural networks.
Available: http://arxiv.org/abs/1409.1556
[47] J. Zhang and D. Tao, “FAMED-net: A fast and accurate multi-scale
end-to-end dehazing network,” IEEE Trans. Image Process., vol. 29,
pp. 72–84, 2020.
[48] J. Zhang, Y. Cao, Y. Wang, C. Wen, and C. W. Chen, “Fully point-wise
convolutional neural network for modeling statistical regularities in
natural images,” in Proc. ACM Multimedia Conf. Multimedia Conf.
(MM), 2018, pp. 984–992. Jian-Jiun Ding (Senior Member, IEEE) was born in
[49] Y. Qu, Y. Chen, J. Huang, and Y. Xie, “Enhanced Pix2pix dehazing Taiwan in 1973. He received the Ph.D. degree from
network,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. National Taiwan University (NTU), Taipei, Taiwan,
(CVPR), Jun. 2019, pp. 8160–8168. in 2001. In 2006, he become an Assistant Professor
[50] K. Mei, A. Jiang, J. Li, and M. Wang, “Progressive feature fusion at the Department of EE and the Graduate Insti-
network for realistic image dehazing,” in Proc. Asian Conf. Comput. tute of Communication Engineering (GICE), NTU.
Vis. (ACCV), 2018, pp. 203–215. In 2012, he was promoted to an Associate Professor.
[51] R. Li, J. Pan, Z. Li, and J. Tang, “Single image dehazing via conditional In 2017, he was promoted to a Professor. His current
generative adversarial network,” in Proc. IEEE/CVF Conf. Comput. Vis. research areas include time-frequency analysis, lin-
Pattern Recognit., Jun. 2018, pp. 8202–8211. ear canonical transforms, wavelet transforms, image
[52] X. Qin, Z. Wang, Y. Bai, X. Xie, and H. Jia, “FFA-net: Feature fusion processing, image compression, integer transforms,
attention network for single image dehazing,” 2019, arXiv:1911.07559. pattern recognition, face recognition, and machine learning.
[Online]. Available: http://arxiv.org/abs/1911.07559
[53] A. Golts, D. Freedman, and M. Elad, “Unsupervised single image Sy-Yen Kuo (Fellow, IEEE) received the B.S. degree
dehazing using dark channel prior loss,” IEEE Trans. Image Process., in electrical engineering from National Taiwan Uni-
vol. 29, pp. 2692–2701, 2020. versity in 1979, the M.S. degree in electrical and
[54] Z. Xu, X. Yang, X. Li, and X. Sun, “Strong baseline for single computer engineering from the University of Cali-
image dehazing with deep features and instance normalization,” in Proc. fornia at Santa Barbara in 1982, and the Ph.D. degree
BMVC, 2018, p. 5. in computer science from the University of Illinois
[55] D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- at Urbana-Champaign (UIUC) in 1987. He was
mization,” 2014, arXiv:1412.6980. [Online]. Available: http://arxiv. the Dean of the College of Electrical Engineering
org/abs/1412.6980 and Computer Science, NTU, from 2012 to 2015,
[56] D. Engin, A. Genc, and H. K. Ekenel, “Cycle-dehaze: Enhanced Cycle- and the Chairman of the Department of Electrical
GAN for single image dehazing,” in Proc. IEEE/CVF Conf. Comput. Engineering, NTU, from 2001 to 2004. He was a
Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2018, pp. 825–833. faculty member with the Department of Electrical and Computer Engineering,
[57] L. Kwon Choi, J. You, and A. C. Bovik, “Referenceless prediction of The University of Arizona, from 1988 to 1991, and an Engineer with Fairchild
perceptual fog density and perceptual image defogging,” IEEE Trans. Semiconductor and Silvar-Lisco, both in California, from 1982 to 1984.
Image Process., vol. 24, no. 11, pp. 3888–3901, Nov. 2015. He is currently a Distinguished Professor with the Department of Electri-
[58] T. M. Bui and W. Kim, “Single image dehazing using color ellipsoid cal Engineering, National Taiwan University (NTU), Taiwan. He has pub-
prior,” IEEE Trans. Image Process., vol. 27, no. 2, pp. 999–1009, lished 450 articles in journals and conferences, and also holds 22 U.S. patents,
Feb. 2018. 23 Taiwan patents, and 15 patents from other countries. His current research
[59] D. Yang and J. Sun, “Proximal dehaze-net: A prior learning-based deep interests include dependable and secure systems, Internet of Things, and image
network for single image dehazing,” in Proc. Eur. Conf. Comput. Vis. processing.
(ECCV), 2018, pp. 702–717. Dr. Kuo was a member of IEEE Computer Society Board of Governors
[60] W. Ren et al., “Gated fusion network for single image dehaz- from 2017 to 2019. He received the Distinguished Academic Achievement
ing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, Alumni Award from the UIUC Department of Computer Science in 2019,
pp. 3253–3261. the Distinguished Research Award, and the Distinguished Research Fellow
[61] S. Santra, R. Mondal, and B. Chanda, “Learning a patch quality Award from the Ministry of Science and Technology in Taiwan. He was
comparator for single image dehazing,” IEEE Trans. Image Process., also a recipient of the Best Paper Awards in the 1996 International Sym-
vol. 27, no. 9, pp. 4598–4607, Sep. 2018. posium on Software Reliability Engineering and the 1986 IEEE/ACM Design
[62] Y. Blau, R. Mechrez, R. Timofte, T. Michaeli, and L. Zelnik-Manor, Automation Conference, and the US National Science Foundation’s Research
“The 2018 PIRM challenge on perceptual image super-resolution,” in Initiation Award in 1989. He is the Vice President of IEEE Computer Society
Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 334–355. in 2020.

Authorized licensed use limited to: WUHAN UNIVERSITY OF TECHNOLOGY. Downloaded on January 20,2024 at 08:12:47 UTC from IEEE Xplore. Restrictions apply.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy