0% found this document useful (0 votes)
14 views12 pages

Semantic Segmentation of Remote Sensing Images Usi

The article discusses a novel deep convolutional neural network called TL-DenseUNet designed for semantic segmentation of remote sensing images, addressing challenges posed by insufficient labeled data and imbalanced data classes. The network employs transfer learning with a DenseNet-121 encoder and a decoder that utilizes dense connections for effective multiscale feature fusion. Experimental results indicate that TL-DenseUNet outperforms existing state-of-the-art models, achieving improved segmentation accuracy.

Uploaded by

reshmaitagi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views12 pages

Semantic Segmentation of Remote Sensing Images Usi

The article discusses a novel deep convolutional neural network called TL-DenseUNet designed for semantic segmentation of remote sensing images, addressing challenges posed by insufficient labeled data and imbalanced data classes. The network employs transfer learning with a DenseNet-121 encoder and a decoder that utilizes dense connections for effective multiscale feature fusion. Experimental results indicate that TL-DenseUNet outperforms existing state-of-the-art models, achieving improved segmentation accuracy.

Uploaded by

reshmaitagi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3003914, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Semantic Segmentation of Remote


Sensing Images Using Transfer Learning
and Deep Convolutional Neural Network
with Dense Connection
BINGE CUI,XIN CHEN, AND YAN LU.
College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China
Corresponding author:Yan Lu(luyan@sdust.edu.cn)
This work was supported by the National key R&D Program of China (2017YFC1405600); National Natural Science Foundation of China
(NSFC) (41406200, 41706105).

ABSTRACT Semantic segmentation is an important approach in remote sensing image analysis. However,
when segmenting multiobject from remote sensing images with insufficient labeled data and imbalanced
data classes, the performances of the current semantic segmentation models were often unsatisfactory. In
this paper, we try to solve this problem with transfer learning and a novel deep convolutional neural network
with dense connection. We designed a UNet-based deep convolutional neural network, which is called
TL-DenseUNet, for the semantic segmentation of remote sensing images. The proposed TL-DenseUNet
contains two subnetworks. Among them, the encoder subnetwork uses a transferring DenseNet pretrained
on three-band ImageNet images to extract multilevel semantic features, and the decoder subnetwork adopts
dense connection to fuse the multiscale information in each layer, which can strengthen the expressive
capability of the features. We carried out comprehensive experiments on remote sensing image datasets
with 11 classes of ground objects. The experimental results demonstrate that both transfer learning and
dense connection are effective for the multiobject semantic segmentation of remote sensing images with
insufficient labeled data and imbalanced data classes. Compared with several other state-of-the-art models,
the kappa coefficient of TL-DenseUNet is improved by more than 0.0752. TL-DenseUNet achieves better
performance and more accurate segmentation results than the state-of-the-art models.

INDEX TERMS dense connection, transfer learning, remote sensing image, multiscale feature fusion,
semantic segmentation, UNet.

I. INTRODUCTION excellent performance in many applications [10]. This trend


ITH the rapid development of remote sensing tech- has also attracted many researchers to apply deep convolu-
W nology, a massive number of remote sensing images
are becoming available every day [1]. Semantic segmenta-
tional neural networks to the field of remote sensing image
semantic segmentation [11]–[13]. The fully convolutional
tion, which aims at the pixel-level classification of images, neural network (FCN) [14] and its variants have exhibited
has become an urgent need [2]. Semantic segmentation is excellent segmentation abilities. Sherrah et al. [15] used an
one of the fundamental ways to analyze remote sensing FCN-based network without any downsampling to seman-
images. This approach can easily and quickly obtain the land tically segment high-resolution aerial images. Their method
cover information of the area of interest, thereby providing used dilated convolution in DeepLab [16] which can maintain
data support for applications such as precision agriculture, the full resolution of the images in each layer of the network
desertification detection, traffic supervision, urban planning, and make better use of image features. Compared with the
and land resource management [3]–[9]. original FCN, this method has no downsampling layer, and
In recent years, deep convolutional neural networks have the segmentation accuracy is higher. Bittner et al. [17] pro-
achieved great success in many fields, and have proven their posed the Fused-FCN4s model consisting of three parallel

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3003914, IEEE Access

B.Cui et al.: Semantic Segmentation of Remote Sensing Images Using Transfer Learning and DCNN with Dense Connection

FCN4s networks. Three-band (R, G, B), PAN (panchromatic) and transmits its own feature maps to all subsequent layers,
and nDSM (normalized digital surface model) images were which encourages feature reuse and constructs direct connec-
used as inputs to the parallel networks to extract features tions among all layers. In addition, compared with ResNet,
from high-resolution remote sensing images. Chen et al. [18] DenseNet has fewer parameters, and DenseNet with only 0.8
proposed a symmetrical FCN model, including the symmet- M training parameters can obtain the performance of a 1001-
rical normal shortcut FCN (SNFCN) and the symmetrical layer ResNet with 10.2 M parameters.
dense-shortcut FCN (SDFCN) with a shortcut connection. Inspired by the transfer learning strategy and the dense
This structure outperformed the traditional methods, and connection approach, we designed a novel end-to-end
has a symmetrical encoder and decoder, which solves the UNet-based deep convolutional neural network called TL-
problem that the structure of the decoder is always simpler DenseUNet for the semantic segmentation of remote sensing
and shallower than that of the encoder. images with insufficient labeled data and imbalanced data
Although the various FCN-based methods mentioned classes. TL-DenseUNet focuses on two aspects. First, it uses
above have achieved remarkable performances in the field transferring DenseNet-121, which is pretrained on ImageNet
of remote sensing image segmentation, their recognition images (1000 classes), to extract the multiscale semantic
capabilities rely heavily on the large-scale dataset [19], since features of ground objects from remote sensing images.
there are millions of parameters in the network that need to be The transferring parameters can provide prior knowledge
trained. For remote sensing images with insufficient labeled to accurately identify multiobject of remote sensing images
data, previous studies have mainly focused on data augmen- without overfitting. Second, dense connections are used in the
tation [20] or designing relatively uncomplicated networks to decoder subnetwork to fuse the multiscale semantic features,
avoid overfitting [21]. However, recent studies have indicated which can enhance feature reuse and information flow. Our
that the deeper the network is, the better the performance main contributions are as follows:
of the deep convolutional network [22]. Unfortunately, as (1) A UNet-based deep convolutional neural network is
the number of neural network layers increases, vanishing proposed in this paper, which performs much better in seg-
gradients problem may emerge. Thus, insufficient labeled menting multiobject of remote sensing images with insuffi-
data and vanishing gradients problem are the main obstacles cient labeled data and imbalanced data classes.
to training deep convolutional neural networks for remote (2) The transferring DenseNet-121 pre-trained on Ima-
sensing image segmentation. geNet is firstly applied in the encoder subnetwork, which
To address the first problem mentioned above, transfer plays a guiding role in multiscale feature extraction of remote
learning, as a strategy of deep learning, provides an effec- sensing images.
tive way to train a large network with limited data without (3) A novel multiscale fusion module with dense con-
overfitting. Yosinski et al. [23] experimentally quantified the nection is designed in the decoder subnetwork, which can
generality versus specificity of neurons in each layer of a effectively fuse the multiscale semantic features and enhance
deep convolutional neural network and verified that trans- the recognition power of ground objects from remote sensing
ferring features even from distant tasks yields better perfor- images.
mance than using random features. In addition, many recent The remainder of this article is organized as follows:
studies [24]–[27] have demonstrated that deep convolutional related works will be discussed in Section II. In Section III
networks pretrained on large natural image datasets such we will introduce the details of the proposed TL-DenseUNet.
as ImageNet [28] can be transferred to other datasets with Section IV describes the experimental data and results. Sec-
insufficient labeled data and perform better than other deep tion V presents the discussion. Finally, we will summarize
learning methods. Dimitrios et al. [29] exploited a pretrained our work in Section VI.
convolutional neural network based on ImageNet to extract
the initial set of representations, and then transferred it to a II. RELATED WORKS
supervised convolutional neural network classifier. Their best In this section, we briefly describe the modern structure of
result over the UC Merced Land Use benchmark improved semantic segmentation and two deep learning techniques:
the overall accuracy (OA) from 83.1% to 92.4%, indicating transfer learning and dense connection.
that transferring representations from different fields may
also be well suited for remote sensing image classification A. MODERN STRUCTURE FOR SEMANTIC
tasks. ImageNet is widely used as the source dataset in SEGMENTATION
transfer learning cases due to its large amount of labeled data. With breakthroughs in the computational power of graphic
To solve the second problem above, He et al. [30] pro- processing units (GPUs) and the development of big data,
posed ResNet with typical residual connection, which allows considerable progress in deep convolutional neural networks
the gradient to flexibly propagate through the bypassing has occurred in recent years. Semantic segmentation is a
paths. Gao et al. [31] proposed DenseNet, which utilizes a successful application of this approach and has been utilized
dense connection method to cope with the vanishing gradi- to solve pixelwise classification problems. Long et al. [14]
ents problem. Through dense connection, each convolutional proposed a semantic segmentation technique that replaced
layer receives feature maps from all previous layers as inputs, the fully connected layers with convolutional layers to enable
2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3003914, IEEE Access

B.Cui et al.:Semantic Segmentation of Remote Sensing Images Using Transfer Learning and DCNN with Dense Connection

end-to-end training, and utilized deconvolution [32] layers


to predict high-resolution masks from coarse feature maps.
In addition, to strengthen the segmentation performance,
skip connections between pooling [33] layers were used to
fuse the semantic features and appearance features (FCN-8s,
FCN-16s and FCN-32s) obtained by the network. The FCNs
combined the features from the final three layers (FCN-
8s), which made it similar to an incomplete encoder-decoder
structure. UNet [34] used a symmetric and complete encoder-
decoder structure for biomedical image segmentation, includ-
ing a contracting path and a similar expanding path in which
the pooling layers were replaced by upsampling layers. For
precise location, high-resolution features from each layer in (a)
the contracting path were combined with upsampled outputs
from the corresponding expanding path through a long skip
connection. This elegant architecture yielded outstanding
performance with very few images. SegNet [35], which was
proposed by Vijay et al. ,officially showed a typical deep
convolutional encoder-decoder structure. The encoder was
responsible for object classification, and the corresponding (b)
decoder reconstructed the encoding features to the same size
FIGURE 1. A basic nonlinear transformation module (a) in a dense block and
as the original input. In particular, the decoder used the a typical dense block (b) in densely connected convolutional networks.
pooling indices memorized from the corresponding encoder
to perform upsampling, which produced a sparse feature map
and could be trained effectively. The encoder-decoder struc- C. DENSE CONNECTION
ture is the most popular structure for semantic segmentation. Recent studies have demonstrated the importance of using
RefineNet [36] and global convolutional networks (GCNs) features from shallow layers to directly optimize features
[37], which are based on this structure, have both achieved from deep layers, especially for very deep convolutional
state-of-the-art performances. neural networks. ResNet [30] establishes an identity connec-
tion by adding an additional path beyond the normal path
B. TRANSFER LEARNING between two neurons, which allows the gradient to propagate
Traditional deep convolutional neural networks require large directly through the bypass. Moreover, the identity function
amounts of labeled data for training to achieve optimal per- and residual function are combined through summation. The
formance. The idea of transfer learning is to apply the knowl- residual connection between the front layer and the back
edge learned from a related source task with large amounts of layer effectively alleviates the problems of the vanishing
training data to a target task with comparatively insufficient gradients and model degradation as the number of network
training data in a certain way [38], which helps gradient layers increases. DenseNet [31] offers another typical ap-
propagation during training, and reduces the limitations of proach called dense connection, which mainly consists of
the data on network performance. Creating labeled data is dense blocks. Figure 1 shows a basic nonlinear transforma-
expensive, so optimally leveraging an existing dataset is key tion module (a) in a dense block and a typical dense block
[39]. Some low-level features, such as the edges and shapes (b). In contrast to residual connections, dense connections
of objects, are relevant and can be shared by transferring combine features by concatenation. In each dense block,
parameters, allowing the model need not to learn from scratch all the layers are directly connected to each other, ensuring
such as with ordinary networks. Hence, extracting abstract maximum information flow between layers (b). Therefore,
and sophisticated high-level features is the optimization goal the lth feature xl receives the output of the preceding l − 1
of the target task. The most common strategy for transfer features x0 , ..., xl−1 as input:
learning is to fine-tune a pretrained network model on a
target dataset [40]. Girshick et al. [25] pretrained a convo- xl = Hl ([x0 , ..., xl−1 ]) (1)
lutional neural network on ImageNet and then fine-tuned all
the network parameters for the target task(detection) where where [x0 , ..., xl−1 ] refers to the concatenation of the output
data are insufficient. Long et al. [41] fine-tuned only the from layers 0, ..., l − 1 and Hl is defined as a nonlinear trans-
parameters of the last few layers and confirmed that specific formation module (a). The multiple inputs and the output of
features from these layers tailored to an original task cannot Hl are combined by concatenation.
effectively bridge the domain discrepancy. Sharif et al. [10] An L − layer model produces L(L+1) 2 connections in-
used a pretrained model to extract features with a support stead of only L, as in the traditional structure, which strength-
vector machine (SVM) to solve different classification tasks. ens the information flow among layers. In addition, feature
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3003914, IEEE Access

B.Cui et al.: Semantic Segmentation of Remote Sensing Images Using Transfer Learning and DCNN with Dense Connection

FIGURE 2. The structure of DenseNet-121.

reuse means that DenseNet requires fewer parameters than corresponding Dense module in the encoder subnetwork. To
a traditional convolutional neural network, because there is improve information flow and feature reuse, dense connec-
no need to learn the redundant features obtained from earlier tions among the MF modules are designed to ensure the
layers via concatenation. The direct connections among all precise fusion of multiscale semantic features from different
the layers improve gradient propagation during training and levels. Details about the MF module can be found in Section
alleviate the vanishing gradient problem. Additionally, each III-C.
layer can obtain the gradient directly from the loss function At last, a softmax function is used to calculate the clas-
and the original input, which represents a kind of implicit sification probability distribution and derive the semantic
deep supervision and helps train the deeper network. segmentation map.
Figure 2 shows a classic version of DenseNet with dense
connections—DenseNet121, which mainly consists of four B. TRANSFER LEARNING STRATEGY IN
parts (Dense1 , Dense2 , Dense3 and Dense4 ). Each part TL-DENSEUNET
has a 2 × 2 average pooling operation to reduce the reso- As mentioned previously, the transfer learning strategy is
lution of the feature maps. Moreover, there is also an 1 × 1 leveraged to train our deep convolutional neural network. As
convolutional operation in the last three Dense parts to reduce presented in the top of Figure 4, DenseNet-121 is pretrained
the number of feature maps. Inside the Dense parts, dense on the ImageNet dataset with three bands (R, G, B). To
connections are constructed from any layer to all subsequent make it fit our target segmentation task, we remove the last
layers after the pool layers. fully connected (FC) layer and treat the rest of DenseNet-
121 as a multiscale feature extractor. Note that to transfer
III. METHODS the model to segment n-band remote sensing images, we
This section presents the proposed TL-DenseUNet. First, the adjust the channels of the first convolution kernel in the
network architecture of TL-DenseUNet is introduced in Sec- original model from three to n. The four Dense modules of
tion III-A. Then, the transfer learning strategy for multiscale TL-DenseUNet are initialized with transferring parameters,
feature extraction in TL-DenseUNet is described in Section and the Conv 7 × 7 kernel and the decoder subnetwork are
III-B. Finally, the multiscale fusion module we designed is randomly initialized by an initialization function.
exhibited in detail in Section III-C. The transfer learning strategy mainly includes two stages:
fine-tuning part of the network and fine-tuning all the net-
A. THE NETWORK ARCHITECTURE OF TL-DENSEUNET work. First, as shown in the middle of Figure 4, we freeze
The TL-DenseUNet proposed in this paper for remote sens- the transferring parameters, which means that these param-
ing image semantic segmentation is based on UNet [34], eters would not be updated in this stage, and fine-tune the
which is an end-to-end network and has two symmetric randomly initialized parameters in the Conv 7 × 7 and
encoder-decoder subnetworks. The input of TL-DenseUNet the decoder subnetwork using target training data for some
is a remote sensing image and the output is a categorical epochs. Then, as shown in the bottom of Figure 4, we use
segmentation map. target training data to fine-tune the entire network. All the
As shown in Figure 3, the transferring DenseNet-121 is parameters of the network are trained together to achieve
employed in the encoder subnetwork to extract the multiscale more excellent performance on the target segmentation task.
features of remote sensing images. The detailed transfer
learning strategy can be found in Section III-B. Note that, C. MULTISCALE FUSION MODULE
the transferring DenseNet-121 directly removes the fully The multiscale fusion (MF) module we designed aims at
connected layer to ensure end-to-end training and avoid the fusing the local information extracted from the corresponding
loss of spatial information. encoder layer and the semantic information derived from
The decoder subnetwork is responsible for fusing the the previous decoder layers. As shown in Figure 5 (a), the
object features extracted by the encoder subnetwork, which decoder subnetwork consists of five MF modules. In order to
mainly consists of five multiscale fusion (MF) modules. Skip strengthen multiscale feature reuse and improve information
connection is utilized between each MF module and the flow, dense connections are introduced among the MF mod-
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3003914, IEEE Access

B.Cui et al.:Semantic Segmentation of Remote Sensing Images Using Transfer Learning and DCNN with Dense Connection

FIGURE 3. Overview of TL-DenseUNet for the semantic segmentation of remote sensing images.

FIGURE 4. Transfer learning in TL-DenseUNet.

ules. It is the first time that dense connections are applied in is introduced to capture the context information. Note that
the decoder of UNet-like network. each Conv in the MF module represents the three consecu-
tive operations: batch normalization (BN) [43], convolution
As shown in Figure 5 (b), each MF module consists of two (Conv) and a rectified linear unit (ReLU) [44].
Conv 1×1, one Conv 3×3, two concatenation operations and
multiple UpSample operations. The Conv 1 × 1 is applied to The two concatenation operations are responsible for mul-
reduce the number of input feature maps, which has proved tiscale feature fusion. Following the structure of UNet, skip
its effectiveness in dimension reduction [42]. The Conv 3 × 3 connection is utilized in the first concatenation operation to
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3003914, IEEE Access

B.Cui et al.: Semantic Segmentation of Remote Sensing Images Using Transfer Learning and DCNN with Dense Connection

(a)

FIGURE 7. Proportional distribution of each class in the dataset.

remote sensing images.

A. DATASET
The dataset comes from the Remote Sensing Image Sparse
Representation and Intelligent Analysis competition (http:
//rscup.bjxintong.com.cn/) held by the Information Science
Department of the National Natural Science Foundation of
China (NSFC) in 2019. All the images were taken by Gaofen-
(b) 2 (GF2). Each remote sensing image has at least 7200 × 6800
FIGURE 5. Dense connections among MF modules (a) and the architecture of
pixels, which take a value between 0 and 255. The image
each MF module (b). consists of four spectral bands (R, G, B, NIR) with a spatial
resolution of 4m per pixel. Among them, ten ground truths
could be obtained and each ground truth was segmented
into eleven pixel-level classes: paddy field, irrigated land,
dry cropland, arbor forest, traffic land, industrial land, rural
residential, urban residential, river, pond, and other (back-
ground). Figure 6 (a) and (b) show one of the images and the
corresponding ground truth. The proportional distribution of
(a) (b) each class in the dataset is shown in Figure 7. We can see
that the distribution of data classes is imbalanced, and some
FIGURE 6. Example of a remote sensing image used in the test. (a) is a
four-band image (R, G, B, NIR) and (b) is the corresponding ground truth.
classes of data, such as arbor forests, are rare. It is obvious
that the imbalanced and insufficient labeled data makes the
semantic segmentation task very challenging. We divided
fuse the output from the corresponding Dense module with ten images into a training set of seven images, a validation
the output from the previous MF module which should first set of one image and a testing set of two images. During
undergo an UpSample operation. Then, following the idea training and validation, the images are randomly clipped
of dense connection, the second concatenation operation is into 256 × 256 overlapping patches by using the sliding
utilized to fuse the multiscale features from all preceding window algorithm with a stride of 128 pixels. The final
MF modules. However, the concatenation operation does not dataset includes 20020 samples for training, 2860 samples
work when the size of the feature map changes. Hence, we for validation, and 5720 samples for testing. Common data
must make sure that the feature maps we want to fuse have augmentation methods were also used to avoid overfitting
the same spatial resolution. So, the feature maps from all and optimize training. In the training stage, the images may
preceding MF modules should first undergo an UpSample be first preprocessed with one of the following operations or
operation to enlarge the low-resolution feature maps to the a combined operation: flip (horizontal or vertical) and add
largest size of the feature maps we want to fuse. Finally, noise.
these feature maps are concatenated together as the output
of the current MF module and transferred to all subsequent B. IMPLEMENTATION DETAILS
MF modules. Note that, the M F4 module has only one input We implemented TL-DenseUNet in Keras [45] with Tensor-
from the encoder subnetwork, therefore it does not have any Flow [46] as the backend. Note that the remote sensing image
fusion parts. dataset consists of four bands, so we adjust the channels of
the first convolutional kernel in the pretrained DenseNet-121
IV. EXPERIMENTS AND EVALUATION from three to four. The experiment was performed on a Linux
In this section, we briefly exhibit the test dataset and imple- platform with an NVIDIA P100 GPU (16 GB RAM). The
mentation details. Then, we evaluate the performance of the Adaptive Moment Estimation (Adam) [47] algorithm was
proposed TL-DenseUNet model in semantic segmentation of used as the optimization algorithm to minimize training loss
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3003914, IEEE Access

B.Cui et al.:Semantic Segmentation of Remote Sensing Images Using Transfer Learning and DCNN with Dense Connection

TABLE 1. Quantitative scores obtained from the semantic segmentation of remote sensing images."P":"precision";"R":"recall";"F":"F1 score";"I":"IoU".

Method Paddy Irri- Dry Arbor Indus- Urban Rural River Pond Traffic Other
Field gated Crop For- trial Resid Resid Land
Land land est Land ential ential
UNet 35.67 62.56 3.68 96.71 13.39 55.06 9.51 31.96 47.79 22.58 61.69
SegNet 15.31 71.34 4.63 92.92 34.45 68.65 13.59 50.09 54.96 26.69 62.32
P(%) DeepResUNet 1.09 75.46 5.05 89.33 43.53 60.46 5.79 56.73 59.09 27.78 61.74
RefineNet 88.68 83.83 8.19 79.39 18.01 80.41 19.29 56.87 60.18 41.11 68.48
TL-DenseUNet 90.43 82.68 12.81 90.78 52.48 57.42 35.32 79.12 69.77 45.57 70.52
UNet 1.69 60.36 3.49 58.93 1.06 62.32 1.77 23.79 61.54 0.01 74.83
SegNet 1.42 48.39 4.77 57.82 46.02 63.11 14.45 26.53 58.49 6.93 83.09
R(%) DeepResUNet 0.05 25.65 7.65 70.60 38.24 66.07 1.59 36.47 60.03 37.21 87.05
RefineNet 0.03 74.71 25.39 3.56 50.46 45.01 37.17 47.19 42.37 45.01 80.53
TL-DenseUNet 22.76 80.11 32.60 77.49 47.95 78.77 35.19 49.55 61.04 60.70 83.12
UNet 3.24 61.43 3.58 73.23 1.96 58.45 2.12 23.97 37.30 0.01 67.62
SegNet 2.59 57.67 4.70 71.28 39.40 65.76 9.27 27.36 43.07 11.01 70.98
F(%) DeepResUNet 0.10 38.29 6.08 78.87 39.59 62.89 2.47 36.39 43.83 31.81 71.12
RefineNet 0.05 79.01 15.39 6.82 45.73 67.39 22.49 52.95 55.62 42.10 73.66
TL-DenseUNet 36.37 81.37 18.39 83.61 50.40 73.52 24.18 60.64 57.94 52.06 74.99
UNet 1.64 46.34 1.57 63.77 1.12 45.61 1.06 15.45 23.81 0.01 55.08
SegNet 1.31 40.52 2.41 55.38 33.32 49.67 5.03 17.67 27.97 5.82 57.03
I(%) DeepResUNet 0.05 23.68 3.14 65.11 33.17 47.56 1.26 25.67 28.44 21.91 57.13
RefineNet 0.03 69.29 13.60 6.53 32.65 52.79 17.31 37.61 39.72 28.41 59.87
TL-DenseUNet 21.23 69.60 15.31 70.84 35.35 61.41 18.80 43.56 40.89 35.19 61.68

and update model parameters. During training, we first froze TABLE 2. OA, Kappa and MIoU obtained from the semantic segmentation of
remote sensing images.
the parameters of DenseNet-121 and trained them for ten
epochs. The initial learning rate was set to 0.0003. Then,
Method OA(%) Kappa MIoU(%)
the entire model was trained for fifty epochs with an initial
UNet 61.41 0.3992 23.22
learning rate of 0.0001 and a weight decay of 0.00001. Due
SegNet 61.76 0.4038 26.92
to the limit of the GPU memory, the batch size during training
DeepResUNet 62.05 0.4113 27.92
was set to eight in the experiment.
RefineNet 67.54 0.4917 32.53
To quantitatively evaluate the performance of the proposed
TL-DenseUNet 72.01 0.5669 43.08
TL-DenseUNet in segmenting remote sensing images, seven
traditional metrics were applied: the precision, recall, F1
score, IoU, overall accuracy (OA), kappa coefficient(Kappa)
and MIoU. class. The performance of RefineNet was second to that
of TL-DenseUNet; however, it performed very poorly for
C. RESULTS AND COMPARISONS some classes with relatively limited data. Benefiting from
To evaluate the effectiveness of TL-DenseUNet, we selected the transferring DenseNet-121 and the MF module, TL-
some state-of-the-art models for comparison: UNet [34], DenseUNet achieved relatively satisfactory performances in
SegNet [35], DeepResUNet [22], and RefineNet [36]. Note both precision and recall, and obtained the best F1 score
that all models were tested with all test images in the same and IoU. For the classes with more data, such as irrigated
experimental environment. Note that, as described above, land, the F1 score of TL-DenseUNet (81.37%) exceeded
we fine-tuned the randomly initialized parameters in TL- that of DeepResUNet (38.29%), UNet (61.43%), RefineNet
DenseUNet’s Conv 7 × 7 and decoder subnetwork for ten (79.01%), and SegNet (57.67%), which demonstrated the
epochs. Hence, to be fair, all the other models mentioned effectiveness of the MF module for enhancing the recognition
above were trained for ten more epochs than TL-DenseUNet. ability of ground objects in remote sensing images. For the
The quantitative scores obtained are shown in Table 1, and performance of segmenting classes with limited data, such
the best values of each metric are shown in bold. as arbor forest, TL-DenseUNet’s F1 score was improved by
As shown in Table 1, TL-DenseUNet outperformed other at least 4.74% and IoU was improved by at least 5.73%,
advanced semantic segmentation models largely in terms indicating that the transferring DenseNet-121 improved mul-
of most metrics. Among these models, UNet displayed the tiscale feature extraction from remote sensing images. These
worst performance, followed by SegNet, implying that sim- findings demonstrate the superiority of TL-DenseUNet in
ple models have difficulty segmenting remote sensing images the semantic segmentation of remote sensing images with
with insufficient and imbalanced labeled data. DeepResUNet insufficient and imbalanced labeled data.
performed better than the above two methods, but it still Table 2 reports the OA, kappa coefficient and MIoU ob-
had very low metrics, such as the IoU for the paddy field tained by the five models. As can be seen from the results
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3003914, IEEE Access

B.Cui et al.: Semantic Segmentation of Remote Sensing Images Using Transfer Learning and DCNN with Dense Connection

(a) (b) (c) (d) (e) (f) (g)

FIGURE 8. Visual comparisons between TL-DenseUNet (ours) and other models. The first row shows the overall results of the test image and the last three rows
show three randomly selected areas from the overall results. (a) Image. (b) Ground Truth. (c) UNet. (d) SegNet. (e) DeepResUNet. (f) RefineNet. (g)
TL-DenseUNet.

in Table 2, TL-DenseUNet was still the best among all the main extraction of arbor forests and the edge extraction
models. UNet and SegNet had the worst performances, which of rivers were also improved, indicating that the transferring
further confirmed that simple models are not suitable for DenseNet-121 and the MF module help achieve better per-
the segmentation of remote sensing images with insufficient formance than the other state-of-the-art methods.
and imbalanced labeled data. DeepResUNet and RefineNet
For further comparison, the semantic segmentation results
achieved better results than the above two methods, but
of some of the classes in the test images are shown in
the results were still not satisfactory. DeepResUNet was
Figure 9. As can be seen, TL-DenseUNet achieved the best
originally proposed to extract buildings from remote sensing
performance in extracting multiobject from remote sensing
images, which made it difficult to segment complex remote
images, while the other models had more false positives
sensing images. TL-DenseUNet achieved the best OA, kappa
(red) and false negatives (green) in the semantic segmen-
coefficient and MIoU, indicating that it has the best perfor-
tation of each ground object. DeepResUNet displayed the
mance in segmenting remote sensing images with insufficient
worst performance because too many false negatives (green)
and imbalanced labeled data.
appeared in the semantic segmentation result for irrigated
Figure 8 shows the overall visual segmentation results of land. SegNet and UNet improved the segmentation quality
the five models for one test image. As can be seen from these of irrigated land, but still generated more incomplete and
figures, SegNet, UNet, DeepResUNet, and RefineNet had inaccurate segmentation results for arbor forests than did the
difficulties in segmenting ground objects with limited data proposed method; moreover, they also did not distinguish
such as ponds, rivers, and arbor forests. These models rarely well between rivers and ponds. RefineNet yielded relatively
accurately identified the paddy field which has a small-sized good performance in segmenting irrigated land, but most
pond located in it, and were easy to misclassify paddy fields false negatives (green) appeared in the segmentation of arbor
as ponds because they were close to each other. Moreover, forests, indicating that many arbor forests were not accurately
these models also often misclassified rivers as ponds (shown identified. It was clear that these four models did not achieve
in the third row of Figure 8) because these two objects satisfactory results for the segmentation of urban residential.
had high similarity. In contrast, TL-DenseUNet performed TL-DenseUNet not only had fewer false positives and false
relatively better than the other models. With the proposed negatives in the segmentation of rivers, arbor forests and
methods, major parts of the paddy fields can be extracted, and ponds, which had limited data, but it was also able to ex-
paddy fields were rarely misclassified as ponds. Moreover, tract more accurate irrigated land and urban residential from
8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3003914, IEEE Access

B.Cui et al.:Semantic Segmentation of Remote Sensing Images Using Transfer Learning and DCNN with Dense Connection

(a) (b) (c) (d) (e) (f)

FIGURE 9. Visual comparisons of the segmentation results of the five models. Among them, the white, red and green areas represent true positive, false positive
and false negative predictions, respectively (a) Ground Truth. (b) UNet. (c) SegNet. (d) DeepResUNet. (e) RefineNet. (f) TL-DenseUNet.

remote sensing images. These facts further verify that the TABLE 3. OA, Kappa and MIoU obtained by TL-DenseUNet while using
different transfer learning strategies.
transferring DenseNet-121 used in the encoder subnetwork
and the MF module designed in the decoder subnetwork help
TL-DenseUNet perform better than the other state-of-the-art
Method OA(%) Kappa MIoU(%)
models. Strategy-1 70.93 0.5512 41.09
Strategy-2 71.79 0.5607 42.11
D. COMPARISONS OF DIFFERENT TRANSFER Strategy-3 72.01 0.5669 43.08
LEARNING STRATEGIES
Figure 10 shows the loss and accuracy curves for the different
transfer learning strategies on the training of TL-DenseUNet. the parameters of DenseNet-121 for ten epochs and fine-
As shown in Figure 10, freezing DenseNet-121 throughout tuning the entire model for fifty epochs (Strategy-3) displays
the training process and fine-tuning other parts of the model lower value than that for the other two strategies. Moreover,
(Strategy-1) yields relatively high loss and low accuracy. This as shown in Table 3, Strategy-3 achieves the best OA, kappa
result demonstrates that not updating some parameters all coefficient and MIoU of the three strategies. These facts
the time affects the ability of the model to extract multilevel highlight the effectiveness of the transfer learning strategy
semantic features from target data. Fine-tuning the entire net- used in our experiment.
work from the start (Strategy-2) achieves better performance
than the first one. The loss curve for the strategy of freezing V. DISCUSSION
VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3003914, IEEE Access

B.Cui et al.: Semantic Segmentation of Remote Sensing Images Using Transfer Learning and DCNN with Dense Connection

TABLE 4. Comparison of the accuracies among the different variants of


TL-DenseUNet, and the best values are in bold.

Method OA(%) Kappa MIoU(%)


Baseline 65.03 0.4585 29.11
Only TFD 69.25 0.5271 36.23
Only DC 66.91 0.4791 31.26
TL-DenseUNet 72.01 0.5669 43.08

TABLE 5. Comparisons of model complexity.

Number of Training Time Inference Time


Method Parameters (M) (Seconds/Epoch) (ms/Image)
(a)
UNet 31.03 1335 75
SegNet 29.46 1796 111
DeepResUNet 2.79 1815 78
RefineNet 49.25 3075 189
TL-DenseUNet 13.19 2369 170

cient and MIoU have been improved by 1.88%, 0.0206 and


2.15%. It is obvious that the reuse of semantic information
obtained from the previous decoder layers can guide feature
reconstruction of remote sensing images, which ensures the
full use of features to generate more accurate segmentation
maps. When adding both, the OA, kappa coefficient and
(b) MIoU were improved by 6.98%, 0.1084 and 13.97%, which
indicated that using the two strategies simultaneously can
FIGURE 10. Loss (a) and accuracy (b) curves of each epoch obtained by
TL-DenseUNet while using different transfer learning strategies.
further improve the model performance.

B. MODEL COMPLEXITY
A. EFFECTS OF THE TRANSFERRING DENSENET-121 For further analysis, we compared the number of parameters,
AND DENSE CONNECTION training time and inference time with those of UNet, SegNet,
The superior performance of TL-DenseUNet was mainly re- DeepResUNet, and RefineNet. The time required to load the
lated to two strategies: the transferring DenseNet-121 (TFD) pretrained parameters was excluded from the training and
used in the encoder subnetwork and the dense connection inference time. The dataset used in the training stage was the
(DC) used in the MF module. Benefiting from these two same as that used before, and the size of the image used in
strategies, all the evaluation metrics were greatly improved the inference stage was 512 × 512.
compared to those of traditional methods. To demonstrate As shown in Table 5, there are fewer parameters in TL-
that both strategies can improve the semantic segmentation DenseUNet than in most models, except for DeepResUNet.
performance of remote sensing images, we compared the DeepResUNet has the fewest parameters, because it is a
accuracies among the different variants of TL-DenseUNet. lightweight model. However, its performance was much
Note that we used TL-DenseUNet without the transferring poorer than that of our model. TL-DenseUNet follows the
DenseNet-121 (TFD) and dense connection (DC) as our structure of UNet, but it adopts dense connection in both the
baseline. The experimental setup was the same as before. All encoder and decoder subnetworks, which makes it possible
the quantitative evaluation metrics are shown in Table 4. to generate relatively few feature maps in each layer. This
As shown in Table 4, when the transferring DenseNet- is the reason why TL-DenseUNet has fewer parameters.
121 (TFD) was added to the designed model for training, RefineNet has the most parameters and the longest training
the performance was significantly improved. The OA, kappa and inference time, which may be caused by its complex
coefficient and MIoU have been improved by 4.22%, 0.0686 structure. UNet and SegNet require relatively short training
and 7.12%, implying that reusing the pretrained parameters and inference time, because they have simple convolution
from the natural image can greatly improve the model perfor- and pooling operations, which leads to simple gradient flow.
mance, even though the bands of the target remote sensing TL-DenseUNet requires longer time to train and inference
image are different. It is probably because that compared than UNet because the dense blocks used in each layer of
with training from scratch, using the pretrained parameters the encoder subnetwork require more calculations. Moreover,
to initialize TL-DenseUNet can provide guidance for model the dense connections inside TL-DenseUNet lead to the com-
convergence. When TL-DenseUNet added only the dense plexity of gradient propagation, which may have a negative
connection (DC) in the MF module, the OA, kappa coeffi- effect on model training. It is also an aspect for us to improve
10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3003914, IEEE Access

B.Cui et al.:Semantic Segmentation of Remote Sensing Images Using Transfer Learning and DCNN with Dense Connection

in the future. [9] L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data:
A technical tutorial on the state of the art,” IEEE Geoscience and Remote
Sensing Magazine, vol. 4, no. 2, pp. 22–40, 2016.
VI. CONCLUSIONS [10] A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn
In this paper, a novel UNet-based deep convolutional neural features off-the-shelf: an astounding baseline for recognition,” in Proceed-
ings of the IEEE conference on computer vision and pattern recognition
network, called TL-DenseUNet, was proposed to segment workshops, 2014, pp. 806–813.
multiobject from remote sensing images with insufficient [11] P. Liu, X. Liu, M. Liu, Q. Shi, J. Yang, X. Xu, and Y. Zhang, “Building
labeled data and imbalanced data classes. TL-DenseUNet footprint extraction from high-resolution images via spatial residual incep-
tion convolutional neural network,” Remote Sensing, vol. 11, no. 7, p. 830,
adopts a transferring DenseNet-121 and multiple MF mod- 2019.
ules to enhance model performance. Experiments were car- [12] Q. Shi, X. Liu, and X. Li, “Road detection from remote sensing images by
ried out on a remote sensing image dataset with 11 classes. generative adversarial networks,” IEEE access, vol. 6, pp. 25 486–25 494,
2017.
Both visual and quantitative experimental results demon-
[13] M. Lan, Y. Zhang, L. Zhang, and B. Du, “Global context based automatic
strated that the transfer learning strategy can deal with the road segmentation via dilated convolutional neural network,” Information
problem of insufficient and imbalanced samples more effec- Sciences, 2020.
tively, and the MF modules we designed can enhance feature [14] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
for semantic segmentation,” in Proceedings of the IEEE conference on
reuse and information flow. Moreover, our work verified computer vision and pattern recognition, 2015, pp. 3431–3440.
that transferring network parameters from three-band natural [15] J. Sherrah, “Fully convolutional networks for dense semantic labelling of
images to multiband remote sensing images is also effec- high-resolution aerial imagery,” arXiv preprint arXiv:1606.02585, 2016.
[16] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,
tive. However, the overall performance of the proposed TL- “Deeplab: Semantic image segmentation with deep convolutional nets,
DenseUNet was still not so satisfactory. Ground objects with atrous convolution, and fully connected crfs,” IEEE transactions on pat-
similar spectra, such as rivers and ponds, were prone to be tern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017.
[17] K. Bittner, F. Adam, S. Cui, M. Körner, and P. Reinartz, “Building footprint
misclassified. In the future, we will explore an unsupervised
extraction from vhr remote sensing images combined with normalized
transfer learning method, which can leverage large amounts dsms using fused fully convolutional networks,” IEEE Journal of Selected
of unlabeled remote sensing images and reduce the labeling Topics in Applied Earth Observations and Remote Sensing, vol. 11, no. 8,
pp. 2615–2629, 2018.
costs, to achieve more accurate semantic segmentation.
[18] G. Chen, X. Zhang, Q. Wang, F. Dai, Y. Gong, and K. Zhu, “Symmetrical
dense-shortcut deep fully convolutional networks for semantic segmen-
ACKNOWLEDGMENT tation of very-high-resolution remote sensing images,” IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing,
The authors would like to thank all reviewers and editors for vol. 11, no. 5, pp. 1633–1644, 2018.
their comments on this paper. The authors would also like [19] S. Chakraborty, V. Balasubramanian, Q. Sun, S. Panchanathan, and J. Ye,
to thank the Information Science Department of the National “Active batch selection via convex relaxations with guaranteed solution
bounds,” IEEE transactions on pattern analysis and machine intelligence,
Natural Science Foundation of China (NSFC) for providing vol. 37, no. 10, pp. 1945–1958, 2015.
the test dataset. [20] J. Ding, B. Chen, H. Liu, and M. Huang, “Convolutional neural network
with data augmentation for sar target recognition,” IEEE Geoscience and
remote sensing letters, vol. 13, no. 3, pp. 364–368, 2016.
REFERENCES [21] B. Pan, Z. Shi, and X. Xu, “R-vcanet: A new deep-learning-based hyper-
[1] Y. Xu, L. Wu, Z. Xie, and Z. Chen, “Building extraction in very high spectral image classification method,” IEEE Journal of selected topics in
resolution remote sensing imagery using deep learning and guided filters,” applied earth observations and remote sensing, vol. 10, no. 5, pp. 1975–
Remote Sensing, vol. 10, no. 1, p. 144, 2018. 1986, 2017.
[2] G. Cheng, F. Zhu, S. Xiang, Y. Wang, and C. Pan, “Accurate urban road [22] Y. Yi, Z. Zhang, W. Zhang, C. Zhang, W. Li, and T. Zhao, “Semantic
centerline extraction from vhr imagery via multiscale segmentation and segmentation of urban buildings from vhr remote sensing imagery using
tensor voting,” Neurocomputing, vol. 205, pp. 407–420, 2016. a deep convolutional neural network,” Remote Sensing, vol. 11, no. 15, p.
[3] X. X. Zhu, D. Tuia, L. Mou, G.-S. Xia, L. Zhang, F. Xu, and F. Fraundorfer, 1774, 2019.
“Deep learning in remote sensing: A comprehensive review and list of [23] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are
resources,” IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 4, features in deep neural networks?” in Advances in neural information
pp. 8–36, 2017. processing systems, 2014, pp. 3320–3328.
[4] Y. Liu, B. Fan, L. Wang, J. Bai, S. Xiang, and C. Pan, “Semantic labeling [24] S. Akçay, M. E. Kundegorski, M. Devereux, and T. P. Breckon, “Transfer
in very high resolution images via a self-cascaded convolutional neural learning using convolutional neural networks for object classification
network,” ISPRS journal of photogrammetry and remote sensing, vol. 145, within x-ray baggage security imagery,” in 2016 IEEE International Con-
pp. 78–95, 2018. ference on Image Processing (ICIP). IEEE, 2016, pp. 1057–1061.
[5] L. Matikainen and K. Karila, “Segment-based land cover mapping of a [25] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies
suburban area¡ªcomparison of high-resolution remotely sensed datasets for accurate object detection and semantic segmentation,” in Proceedings
using classification trees and test field points,” Remote Sensing, vol. 3, of the IEEE conference on computer vision and pattern recognition, 2014,
no. 8, pp. 1777–1804, 2011. pp. 580–587.
[6] J. V. Solórzano, J. A. Meave, J. A. Gallardo-Cruz, E. J. González, and J. L. [26] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring
Hernández-Stefanoni, “Predicting old-growth tropical forest attributes mid-level image representations using convolutional neural networks,”
from very high resolution (vhr)-derived surface metrics,” International in Proceedings of the IEEE conference on computer vision and pattern
Journal of Remote Sensing, vol. 38, no. 2, pp. 492–513, 2017. recognition, 2014, pp. 1717–1724.
[7] T. Shi, Q. Xu, Z. Zou, and Z. Shi, “Automatic raft labeling for remote sens- [27] B. Pan, Z. Shi, X. Xu, T. Shi, N. Zhang, and X. Zhu, “Coinnet: Copy
ing images via dual-scale homogeneous convolutional neural network,” initialization network for multispectral imagery semantic segmentation,”
Remote Sensing, vol. 10, no. 7, p. 1130, 2018. IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 5, pp. 816–820,
[8] L. Ma, Y. Liu, X. Zhang, Y. Ye, G. Yin, and B. A. Johnson, “Deep 2018.
learning in remote sensing applications: A meta-analysis and review,” [28] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:
ISPRS journal of photogrammetry and remote sensing, vol. 152, pp. 166– A large-scale hierarchical image database,” in 2009 IEEE conference on
177, 2019. computer vision and pattern recognition. Ieee, 2009, pp. 248–255.

VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3003914, IEEE Access

B.Cui et al.: Semantic Segmentation of Remote Sensing Images Using Transfer Learning and DCNN with Dense Connection

[29] D. Marmanis, M. Datcu, T. Esch, and U. Stilla, “Deep learning earth BINGE CUI received B.Sc., M.Sc., and Ph.D.
observation classification using imagenet pretrained networks,” IEEE Geo- degrees in computer science from Harbin Engi-
science and Remote Sensing Letters, vol. 13, no. 1, pp. 105–109, 2015. neering University, Harbin, China, in 2000, 2003,
[30] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image and 2006, respectively. In 2006, he joined the
recognition,” in Proceedings of the IEEE conference on computer vision College of Computer Science and Engineering,
and pattern recognition, 2016, pp. 770–778. Shandong University of Science and Technology.
[31] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely From 2010 to 2011, he was a Visiting Scholar
connected convolutional networks,” in Proceedings of the IEEE conference
with the Department of Information System, City
on computer vision and pattern recognition, 2017, pp. 4700–4708.
University of Hong Kong. From 2012 to 2014,
[32] M. D. Zeiler, G. W. Taylor, and R. Fergus, “Adaptive deconvolutional
networks for mid and high level feature learning,” in 2011 International he was a Postdoctoral Researcher with the First
Conference on Computer Vision. IEEE, 2011, pp. 2018–2025. Institute of Oceanography, State Oceanic Administration, China.
[33] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning He is currently a Professor. His research interests include hyperspectral
applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, image classification and remote sensing image analysis with deep learning.
pp. 2278–2324, 1998.
[34] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” in International Conference on Medi-
cal image computing and computer-assisted intervention. Springer, 2015,
pp. 234–241.
[35] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep con-
volutional encoder-decoder architecture for image segmentation,” IEEE
transactions on pattern analysis and machine intelligence, vol. 39, no. 12,
pp. 2481–2495, 2017.
[36] G. Lin, A. Milan, C. Shen, and I. Reid, “Refinenet: Multi-path refinement
networks for high-resolution semantic segmentation,” in Proceedings of
the IEEE conference on computer vision and pattern recognition, 2017,
pp. 1925–1934. XIN CHEN received the bachelor’s degree from
[37] C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun, “Large kernel matters–
Jiangsu Ocean University, Lianyungang, China,
improve semantic segmentation by global convolutional network,” in
in 2018. He is currently pursuing the master’s
Proceedings of the IEEE conference on computer vision and pattern
recognition, 2017, pp. 4353–4361. degree with College of Computer Science and
[38] W. Zhang, Y. Zhu, and Q. Fu, “Semi-supervised deep transfer learning- Engineering, Shandong University of Science and
based on adversarial feature learning for label limited sar target recogni- Technology, Qingdao, China.
tion,” IEEE Access, vol. 7, pp. 152 412–152 420, 2019. His current research interests include deep
[39] T. Panboonyuen, K. Jitkajornwanich, S. Lawawirojwong, P. Srestasathiern, learning and remote sensing image processing.
and P. Vateekul, “Semantic segmentation on remotely sensed images using
an enhanced global convolutional network with channel attention and
domain specific transfer learning,” Remote Sensing, vol. 11, no. 1, p. 83,
2019.
[40] Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and R. Feris, “Spottune:
transfer learning through adaptive fine-tuning,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2019, pp. 4805–
4814.
[41] M. Long, Y. Cao, J. Wang, and M. I. Jordan, “Learning transferable fea-
tures with deep adaptation networks,” arXiv preprint arXiv:1502.02791,
2015.
[42] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2015, pp. 1–9.
[43] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep YAN LU received the B.Sc. and M.Sc. degrees in
network training by reducing internal covariate shift,” arXiv preprint Computer Science from Yanshan University, Qin-
arXiv:1502.03167, 2015. huangdao, China, in 1998 and 2000, and the Ph.D.
[44] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural degree in Computer Science from Fudan Univer-
networks,” in Proceedings of the fourteenth international conference on sity, Shanghai, China, in 2003. From 2003 to 2005,
artificial intelligence and statistics, 2011, pp. 315–323. she was a Post-Doctoral Researcher in Computer
[45] N. K. Manaswi, N. K. Manaswi, and S. John, Deep Learning with Appli- Science and Technology at Harbin Institute of
cations Using Python. Springer, 2018. Technology. In 2005, she joined the College of
[46] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, Information Science and Engineering, Shandong
S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: A system for large- University of Science and Technology, Qingdao,
scale machine learning,” in 12th {USENIX} Symposium on Operating China. She is currently an Associate Professor. Her research interests include
Systems Design and Implementation ({OSDI} 16), 2016, pp. 265–283. hyperspectral image classification and object detection.
[47] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
arXiv preprint arXiv:1412.6980, 2014.

12 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy