0% found this document useful (0 votes)
65 views28 pages

A Review On Deep Learning in UAV Remote Sensing

The document reviews deep learning techniques applied to images collected by sensors on unmanned aerial vehicles. It examines 232 articles on classification and segmentation methods using data from UAVs. The review focuses on environmental, urban and agricultural applications of deep learning to UAV remote sensing images.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views28 pages

A Review On Deep Learning in UAV Remote Sensing

The document reviews deep learning techniques applied to images collected by sensors on unmanned aerial vehicles. It examines 232 articles on classification and segmentation methods using data from UAVs. The review focuses on environmental, urban and agricultural applications of deep learning to UAV remote sensing images.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

A Review on Deep Learning in UAV Remote Sensing

Preprint, compiled August 22, 2023


Lucas Prado Osco ID 1 , José Marcato Junior ID 2 , Ana Paula Marques Ramos ID 3 , Lúcio André de Castro Jorge ID 4
,
Sarah Narges Fatholahi ID 5 , Jonathan de Andrade Silva ID 6 , Edson Takashi Matsubara ID 6 , Hemerson Pistori ID 7
,
Wesley Nunes Gonçalves ID 6 , and Jonathan Li ID 5
1
Faculty of Engineering and Architecture and Urbanism, University of Western São Paulo (UNOESTE), Rod. Raposo Tavares, km 572, Limoeiro, Presidente Prudente 19067-175, SP,
Brazil; lucasosco@unoeste.br; pradoosco@gmail.com
2
Faculty of Engineering, Architecture and Urbanism and Geography, Federal University of Mato Grosso do Sul (UFMS), Av. Costa e Silva-Pioneiros, Cidade Universitária, Campo
Grande 79070-900, MS, Brazil; jose.marcarto@ufms.br
3
Departament of Cartography, São Paulo State University (UNESP), Centro Educacional, R. Roberto Simonsen, 305, Presidente Prudente, 19060-900, SP, Brazil;
marques.ramos@unesp.br
arXiv:2101.10861v4 [cs.CV] 20 Aug 2023

4
National Research Center of Development of Agricultural Instrumentation, Brazilian Agricultural Research Agency, R. XV de Novembro, 1452, São Carlos, 13560-970, SP, Brazil;
lucio.jorge@embrapa.br
5
Department of Geography and Environmental Management, University of Waterloo, Waterloo, ON N2L 3G1, Canada; nfatholahi@uwaterloo.ca, junli@uwaterloo.ca
6
Faculty of Computing, Federal University of Mato Grosso do Sul (UFMS), Av. Costa e Silva-Pioneiros, Cidade Universitária, Campo Grande 79070-900, MS, Brazil;
jonathan.andrade@ufms.br, edsontm@facom.ufms.br, wesley.goncalves@ufms.br
7
Inovisão, Catholic University of Dom Bosco, Av. Tamandaré, 6000, Campo Grande, 79117-900, MS, Brazil; pistori@ucdb.br

Abstract
Deep Neural Networks (DNNs) learn hierarchical representations from data, bringing significant advances in image
processing, and time-series analysis, as well as in natural language, audio, video, and many others. In the field of remote
sensing, research and literature reviews specifically involving DNN applications have been conducted to summarize the
amount of information produced. Recently, applications based on Unmanned Aerial Vehicles (UAVs) have stood out in aerial
sensing research, as they allow for fast, less costly data collection at high spatial resolution. However, a literature review that
combines the themes of "Deep Learning" (DL) and "remote sensing with UAVs" has not yet been conducted. The motivation
for our work was to present a review of the fundamentals of DL applied to images collected by sensors onboard these aircraft.
We especially present a description of the classification and segmentation techniques used in recent applications with data
acquired by UAVs. For this, a total of 232 articles published in international scientific journal databases were examined.
We gathered all this material and evaluated its characteristics in relation, for example, to the application, sensor, and type
of network used. We relate how DL presents promising results and has the potential for processing tasks associated with
aerial image data collected by UAVs. Finally, we project future perspectives, commenting on the prominent paths of DL
to be explored in aerial remote sensing. Our review consists of a simplistic and objective approach to present, comment
and summarize the state of the art in applications of sub-meter spatial resolution images with DNNs in various subfields of
remote sensing, grouping them in the environmental, urban, and agricultural contexts.

1 Introduction operation. Currently, the real challenge in remote sensing ap-


proaches is to obtain automatic, rapid, and accurate information
from this type of data. In recent years, the advent of Deep Learn-
For investigations using remote sensing image data, multiple
ing (DL) techniques has offered robust and intelligent methods
processing tasks depend on computer vision algorithms. In the
to improve the mapping of the Earth’s surface.
past decade, applications conducted with statistical and Ma-
chine Learning (ML) algorithms were mainly used in classifica- DL is an Artificial Neural Network (ANN) method with multiple
tion/regression tasks. The increase of remote sensing systems hidden layers and deeper combinations, which is responsible for
allowed a wide collection of data from any target on the Earth’s optimizing and returning better learning patterns than a common
surface. Aerial imaging has become a common approach to ANN. There is an impressive amount of revision material in the
acquiring data with the advent of Unnamed Aerial Vehicles scientific journals explaining DL-based techniques, its histori-
(UAV). These are also known as Remotely Piloted Aircrafts cal evolution, general usage, as well as detailing networks and
(RPA), or, as a commonly adopted term, drones (multi-rotor, functions. Highly detailed publications, such as Lecun [113]
fixed wings, hybrid, etc). These devices have grown in market and Goodfellow [69] are both considered important material
availability for their relatively low cost and high operational in this area. As computer processing and labeled examples
capability to capture images quickly and in an easy manner. The (i.e. samples) became more available in recent years, the per-
high-spatial-resolution of UAV-based imagery and its capacity formance of Deep Neural Networks (DNNs) increased in the
for multiple visits allowed the creation of large and detailed image-processing applications. DNN has been successfully
amounts of datasets to be dealt with. applied in data-driven methods. However, much needs to be
covered to truly understand its potential, as well as its limita-
The surface mapping with UAV platforms presents some advan-
tions. In this regard, several surveys on the application of DL
tages compared to orbital and other aerial sensing methods of
in remote sensing were developed in both general and specific
acquisition. Less atmospheric interference, the possibility to fly
contexts to better explain its importance.
within lower altitudes, and mainly, the low operational cost have
made this acquisition system popular in both commercial and The context in which remote sensing literature surveys are pre-
scientific explorations. However, the visual inspection of multi- sented is variated. Zhang et al. [203] organized a revision
ple objects can still be a time-consuming, biased, and inaccurate
Preprint – A Review on Deep Learning in UAV Remote Sensing 2

material which explains how DL methods were being applied, at analysis with DNNs’ methods. As of recently, literature revision
the time, to image classification tasks. Later, Cheng et al. [39] focused on more specific approaches within this theme. Some of
investigated object detection in optical images, but focused more which included DL methods for enhancement of remote sensing
on the traditional ANN and ML. A complete and systematic re- observations, as super-resolution, denoising, restoration, pan-
view was presented by Ball et al. [12] in a survey describing DL sharpening, and image fusion techniques, as demonstrated by
theories, tools, and its challenges in dealing with remote sensing Tsagkatakis et al. [186] and Signoroni et al. [172]. Also, a
data. Cheng et al. [40] produced a revision on image classi- meta-analysis by Ma et al. [128] was performed concerning the
fication with examples produced at their experiments. Also, usage of DL algorithms in seven subfields of remote sensing:
focusing on classification, Zhu et al. [215] summarized most of image fusion and image registration, scene classification, ob-
the current information to understand the DL methods used for ject detection, land use and land cover classification, semantic
this task. Additionally, a survey performed by Li et al. [114] segmentation, and object-based image analysis (OBIA).
helped to understand some DL applications regarding the overall
Although, from these recent reviews, various remote sensing
performance of DNNs in publicly available datasets for image
applications using DL can be verified, it should be noted that
classification task. Yao et al. [200] stated in their survey that
the authors did not focus on specific surveying in the context of
DL will become the dominant method of image classification in
DL algorithms applied to UAV-image sets, which is something
remote sensing community.
that, at the time of writing, has gained the attention of remote
Although DL does provide promising results, many observa- sensing investigations. We verified in the literature that, in
tions and examinations are still required. Interestingly enough, general, similar DL methods are used for imagery acquired
multiple remote sensing applications using hyperspectral im- at different levels, resolutions and domains, such as the ones
agery (HSI) data were in the process, which gained attention. from orbital, aerial, terrestrial and proximal sensing platforms.
In Petersson et al. [152], probably one of the first surveys on However, as of recently, some of the proposed deep neural
hyperspectral data was performed. In [172], is presented a mul- networks are maintaining high resolution images into deeper
tidisciplinary review about how DL models have been widely layers [101]. This type of deep networks may benefit from UAV-
used in the field of HSI dataset processing. These authors high- based data, taking advantage of its resolutions. Indeed, there
lighted that, among the distinct areas of applications, remote are orbital images with high spatial resolutions, but these are
sensing approaches are one of the most emerging. Regarding not as commonly available to the general public as UAV-based
the use of DL models to process highly detailed remotely sensed images. Because of that, these kinds of architectures associated
HSI data, Signoroni et al. [172] summarized usage into clas- with UAV-based data may be a surging trend in remote sensing
sification tasks, object detection, semantic segmentation, and applications.
data enhancement, such as denoising, spatial super-resolution,
Another interesting take on DL-based methods was related to
and fusion. Adão et al. [1] present a recent review on hyper-
image segmentation in a survey by Hossain et al. [83], which
spectral imaging acquired by UAV-based sensors for agriculture
its theme was expanded by Yuan et al. [202] and included state-
and forestry applications, and show that there are manifold DL
of-the-art algorithms. A summarized analysis by Zheng et al.
approaches to deal with HSI dataset complexity.
[213] focused on remote sensing images with object detection
A more recent survey is presented by Jia et al. [98] regarding DL approaches, indicating some of the challenges related to the
for hyperspectral image classification considering few labeled detection with few labeled samples, multi-scale issues, network
samples. They commentate how there is a notable gap between structure problems, and cross-domain detection difficulties. In
deep learning models and HSI datasets because DL models usu- more of a “niche” type of research, environmental applications
ally need sufficient labeled samples, but it is generally difficult and land surface change detection were investigated in literature
to acquire many samples in HSI dataset due to the difficulty revision papers by Yuan et al. [201] and Khelifi et al. [106],
and time-consuming nature of manual labeling. However, the respectively.
issues of small-sample sets may be well defined by the fusion of
The aforementioned studies were evaluated with a text process-
deep learning methods and related techniques, such as transfer
ing method that returned a word cloud in which the word size
learning and a lightweight model. Deep learning is also a new
denotes the frequency of the word within these papers (Fig. 1).
approach for the domain of infrared thermal imagery processing
An interesting observation regarding this world-cloud is that the
to attend different domains, especially in satellite-provided data.
term “UAV” is under or not represented at all. This revision
Some of these applications are the usage of convolutional layers
gap is a problem since UAV image data is daily produced in
to detect potholes on roads with terrestrial imagery [5], detec-
large amounts, and no scientific investigation appears to offer
tion of land surface temperatures from combined multispectral
a comprehensive literature revision to assist new research on
and microwave observations from orbital platforms [193], or
this matter. In the UAV context, there are some revision pa-
determining sea surface temperature patterns to identify ocean
pers published in important scientific journals from the remote
temperatures extremes [196] from orbital imagery.
sensing community. As of recently, a revision-survey [23] fo-
Yet in the literature revision theme, a comparative review by cused on the implications of ML methods being applied to UAV
Audebert et al. [8] was conducted by examining various families image processing, but no investigation was conducted on DL
of networks’ architectures while providing a toolbox to perform algorithms for this particular issue. This is an important theme,
such methods to be publicly available. In this regard, another especially since UAV platforms are more easily available to
paper written by Paoletti et al. [149] organized the source code the public and DL-based methods are being tested to provide
of DNNs to be easily reproduced. Similar to [40], Li et al. [115] accurate mapping in highly detailed imagery.
conducted a literature revision while presenting an experimental
Preprint – A Review on Deep Learning in UAV Remote Sensing 3

1. A presentation of fundamental ideas behind the DL


models, including classification, object detection, and
semantic segmentation approaches; as well as the ap-
plication of these concepts to attend UAV-image based
mapping tasks;
2. The examination of published material in scientific
sources regarding sensors types and applications, cate-
gorized in environmental, urban, and agricultural map-
ping contexts;
3. The organization of publicly available datasets from
previous researches, conducted with UAV-acquired
data, also labeled for both object detection and seg-
mentation tasks;
4. A description of the challenges and future perspectives
of DL-based methods to be applied with UAV-based
Figure 1: Word-cloud of different literature-revision papers related to image data.
the “remote sensing” and “deep learning” themes.

2 Deep Neural Networks Overview


DNNs are based on neural networks which are composed of
As mentioned, UAVs offer flexibility in data collection, as
neurons (or units) with certain activations and parameters that
flights are programmed under users’ demand; they are low-cost
transform input data (e.g., UAV remote sensing image) to out-
when compared to other platforms that offer similar spatial-
puts (e.g., land use and land cover maps) while progressively
resolution images; produce high-level of detail in its data collec-
learning higher-level features [128, 167]. This progressive fea-
tion; presents dynamic data characteristics since it is possible to
ture learning occurs, among others, on layers between the input
embed RGB, multispectral, hyperspectral, thermal and, LiDAR
and the output, which are referred to as hidden layers [128].
sensors on it; and are capable of gathering data from difficult to
DNNs are considered as a DL method in their most traditional
access places. Aside from that, sensors embedded in UAVs are
form (i.e. with 2 or more hidden layers). Their concept, based
known to generate data at different altitudes and point-of-views.
on an Artificial Intelligence (AI) modeled after the biological
These characteristics, alongside others, are known to produce a
neurons’ connections, exists since the 1950s. But only later,
higher dynamic range of images than common sensing systems.
with advances in computer hardware and the availability of a
This ensures that the same object is viewed from different angles,
high number of labeled examples, its interest has resurged in
where not only their spatial and spectral information is affected,
major scientific fields. In the remote sensing community, the
as well as form, texture, pattern, geometry, illumination, etc.
interest in DL algorithms has been gaining attention since mid
This becomes a challenge for multidomain detection. As such,
2010s decade, specifically because these algorithms achieved
studies indicate that DL is the most prominent solution for deal-
significant success at digital image processing tasks [128, 105].
ing with these disadvantages. These studies, which most are
presented in this revision paper, were conducted within a series A DNN works similarly to an ANN, when as a supervised algo-
of data criteria and evaluated DL architectures in classifying, rithm, uses a given number of input features to be trained, and
detecting, and segmenting various objects from UAV scenes. that these feature observations are combined through multiple
operations, where a final layer is used to return the desired pre-
To the best of our knowledge, there is a literature gap related to
diction. Still, this explanation does not do much to highlight
review articles combining both “deep learning” and “UAV re-
the differences between traditional ANNs and DNNs. LeCun
mote sensing” thematics. This survey is important to summarize
et. al. [113], the paper amongst the most cited articles in DL
the direction of DL applications in the remote sensing commu-
literature, defines DNN as follows: “Deep-learning methods are
nity, particularly related to UAV-imagery. The purpose of this
representation-learning methods with multiple levels of repre-
study is to provide a brief review of DL methods and their ap-
sentation”. Representation-learning is a key concept in DL. It
plications to solve classification, object detection, and semantic
allows the DL algorithm to be fed with raw data, usually unstruc-
segmentation problems in the remote sensing field. Herein, we
tured data such as images, texts, and videos, to automatically
discuss the fundamentals of DL architectures, including recent
discover representations.
proposals. There is no intention of summarizing existing litera-
ture, but to present an examination of DL models while offering The most common DNNs (Fig. 2) are generally composed of
the necessary information to understand the state-of-the-art in dense layers, wherein activation functions are implemented in.
which it encounters. Our revision is conducted highlighting Activation functions compute the weighted sum of input and
traits about the UAV-based image data, their applications, sensor biases, which is used to decide if a neuron can be activated or not
types, and techniques used in recent approaches in the remote [141]. These functions constitute decision functions that help in
sensing field. Additionally, we relate how DL models present learning intrinsic patterns [105]; i.e., they are one of the main
promising results and project future perspectives of prominent aspects of how each neuron learns from its interaction with the
paths to be explored. In short, this paper brings the following other neurons. Known as a piecewise linear function type, ReLu
contributions: defines the 0 valor for all negative values of X. This function is,
Preprint – A Review on Deep Learning in UAV Remote Sensing 4

at the time of writing, the most popular in current DNNs mod- used. Also, one prominent type of architecture is the CNN-
els. Regardless, another potential activation function recently LSTM method (Fig. 4). This network uses convolutional layers
explored is Mish, a self regularized non-monotonic activation to extract important features from the given input image and
function [105]. Aside from the activation function, another im- feed the LSTM. Although few studies implemented this type
portant information on how a DNN works is related to its layers, of network, it should be noted that it serves specific purposes,
such as dropout, batch-normalization, convolution, deconvolu- and its usage, for example, can be valued for multitemporal
tion, max-pooling, encode-decode, memory cells, and others. applications.
This layer is regularly used to solve issues with covariance-shift
As aforementioned, other types of neural networks, aside from
within feature-maps [105]. The organization in which the lay-
CNNs and RNNs, are currently being proposed to also deal with
ers are composed, as well as its parameters, is one of the main
an image type of data. GANs are amongst the most innovative
aspects of the architecture.
unsupervised DL models. GANs are composed of two networks:
Multiple types of architectures were proposed in recent years to generative and discriminative, that contest between themselves.
improve and optimize DNNs by implementing different kinds The generative network is responsible for extracting features
of layers, optimizers, loss functions, depth-level, etc. However, from a particular data distribution of interest, like images, while
it is known that one of the major reasons behind DNNs’ popu- the discriminative network distinguishes between real (reference
larity today is also related to the high amount of available data or ground truth data) and those data generated by the generative
to learn from it. A rule of thumb conceived among data scien- part of GANs (fake data) [68, 128]. Recently approaches in
tists indicates that at least 5,000 labeled examples per category the image processing context like the classification of remote
was recommended [69]. But, as of today, DNNs’ proposals sensing images [123] and image-to-image translation problems
focused on improving these network’s capacities to predict fea- solution [96] adopted GANs as DL model, obtaining successful
tures with fewer examples than that. Some applications which results.
are specifically oriented may benefit from it, as it reduces the
In short, several DNNs are constantly developed, in both sci-
amount of labor required at sample collection by human inspec-
entific and/or image competition platforms, to surpass existing
tion. Even so, it should be noted that, although this pursuit is
methods. However, as each year passes, some of these neural
being conducted, multiple takes are performed by the vision
networks are often mentioned, remembered, or even improved
computer communities and novel research includes methods for
by novel approaches. A summary of well-known DL methods
data-augmentation, self-supervising, and unsupervised learning
built in recent years is presented in Fig. 5. A detailed take on
strategies, as others. A detailed discussion of this manner is
this, which we recommend to anyone interested, is found in
presented in [105].
Khan et al. [105]. Alongside the creations and developments of
these and others, researchers observed that higher depth chan-
2.1 Convolutional and Recurrent Neural Networks nel exploration, and, as of recently proposed, attention-based
feature extraction neural networks, are regarded as some of the
A DNN can be formed by different architectures, and the com- most prominent approaches for DL. Initially, most of the pro-
plexity of the model is related to how each layer and additional posed supervised DNNs, like CNN and RNN, or CNN-LSTM
computational method is implemented. Different DL architec- models, were created to perform and deal with specific issues.
tures are proposed regularly, Convolutional Neural Networks Often, these approaches can be grouped into classification tasks,
(CNN), Recurrent Neural Networks (RNN), and Deep Belief like scene-wise classification, object detection, semantic and
Networks (DBN) [12], and, more recently yet, Generative Adver- instance segmentation (pixel-wise), and regression tasks.
sarial Networks (GAN) [69]. However, the most common DNNs
in the supervised networks categories are usually classified as
CNNs (Fig. 3) and RNNs [105]. 2.2 Classification and Regression Approaches
As a different kind of DL network structure, RNNs refer to When considering remote sensing data processed with DL-based
another supervised learning model. The main idea behind im- algorithms, the following tasks can be highlighted: scene-wise
plementing RNNs regards their capability of improving their classification, semantic and instance segmentation, and object
learning by repetitive observations of a given phenom or object, detection. Scene-wise classification involves assigning a class
often associated with a time-series collection. A type of RNN label to each image (or patch), while the object detection task
being currently implemented in multiple tasks is the Long Short- aims to draw bounding boxes around objects in an image (or
Term Memory (LSTM)[81]. In the remote sensing field, RNN patch) and labeling each of them according to the class label.
models have been applied to deal with time series tasks analysis, Object detection can be considered a more challenging task
aiming to produce, for example, land cover mapping [93, 84]. since it requires to locate the objects in the image and then per-
For a pixel-based time series analysis aiming to discriminate form their classification. Another manner to detect objects in an
classes of winter vegetation coverage using SAR Sentinel-1 image, instead of drawing bounding boxes, is to draw regions
[84], it was verified that RNN models outperformed classical or structures around the boundary of objects, i.e., distinguish
ML approaches. A recent approach [56] for accurate vegetation the class of the object at the pixel level. This task is known as
mapping combined multiscale CNN to extract spatial features semantic segmentation. However, in semantic segmentation, it
from UAV-RGB imagery and then fed an attention-based RNN is not possible to distinguish multiple objects of the same cate-
to establish the sequential dependency between multitemporal gory, as each pixel receives one class label [195]. To overcome
features. The aggregated spatial-temporal features are used to this drawback, a task that combines semantic segmentation and
predict the vegetable category. Such examples with remote sens- object detection named instance segmentation was proposed to
ing data demonstrate the potential in which RNNs are being detect multiple objects in pixel-level masks and labeling each
Preprint – A Review on Deep Learning in UAV Remote Sensing 5

Figure 2: A DNN architecture. This is a simple example of how a DNN may be built. Here the initial layer (Xinput ) is composed of the collected
data samples. Later this data information can be extracted by hidden layers in a back-propagation manner, which is used by subsequent hidden
layers to learn these features’ characteristics. In the end, another layer is used with an activation function related to the given problem
(classification or regression, as an example), by returning a prediction outcome (Ylabel ).

Figure 3: A CNN type of architecture with convolution and deconvolution layers. This example architecture is formed by convolutional layers,
where a dropout layer is added between each conv layer, and a max-pooling layer is adopted each time the convolution window-size is decreased.
By the end of it, a deconvolutional layer is used with the same size as the last convolutional, and then it uses information from the previous step
to reconstruct the image with its original size. The final layer is of a softmax, where it returns the models’ predictions.

mask with a class label [180, 36]. The instance segmentation, son to classification, the regression task using DL is not often
however, consists of a method that, while classifying the image used; however, recent publications have shown its potential in
with this pixel-wise approach, is able to individualize objects remote sensing applications. One approach [111] performed a
[170]. comprehensive analysis of deep regression methods and pointed
out that well-known fine-tuned networks, like VGG-16 [192]
To produce a deep regression approach, the model needs to be
and ResNet-50 [75], can provide interesting results. These meth-
adapted so that the last fully-connected layer of the architecture
ods, however, are normally developed for specific applications,
is changed to deal with a regression problem instead of a com-
which is a drawback for general-purpose solutions. Another
mon classification one. With this adaptation, continuous values
important point is that depending on the application, not always
are estimated, differently from classification tasks. In compari-
Preprint – A Review on Deep Learning in UAV Remote Sensing 6

Figure 4: An example of a neural network based on the CNN-LSTM type of architecture. The input image is processed with convolutional
layers, and a max-pooling layer is used to introduce the information to the LSTM. Each memory cell is updated with weights from the previous
cell. After this process, one may use a flatten layer to transform the data in an arrangement to be read by a dense (fully-connected) layer,
returning a classification prediction, for instance.

deep regression succeeds. A strategy is to discretize the out- Object detection methods can be described into two mainstream
put space and consider it as a classification solution. For UAV categories: one-stage detectors (or regression-based methods)
remote sensing applications, the strategy of using well-known and two-stage detectors (or region proposal-based methods)
networks is in general adopted. Not only VGG-16 and ResNet- [212, 126, 195]. The usual two-stage object detection pipeline
50, as investigated by [111], but also other networks including is to generate region proposals (candidate rectangular bounding
AlexNet [108] and VGG-11 have been used. An important issue boxes) on the feature map. It then classifies each one into an
that could be investigated in future research, depending on the object class label and refines the proposals with a bounding box
application, is the optimizer. Algorithms with adaptive learning regression. A widely used strategy in the literature to generate
rates such as AdaGrad, RMSProp, AdaDelta (an extension of proposals was proposed with the Faster-RCNN algorithm with
AdaGrad), and Adam are among the commonly used. the Region Proposal Network (RPN) [212]. Other state-of-the-
art representatives of such algorithms are Cascade-RCNN [32],
Trident-Net [185], Grid-RCNN [71], Dynamic-RCNN [52], De-
tectoRS [44]. As for one-stage detectors, they directly make a
2.2.1 Scene-Wise Classification, Object Detection, and classification and detect the location of objects without a region
Segmentation proposal classification step. This reduced component achieves
a high detection speed for the models but tends to reduce the
Scene-wise classification or scene recognition refers to methods accuracy of the results. These are known as region-free detec-
that associate a label/theme for one image (or patch) based on tors since they typically use cell grid strategies to divide the
numerous images, such as in agricultural scenes, beach scenes, image and predict the class label of each one. Besides that, some
urban scenes, and others [219, 128]. Basic DNNs methods were detectors may serve for both one-stage and two-stage categories.
developed for this task, and they are among the most common
networks for traditional image recognition tasks. In remote sens- Object detection-based methods can be described in three com-
ing applications, scene-wise classification is not usually applied. ponents: a) backbone, which is responsible to extract semantic
Instead, most applications benefit more from object detection features from images; b) the neck, which is an intermediate com-
and pixel-wise semantic segmentation approaches. For scene- ponent between the backbone and the head components, used
wise classification, the method needs only the annotation of the to enrich the features obtained by the backbone, and; c) head
class label of the image, while other tasks like object detection component, which performs the detection and classification of
method needs a drawn of a bounding box for all objects in an the bounding boxes.
image, which makes it more costly to build labeled datasets. For The backbone is a CNN that receives as input an image and
instance or semantic segmentation, the specialist (i.e., the person outputs a feature map that describes the image with semanti-
who performs the annotation or object labeling) needs to draw cally features. In the DL, the state-of-the-art is composed of
a mask involving each pixel of the object, which needs more the following backbones: VGG [192], ResNet [160], ResNeXt
attention and precision in the annotation task, reducing, even [161], HRNet [88], RegNet [157], Res2Net [158], and ResNesT
more, the availability of datasets. Fig. 6 shows the examples [159]. The neck component combines in several scales low-
of both annotation approaches (object detection and instance resolution and semantically strong features, capable of detecting
segmentation).
Preprint – A Review on Deep Learning in UAV Remote Sensing 7

Figure 5: A DL time-series indicating some popular architectures implemented in image classification (yellowish color), object detection
(greenish color), and segmentation (bluish color). These networks often intertwine, and many adaptations have been proposed for them.
Although it may appear that most of the DL methods were developed during 2015-2017 annuals, it is important to note that, as some, novel deep
networks use most of the already developed methods as backbones, or accompanied from other types of architectures, mainly used as the feature
extraction part of a much more complex structure.

large objects, with high-resolution and semantically weak fea- moving RPN and adding a classification subnet and a bounding
tures, capable of detecting small objects, which is done with the box regression subnet. The head component is responsible for
lateral and top-down connections of the convolutional layers of the detection of the objects with the softmax classification layer,
the Feature Pyramid Network (FPN) [60], and its variants like which produces probabilities for all classes and a regression
PAFPN [146] and NAS-FPN [136]. Although FPN was origi- layer to predict the relative offset of the bounding box positions
nally designed to be a two-stage method, the methods’ purpose with the ground truth.
was a manner to use the FPN on single-stage detectors by re-
Preprint – A Review on Deep Learning in UAV Remote Sensing 8

Figure 6: Labeled examples. The first-row consists of a bounding-box type of object detection approach label-example to identify individual
tree-species in an urban environment. The second-row is a labeled-example of instance segmentation to detect rooftops in the same environment.

Despite the differences in object detectors (one or two-stage), shaped bounding box (described by nine sampling points) to
their universal problem consists of dealing with a large gap improve the location of objects.
between positive samples (foreground) and negative samples
Regarding semantic segmentation and instance segmentation
(background) during training, i.e class imbalance problem that
approaches, they are generally defined as a pixel-level classifica-
can deteriorate the accuracy results [38]. In these detectors, the
tion problem [169]. The main difference between semantic and
candidate bounding boxes can be represented into two main
instance is that the former one is capable to identify pixels be-
classes: positive samples, which are bounding boxes that match
longing to one class but can not distinguish objects of the same
with the ground-truth, according to a metric; and negative sam-
class in the image. However, instance segmentation approaches
ples, which do not match with the ground-truth. In this sense, a
can not distinguish overlapping of different objects, since they
non-max suppression filter can be used to refine these dense can-
are concerned with identifying objects separately. For example,
didates by removing overlaps to the most promising ones. The
it may be problematic to identify in an aerial urban image the lo-
Libra-RCNN [147], ATSS [7], Guided Anchoring [61], FSAF
cation of the cars, trucks, motorcycle, and the asphalt pavement
[216], PAA [145], GFL [65], PISA [153] and VFNet [191] de-
which consists of the background or region in which the other
tectors explore different sampling strategies and new loss metrics
objects are located. To unify these two approaches, a method
to improve the quality of selected positive samples and reduce
was recently proposed in [148], named panoptic segmentation.
the weight of the large negative samples.
With panoptic segmentation, the pixels that are contained in
Another theme explored in the DL literature is the strategy of uncountable regions (e.g. background) receive a specific value
encoding the bounding boxes, which influences the accuracy indicating it.
of the one-stage detectors as they do not use region proposal
Considering the success of the RPN method for object detection,
networks [191]. In this report [191], the authors represent the
some variants of Faster R-CNN were considered to instance seg-
bounding boxes like a set of representatives or key-points and
mentation as Mask R-CNN [131], which in parallel to bounding
find the farthest top, bottom, left, and right points. CenterNet
box regression branch add a new branch to predict the mask of
[51] detects the object center point instead of using bounding
the objects (mask generation). The Cascade Mask R-CNN [31]
boxes, while CornerNet [112] estimates the top-left corner and
and HTC [89] extend Mask R-CNN to refine in a cascade man-
the bottom-right corner of the objects. SABL [165] uses a chunk
ner the object localization and mask estimation. The PointRend
based strategy to discretize horizontally and vertically the image
[154] is a point-based method that reformulates the mask gener-
and estimate the offset of each side (bottom, up, left, and right).
ation branch as a rendering problem to iteratively select points
The VFNet [191] method proposes a loss function and a star-
Preprint – A Review on Deep Learning in UAV Remote Sensing 9

around the contour of the object. Regarding semantic segmen- sequent section, datasets from previously conducted research
tation, methods like U-Net [163], SegNet [11], DeepLabV3+ for further investigation by novel studies. These datasets were
[37], and Deep Dual-domain Convolutional Neural Network organized and their characteristics were also summarized ac-
(DDCN) [139] have also been regularly used and adapted for cordingly.
recent remote sensing investigations [140]. Another important
Most of our research was composed of publications from peer-
remote sensing approach that is been currently investigated is
review publishers in the area of remote sensing journals (Fig. 8).
the segmentation of objects considering sparse annotations [91].
Even though the review articles encountered in the WoS and
Still, as of today, the CGnet [35] and DLNet [47] are considered
Google Scholar databases do mention, to some extent, UAV-
the state-of-art methods for semantic segmentation.
based applications, none of them were dedicated to it. Towards
the end of our paper, we examined state-of-the-art approaches,
3 Deep Learning in UAV Imagery like real-time processing, data dimensionality reduction, do-
main adaptation, attention-based mechanisms, few-shot learning,
To identify works related to DL in UAV remote sensing appli- open-set, semi-supervised and unsupervised learning, and others.
cations, we performed a search in the Web of Science (WOS) This information provided an overview of the future opportu-
and Google Scholar databases. WOS is one of the most re- nities and perspectives on DL methods applied in UAV-based
spected scientific databases and hosts a high number of sci- images, where we discuss the implications and challenges of
entific journals and publications. We conducted a search us- novel approaches.
ing the following string in the WOS: (“TS = ((deep learning
The 232 papers (articles + proceedings) were investigated
OR CNN OR convolutional neural network) AND (UAV OR
through a quantitative perspective, where we evaluated the num-
unmanned aerial vehicle OR drone OR RPAS) AND (remote
ber of occurrences per journal, the number of citations, year
sensing OR photogrammetry)) AND LANGUAGE: (English)
of publication, and location of the conducted applications ac-
AND Types of Document: (Article OR Book OR Book Chap-
cording to country. We also prepared and organized a sampling
ter OR Book Review OR Letter OR Proceedings Paper OR
portion in relation to the corresponding categories, as previously
Review); Indexes=SCI-EXPANDED, SSCI, A%HCI, CPCI-S,
explained, identifying characteristics like architecture used, eval-
CPCI-SSH, ESCI. Stipulated-time=every-years.”). We consid-
uation metric approach, task conducted, and type of sensor and
ered DL, but added CNN, as it is one of the main DL-based
mapping context objectives. After evaluating it, we adopted a
architectures used in remote sensing applications [128]. As such,
qualitative approach by revising and presenting some of the ap-
published materials that use these terms in their titles, abstracts
plications conducted within the papers (UAV + DL) encountered
or keywords were investigated and included. For such reasons,
in the scientific databases, summarizing the most prominent ones.
we opted for this string to achieve a generalist investigation.
This narrative over these applications was separated accordingly
We filtered the results to consider only papers that implemented to the respective categories related to the mapping context (envi-
approaches with UAV-based systems. A total of 190 papers were ronmental, urban, and agricultural). Later on, when presenting
found in the WOS database, where 136 were articles, 46 pro- future perspectives and current trends in DL, we mentioned
ceedings, and 10 reviews. An additional search was conducted some of these papers alongside other investigations proposed
in the Google Scholar database to identify works not detected at computer vision scientific journals that could be potentially
in the WOS. We adopted the same combination of keywords in used for remote sensing and UAV-based applications.
this search. We performed a detailed evaluation of its results and
selected only those that, although from respected journals, were
not encountered in the WOS search. This resulted in a total of
34 articles, 16 proceedings, and 8 reviews. The entire dataset 3.1 Sensors and Applications Worldwide
was composed of 232 articles + proceedings and 18 reviews
from scientific journals indexed in those bases. These papers In the UAV-based imagery context, several applications were
were then organized and revised. Fig. 7 demonstrates the main beneficiated from DL approaches. As these networks’ usabil-
steps to map this research. The encountered publications were ity is increasing throughout different remote sensing areas, re-
registered only in the last five years (from 2016 to 2021), which searchers are also experimenting with their capability in substi-
indicates how recent UAV-based approaches integrated with DL tuting laborious-human tasks, as well as improving traditional
methods are in the scientific journals. measurements performed by shallow learning or conventional
The review articles gathered at those bases were separated and statistical methods. As of recently, several articles and pro-
mostly used in the cloud text analysis of Fig. 1, while the remain- ceedings were published in renowned scientific journals. In
ing papers (articles and proceedings) were organized according general terms, the articles collected at the scientific databases
to their category. A total of 283.785 words were analyzed for demonstrated a pattern related to its architecture (CNN or RNN),
the word-cloud, as we removed words with less than 5% oc- evaluation (classification or regression) approach (object detec-
currences to cut lesser-used words unrelated to the theme, and tion, segmentation, or scene-wise classification), type of sensor
higher than 95% occurrences to remove plain and simple words (RGB, multispectral, hyperspectral or LiDAR) and mapping
frequently used in the English language. The published articles context (environmental, urban, or agricultural). These patterns
and proceedings were divided in terms of DL-based networks can be viewed on a diagram (Fig. 9). The following observations
(classification: scene-wise classification, segmentation, and ob- can be extracted from this graphic:
ject detection and; regression), sensor types (RGB, multispectral,
hyperspectral, and LiDAR); and; applications (environmental, 1. The majority of networks in UAV-based applications
urban, and agricultural context). We also provided, in a sub- still rely mostly on CNNs;
Preprint – A Review on Deep Learning in UAV Remote Sensing 10

Figure 7: The schematic procedure adopted to organize the revised material according to their respective categories as proposed in this review.

Figure 8: The distribution of the evaluated scientific material according to data gathered at Web of Science (WOS) and Google Scholar
databases. The y-axis on the left represents the number (n) of published papers, illustrated by solid-colored boxes. The y-axis on the right
represents the number of citations that these publications, according to peer-review scientific journals, received since their publication, illustrated
by dashed-lines of the same color to its corresponding solid-colored box.
Preprint – A Review on Deep Learning in UAV Remote Sensing 11

2. Even though object detection is the highest type of ap- Following it, there is an interesting distribution pattern related
proach, there has been a lot of segmentation approaches to the application context. The data indicated that most of
in recent years; the applications were conducted in the environmental context
3. Most of the used sensors are RGB, followed by multi- (46.6%). This context includes approaches that aim to, in a
spectral, hyperspectral, and LiDAR, and; sense, deal with detection and classification tasks on land use and
change, environmental hazards and disasters, erosion estimates,
4. There is an interesting amount of papers published wild-life detection, forest tree inventory, monitoring difficult to
within the environmental context, with forest-type re- access regions, as others. Urban and agricultural categories (both
lated applications being the most common approach 27.2% and 26.4%, respectively) were associated with car and
in this category, while both urban and agricultural cat- traffic detection, buildings, street, and rooftop extraction, as well
egories were almost evenly distributed among opted as plant counting, plantation-row detection, weed infestation
approaches. identification, and others. Interestingly, all of the LiDAR data
applications were related to environmental mapping, while RGB
The majority of papers published on UAV-based applications
images were mostly used for urban, followed by the agricultural
implemented a type of CNN (91.2%). Most of these articles used
context. Multispectral and hyperspectral data, however, were
established architectures (Fig. 5) and a small portion proposed
less implemented in the urban context in comparison against the
their models and compared them against the state-of-the-art
other categories. As these categories benefit differently from DL-
networks. In reality, this comparison appears to be a crucial
based methods, a more detailed intake is needed to understand
concern regarding recent publications, since it is necessary to
its problems, challenges, and achievements. In the following
ascertain the performance of the proposed method in relation
subsections, we explain these issues and advances while citing
to well-known DL-based models. Still, the popularity of CNNs
some suitable examples from within our search database.
architecture in remote sensing images is not new, mainly because
of reasons already stated in the previous sections. Besides that, Lastly, another important observation to be made regarding
even though presented in a small number of articles, RNNs the categorization division used here is that there is a visible
(8.8%), mostly composed of CNN-LSTM architectures, are an dichotomy between the types of sensor used. Most of the pub-
emerging trend in this area and appear to be the focus of novel lished papers in this area evaluating the performance of DL-
proposals. As UAV systems are capable of operating mostly based networks with RGB sensors (52.4%). This was, respec-
according to the users’ own desires (i.e., can acquire images tively, followed by multispectral (24.3%), hyperspectral (17.8%),
from multiple dates in a more personalized manner), the same and LiDAR (5.5%). The preference for RGB sensors in UAV-
object is viewed through a type of time-progression approach. based systems may be associated with their low-cost and high
This is beneficial for many applications that include monitoring market availability. As such, published articles may reflect on
of stationary objects, like rivers, vegetation, or terrain slopes, this, since it is a viable option for practical reasons when consid-
for example. ering the replicability of the method. It should be noted that the
number of labeled examples in public databases are mostly RGB,
Although classification (97.7%) tasks are the most common eval-
which helps improvements and investigation with this type of
uation metrics implemented in these papers, regression (2.3%) is
data. Moreover, data obtained from multispectral, hyperspec-
an important estimate and may be useful in future applications.
tral, and LiDAR sensors are used in more specific applications,
The usage of regression metrics in remote sensing applications
which contributes to this division.
is worth it simply because it enables the estimation of con-
tinuous data. Applications that could benefit from regression Most of the object detection applications went on RGB types of
analysis are present in environmental, urban, and agricultural data, while segmentation problems were dealt with both RGB,
contexts, as in many others, and it is useful to return predictions multispectral, hyperspectral, and LiDAR data. A possible ex-
on measured variables. Classification, on the other hand, is more planation for this is that object detection often relies on the
of a common ground for remote sensing approaches and it is spatial, texture, pattern, and shape characteristics of the object
implemented in every major task (object detection; pixel-wise in the image, as segmentation approaches are a diverse type of
semantic segmentation and scene-wise classification). applications, which benefit from the amount of spectral and ter-
rain information provided by these sensors. In object detection,
The aforementioned DL-based architectures were majorly ap-
DL-based methods may have potentialized the usage of RGB
plied in object detection (53.9%) and image segmentation
images, since simpler and traditional methods need additional
(40.7%) problems, while (scene-wise) classification (5.4%) were
spectral information to perform it. Also, apart from the spectral
the least common. This preference for object detection may be
information, LiDAR, for example, offers important features of
related to UAV-based data, specifically, since the high amount
the objects for the networks to learn and refine the edges around
of detail of an object provided by the spatial resolution of the
them, specifically where their patterns are similar. Regardless,
images is both an advantage and a challenge. It is an advantage
many of these approaches are related to the available equipment
because it increases the number of objects to be detected on
and nature of the application itself, so it is difficult to pinpoint a
the surface (thus, more labeled examples), and it is a challenge
specific reason.
because it difficulties both the recognition and segmentation of
these objects (higher detail implies more features to be extracted
and analyzed). Classification (scene-wise), on the other hand, 3.2 Environmental Mapping
is not as common in remote sensing applications, and image
segmentation is often preferred in some applications since as- Environmental approaches with DNNs-based methods hold the
signing a class to each pixel of the image has more benefits for most diverse applications with remote sensing data, including
this type of analysis than rather only identifying a scene. UAV-imagery. These applications adopt different sensors simply
Preprint – A Review on Deep Learning in UAV Remote Sensing 12

Figure 9: Diagram describing proceedings and articles according to the defined categories using WOS and Google Scholar datasets.

because of their divergent nature. To map natural habits and their data registered at multiple scales, used a CNN in combination
characteristics, studies often relied on methods and procedures with a graphical method named conditional random field (CRF).
specifically related to its goals, and no “universal” approach Another research [150], with hyperspectral images in combina-
could be proposed nor discovered. However, although DL-based tion between 2D and 3D convolutional layers, was developed
methods have not reached this type of “universal” approach, to determine the discrepancy of land cover in the assigned land
they are changing some skepticism by being successfully im- category of cadastral map parcels.
plemented in the most unique scenarios. Although UAV-based
With a semantic segmentation approach, road extraction by a
practices still offer some challenges to both classification and
CNN was demonstrated in another investigation [116]. Another
regression tasks, DNNs methods are proving to be generally
study [64] investigated the performance of a FCN to monitor
capable of performing such tasks. Regardless, there is still much
household upgrading in unplanned settlements. Terrain analysis
to be explored.
is a diversified topic in any type of cartographic scale, but for
Several environmental practices could potentially benefit from UAV-based images, in which most data acquisitions are com-
deep networks like CNNs and RNNs. For example, monitoring posed by a high-level of detail, DL-based methods are resulting
and counting wild-life [15, 85, 176], detecting and classifying in important discoveries, demonstrating the feasibility of these
vegetation from grasslands and heavily-forested areas [82, 73], methods to perform this task. Still, although these studies are
recognizing fire and smoke signals [110, 205], analyzing land proving this feasibility, especially in comparison with other
use, land cover, and terrain changes, which are often imple- methods, novel research should focus on evaluating the per-
mented into environmental planning and decision-making mod- formance of deep networks regarding their domain adaptation,
els [109, 206], predicting and measuring environmental hazards as well as its generalization ability, like using data in different
[190, 25], among others. What follows is a brief description of spatial resolutions, multitemporal imagery, etc.
recent material published in the remote sensing scientific jour-
The detection, evaluation, and prediction of flooded areas rep-
nals that aimed to solve some of these problems by integrating
resents another type of investigation with datasets provided by
data from UAV embedded sensors with DL-based methods.
UAV-embedded sensors. A study [62] demonstrated the impor-
One of the most common approaches related to environmen- tance of CNNs for the segmentation of flooded regions, where
tal remote sensing applications regards land use, land cover, the network was able to separate water from other targets like
and other types of terrain analysis. A recent study [66] applied buildings, vegetation, and roads. One potential application that
semantic segmentation networks to map land use over a min- could be conducted with UAV-based data, but still needs to be
ing extraction area. Another one, [3], combined information further explored, is mapping and predicting regions of possible
from a Digital Surface Model (DSM) with UAV-based RGB flooding with a multitemporal analysis, for example. This, as
images and applied a type of feature fusion as input for a CNN well as many other possibilities related to flooding, water-bodies,
model. To map coastal regions, an approach [26], with RGB
Preprint – A Review on Deep Learning in UAV Remote Sensing 13

and river courses [27], could be investigated with DL-based ap- Another application aside from vegetation mapping involves
proaches. wild-life identification. Animal monitoring in open spaces and
grasslands is also something that received attention as DL-based
For river analysis, an investigation [207] used a CNN archi-
object detection and semantic segmentation methods are provid-
tecture for image segmentation by fusing both the positional
ing interesting outcomes. A paper by [103] covers this topic and
and channel-wise attentive features to assist in river ice moni-
discusses, with practical examples, how CNNs may be used in
toring. Another study [97] compared LiDAR data with point
conjunction with UAV-based images to recognize mammals in
cloud generated by UAV mapping and demonstrated an inter-
the African Savannah. This study relates the challenges related
esting approach to DL-based methods applications for point
to this task and proposes a series of suggestions to overcome
cloud classification and a rapid Digital Elevation Model (DEM)
them, focusing mostly on imbalances in the labeled dataset.
generation for flood risk mapping. One type of application with
The identification of wild-life, also, was not only performed
CNN in UAV data involved measuring hailstones in open areas
in terrestrial environments, but also in marine spaces, where
[174]. For this approach, image segmentation was used in RGB
a recent publication [70] implemented a CNN-based semantic
images and returned the maximum dimension and intermediate
segmentation method to identify cetacean species, mainly blue,
dimension of the hailstones. Lastly, on this topic, a comparison
humpback, and minke whales, in the ocean. These studies not
[92] with CNNs and GANs to segment both river and vegetation
only demonstrate that such methods can be highly accurate at
areas demonstrated that a type of “fusion” between these net-
different tasks but also imply the potential of DL approaches for
works using a global classifier had an advantage of increasing
UAVs in the current literature.
the efficiency of the segmentation.
UAV-based forest mapping and monitoring is also an emerging
approach that has been gaining the attention of the scientific 3.3 Urban Mapping
community and, at some level, governmental bodies. Forest
For urban environments, many DL-based proposals with UAV
areas often pose difficulties for precise monitoring and investiga-
data have been presented in the literature in the last years. The
tion, since they can be hard to access and may be dangerous to
high-spatial-resolution easily provided by UAV embedded sen-
some extent. In this aspect, images taken from UAV embedded
sors are one of the main reasons behind its usage in these ar-
sensors can be used to identify single tree-species in forested en-
eas. Object detection and instance segmentation methods in
vironments and compose an inventory. From the papers gathered,
those images are necessary to individualize, recognize, and map
multiple types of sensors, RGB, both multi and hyperspectral,
highly-detailed targets. Thus, many applications rely on CNNs
and LiDAR, were used for this approach. An application inves-
and, in small cases, RNNs (CNN-LSTM) to deal with them.
tigated the performance of a 3D-CNN method to classify tree
Some of the most common examples encountered in this cate-
species in a boreal forest, focusing on pine, spruce, and birch
gory during our survey are the identification of pedestrians, car
trees, with a combination between RGB and hyperspectral data
and traffic monitoring, segmentation of individual tree-species
[138].
in urban forests, detection of cracks in concrete surfaces and
Single-tree detection and species classification by CNNs were pavements, building extraction, etc. Most of these applications
also investigated in [57] in which three types of palm-trees in were conducted with RGB type of sensors, and, in a few cases,
the Amazon forest, considered important for its population and spectral ones.
native communities, were mapped with this type of approach.
The usage of RGB sensors is, as aforementioned, a preferred
Another example [90] includes the implementation of a Deep
option for small-budget experiments, but also is related to an-
Convolutional Generative Adversarial Network (DCGAN) to
other important preference of CNNs, and that is that features
discriminate between health diseased pinus-trees in a heavily-
like pixel-size, form, and texture of an object are essential to its
dense forested park area. Another recent investigation [134]
recognition. In this regard, novel experiments could compare
proposed a novel DL method to identify single-tree species in
the performance of DL-based methods with RGB imagery with
highly-dense areas with UAV- hyperspectral imagery. These and
other types of sensors. As low-budget systems are easy to imple-
other scientific studies demonstrate how well DL-based methods
ment in larger quantities, many urban monitoring activities could
can deal with such environments.
benefit from such investigations. In urban areas, the importance
Although the majority of approaches encountered at the of UAV real-time monitoring is relevant, and that is one of the
databases of this category relate to tree-species mapping, UAV- current objectives when implementing such applications.
acquired data were used for other applications in these natural
The most common practices on UAV-based imagery in urban
environments. A recent study [208] proposed a method based on
environments with DL-based methods involve the detection of
semantic segmentation and scene-wise classification of plants
vehicles and traffic. Car identification is an important task to
in UAV-based imagery. The method bases itself on a CNN that
help urban monitoring and may be useful for real-time analysis
classifies individual plants by increasing the image scale while
of traffic flow in those areas. It is not an easy task, since vehicles
integrating features learned from small scales. This approach
can be occluded by different objects like buildings and trees, for
is an important intake in multi-scale information fusion. Also
example. A recent approach using RGB video footage obtained
related to vegetation identification, multiple CNNs architectures
with UAV, as presented in [204], used an object detection CNN
were investigated in [74] to detect between plants and non-type
for this task. They also dealt with differences in traffic monitor-
of plants with UAV-based RGB images achieving interesting
ing to motorcycles, where a frame-by-frame analysis enabled
performance.
the neural network to determine if the object in the image was a
person (pedestrian) or a person riding a motorcycle since differ-
ences in its pattern and frame-movement indicated it. Regarding
Preprint – A Review on Deep Learning in UAV Remote Sensing 14

pedestrian traffic, an approach with thermal cameras presented differences in angles, rotation, scales, and other UAV-based
by [43] demonstrated that CNNs are appropriate to detect per- imagery-related characteristics, diversity in urban scenarios is a
sons with different camera rotations, angles, sizes, translation, problem that should be considered by unsupervised approaches.
and scale, corroborating the robustness of its learning and gener- Therefore, in the current state, DL-based networks still may
alization capabilities. rely on some supervised manner to guide image processing,
specifically regarding domain shift factors.
Another important survey in those areas is the detection and
localization of single-tree species, as well as the segmentation of
their canopies. Identifying individual species of vegetation in ur-
ban locations is an important requisite for urban-environmental 3.4 Agricultural Mapping
planning since it assists in inventorying species and providing
Precision agriculture applications have been greatly benefited
information for decision-making models. A recent study [49] ap-
from the integration between UAV-based imagery and DL meth-
plied object detection methods to detect and locate tree-species
ods in recent scientific investigations. The majority of issues
threatened by extinction. Following their intentions, a research
related to these approaches involve object detection and feature
[183] evaluated semantic segmentation neural networks to map
extraction for counting plants and detecting plantation lines,
endangered tree-species in urban environments. While one ap-
recognizing plantation-gaps, segmentation of plant species and
proach aimed to recognize the object to compose an inventory,
invasive species such as weeds, phenology, and phenotype de-
the other was able to identify it and return important metrics,
tection, and many others. These applications offer numerous
like its canopy-area for example. Indeed, some proposals that
possibilities for this type of mapping, especially since most of
were implemented in a forest type of study could also be adopted
these tasks are still conducted manually by human-vision inspec-
in urban areas, and this leaves an open field for future research
tion. As a result, they can help precision farming practices by
that intends to evaluate DL-based models in this environment.
returning predictions with rapid, unbiased, and accurate results,
Urban areas pose different challenges for tree monitoring, so
influencing decision-making for the management of agricultural
these applications need to consider their characteristics.
systems.
DL-based methods have also been used to recognize and extract
Regardless, although automatic methods do provide important
infrastructure information. An interesting approach demon-
information in this context, they face difficult challenges. Some
strated by [24], based on semantic segmentation methods,
of these include similarity between the desired plant and invasive
was able to extract buildings in heavily urbanized areas, with
plants, hard-to-detect plants in high-density environments (i.e.
unique architectural styles and complex structures. Interestingly
presenting small spacing between plants and lines), plantation-
enough, a combination of RGB with a DSM improved building
lines that do not follow a straight-path, edge-segmentation in
identification, indicating that the segmentation model was able
mapping canopies with conflicts between shadow and illumina-
to incorporate appropriate information related to the objects’
tion, and many others. Still, novel investigations aim to achieve
height. This type of combinative approach, between spatial-
a more generative capability to these networks in dealing with
spectral data and height, may be useful in other identification
such problems. In this sense, approaches that implement meth-
and recognition approaches. Also regarding infrastructure, an-
ods in more than one condition or plantation are being the main
other possible application in urban areas is the identification and
focus of recent publications. Thus, varied investigation scenarios
location of utility poles [67]. This application, although being of
are currently being proposed, with different types of plantations,
rather a specific example, is important to maintain and monitor
sensors, flight-altitudes, angles, spatial and spectral divergences,
the conditions of poles regularly. These types of monitoring in
dates, phenological-stages, etc.
urban environments is something that benefits from DL-based
models approaches, as it tends to substitute multiple human in- An interesting approach that has the potential to be expanded to
spection tasks. Another application involves detecting cracks in different orchards was used in [6]. There, a low-altitude flight
concrete pavements and surfaces [20]. Because some regions of approach was adopted with side-view angles to map yield by
civil structures are hard to gain access to UAV-based data with counting fruits with the CNN-based method. Counting fruits is
object detection networks may be useful to this task, returning a not something entirely new in DL-based approaches, some pa-
viable real-life application. pers demonstrated the effectiveness of bounding-box and point-
feature methods to extract it [22, 182, 100] aside from several
Another topic that is presenting important discoveries relates to
differences in occlusion, lightning, fruit size, and image corrup-
land cover pixel segmentation in urban areas, as demonstrated
tion.
by [18]. In this investigation, an unsupervised domain adapta-
tion method based on GANs was implemented, working with Today’s deep networks demonstrate high potential in yield-
different data from UAV-based systems, while being able to im- prediction, as some applications are adapted to CNN architec-
prove image segmentation of buildings, low vegetation, trees, tures mainly because of its benefits in image processing. One of
cars, and impervious surfaces. As aforementioned, GANs or which includes predicting pasture-forage with only RGB images
DCGANs are quickly gaining the attention of computer vision [33]. Another interesting example in crop-yield estimates is
communities due to their wide area of applications and the way presented by [137], where a CNN-LSTM was used to predict
they function by being trained to differentiate between real and yield with a spatial multitemporal approach. There the authors
fake data [68]. Regardless, its usage in UAV-based imagery is implemented this structure since RNNs are more appropriate
still underexplored, and future investigations regarding not only to learn from temporal data, while a 3D-CNN was used to pro-
land change and land cover but also other types of applications’ cess and classify the image. Although used less frequently than
accuracies may be improved with them. Nonetheless, apart from CNNs in the literature, there is emerging attention to LSTM
architectures in precision agriculture approaches, which appear
Preprint – A Review on Deep Learning in UAV Remote Sensing 15

to be an appropriate intake for temporal monitoring of these them [16]. These examples reveal the widespread variety of
areas. agriculture problems that can be attended with the integration of
DL models and UAV remote sensing data.
Nonetheless, one of the most used and beneficiated approaches
in precision agriculture with DL-based networks is counting Lastly, a field yet to be also explored in the literature is the identi-
and detecting plants and plantation lines. Counting plants is fication and recognition of pests and disease indicators in plants
essential to produce estimates regarding production rates, as using DL-based methods. Most recent approaches aimed to iden-
well as, by geolocating it, determine if a problem occurred tify invasive species, commonly named “weeds”, in plantation-
during the seedling process by identifying plantation-gaps. In fields. In a demonstration with unsupervised data labeling,
this regard, plantation-lines identification with these gaps is [45] evaluated the performance of a CNN-based method to pre-
also a desired application. Both object detection and image dict weeds in the plantation lines of different crops. This pre-
segmentation methods were implemented in the literature, but processing step to automatically generate labeled data, which is
most approaches using image semantic segmentation algorithms implemented outside the CNN model structure, is an interesting
rely on additional procedures, like using a blob detection method approach. However, others prefer to include a “one-step” net-
[107], for example. These additional steps may not always be work to deal with this situation, and different fronts are emerging
desirable, and to prove the generality capability of one model, in the literature. Unsupervised domain adaptation, in which the
multiple tests at different conditions should be performed. network extracts learning features from new unviewed data, is
one of the most current aimed models.
For plantation-line detection, segmentations are currently being
implemented and often used to assist in more than one informa- A recent publication [118] proposed it to recognize and count
tion extraction. In [143] semantic segmentation methods were in-field cotton-boll status identification. Regardless, with UAV-
applied in UAV-based multispectral data to extract canopy areas based data examples, this is still an issue. As for disease detec-
and was able to demonstrate which spectral regions were more tion, a study [104] investigated the use of image segmentation
appropriate to it. A recent application with UAV-based data for vine-crops with multispectral images, and was able to sep-
was also proposed in [144], where a CNN model is presented arate visible symptoms (RGB), infrared symptoms (i.e. when
to simultaneously count and detect plants and plantation-lines. considering only the infrared band) and in an intersection be-
This model is based on a confidence map extraction and was tween visible and infrared spectral data. Another interesting
an upgraded version from previous research with citrus-tree example regarding pests identification with UAV-based image
counting [142]. This CNN works by implementing some convo- was demonstrated in [179] where superpixel image samples of
lutional layers, a Pyramid Pooling Module (PPM) [211], and a multiple pest species were considered, and activation filters used
Multi-Stage Module (MSM) with two information branches that, to recognize undesirable visual patterns implemented alongside
concatenated at the end of the MSM processes, shares knowl- different DL-based architectures.
edge learned from one to another. This method ensured that the
network learned to detect plants that are located at a plantation-
line, and understood that a plantation-line is formed by linear
conjunction of plants. This type of method has also been proved 4 Publicly Available UAV-Based Datasets
successful in dealing with highly-dense plantations. Another
research [4] that aimed to count citrus-trees with a bounding- As mentioned, one of the most important characteristics of DL-
box-based method also returned similar accuracies. However, it based methods is that they tend to increase their learning capabil-
was conducted in a sparse plantation, which did not impose the ities as a number of labeled examples are used to train a network.
same challenges faced at [142, 144]. Regardless, to deal with In most of the early approaches to remote sensing data, CNNs
highly dense scenes, feature extraction from confidence maps were initialized with pre-trained weights from publicly available
appears to be an appropriate approach. image repositories over the internet. However, most of these
repositories are not from data acquired with remote sensing
However, agricultural applications do not always involve plant
platforms. Still, there are some known aerial repositories with
counting or plantation-line detection. Similar to wild-animal
labeled examples, which were presented in recent years, such
identification as included in other published studies [103, 70],
as the DOTA [197], UAVDT [50], VisDrone [9], WHU-RS19
there is also an interest in cattle detection, which is still an oner-
[171], RSSCN7 [220], RSC11 [209], Brazilian Coffee Scene
ous task for human-inspection. In UAV-based imagery, some
[151] datasets. These and others are gaining notoriety in UAV-
approaches included DL-based bounding-boxes methods [14],
based applications and could be potentially used to pre-train or
which were also successfully implemented. DNNs used for
benchmark DL methods. These datasets not only serve as an
this task are still underexplored, but published investigations
additional option to start a network but also may help in novel
[162] argue that one of the main reasons behind the necessity to
proposals to be compared against the evaluated methods.
use DL methods is based on occurrences of changes in terrain
(throughout the seasons of the year) and the non-uniform distri- Since there is a still scarce amount of labeled examples with
bution of the animals throughout the area. On this matter, one UAV-acquired data, specifically in multispectral and hyperspec-
interesting approach should involve the usage of real-time object tral data, we aimed to provide UAV-based datasets in both urban
detection on the flight. This is because it is difficult to track and rural scenarios for future research to implement and com-
animal movement, even in open areas such as pastures, when a pare the performance of novel DL-based methods with them.
UAV system is acquiring data. Another agricultural application Table 1 summarizes some of the information related to these
example refers to the monitoring offshore aquaculture farms datasets, as well as indicates recent publications in which previ-
using UAV-underwater color imagery and DL models to classify ously conducted approaches were implemented, as well as the
results achieved on them. They are available on the following
Preprint – A Review on Deep Learning in UAV Remote Sensing 16

webpage, which is to be constantly updated with novel labeled DL models use 32-bit floating points to represent the weights
datasets from here on: Geomatics and Computer Vision/Datasets of the neural network. A simple strategy known as quantization
reduces the amount of memory required by DL models repre-
senting the weights, using 16, 8, or even 1 bit instead of 32-bits
5 Perspectives in Deep Learning with UAV Data floating points. A 32-bit full precision ResNet-18 [75] achieves
89.2% top-5 accuracy on the ImageNet dataset [94], while the
There is no denying that DL-based methods are a powerful and
ResNet-18 [75] ported to XNOR-Net achieves 73.2% top-5 accu-
important tool to deal with the numerous amounts of data daily
racy in the same dataset. The quantization goes beyond weights,
produced by remote sensing systems. What follows in this sec-
in all network components, while the literature reports activation
tion is a short commentary on the near perspectives of one of the
functions and gradient optimizations quantized methods. The
most emerging fields in the DL and remote sensing communities
survey conducted in [72] gives an important overview of quan-
that could be implemented with UAV-based imagery. These top-
tization methods. Also, knowledge distillation [79] is another
ics, although individually presented here, have the potential to
example of a training model using a smaller network, where a
be combined, as already performed in some studies, contributing
larger “teacher” network guides the learning process of a smaller
to the development of novel approaches.
“student” network.
In general, DL architectures require low resolution input images
Another strategy to develop fast DL models is to design layers
(e.g., 512 × 512 pixels). High resolution images are generally
with fewer parameters that are still capable of retaining predic-
scaled to the size required for processing. However, UAVs
tive performance. MobileNets [86] and its variants are a good
have the advantage of capturing images in higher resolution
example of this idea. In specific tasks, such as object detection,
than most other types of sensing platforms aside from proximal
it is possible to develop architectural enhancements for this ap-
sensing, and the direct application of traditional architectures
proach, such as the Context Enhanced Module (CEM) and the
may not take advantage of this feature. As such, processing
Spatial Attention Module (SAM) [155]. When considering even
images with DL while maintaining high resolution in deeper
smaller computational power, it is possible to find DL running
layers is a challenge to be explored. In real-time applications,
on microcontroller units (MCU) where the memory and compu-
such as autonomous navigation, this processing must be fast,
tational power are 3-4 orders of magnitude smaller than mobile
which opens up a range of research related to reducing the com-
phones.
plexity of architectures while preserving accuracy. Regarding
DL, recently, some CNN architectures that try to maintain high On hardware, the industry has already developed embedded AI
resolution in deeper layers, such as HRNet, have been proposed platforms that run DL algorithms. NVIDIA’s Jetson is amongst
[101]. These novel architectures can really take advantage of the most popular choices and a survey [133] of studies using
the high resolution from UAV images compared to commonly the Jetson platform and its applications demonstrate it. Also, a
available orbital data. broader survey on this theme, that considers GPU, ASIC, FPGA,
and MCUs of AI platforms, can be read in [95]. Regardless,
To summarize, the topics addressed in this section compose
research in the context of UAV remote sensing is quite limited,
some of the hot topics in the computer vision community, and
and there is a gap that can be fulfilled by future works. Several
the combination of them with remote sensing data can contribute
applications can be benefited by this technology, including, for
to the development of novel approaches in the context of UAV
example, agricultural spraying UAV, which can recognize differ-
mapping. In this regard, it is important to emphasize that not
ent types of weeds in real-time, and simultaneously use the spray.
only these topics are currently being investigated by computer
Other approaches may include real-time monitoring of trees in
vision research, but that they also are being fastly implemented
both urban and forest environments, as well as the detection of
in multiple approaches aside from remote sensing. As other
other types of objects that benefit from a rapid intake.
domains are investigated, novel ways of improving and adapt-
ing these networks can be achieved. Future studies in remote
sensing communities, specifically on UAV-based systems, may
benefit from these improvements and incorporate them into their 5.2 Dimensionality Reduction
applications.
Due to recent advances in capture devices, hyperspectral images
5.1 Real-Time Processing can be acquired even in UAVs. These images consist of tens to
hundreds of spectral bands that can assist in the classification
Most of the environmental, urban, and agricultural applications of objects in a given application. However, two main issues
presented in this study can benefit from real-time responses. arise from the high dimensionality: i) the bands can be highly
Although UAV and DL-based combinations speed up the pro- correlated, and ii) the excessive increase in the computational
cessing pipeline, these algorithms are highly computer-intensive. cost of DL models. High-dimensionality could invoke a prob-
Usually, they do require post-processing in data centers or dedi- lem known as the Hughes phenomenon, which is also known
cated Graphics Processing Units (GPUs) machines. Although as the curse of dimensionality, i.e., when the accuracy of a clas-
DL is considered a fast method to extract information from data sification is reduced due to the introduction of noise and other
after its training, it still bottlenecks real-time applications mainly implications encountered in hyperspectral or high-dimensional
because of the number of layers intrinsic to the DL methods data [77]. Regardless, hyperspectral data may pose an hindrance
architecture. Research groups, especially from the IoT indus- for the DL-based approaches accuracies, thus being an impor-
try/academy, race to develop real-time DL methods because of it. tant issue to be considered in remote sensing practices. The
The approach usually goes in two directions: developing faster classic approach to address high dimensionality is by applying a
algorithms and developing dedicated GPU processors. Principal Component Analysis (PCA) [120].
Preprint – A Review on Deep Learning in UAV Remote Sensing 17

Table 1: UAV-based datasets that are publically available from previous research.
Reference Task Target Sensor GSD(cm) Best Method Result
[49] Detection Trees RGB 0.82 RetinaNet AP = 92.64%
[183] Segmentation Trees RGB 0.82 FC-DenseNet F1 = 96.0%
[143] Segmentation Citrus Multispectral 12.59 DDCN F1 = 94.4%
[144] Detection Citrus RGB 2.28 [144] F1 = 96.5%
[144] Detection Corn RGB 1.55 [144] F1 = 87.6%
[142] Detection Citrus Multispectral 12.59 [142] F1 = 95.0%

Despite several proposals, PCA is generally not applied in con- 5.4 Attention-Based Mechanisms
junction with DL, but as a pre-processing step. Although this
method may be one of the most known approaches to reduce Attention mechanisms aim to highlight the most valuable fea-
dimensionality when dealing with hyperspectral data, different tures or image regions based on assigning different weights for
intakes were already presented in the literature. A novel DL them in a specific task. It is a topic that has been recently ap-
approach, implemented with UAV-based imagery, was demon- plied in remote sensing, providing significant improvements. As
strated by Miyoshi et al. [134]. There, the authors proposed a pointed out by [198], high-resolution images in remote sens-
one-step approach, conducted within the networks’ architecture, ing provide a large amount of information and exhibit minor
to consider a combination of bands of a hyperspectral sensor that intra-class variation while it tends to increase. These variations
were highly related to the labeled example provided in the input and a large amount of information make extraction of relevant
layer at the initial stage of the network. Another investigation features more difficult, since traditional CNNs process all re-
[189] combines a band selection approach, spatial filtering, and gions with the same weight (relevance). Attention mechanisms,
CNN to simultaneously extract the spectral and spatial features. such as the one proposed by [198], are useful tools to focus the
Still, the future perspective to solve this issue appears to be a feature extraction in discriminative regions of the problem, be
combination of spectral band selection and DL methods in an it image segmentation [46, 175, 214], scene-wise classification
end-to-end approach. Thus, both selection and DL methods [218, 125], or object detection [121, 125], as others.
can exchange information and improve results. This can also Besides, [175] argue that when remote sensing images are used,
contribute to understanding how DL operates with these images, they are generally divided into patches for training the CNNs.
which was slightly accomplished at Miyoshi et al. [134]. Thus, objects can be divided into two or more sub-images, caus-
ing the discriminative and structural information to be lost. At-
tention mechanisms can be used to aggregate learning by focus-
ing on relevant regions that describe the objects of interest, as
5.3 Domain Adaptation and Transfer Learning presented in [175], through a global attention upsample module
that provides global context and combines low and high-level
information. Recent advances in computer vision were achieved
The training steps of DL models are generally carried out on im- with attention mechanisms for classification (e.g., Vision Trans-
ages captured in a specific geographical region, in a short-time former [48] and Data-efficient Image Transformers [184]) and
period, or on single capture equipment (also known as domains). in object detection (e.g., DETR [28]) that have not yet been fully
When the model is used in practice, it is common for spectral evaluated in remote sensing applications. Some directions also
shifts to occur between the training and test images due to differ- point to the use of attention mechanisms directly in a sequence
ences in acquisition, geographic region, atmospheric conditions, of image patches [48, 184]. These new proposals can improve
among others [187]. Domain adaptation is a technique for adapt- the results already achieved in remote sensing data, just as they
ing models trained in a source domain to a different, but still have advanced the results on the traditional image datasets in
related, target domain. Therefore, domain adaptation is also computer vision (e.g., ImageNet [94]).
viewed as a particular form of transfer learning [187]. On the
other hand, transfer learning [217, 178] does include applica-
tions in which the characteristics of the domain’s target space 5.5 Few-Shot Learning
may differ from the source domain.
Although recent materials demonstrated the feasibility of DL-
A promising research line for domain adaptation and transfer based methods for multiple tasks, they still are considered lim-
learning is to consider GANs [68, 53]. For example, [19] pro- ited in terms of high generalization. This occurs when dealing
posed the use of GANs to convert an image from the source with the same objects in different geographical areas or when
domain to the target domain, causing the source images to mimic new object classes are considered. Traditional solutions require
the characteristics of the images from the target domain. Recent retraining the model with a robust labeled dataset for the new
approaches seek to align the distribution of the source and target area or object. Few-shot learning aims to cope with situations in
domains, although they do not consider direct alignment at the which few labeled datasets are available. A recent study [119],
level of the problem classes. Approaches that are attentive to in the context of scene classification, pointed out that few-shot
class-level shifts may be more accurate, as the category-sensitive methods in remote sensing are based on transfer learning and
domain adaptation proposed by [55]. Thus, these approaches meta-learning. Meta-learning can be more flexible than transfer
reduce the domain shift related to the quality and characteristics learning, and when applied in the training set to extract meta-
of the training images and can be useful in practice for UAV knowledge, contributes significantly to few-shot learning in the
remote sensing. test set. An interesting strategy to cope with large intraclass
Preprint – A Review on Deep Learning in UAV Remote Sensing 18

variation and interclass similarity is the implementation of the detecting plants and plantation lines in UAV-based imagery. The
attention mechanism in the feature learning step, as previously proposed network benefited from the contributions of consid-
described. The datasets used in the [119] study were not UAV- ering both tasks in the same structure, since the plants must,
based; however, the strategy can be explored in UAV imagery. essentially belong to a plantation line. In short, improvements
occurred in the detection task when line detection was consid-
In the context of UAV remote sensing, there are few studies on
ered at the same time. This approach can be further explored in
few-shot learning. Recently, an investigation [102] aimed for
several UAV-based remote sensing applications.
the detection of maize plants using the object detection method
CenterNet. The authors adopted a transfer learning strategy
using pre-trained models from other geographical areas and 5.8 Open-Set
dates. Fewer images (in total, 150 images), when compared to The main idea of an open-set is to deal with unknown or unseen
the previous training (with 600 images), from the new area were classes during the inference in the testing set [17]. As the au-
used for fine-tuning the model. Based on the literature survey, thors mention, recognition in real-world scenarios is “open-set”,
there is a research-gap to be further explored in the context of different from neural networks’ nature, which is in a “close-set”.
object detection using few-shot learning in UAV remote sensing. Consequently, the testing set is classified considering only the
The main idea behind this is to consider less labeled datasets classes used during the training. Therefore, unknown or unseen
for training, which may help in some remote applications where classes are not rejected during the test. There are few studies
data availability is scarce or presents few occurrences. regarding open-set in the context of remote sensing. Regarding
semantic segmentation of aerial imagery, a study by [173] pre-
5.6 Semi-Supervised Learning and Unsupervised Learning sented an approach considering the open-set context. There, an
adaptation of a close-set semantic segmentation method, adding
With the increasing availability of remote sensing images, the a probability threshold after the softmax, was conducted. Later,
labeling task for supervised training of DL models is expensive a post-processing step based on morphological filters was ap-
and time-consuming. Thus, the performance of DL models is plied to the pixels classified as unknown to verify if they are
impacted due to the lack of large amount of labeled training inside pixels or from borders. Another interesting approach is to
images. Efforts have been made to consider unlabeled images combine open-set and domain adaptation methods, as proposed
in training through unsupervised (unlabeled images only) and by [2] in the remote sensing context.
semi-supervised (labeled and unlabeled images) learning. In re-
mote sensing, most semi-supervised or unsupervised approaches 5.9 Photogrammetric Processing
are based on transfer learning, which usually requires a super-
vised pre-trained model [127]. In this regard, a recent study [99] Although not as developed as other practices, DL-based meth-
proposed a promising approach for unlabeled remote sensing ods can be adopted for processing and optimizing the UAV
images that define spatial augmentation criteria for relating close photogrammetric processing task. This process aims to gener-
sub-images. Regardless, this is still an underdeveloped prac- ate a dense point cloud and an orthomosaic, and it is based on
tice with UAV-based data and should be investigated in novel Structure-from-Motion (SfM) and Multi-View Stereo (MVS)
approaches. techniques. In SfM, the interior and exterior orientation param-
eters are estimated, and a sparse point cloud is generated. A
Future perspectives point to the use of contrastive loss [10, 181, matching technique between the images is applied in SfM. A
80, 76] and clustering-based approaches [30, 29]. Recent publi- recent survey on image matching [129] concluded that this the-
cations have shown interesting results with the use of contrastive matic is still an open problem and pointed out the potential of
loss that has not yet been fully evaluated in remote sensing. DL is this task. The authors mentioned that DL techniques are
For example, [76] proposed an approach based on contrastive mainly applied to feature detection and description, and further
loss that surpassed the performance of its supervised pre-trained investigations on feature matching can be explored. Finally, they
counterpart. As for clustering-based methods, they often group pointed out that a promising direction is the customization of
images with similar characteristics [30]. On this matter, a re- modern feature matching techniques to attend SfM.
search [30] presented an approach that groups the data while
reinforcing the consistency between the cluster assignments Regarding DL for UAV image matching, there is a lack of work
produced for a pair of images (same images with two augmen- indicating a potential for future exploration. In the UAV pho-
tations). An efficient and effective way to use a large number togrammetric process, DL also can be used in filtering the DSM,
of unlabeled images can considerably improve the performance, which is essential to generate high-quality orthoimages. Pre-
mainly related to the generalizability of the models. vious work [63] showed the potential of using DL to filter the
DSM and generate the DTM. Further investigations are required
in this thematic, mainly considering UAV data. Besides, another
5.7 Multitask Learning task that can be beneficiated by DL is the color balancing be-
Multitask learning aims to perform multiple tasks simultane- tween images when generating orthomosaic from thousands of
ously. Several advantages are mentioned in [42], including fast images, corresponding to extensive areas.
learning and the minimization of overfitting problems. Recently,
in the context of UAV remote sensing, there were some impor- 6 Conclusions
tant researches already developed. A study [194] proposed a
method to conduct three tasks (semantic segmentation, height es- DL is still considered up to the time of writing, a “black-box”
timation, and boundary detection), which also considered bound- type of solution for most of the problems, although novel re-
ary attention modules. Another research [144] simultaneously search is advancing in minimizing this notion at considerable
Preprint – A Review on Deep Learning in UAV Remote Sensing 19

proportions. Regardless, in the remote sensing domain, it already Abbreviations


provided important discoveries on most of its implementations.
Our literature revision has focused on the application of these The following abbreviations are used in this manuscript:
methods in UAV-based image processing. In this sense, we
structured our study to offer more of a comprehensive approach AdaGrad Adaptive Gradient Algorithm
to the subject while presenting an overview of state-of-the-art AI Artificial Intelligence
techniques and perspectives regarding its usage. As such, we ANN Artificial Neural Network
hope that this literature revision may serve as an inclusive survey CEM Context Enhanced Module
to summarize the UAV applications based on DNNs. Thus, in CNN Convolutional Neural Network
the evaluated context, this review concludes that: DCGAN Deep Convolutional Generative Adversarial network
DDCN Deep Dual-domain Convolutional neural Network
1. In the context of UAV remote sensing, most of the pub- DL Deep Learning
lished materials are based on object detection methods DNN Deep Neural Network
and RGB sensors; however, some applications, as in DEM Digital Elevation Model
precision agriculture and forest-related, benefit from DSM Digital Surface Model
multi/hyperspectral data; FPS Frames per Second
GAN Generative Adversarial Network
2. There is a need for additional labeled public available GPU Graphics Processing Unit
datasets obtained with UAVs to be used to train and KL Kullback-Leibler
benchmark the networks. In this context, we con- LSTM Long Short-Term Memory
tributed by providing a repository with some of our IoU Intersection over Union
UAV datasets in both agricultural and environmental ML Machine Learning
applications; MAE Mean Absolute Error
MAPE Mean Absolute Percentage Error
3. Even though CNNs are the most adopted architecture, MRE Mean Relative Error
other methods based on CNN-LSTMs and GANs are MSE Mean Squared Error
gaining attention in UAV remote sensing and image MSLE Mean Squared Logarithmic Error
applications, and future UAV remote sensing works MSM Multi-Stage Module
may benefit from their inclusion; MVS Multiview Stereo
NAS Network Architecture Search
4. DL, when assisted by GPU processing, can provide PCA Principal Component Analysis
fast inference solutions. However there is still a need PPM Pyramid Pooling Module
for further investigation regarding real-time processing r Correlation Coefficient
using embedded systems on UAVs, and, lastly; RMSE Root Mean Squared Error
RNN Recurrent Neural Network
5. Some promising thematics, such as open-set, attention-
ROC Receiver Operating Characteristics
based mechanisms, few shot and multitask learning can
RPA Remotely Piloted Aircraft
be combined and provide novel approaches in the con-
SAM Spatial Attention Module
text of UAV remote sensing; also, these thematics can
SGD Stochastic Gradient Descent
contribute significantly to the generalization capacity
SfM Structure from Motion
of the DNNs.
UAV Unmanned Aerial Vehicle
WOS Web of Science

Acknowledgements
References
This study was financed in part by the Coordenação de Aper-
feiçoamento de Pessoal de Nível Superior (CAPES) - Finance [1] Adão, T., Hruška, J., Pádua, L., Bessa, J., Peres, E.,
Code 001. The authors are funded by the Support Foundation Morais, R. & Sousa, J. Hyperspectral Imaging: A Re-
for the Development of Education, Science, and Technology of view on UAV-Based Sensors, Data Processing and Appli-
the State of Mato Grosso do Sul (FUNDECT; 71/009.436/2022) cations for Agriculture and Forestry. Remote Sensing. 9
and the Brazilian National Council for Scientific and Techno- (2020), https://www.mdpi.com/2072-4292/9/11/1110
logical Development (CNPq; 433783/2018-4, 310517/2020-6; [2] Adayel, R., Bazi, Y., Alhichri, H. & Alajlan, N.
405997/2021-3; 308481/2022-4; 305296/2022-1). Deep Open-Set Domain Adaptation for Cross-Scene
Classification based on Adversarial Learning and
Pareto Ranking. Remote Sensing. 12, 1716 (2020,5),
Conflicts of Interest http://dx.doi.org/10.3390/rs12111716
[3] Al-Najjar, H., Kalantar, B., Pradhan, B., Saeidi, V.,
The authors declare no conflict of interest. The funders had no Halin, A., Ueda, N. & Mansor, S. Land Cover Classi-
role in the design of the study; in the collection, analyses,or fication from fused DSM and UAV Images Using Con-
interpretation of data; in the writing of the manuscript, or in the volutional Neural Networks. Remote Sensing. 11 (2019),
decision to publish the results. https://www.mdpi.com/2072-4292/11/12/1461
Preprint – A Review on Deep Learning in UAV Remote Sensing 20

[4] Ampatzidis, Y. & Partel, V. UAV-based high throughput [18] Benjdira, B., Bazi, Y., Koubaa, A. & Ouni, K. Unsu-
phenotyping in citrus utilizing multispectral imaging and pervised domain adaptation using generative adversarial
artificial intelligence. Remote Sensing. 11 (2019) networks for semantic segmentation of aerial images. Re-
[5] Aparna, Bhatia, Y., Rai, R., Gupta, V., Aggarwal, N. & mote Sensing. 11 (2019)
Akula, A. Convolutional neural networks based potholes [19] Benjdira, B., Bazi, Y., Koubaa, A. & Ouni, K. Unsuper-
detection using thermal imaging. Journal Of King Saud vised Domain Adaptation Using Generative Adversarial
University - Computer And Information Sciences. (2019) Networks for Semantic Segmentation of Aerial Images.
[6] Apolo-Apolo, O., Martínez-Guanter, J., Egea, G., Raja, Remote Sensing. 11 (2019), https://www.mdpi.com/2072-
P. & Pérez-Ruiz, M. Deep learning techniques for esti- 4292/11/11/1369
mation of the yield and size of citrus fruits using a UAV. [20] Bhowmick, S., Nagarajaiah, S. & Veeraraghavan, A. Vi-
European Journal Of Agronomy. 115, 126030 (2020) sion and deep learning-based algorithms to detect and
[7] Zhang, S., Chi, C., Yao, Y., Lei, Z. & Li, S. Bridging the quantify cracks on concrete surfaces from UAV videos.
Gap Between Anchor-based and Anchor-free Detection Sensors (Switzerland). 20, 1-19 (2020)
via Adaptive Training Sample Selection. ArXiv Preprint [21] Bhuiyan, M., Witharana, C. & Liljedahl, A. Use of
ArXiv:1912.02424. (2019) Very High Spatial Resolution Commercial Satellite Im-
[8] Audebert, N., Le Saux, B. & Lefevre, S. Deep learning agery and Deep Learning to Automatically Map Ice-
for classification of hyperspectral data: A comparative Wedge Polygons across Tundra Vegetation Types. Jour-
review. IEEE Geoscience And Remote Sensing Magazine. nal Of Imaging. 6 (2020), https://www.mdpi.com/2313-
7, 159-173 (2019) 433X/6/12/137
[9] B, P., Wen, L., Du, D., Bian, X., Ling, H., Hu, Q., Nie, [22] Biffi, L., Mitishita, E., Liesenberg, V., Dos Santos, A.,
Q., Cheng, H., Liu, C., Liu, X., Ma, W., Wu, H., Wang, Gonçalves, D., Estrabis, N., Silva, J., Osco, L., Ramos,
L., Schumann, A., Brown, C. & Lagani, R. VisDrone- A., Centeno, J., Schimalski, M., Rufato, L., Neto, S.,
DET2018 : The Vision Meets Drone Object Detection in Junior, J. & Gonçalves, W. Article atss deep learning-
Image Challenge Results. (Springer, Cham,2019) based approach to detect apple fruits. Remote Sensing. 13,
1-23 (2021)
[10] Bachman, P., Hjelm, R. & Buchwalter, W. Learning Rep-
resentations by Maximizing Mutual Information Across [23] Bithas, P., Michailidis, E., Nomikos, N., Vouyioukas, D.
Views. Advances In Neural Information Processing Sys- & Kanatas, A. A survey on machine-learning techniques
tems. 32 pp. 15535-15545 (2019) for UAV-based communications. Sensors (Switzerland).
19, 1-39 (2019)
[11] Badrinarayanan, V., Kendall, A. & Cipolla, R. SegNet:
A Deep Convolutional Encoder-Decoder Architecture [24] Boonpook, W., Tan, Y. & Xu, B. Deep learning-based
for Image Segmentation. IEEE Transactions On Pattern multi-feature semantic segmentation in building extrac-
Analysis And Machine Intelligence. 39, 2481-2495 (2017) tion from images of UAV photogrammetry. International
Journal Of Remote Sensing. 42, 1-19 (2021)
[12] Ball, J., Anderson, D. & Chan, C. A comprehensive sur-
vey of deep learning in remote sensing: Theories, tools [25] Bui, D., Tsangaratos, P., Nguyen, V., Liem, N. & Trinh, P.
and challenges for the community. ArXiv. 11 (2017) Comparing the prediction performance of a Deep Learn-
ing Neural Network model with conventional machine
[13] Balzer, W., Takahashi, M., Ohta, J. & Kyuma, K. Weight learning models in landslide susceptibility assessment.
quantization in Boltzmann machines. Neural Networks. CATENA. 188 pp. 104426 (2020)
4, 405-409 (1991)
[26] Buscombe, D. & Ritchie, A. Landscape Classification
[14] Barbedo, J., Koenigkan, L., Santos, T. & Santos, P. A with Deep Neural Networks. Geosciences. 8 (2018),
study on the detection of cattle in UAV images using https://www.mdpi.com/2076-3263/8/7/244
deep learning. Sensors (Switzerland). 19, 1-14 (2019)
[27] Carbonneau, P., Dugdale, S., Breckon, T., Dietrich, J.,
[15] Barbedo, J., Koenigkan, L., Santos, P. & Ribeiro, A.
Fonstad, M., Miyamoto, H. & Woodget, A. Adopting
Counting Cattle in UAV Images—Dealing with Clustered
deep learning methods for airborne RGB fluvial scene
Animals and Animal/Background Contrast Changes.
classification. REMOTE SENSING OF ENVIRONMENT.
Sensors. 20 (2020), https://www.mdpi.com/1424-
251 (2020,12,15)
8220/20/7/2126
[28] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov,
[16] Bell, T., Nidzieko, N., Siegel, D., Miller, R., Cavanaugh,
A. & Zagoruyko, S. End-to-End Object Detection with
K., Nelson, N. & . . . Griffith, M. The Utility of Satel-
Transformers. Computer Vision – ECCV 2020. pp. 213-
lites and Autonomous Remote Sensing Platforms for
229 (2020)
Monitoring Offshore Aquaculture Farms: A Case Study
for Canopy Forming Kelps. Frontiers In Marine Science. [29] Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P.
(2020) & Joulin, A. Unsupervised Learning of Visual Features
[17] Bendale, A. & Boult, T. Towards Open Set Deep Net- by Contrasting Cluster Assignments. (2021)
works. Proceedings Of The IEEE Conference On Com- [30] Caron, M., Bojanowski, P., Joulin, A. & Douze, M. Deep
puter Vision And Pattern Recognition (CVPR). pp. 14 Clustering for Unsupervised Learning of Visual Features.
(2016,6) Computer Vision – ECCV 2018. pp. 139-156 (2018)
Preprint – A Review on Deep Learning in UAV Remote Sensing 21

[31] Cai, Z. & Vasconcelos, N. Cascade R-CNN: high quality [45] Dian Bah, M., Hafiane, A. & Canals, R. Deep learning
object detection and instance segmentation. IEEE Trans- with unsupervised data labeling for weed detection in line
actions On Pattern Analysis And Machine Intelligence. crops in UAV images. Remote Sensing. 10, 1-22 (2018)
(2019)
[46] Ding, L., Tang, H. & Bruzzone, L. LANet: Local Atten-
[32] Cai, Z. & Vasconcelos, N. Cascade R-CNN: Delving Into tion Embedding to Improve the Semantic Segmentation
High Quality Object Detection. 2018 IEEE/CVF Confer- of Remote Sensing Images. IEEE Transactions On Geo-
ence On Computer Vision And Pattern Recognition. pp. science And Remote Sensing. 59, 426-435 (2021)
6154-6162 (2018)
[47] Yin, M., Yao, Z., Cao, Y., Li, X., Zhang, Z., Lin, S. &
[33] Castro, W., Junior, J., Polidoro, C., Osco, L., Gonçalves, Hu, H. Disentangled Non-Local Neural Networks. ECCV.
W., Rodrigues, L., Santos, M., Jank, L., Barrios, S., Valle, (2020)
C., Simeão, R., Carromeu, C., Silveira, E., Jorge, L. &
Matsubara, E. Deep learning applied to phenotyping of [48] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
biomass in forages with uav-based rgb imagery. Sensors D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M.,
(Switzerland). 20, 1-18 (2020) Heigold, G., Gelly, S., Uszkoreit, J. & Houlsby, N. An
Image is Worth 16x16 Words: Transformers for Image
[34] Wu, T., Tang, S., Zhang, R. & Zhang, Y. CGNet: A Recognition at Scale. (2020)
Light-weight Context Guided Network for Semantic Seg-
mentation. ArXiv Preprint ArXiv:1811.08201. (2018) [49] Santos, A., Marcato Junior, J., Araújo, M., Di Martini, D.,
Tetila, E., Siqueira, H., Aoki, C., Eltner, A., Matsubara,
[35] Wu, T., Tang, S., Zhang, R., Cao, J. & Zhang, Y. Cgnet: E., Pistori, H., Feitosa, R., Liesenberg, V. & Gonçalves,
A light-weight context guided network for semantic seg- W. Assessment of CNN-based methods for individual tree
mentation. IEEE Transactions On Image Processing. 30 detection on images captured by RGB cameras attached
pp. 1169-1179 (2020) to UAVS. Sensors (Switzerland). 19, 1-11 (2019)
[36] Chen, L., Papandreou, G., Kokkinos, I., Murphy, K. & [50] Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang,
Yuille, A. Semantic Image Segmentation with Deep Con- W., Huang, Q. & Tian, Q. The unmanned aerial vehicle
volutional Nets and Fully Connected CRFs. (2016) benchmark: Object detection and tracking. Lecture Notes
[37] Chen, L., Papandreou, G., Kokkinos, I., Murphy, K. & In Computer Science (including Subseries Lecture Notes
Yuille, A. DeepLab: Semantic Image Segmentation with In Artificial Intelligence And Lecture Notes In Bioinfor-
Deep Convolutional Nets, Atrous Convolution, and Fully matics). 11214 LNCS pp. 375-391 (2018)
Connected CRFs. IEEE Transactions On Pattern Analysis [51] Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q. & Tian, Q.
And Machine Intelligence. 40, 834-848 (2018) CenterNet: Keypoint triplets for object detection. Pro-
[38] Chen, J., Wu, Q., Liu, D. & Xu, T. Foreground- ceedings Of The IEEE International Conference On Com-
Background Imbalance Problem in Deep Object Detec- puter Vision. 2019-October pp. 6568-6577 (2019)
tors: A Review. 2020 IEEE Conference On Multimedia [52] Zhang, H., Chang, H., Ma, B., Wang, N. & Chen, X. Dy-
Information Processing And Retrieval (MIPR). pp. 285- namic R-CNN: Towards High Quality Object Detection
290 (2020) via Dynamic Training. ArXiv Preprint ArXiv:2004.06002.
[39] Cheng, G. & Han, J. A survey on object detection in (2020)
optical remote sensing images. ISPRS Journal Of Pho- [53] Elshamli, A., Taylor, G., Berg, A. & Areibi, S. Domain
togrammetry And Remote Sensing. 117 pp. 11-28 (2016), Adaptation Using Representation Learning for the Clas-
http://dx.doi.org/10.1016/j.isprsjprs.2016.03.014 sification of Remote Sensing Images. IEEE Journal Of
[40] Cheng, G., Han, J. & Lu, X. Remote sensing image scene Selected Topics In Applied Earth Observations And Re-
classification: Benchmark and state of the art. ArXiv. mote Sensing. 10, 4198-4209 (2017)
(2017) [54] Elsken, T., Metzen, J., Hutter, F. & Others Neural archi-
[41] Chollet, F. Xception: Deep learning with depthwise sepa- tecture search: A survey.. J. Mach. Learn. Res.. 20, 1-21
rable convolutions. Proceedings Of The IEEE Conference (2019)
On Computer Vision And Pattern Recognition. pp. 1251- [55] Fang, B., Kou, R., Pan, L. & Chen, P. Category-
1258 (2017) Sensitive Domain Adaptation for Land Cover Map-
[42] Crawshaw, M. Multi-Task Learning with Deep Neural ping in Aerial Scenes. Remote Sensing. 11 (2019),
Networks: A Survey. (2020) https://www.mdpi.com/2072-4292/11/22/2631
[43] Oliveira, D. & Wehrmeister, M. Using deep learning [56] Feng, Q., Yang, J., Liu, Y., Ou, C., Zhu, D., Niu, B., Liu, J.
and low-cost rgb and thermal cameras to detect pedestri- & Li, B. Multi-temporal unmanned aerial vehicle remote
ans in aerial images captured by multirotor uav. Sensors sensing for vegetable mapping using an attention-based
(Switzerland). 18 (2018) recurrent convolutional neural network. Remote Sensing.
12 (2020)
[44] Qiao, S., Chen, L. & Yuille, A. DetectoRS: Detecting
Objects with Recursive Feature Pyramid and Switchable [57] Ferreira, M., Almeida, D., Papa, D., Minervino,
Atrous Convolution. ArXiv Preprint ArXiv:2006.02334. J., Veras, H., Formighieri, A., Santos, C., Fer-
(2020) reira, M., Figueiredo, E. & Ferreira, E. Individual
Preprint – A Review on Deep Learning in UAV Remote Sensing 22

tree detection and species classification of Amazo- species identification and photogrammetry. Methods In
nian palms using UAV images and deep learning. For- Ecology And Evolution. 10, 1490-1500 (2019)
est Ecology And Management. 475, 118397 (2020), [71] Lu, X., Li, B., Yue, Y., Li, Q. & Yan, J. Grid R-CNN
https://doi.org/10.1016/j.foreco.2020.118397 Plus: Faster and Better. CoRR. abs/1906.05688 (2019),
[58] Fiesler, E., Choudry, A. & Caulfield, H. Weight discretiza- http://arxiv.org/abs/1906.05688
tion paradigm for optical neural networks. Optical Inter- [72] Guo, Y. A survey on methods and theories of quan-
connections And Networks. 1281 pp. 164-173 (1990) tized neural networks. ArXiv Preprint ArXiv:1808.04752.
[59] Foody, G. Explaining the unsuitability of the kappa co- (2018)
efficient in the assessment and comparison of the accu- [73] Hamdi, Z., Brandmeier, M. & Straub, C. Forest damage
racy of thematic maps obtained by image classification. assessment using deep learning on high resolution remote
Remote Sensing Of Environment. 239, 111630 (2020), sensing data. Remote Sensing. 11, 1-14 (2019)
https://doi.org/10.1016/j.rse.2019.111630
[74] Hamylton, S., Morris, R., Carvalho, R., Roder, N., Bar-
[60] Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B. low, P., Mills, K. & Wang, L. Evaluating techniques for
& Belongie, S. Feature Pyramid Networks for Object mapping island vegetation from unmanned aerial vehicle
Detection. 2017 IEEE Conference On Computer Vision (UAV) images: Pixel classification, visual interpretation
And Pattern Recognition (CVPR). pp. 936-944 (2017) and machine learning approaches. International Journal
[61] Wang, J., Chen, K., Yang, S., Loy, C. & Lin, D. Region Of Applied Earth Observation And Geoinformation. 89,
Proposal by Guided Anchoring. IEEE Conference On 102085 (2020), https://doi.org/10.1016/j.jag.2020.102085
Computer Vision And Pattern Recognition. pp. 12 (2019) [75] He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learn-
[62] Gebrehiwot, A., Hashemi-Beni, L., Thompson, G., ing for image recognition. Proceedings Of The IEEE
Kordjamshidi, P. & Langan, T. Deep Convolutional Computer Society Conference On Computer Vision And
Neural Network for Flood Extent Mapping Using Un- Pattern Recognition. 2016-December pp. 770-778 (2016)
manned Aerial Vehicles Data. Sensors. 19 (2019), [76] He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momen-
https://www.mdpi.com/1424-8220/19/7/1486 tum Contrast for Unsupervised Visual Representation
[63] Gevaert, C., Persello, C., Nex, F. & Vosselman, G. A Learning. 2020 IEEE/CVF Conference On Computer Vi-
deep learning approach to DTM extraction from imagery sion And Pattern Recognition (CVPR). pp. 9726-9735
using rule-based training labels. ISPRS Journal Of Pho- (2020)
togrammetry And Remote Sensing. 142 pp. 106 - 123 [77] Hennessy, A., Clarke, K. & Lewis, M. Hyperspectral
(2018) Classification of Plants: A Review of Waveband Selection
[64] Gevaert, C., Persello, C., Sliuzas, R. & Vossel- Generalisability. Remote Sensing. 12, 113 (2020)
man, G. Monitoring household upgrading in un- [78] Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever,
planned settlements with unmanned aerial vehi- I. & Salakhutdinov, R. Improving neural networks by
cles. International Journal Of Applied Earth Ob- preventing co-adaptation of feature detectors. CoRR.
servation And Geoinformation. 90, 102117 (2020), abs/1207.0580 (2012), http://arxiv.org/abs/1207.0580
https://doi.org/10.1016/j.jag.2020.102117
[79] Hinton, G., Vinyals, O. & Dean, J. Distilling
[65] Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, the knowledge in a neural network. ArXiv Preprint
J. & Yang, J. Generalized Focal Loss: Learning Quali- ArXiv:1503.02531. (2015)
fied and Distributed Bounding Boxes for Dense Object
[80] Hjelm, D., Fedorov, A., Lavoie-Marchildon, S., Grewal,
Detection. ArXiv Preprint ArXiv:2006.04388. (2020)
K., Bachman, P., Trischler, A. & Bengio, Y. Learning
[66] Giang, T., Dang, K., Toan Le, Q., Nguyen, V., Tong, deep representations by mutual information estimation
S. & Pham, V. U-Net Convolutional Networks for Min- and maximization. ICLR 2019. pp. 24 (2019,4)
ing Land Cover Classification Based on High-Resolution
[81] Hochreiter, S. & Schmidhuber, J. Long Short-Term Mem-
UAV Imagery. IEEE Access. 8 pp. 186257-186273 (2020)
ory. Neural Computation. 9 (1997)
[67] Gomes, M., Silva, J., Gonçalves, D., Zamboni, P., Perez,
[82] Horning, N., Fleishman, E., Ersts, P., Fogarty, F. &
J., Batista, E., Ramos, A., Osco, L., Matsubara, E., Li, J.,
Wohlfeil Zillig, M. Mapping of land cover with open-
Junior, J. & Gonçalves, W. Mapping utility poles in aerial
source software and ultra-high-resolution imagery ac-
orthoimages using atss deep learning method. Sensors
quired with unmanned aerial vehicles. Remote Sensing In
(Switzerland). 20, 1-14 (2020)
Ecology And Conservation. 6, 487-497 (2020)
[68] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., [83] Hossain, M. & Chen, D. Segmentation for Object-Based
Warde-Farley, D., Ozair, S., Courville, A. & Bengio, Y. Image Analysis (OBIA): A review of algorithms and chal-
Generative Adversarial Networks. (2014) lenges from remote sensing perspective. ISPRS Journal
[69] Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning. Of Photogrammetry And Remote Sensing. 150, 115-134
(MIT Press,2016) (2019), https://doi.org/10.1016/j.isprsjprs.2019.02.009
[70] Gray, P., Bierlich, K., Mantell, S., Friedlaender, A., Gold- [84] Ho Tong Minh, D., Ienco, D., Gaetano, R., Lalande, N.,
bogen, J. & Johnston, D. Drones and convolutional neu- Ndikumana, E., Osman, F. & Maurel, P. Deep Recurrent
ral networks facilitate automated and accurate cetacean Neural Networks for Winter Vegetation Quality Mapping
Preprint – A Review on Deep Learning in UAV Remote Sensing 23

via Multitemporal SAR Sentinel-1. IEEE Geoscience And Creation and Flood Risk Mapping. Geosciences. 9 (2019),
Remote Sensing Letters. 15, 464-468 (2018) https://www.mdpi.com/2076-3263/9/7/323
[85] Hou, J., He, Y., Yang, H., Connor, T., Gao, J., Wang, [98] Jia, S., Jiang, S., Lin, Z., Li, N., Xu, M. & Yu, S. A survey:
Y., Zeng, Y., Zhang, J., Huang, J., Zheng, B. & Zhou, S. Deep learning for hyperspectral image classification with
Identification of animal individuals using deep learning: few labeled samples. Neurocomputing. 448 pp. 179-204
A case study of giant panda. Biological Conservation. (2021)
242 pp. 108414 (2020) [99] Kang, J., Fernandez-Beltran, R., Duan, P., Liu, S. &
[86] Howard, A., Zhu, M., Chen, B., Kalenichenko, D., Wang, Plaza, A. Deep Unsupervised Embedding for Remotely
W., Weyand, T., Andreetto, M. & Adam, H. Mobilenets: Sensed Images Based on Spatially Augmented Momen-
Efficient convolutional neural networks for mobile vision tum Contrast. IEEE Transactions On Geoscience And
applications. ArXiv Preprint ArXiv:1704.04861. (2017) Remote Sensing. pp. 1-13 (2020)
[87] Howard, A., Sandler, M., Chu, G., Chen, L., Chen, B., [100] Kang, H. & Chen, C. Fast implementation of real-time
Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V. & fruit detection in apple orchards using deep learning.
Others Searching for mobilenetv3. Proceedings Of The Computers And Electronics In Agriculture. 168, 105108
IEEE International Conference On Computer Vision. pp. (2020), https://doi.org/10.1016/j.compag.2019.105108
1314-1324 (2019) [101] Kannojia, S. & Jaiswal, G. Effects of Varying Resolution
[88] Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, on Performance of CNN based Image Classification An
Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W. & Xiao, Experimental Study. International Journal Of Computer
B. Deep High-Resolution Representation Learning for Vi- Sciences And Engineering. 6, 451-456 (2018)
sual Recognition. IEEE Transactions On Pattern Analysis [102] Karami, A., Crawford, M. & Delp, E. Automatic Plant
And Machine Intelligence. pp. 1-1 (2020) Counting and Location Based on a Few-Shot Learning
[89] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Technique. IEEE Journal Of Selected Topics In Applied
Feng, W., Liu, Z., Shi, J., Ouyang, W., Loy, C. & Lin, Earth Observations And Remote Sensing. 13 pp. 5872-
D. Hybrid task cascade for instance segmentation. IEEE 5886 (2020)
Conference On Computer Vision And Pattern Recognition. [103] Kellenberger, B., Marcos, D. & Tuia, D. Detecting mam-
pp. 10 (2019) mals in UAV images: Best practices to address a sub-
stantially imbalanced dataset with deep learning. Remote
[90] Hu, G., Yin, C., Wan, M., Zhang, Y. & Fang,
Sensing Of Environment. 216 pp. 139 - 153 (2018)
Y. Recognition of diseased Pinus trees in UAV im-
ages using deep learning and AdaBoost classifier. [104] Kerkech, M., Hafiane, A. & Canals, R. Vine disease detec-
Biosystems Engineering. 194 pp. 138-151 (2020), tion in UAV multispectral images using optimized image
https://doi.org/10.1016/j.biosystemseng.2020.03.021 registration and deep learning segmentation approach.
Computers And Electronics In Agriculture. 174 (2020)
[91] Hua, Y., Marcos, D., Mou, L., Zhu, X. & Tuia, D. Seman-
tic Segmentation of Remote Sensing Images with Sparse [105] Khan, A., Sohail, A., Zahoora, U. & Qureshi, A. A survey
Annotations. IEEE Geoscience And Remote Sensing Let- of the recent architectures of deep convolutional neural
ters. (2021) networks. Artificial Intelligence Review. 53 pp. 5455-
5516 (2020), https://doi.org/10.1007/s10462-020-09825-
[92] Ichim, L. & Popescu, D. Segmentation of Vegetation 6
and Flood from Aerial Images Based on Decision Fu-
sion of Neural Networks. Remote Sensing. 12 (2020), [106] Khelifi, L. & Mignotte, M. Deep Learning for Change
https://www.mdpi.com/2072-4292/12/15/2490 Detection in Remote Sensing Images: Comprehensive
Review and Meta-Analysis. IEEE Access. 8 pp. 126385-
[93] Ienco, D., Gaetano, R., Dupaquier, C. & Maurel, P. Land 126400 (2020)
Cover Classification via Multitemporal Spatial Data by
Deep Recurrent Neural Networks. IEEE Geoscience And [107] Kitano, B., Mendes, C., Geus, A., Oliveira, H. & Souza,
Remote Sensing Letters. 14, 1685-1689 (2017) J. Corn Plant Counting Using Deep Learning and UAV
Images. IEEE Geoscience And Remote Sensing Letters.
[94] ImageNet ImageNet Object Localization Challenge. pp. 1-5 (2019)
(2018), https://www.kaggle.com/c/imagenet-object-
localization-challenge [108] Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet
Classification with Deep Convolutional Neural Networks.
[95] Imran, H., Mujahid, U., Wazir, S., Latif, U. & Mehmood, Proceedings Of The 25th International Conference On
K. Embedded Development Boards for Edge-AI: A Com- Neural Information Processing Systems - Volume 1. pp.
prehensive Report. ArXiv Preprint ArXiv:2009.00803. 1097-1105 (2012)
(2020)
[109] Kussul, N., Lavreniuk, M., Skakun, S. & Shelestov, A.
[96] Isola, P., Zhu, J., Zhou, T. & Efros, A. Image-to- Deep Learning Classification of Land Cover and Crop
Image Translation with Conditional Adversarial Net- Types Using Remote Sensing Data. IEEE Geoscience
works. (2018) And Remote Sensing Letters. 14, 778-782 (2017)
[97] Jakovljevic, G., Govedarica, M., Alvarez-Taboada, F. & [110] Alexandra Larsen, A., Hanigan, I., Reich, B., Qin, Y.,
Pajic, V. Accuracy Assessment of Deep Learning Based Cope, M., Morgan, G. & Rappold, A. A deep learning
Classification of LiDAR and UAV Points Clouds for DTM approach to identify smoke plumes in satellite imagery in
Preprint – A Review on Deep Learning in UAV Remote Sensing 24

near-real time for health risk communication. Journal Of [124] Lin, J., Chen, W., Lin, Y., Cohn, J., Gan, C. & Han, S.
Exposure Science & Environmental Epidemiology. 31 pp. Mcunet: Tiny deep learning on iot devices. ArXiv Preprint
170-176 (2020) ArXiv:2007.10319. (2020)
[111] Lathuilière, S., Mesejo, P., Alameda-Pineda, X. & Ho- [125] Li, Y., Huang, Q., Pei, X., Jiao, L. & Shang, R. RADet:
raud, R. A Comprehensive Analysis of Deep Regression. Refine Feature Pyramid Network and Multi-Layer At-
IEEE Transactions On Pattern Analysis And Machine tention Network for Arbitrary-Oriented Object Detection
Intelligence. 42, 2065-2081 (2020) of Remote Sensing Images. Remote Sensing. 12 (2020),
[112] Law, H. & Deng, J. CornerNet: Detecting Objects as https://www.mdpi.com/2072-4292/12/3/389
Paired Keypoints. International Journal Of Computer [126] Liu, L., Ouyang, W., Wang, X., Fieguth, W., Chen, J., Liu,
Vision. 128, 642-656 (2020) X. & Pietikäinen, M. Deep Learning for Generic Object
[113] Lecun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature. Detection: A Survey. International Journal Of Computer
521, 436-444 (2015) Vision. pp. 261-318 (2019)
[114] Li, Y., Zhang, H., Xue, X., Jiang, Y. & Shen, Q. Deep [127] Liu, W. & Qin, R. A MultiKernel Domain Adaptation
learning for remote sensing image classification: A sur- Method for Unsupervised Transfer Learning on Cross-
vey. Wiley Interdisciplinary Reviews: Data Mining And Source and Cross-Region Remote Sensing Data Classifi-
Knowledge Discovery. 8, 1-17 (2018) cation. IEEE Transactions On Geoscience And Remote
Sensing. 58, 4279-4289 (2020)
[115] Li, S., Song, W., Fang, L., Chen, Y., Ghamisi, P. &
Benediktsson, J. Deep learning for hyperspectral image [128] Ma, L., Liu, Y., Zhang, X., Ye, Y., Yin, G. & Johnson, B.
classification: An overview. IEEE Transactions On Geo- Deep learning in remote sensing applications: A meta-
science And Remote Sensing. 57, 6690-6709 (2019) analysis and review. ISPRS Journal Of Photogrammetry
And Remote Sensing. 152 pp. 166 - 177 (2019)
[116] Li, Y., Peng, B., He, L., Fan, K., Li, Z. & Tong, L. Road
extraction from unmanned aerial vehicle remote sens- [129] Ma, J., Jiang, X., Fan, A., Jiang, J. & Yan, J. Image
ing images based on improved neural networks. Sensors Matching from Handcrafted to Deep Features: A Survey.
(Switzerland). 19 (2019) International Journal Of Computer Vision. 129, 23-79
[117] LI, Y., DU, X., WAN, F., WANG, X. & YU, H. Rotating (2021,1), https://doi.org/10.1007/s11263-020-01359-2
machinery fault diagnosis based on convolutional neural [130] Mambou, S., Maresova, P., Krejcar, O., Selamat, A. &
network and infrared thermal imaging. Chinese Journal Kuca, K. Breast Cancer Detection Using Infrared Ther-
Of Aeronautics. 33, 427-438 (2020) mal Imaging and a Deep Learning Model. Sensors. 18
[118] Li, Y., Cao, Z., Lu, H. & Xu, W. Unsupervised do- (2018), https://www.mdpi.com/1424-8220/18/9/2799
main adaptation for in-field cotton boll status identifi- [131] He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask R-
cation. Computers And Electronics In Agriculture. 178 CNN. 2017 IEEE International Conference On Computer
pp. 105745 (2020) Vision (ICCV). pp. 2980-2988 (2017)
[119] Li, L., Han, J., Yao, X., Cheng, G. & Guo, L. DLA- [132] Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtar-
MatchNet for Few-Shot Remote Sensing Image Scene navaz, N. & Terzopoulos, D. Image Segmentation Using
Classification. IEEE Transactions On Geoscience And Deep Learning: A Survey. (2020)
Remote Sensing. pp. 1-10 (2020)
[133] Mittal, S. A Survey on optimized implementation of deep
[120] Licciardi, G., Marpu, P., Chanussot, J. & Benediktsson, learning models on the NVIDIA Jetson platform. Journal
J. Linear Versus Nonlinear PCA for the Classification Of Systems Architecture. 97 pp. 428-442 (2019)
of Hyperspectral Data Based on the Extended Morpho-
logical Profiles. IEEE Geoscience And Remote Sensing [134] Miyoshi, G., Arruda, M., Osco, L., Marcato Ju-
Letters. 9, 447-451 (2012) nior, J., Gonçalves, D., Imai, N., Tommaselli, A.,
Honkavaara, E. & Gonçalves, W. A Novel Deep Learn-
[121] Li, C., Xu, C., Cui, Z., Wang, D., Zhang, T. & Yang, J. ing Method to Identify Single Tree Species in UAV-
Feature-Attentioned Object Detection in Remote Sensing Based Hyperspectral Images. Remote Sensing. 12 (2020),
Imagery. 2019 IEEE International Conference On Image https://www.mdpi.com/2072-4292/12/8/1294
Processing (ICIP). pp. 3886-3890 (2019)
[135] Naitzat, G., Zhitnikov, A. & Lim, L. Topology of deep
[122] Lin, T., Maire, M., Belongie, S., Bourdev, L., Gir- neural networks. Journal Of Machine Learning Research.
shick, R., Hays, J., Perona, P., Ramanan, D., Zit- 21 pp. 1-40 (2020)
nick, C. & Dollár, P. Microsoft COCO: Common Ob-
jects in Context. (2014), http://arxiv.org/abs/1405.0312, [136] Ghiasi, G., Lin, T. & Le, Q. Nas-fpn: Learning scalable
cite arxiv:1405.0312Comment: 1) updated annotation feature pyramid architecture for object detection. Pro-
pipeline description and figures; 2) added new section ceedings Of The IEEE Conference On Computer Vision
describing datasets splits; 3) updated author list And Pattern Recognition. pp. 7036-7045 (2019)
[123] Lin, D., Fu, K., Wang, Y., Xu, G. & Sun, X. MARTA [137] Nevavuori, P., Narra, N., Linna, P. & Lipping, T. Crop
GANs: Unsupervised Representation Learning for Re- yield prediction using multitemporal UAV data and spatio-
mote Sensing Image Classification. IEEE Geoscience And temporal deep learning models. Remote Sensing. 12, 1-18
Remote Sensing Letters. 14, 2092-2096 (2017) (2020)
Preprint – A Review on Deep Learning in UAV Remote Sensing 25

[138] Nezami, S., Khoramshahi, E., Nevalainen, O., Pölönen, I. in Cadastral Map Using Hyperspectral UAV Images: A
& Honkavaara, E. ree Species Classification of Drone Hy- Case Study in Jeonju, South Korea. Remote Sensing. 12
perspectral and RGB Imagery with Deep Learning Con- (2020), https://www.mdpi.com/2072-4292/12/3/354
volutional Neural Networks. Remote Sensing. 12 (2020) [151] Penatti, O., Nogueira, K. & Dos Santos, J. Do deep fea-
[139] Nogueira, K., Dalla Mura, M., Chanussot, J., Schwartz, tures generalize from everyday objects to remote sensing
W. & Dos Santos, J. Dynamic multicontext segmentation and aerial scenes domains?. IEEE Computer Society Con-
of remote sensing images based on convolutional net- ference On Computer Vision And Pattern Recognition
works. IEEE Transactions On Geoscience And Remote Workshops. 2015-October pp. 44-51 (2015)
Sensing. 57, 7503-7520 (2019)
[152] Petersson, H., Gustafsson, D. & Bergström, D. Hyper-
[140] Nogueira, K., Machado, G., Gama, P., Silva, C., Balaniuk, spectral image analysis using deep learning - A review.
R. & Santos, J. Facing erosion identification in railway 2016 6th International Conference On Image Processing
lines using pixel-wise deep-based approaches. Remote Theory, Tools And Applications, IPTA 2016. (2017)
Sensing. 12, 1-21 (2020)
[153] Cao, Y., Chen, K., Loy, C. & Lin, D. Prime sample atten-
[141] Nwankpa, C., Ijomah, W., Gachagan, A. & Marshall, tion in object detection. IEEE Conference On Computer
S. Activation functions: Comparison of trends in prac- Vision And Pattern Recognition. pp. 9 (2020)
tice and research for deep learning. ArXiv Preprint
ArXiv:1811.03378. (2018) [154] Kirillov, A., Wu, Y., He, K. & Girshick, R. PointRend:
Image Segmentation As Rendering. Proceedings Of The
[142] Osco, L., Arruda, M., Marcato Junior, J., Silva, N., IEEE/CVF Conference On Computer Vision And Pattern
Ramos, A., Moryia, É., Imai, N., Pereira, D., Creste, Recognition (CVPR). pp. 10 (2020,6)
J., Matsubara, E., Li, J. & Gonçalves, W. A convolutional
neural network approach for counting and geolocating [155] Qin, Z., Li, Z., Zhang, Z., Bao, Y., Yu, G., Peng, Y. &
citrus-trees in UAV multispectral imagery. ISPRS Journal Sun, J. ThunderNet: Towards real-time generic object
Of Photogrammetry And Remote Sensing. 160, 97-106 detection on mobile devices. Proceedings Of The IEEE
(2020), https://doi.org/10.1016/j.isprsjprs.2019.12.010 International Conference On Computer Vision. pp. 6718-
6727 (2019)
[143] Osco, L., Nogueira, K., Marques Ramos, A., Faita Pin-
heiro, M., Furuya, D., Gonçalves, W., Castro Jorge, L., [156] Rastegari, M., Ordonez, V., Redmon, J. & Farhadi, A.
Marcato Junior, J. & Santos, J. Semantic segmentation of Xnor-net: Imagenet classification using binary convolu-
citrus-orchard using deep neural networks and multispec- tional neural networks. European Conference On Com-
tral UAV-based imagery. Precision Agriculture. (2021) puter Vision. pp. 525-542 (2016)
[144] Osco, L., Arruda, M., Gonçalves, D., Dias, A., Batistoti, [157] Radosavovic, I., Kosaraju, R., Girshick, R., He, K.
J., Souza, M., Gomes, F., Ramos, A., Castro Jorge, L., & Dollar, P. Designing Network Design Spaces. 2020
Liesenberg, V., Li, J., Ma, L., Junior, J. & Gonçalves, W. IEEE/CVF Conference On Computer Vision And Pattern
A CNN Approach to Simultaneously Count Plants and Recognition (CVPR). pp. 10425-10433 (2020)
Detect Plantation-Rows from UAV Imagery. (2020) [158] Gao, S., Cheng, M., Zhao, K., Zhang, X., Yang, M. &
[145] Kim, K. & Lee, H. Probabilistic Anchor Assignment with Torr, P. Res2Net: A New Multi-Scale Backbone Archi-
IoU Prediction for Object Detection. European Confer- tecture. IEEE Transactions On Pattern Analysis And Ma-
ence On Computer Vision (ECCV). pp. 22 (2020) chine Intelligence. 43, 652-662 (2021)
[146] Liu, S., Qi, L., Qin, H., Shi, J. & Jia, J. Path Aggre- [159] Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang,
gation Network for Instance Segmentation. Proceedings Z., Sun, Y., He, T., Mueller, J., Manmatha, R., Li, M. &
Of IEEE Conference On Computer Vision And Pattern Smola, A. ResNeSt: Split-Attention Networks. (2020)
Recognition (CVPR). pp. 11 (2018) [160] He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual
[147] Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W. & Lin, Learning for Image Recognition. 2016 IEEE Conference
D. Libra R-CNN: Towards balanced learning for object On Computer Vision And Pattern Recognition (CVPR).
detection. Proceedings Of The IEEE Computer Society pp. 770-778 (2016)
Conference On Computer Vision And Pattern Recognition. [161] Xie, S., Girshick, R., Dollár, P., Tu, Z. & He, K. Ag-
2019-June pp. 821-830 (2019) gregated Residual Transformations for Deep Neural Net-
[148] Kirillov, A., He, K., Girshick, R., Rother, C. & Dollár, P. works. 2017 IEEE Conference On Computer Vision And
Panoptic Segmentation. 2019 IEEE/CVF Conference On Pattern Recognition (CVPR). pp. 5987-5995 (2017)
Computer Vision And Pattern Recognition (CVPR). pp. [162] Rivas, A., Chamoso, P., González-Briones, A. & Cor-
9396-9405 (2019) chado, J. Detection of cattle using drones and convolu-
[149] Paoletti, M., Haut, J., Plaza, J. & Plaza, A. tional neural networks. Sensors (Switzerland). 18, 1-15
Deep learning classifiers for hyperspectral imag- (2018)
ing: A review. ISPRS Journal Of Photogramme- [163] Ronneberger, O., Fischer, P. & Brox, T. U-net: Convo-
try And Remote Sensing. 158, 279-317 (2019), lutional networks for biomedical image segmentation.
https://doi.org/10.1016/j.isprsjprs.2019.09.006 Lecture Notes In Computer Science (including Subseries
[150] Park, S. & Song, A. Discrepancy Analysis for Detecting Lecture Notes In Artificial Intelligence And Lecture Notes
Candidate Parcels Requiring Update of Land Category In Bioinformatics). 9351 pp. 234-241 (2015)
Preprint – A Review on Deep Learning in UAV Remote Sensing 26

[164] Ruder, S. An overview of gradient descent optimization [178] Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C. & Liu, C.
algorithms. (2017) A survey on deep transfer learning. International Confer-
[165] Wang, J., Zhang, W., Cao, Y., Chen, K., Pang, J., Gong, ence On Artificial Neural Networks. pp. 270-279 (2018)
T., Shi, J., Loy, C. & Lin, D. Side-Aware Boundary Lo- [179] Tetila, E., Machado, B., Menezes, G., Da Silva Oliveira,
calization for More Precise Object Detection. European A., Alvarez, M., Amorim, W., De Souza Belete, N., Da
Conference On Computer Vision (ECCV). pp. 21 (2020) Silva, G. & Pistori, H. Automatic Recognition of Soy-
[166] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & bean Leaf Diseases Using UAV Images and Deep Convo-
Chen, L. MobileNetV2: Inverted Residuals and Linear lutional Neural Networks. IEEE Geoscience And Remote
Bottlenecks. Proceedings Of The IEEE Computer Society Sensing Letters. 17, 903-907 (2020)
Conference On Computer Vision And Pattern Recognition. [180] Thoma, M. A Survey of Semantic Segmentation. (2016)
pp. 4510-4520 (2018)
[181] Tian, Y., Krishnan, D. & Isola, P. Contrastive
[167] Schmidhuber, J. Deep learning in neural networks: An
Multiview Coding. CoRR. abs/1906.05849 (2019),
overview. Neural Networks. 61 pp. 85 - 117 (2015)
http://arxiv.org/abs/1906.05849
[168] Sultana, F., Sufian, A. & Dutta, P. Evolution of Image Seg-
mentation using Deep Convolutional Neural Network: A [182] Tian, Y., Yang, G., Wang, Z., Wang, H., Li, E. & Liang, Z.
Survey. Knowledge-Based Systems. 201-202 pp. 106062 Apple detection during different growth stages in orchards
(2020) using the improved YOLO-V3 model. Computers And
Electronics In Agriculture. 157, 417-426 (2019)
[169] Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtar-
navaz, N. & Terzopoulos, D. Image Segmentation Using [183] Torres, D., Feitosa, R., Happ, P., La Rosa, L., Junior, J.,
Deep Learning: A Survey. (2020) Martins, J., Bressan, P., Gonçalves, W. & Liesenberg, V.
Applying fully convolutional architectures for semantic
[170] Sharma, V. & Mir, R. A comprehensive and system- segmentation of a single tree species in urban environ-
atic look up into deep learning based object detection ment on high resolution UAV optical imagery. Sensors
techniques: A review. Computer Science Review. 38 pp. (Switzerland). 20, 1-20 (2020)
100301 (2020)
[184] Touvron, H., Cord, M., Douze, M., Massa, F., Sablay-
[171] Sheng, G., Yang, W., Xu, T. & Sun, H. High-resolution
rolles, A. & Jégou, H. Training data-efficient image trans-
satellite scene classification using a sparse coding based
formers & distillation through attention. (2020)
multiple feature combination. International Journal Of
Remote Sensing. 33, 2395-2412 (2012) [185] Li, Y., Chen, Y., Wang, N. & Zhang, Z. Scale-Aware
[172] Signoroni, A., Savardi, M., Baronio, A. & Benini, S. Trident Networks for Object Detection. 2019 IEEE/CVF
Deep Learning Meets Hyperspectral Image Analysis: A International Conference On Computer Vision (ICCV).
Multidisciplinary Review. Journal Of Imaging. 5 (2019), pp. 6053-6062 (2019)
https://www.mdpi.com/2313-433X/5/5/52 [186] Tsagkatakis, G., Aidini, A., Fotiadou, K., Giannopoulos,
[173] Da Silva, C., Nogueira, K., Oliveira, H. & Santos, J. M., Pentari, A. & Tsakalides, P. Survey of deep-learning
Towards Open-Set Semantic Segmentation Of Aerial Im- approaches for remote sensing observation enhancement.
ages. 2020 IEEE Latin American GRSS ISPRS Remote Sensors (Switzerland). 19, 1-39 (2019)
Sensing Conference (LAGIRS). pp. 16-21 (2020) [187] Tuia, D., Persello, C. & Bruzzone, L. Domain Adapta-
[174] Soderholm, J., Kumjian, M., McCarthy, N., Maldonado, tion for the Classification of Remote Sensing Data: An
P. & Wang, M. Quantifying hail size distributions from Overview of Recent Advances. IEEE Geoscience And
the sky – application of drone aerial photogrammetry. At- Remote Sensing Magazine. 4, 41-57 (2016)
mospheric Measurement Techniques. 13, 747-754 (2020),
[188] U, S., K., P. & K, S. Computer aided diagnosis of obesity
https://amt.copernicus.org/articles/13/747/2020/
based on thermal imaging using various convolutional
[175] Su, Y., Wu, Y., Wang, M., Wang, F. & Cheng, J. Semantic neural networks. Biomedical Signal Processing And Con-
Segmentation of High Resolution Remote Sensing Image trol. 63 pp. 102233 (2021)
Based on Batch-Attention Mechanism. IGARSS 2019 -
2019 IEEE International Geoscience And Remote Sensing [189] Vaddi, R. & Manoharan, P. CNN based hyperspectral
Symposium. pp. 3856-3859 (2019) image classification using unsupervised band selection
and structure-preserving spatial features. Infrared Physics
[176] Sundaram, D. & Loganathan, A. FSSCaps-DetCountNet: & Technology. 110 pp. 103457 (2020)
fuzzy soft sets and CapsNet-based detection and count-
ing network for monitoring animals from aerial images. [190] Dao, D., Jaafari, A., Bayat, M., Mafi-Gholami, D., Qi,
Journal Of Applied Remote Sensing. 14, 1 - 30 (2020), C., Moayedi, H., Phong, T., Ly, H., Le, T., Trinh, P., Luu,
https://doi.org/10.1117/1.JRS.14.026521 C., Quoc, N., Thanh, B. & Pham, B. A spatially explicit
deep learning neural network model for the prediction of
[177] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., landslide susceptibility. CATENA. 188 pp. 104451 (2020)
Anguelov, D., Erhan, D., Vanhoucke, V. & Rabinovich,
A. Going deeper with convolutions. Proceedings Of [191] Zhang, H., Wang, Y., Dayoub, F. & Sünderhauf, N. Var-
The IEEE Conference On Computer Vision And Pattern ifocalNet: An IoU-aware Dense Object Detector. ArXiv
Recognition. pp. 1-9 (2015) Preprint ArXiv:2008.13367. (2020)
Preprint – A Review on Deep Learning in UAV Remote Sensing 27

[192] Simonyan, K. & Zisserman, A. Very Deep Convolutional [205] Zhang, G., Wang, M. & Liu, K. Forest Fire Suscep-
Networks for Large-Scale Image Recognition. Interna- tibility Modeling Using a Convolutional Neural Net-
tional Conference On Learning Representations. pp. 14 work for Yunnan Province of China. International Jour-
(2015) nal Of Disaster Risk Science. 10, 386-403 (2019),
[193] Wang, S., Zhou, J., Lei, T., Wu, H., Zhang, X., Ma, J. & https://doi.org/10.1007/s13753-019-00233-1
Zhong, H. Estimating land surface temperature from satel- [206] Zhang, X., Han, L., Han, L. & Zhu, L. How Well Do
lite passive microwave observations with the traditional Deep Learning-Based Methods for Land Cover Classifi-
neural network, deep belief network, and convolutional cation and Object Detection Perform on High Resolution
neural network. Remote Sensing. 12 (2020) Remote Sensing Imagery?. Remote Sensing. 12 (2020),
[194] Wang, Y., Ding, W., Zhang, R. & Li, H. Boundary-Aware https://www.mdpi.com/2072-4292/12/3/417
Multitask Learning for Remote Sensing Imagery. IEEE [207] Zhang, X., Jin, J., Lan, Z., Li, C., Fan, M., Wang, Y., Yu,
Journal Of Selected Topics In Applied Earth Observations X. & Zhang, Y. ICENET: A semantic segmentation deep
And Remote Sensing. 14 pp. 951-963 (2021) network for river ice by fusing positional and channel-
wise attentive features. Remote Sensing. 12, 1-22 (2020)
[195] Wu, X., Sahoo, D. & Hoi, S. Recent advances in deep
learning for object detection. Neurocomputing. 396 pp. [208] Zhang, C., Atkinson, P., George, C., Wen, Z., Diazgrana-
39 - 64 (2020) dos, M. & Gerard, F. Identifying and mapping individual
plants in a highly diverse high-elevation ecosystem using
[196] Xavier Prochaska, J., Cornillon, P. & Reiman, D. Deep UAV imagery and deep learning. ISPRS Journal Of Pho-
learning of sea surface temperature patterns to identify togrammetry And Remote Sensing. 169, 280-291 (2020),
ocean extremes. Remote Sensing. 13, 1-18 (2021) https://doi.org/10.1016/j.isprsjprs.2020.09.025
[197] Xia, G., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, [209] Zhao, L., Tang, P. & Huo, L. Feature significance-
J., Datcu, M., Pelillo, M. & Zhang, L. DOTA: A Large- based multibag-of-visual-words model for re-
Scale Dataset for Object Detection in Aerial Images. Pro- mote sensing image scene classification. Journal
ceedings Of The IEEE Computer Society Conference On Of Applied Remote Sensing. 10, 1 - 21 (2016),
Computer Vision And Pattern Recognition. pp. 3974-3983 https://doi.org/10.1117/1.JRS.10.035004
(2018)
[210] Zhao, B., Zhong, Y., Xia, G. & Zhang, L. Dirichlet-
[198] Xu, R., Tao, Y., Lu, Z. & Zhong, Y. Attention-Mechanism- derived multiple topic scene classification model for high
Containing Neural Networks for High-Resolution Re- spatial resolution remote sensing imagery. IEEE Transac-
mote Sensing Image Classification. Remote Sensing. 10 tions On Geoscience And Remote Sensing. 54, 2108-2123
(2018), https://www.mdpi.com/2072-4292/10/10/1602 (2016)
[199] Yang, T., Howard, A., Chen, B., Zhang, X., Go, A., San- [211] Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid
dler, M., Sze, V. & Adam, H. Netadapt: Platform-aware Scene Parsing Network. (2017)
neural network adaptation for mobile applications. Pro- [212] Zhao, Z., Zheng, P., Xu, S. & Wu, X. Object Detection
ceedings Of The European Conference On Computer Vi- With Deep Learning: A Review. IEEE Transactions On
sion (ECCV). pp. 285-300 (2018) Neural Networks And Learning Systems. 30, 3212-3232
[200] Yao, C., Luo, X., Zhao, Y., Zeng, W. & Chen, X. A (2019,11)
review on image classification of remote sensing using [213] Zheng, Z., Lei, L., Sun, H. & Kuang, G. A Review of Re-
deep learning. 2017 3rd IEEE International Conference mote Sensing Image Object Detection Algorithms Based
On Computer And Communications, ICCC 2017. 2018- on Deep Learning. 2020 IEEE 5th International Confer-
Janua pp. 1947-1955 (2018) ence On Image, Vision And Computing, ICIVC 2020. pp.
[201] Yuan, Q., Shen, H., Li, T., Li, Z., Li, S., Jiang, Y., Xu, H., 34-43 (2020)
Tan, W., Yang, Q., Wang, J., Gao, J. & Zhang, L. Deep [214] Zhou, D., Wang, G., He, G., Long, T., Yin, R.,
learning in environmental remote sensing: Achievements Zhang, Z., Chen, S. & Luo, B. Robust Building Extrac-
and challenges. Remote Sensing Of Environment. 241, tion for High Spatial Resolution Remote Sensing Im-
111716 (2020), https://doi.org/10.1016/j.rse.2020.111716 ages with Self-Attention Network. Sensors. 20 (2020),
[202] Yuan, X., Shi, J. & Gu, L. A review of deep learning https://www.mdpi.com/1424-8220/20/24/7241
methods for semantic segmentation of remote sensing [215] Zhu, X., Tuia, D., Mou, L., Xia, G., Zhang, L., Xu, F.
imagery. Expert Systems With Applications. 169, 114417 & Fraundorfer, F. Deep Learning in Remote Sensing: A
(2021), https://doi.org/10.1016/j.eswa.2020.114417 Comprehensive Review and List of Resources. IEEE Geo-
[203] Zhang, L., Zhang, L. & Du, B. Deep learning for remote science And Remote Sensing Magazine. 5, 8-36 (2017)
sensing data: A technical tutorial on the state of the art. [216] Zhu, C., He, Y. & Savvides, M. Feature selective anchor-
IEEE Geoscience And Remote Sensing Magazine. 4, 22- free module for single-shot object detection. Proceedings
40 (2016) Of The IEEE Computer Society Conference On Computer
[204] Zhang, H., Liptrott, M., Bessis, N. & Cheng, J. Real-time Vision And Pattern Recognition. 2019-June pp. 840-849
traffic analysis using deep learning techniques and UAV (2019)
based video. 2019 16th IEEE International Conference [217] Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H.,
On Advanced Video And Signal Based Surveillance, AVSS Xiong, H. & He, Q. A comprehensive survey on transfer
2019. pp. 1-5 (2019) learning. Proceedings Of The IEEE. 109, 43-76 (2020)
Preprint – A Review on Deep Learning in UAV Remote Sensing 28

[218] Zhu, R., Yan, L., Mo, N. & Liu, Y. Attention-Based


Deep Feature Fusion for the Scene Classification of High-
Resolution Remote Sensing Images. Remote Sensing. 11
(2019), https://www.mdpi.com/2072-4292/11/17/1996
[219] Zou, Q., Ni, L., Zhang, T. & Wang, Q. Deep Learning
Based Feature Selection for Remote Sensing Scene Clas-
sification. IEEE Geoscience And Remote Sensing Letters.
12, 2321-2325 (2015)
[220] Zou, Q., Ni, L., Zhang, T. & Wang, Q. Remote Sensing
Scene Classification. IEEE Transactions On Geoscience
And Remote Sensing Letters. 12, 2321-2325 (2015)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy