0% found this document useful (0 votes)

20 views19 pages

Electronics 12 04928

Uploaded by

Nimra Salahuddin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views19 pages

Electronics 12 04928

Uploaded by

Nimra Salahuddin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

electronics

Article
Real-Time Object Detection and Tracking for Unmanned Aerial
Vehicles Based on Convolutional Neural Networks
Shao-Yu Yang 1 , Hsu-Yung Cheng 1, * and Chih-Chang Yu 2

1 Department of Computer Science and Information Engineering, National Central University,

Taoyuan 32001, Taiwan
2 Department of Information and Computer Engineering, Chun-Yuan Christian University,
Taoyuan 320, Taiwan
* Correspondence: chengsy@csie.ncu.edu.tw

Abstract: This paper presents a system applied to unmanned aerial vehicles based on Robot Operating
Systems (ROSs). The study addresses the challenges of efficient object detection and real-time target
tracking for unmanned aerial vehicles. The system utilizes a pruned YOLOv4 architecture for fast
object detection and the SiamMask model for continuous target tracking. A Proportional Integral
Derivative (PID) module adjusts the flight attitude, enabling stable target tracking automatically in
indoor and outdoor environments. The contributions of this work include exploring the feasibility of
pruning existing models systematically to construct a real-time detection and tracking system for
drone control with very limited computational resources. Experiments validate the system’s feasibility,
demonstrating efficient object detection, accurate target tracking, and effective attitude control. This
ROS-based system contributes to advancing UAV technology in real-world environments.

Keywords: UAV; deep learning; ROS; convolutional neural network; pruned network; target tracking
network; PID control

Citation: Yang, S.-Y.; Cheng, H.-Y.;

1. Introduction
Yu, C.-C. Real-Time Object Detection
and Tracking for Unmanned Aerial
In recent years, the robotics industry has experienced significant growth. Among
Vehicles Based on Convolutional different applications, unmanned aerial vehicles (UAVs), also known as drones, have
Neural Networks. Electronics 2023, 12, emerged as a popular application area. A UAV is an autonomous or remotely controlled
4928. https://doi.org/10.3390/ flying vehicle that does not require a human pilot onboard and has the capability to
electronics12244928 carry payloads. At first, UAVs found their primary use within the military, where they
were employed for high-risk tasks like reconnaissance, attack operations, and supply
Academic Editors: Fan Yang,
missions, with the goal of minimizing the risk to human personnel. Nevertheless, progress
Slaviša Jovanović, Zongwei Wu,
in aerospace materials, inertial sensors, navigation technology, image processing, and
Virginie Fresse and Chao Li
data transmission has broadened the range of UAV applications, now extending into
Received: 1 November 2023 civilian fields.
Revised: 6 December 2023 UAVs find wide-ranging applications in civilian life. For example, remote-controlled
Accepted: 6 December 2023 aircraft are widely used for entertainment and recreational activities, and the emergence of
Published: 7 December 2023 first-person view (FPV) racing drones [1] has created a popular sport. Furthermore, drones
are utilized for aerial photography and videography, offering unique perspectives and
camera angles from the sky. Additionally, some companies have begun utilizing drones
Copyright: © 2023 by the authors.
for delivery services [2], enabling faster and more efficient logistics. However, with the
Licensee MDPI, Basel, Switzerland.
widespread adoption of drones, regulatory and safety concerns have arisen. To ensure
This article is an open access article safe operations, the Federal Aviation Administration (FAA) in the United States has estab-
distributed under the terms and lished the “Operation and Certification of Small Unmanned Aircraft Systems” [3], which
conditions of the Creative Commons categorizes quadcopter drones and sets standards and requirements for their operation
Attribution (CC BY) license (https:// and certification. During the development of UAVs, compatibility issues between different
creativecommons.org/licenses/by/ systems often arise. To address these challenges, the Robot Operating System (ROS), an
4.0/). open-source tool for robot operations, has emerged. The ROS provides a framework and

Electronics 2023, 12, 4928. https://doi.org/10.3390/electronics12244928 https://www.mdpi.com/journal/electronics

Electronics 2023, 12, 4928 2 of 19

tools for handling inter-system connections, enabling developers to establish, control, and
monitor UAV systems more easily. It also offers numerous software modules and libraries
for various functionalities, such as sensor data processing, motion control, and mission
planning. The goal of this paper is to apply ROS technology to UAVs, allowing for a
modular system design and simplifying the overall development process, making UAV
development more accessible and efficient.
Many newly introduced drones on the market are equipped with tracking and fol-
lowing capabilities. Typically, this functionality is achieved through electronic devices
worn by the target, utilizing GPS coordinates to track and follow the target. However,
in environments where GPS signals are unavailable, such as tunnels or basements, GPS
positioning becomes ineffective or even impossible. Therefore, image tracking serves as
an auxiliary method to realize target tracking. One significant aspect of this research is to
explore the feasibility of controlling UAV systems via detection and tracking techniques
based on images. By utilizing detection and tracking techniques, drones can capture target
images through cameras and perform real-time analysis to achieve precise detection and
tracking of the position and orientation of the target. With the rapid development of deep
learning, many deep learning-based models have been proposed to address the problem of
object detection based on images. R-CNN [4] applies high-capacity convolutional neural
networks to bottom-up region proposals in order to localize and segment objects. SPP-
Net [5] utilizes spatial pyramid pooling to eliminate the requirement of fixed size input
images. Faster R-CNN [6] improves R-CNN and SPP-Net to reduce the training and testing
speed while also increasing the detection accuracy. The Single Shot MultiBox Detector
(SSD) [7] utilizes multi-scale convolutional bounding box outputs attached to multiple
feature maps at the top of the network to detect objects in images using a single deep
neural network. Unlike prior works that treat detection as a classification problem, the
work named You Only Look Once (YOLO) [8] considers object detection as a regression
problem to spatially separated bounding boxes and the associated class probabilities. A
single neural network that can be optimized end-to-end is used to predict bounding boxes
and class probabilities directly from full images in one evaluation. Therefore, YOLO has
achieved great success. YOLO9000 and YOLOv2 [9] improve the original YOLO by in-
troducing the concepts of batch normalization [10], high resolution classifier, convolution
with anchor boxes [6], dimension clusters, direct location prediction, fine-grained features,
and multi-scale training. YOLOv3 [11] made some little changes to update YOLO and to
make it better. YOLOv4 [12] performs extensive experiments on the techniques of weighted
residual connections, cross-stage partial connections, cross mini-batch normalization, self-
adversarial training, mish-activation, mosaic data augmentation, drop-block regularization,
and CIoU loss, and it combines a subset of these techniques to achieve state-of-the-art
results. Person detection is a specialized form of object detection designed to identify the
specific class “person” within images or video frames. Therefore, we utilize YOLOv4 to
perform the detection task for the drone. To further reduce the computation complexity
of YOLOv4 so that it can be applied in the environment with very limited computational
resources, we perform pruning on the original YOLOv4 model.
Pruning methods have been proposed to reduce the complexity of CNN models [13–17].
Channel pruning intends to exploit the redundancy of feature maps between channels and
remove channels with the minimal performance loss [13]. Li et al. [14] proposed pruning
deep learning models using both channel-level and layer-level compression techniques. Liu
et al. [16] designed a pruning method that can be directly applied to existing modern CNN
architectures by enforcing channel-level sparsity in the network to reduce the model size,
decrease the run-time memory footprint and lower the number of computing operations
while maintaining the accuracy of the model. In [17], the authors demonstrate how to prune
YOLOv3 and YOLOv4 models and then deployed them on OpenVINO with an increased
frame rate and little accuracy loss. Since we utilize YOLOv4 for detection in the framework,
we refer to the pruning methods described in [16,17].
Electronics 2023, 12, 4928 3 of 19

Tracking algorithms based on Siamese Networks have become mainstream for visual
tracking recently [18]. Bertinetto et al. [19] utilized a fully convolutional Siamese network
that can be trained end-to-end for tracking applications. Zhu et al. [20] designed a distractor-
aware module to perform incremental learning, which is able to transfer the general
embedding to the current video domain effectively. Li et al. [21] proposed a tracker based
on a Siamese region proposal network that is trained offline with large-scale image pairs.
A ResNet-driven Siamese tracker is trained in [22]. SiamMask [23] improves the offline
training procedure of popular fully convolutional Siamese methods for visual tracking by
augmenting their loss with a binary segmentation task.
In this work, we utilize the ROS [24] (Robot Operating System) to implement image
detection and tracking for controlling UAVs. Due to hardware constraints on the laptop,
lightweight models are required. Therefore, for the object detector, we train a convolutional
neural network based on the YOLOv4 architecture and prune it accordingly. In this work,
the target object for detection is a person. We employ the pruned version of the YOLOv4
object detector and the SiamMask [23] monocular object tracker to detect and track the
target person captured by the camera of the drone. Our system consists of four main
components: (1) object detection, (2) target tracking, (3) Proportional Integral Derivative
(PID) control, and (4) the UAV driver package. We utilize the Tello drone for implementing
the object detection and tracking system. During the tracking process, the UAV control
parameters include the roll, pitch, yaw, and altitude, all of which are controlled using PID
controllers. These PID controllers take the position and distance of the target object as
inputs. The position and distance are calculated using the monocular front-facing camera
of the UAV.

2. Approach
The details of the methods used in the proposed framework, including object detection,
model pruning, and visual tracking, are elaborated in this section. Figure 1 illustrates the
system framework. A laptop computer (PC) is connected to the Tello drone via Wi-Fi for
communication. The drone transmits images at a constant frequency of 30 Hz, which is
preconfigured in the drone’s driver software. These images are processed on the PC using
a pruned version of the YOLOv4 algorithm for object detection. Users have the ability
to select bounding boxes based on their requirements. The system utilizes the Siamese
network, called SiamMask, for object tracking. Based on the tracked object’s position and
distance, a tracking algorithm based on a PID controller is employed to calculate estimates
of the roll, pitch, yaw, and altitude. These estimated values for the roll, pitch, yaw, and
altitude are then sent back to the drone to initiate the tracking process and to utilize the
texture information of the background to enhance the final results.
The overall flowchart of the system architecture is shown in Figure 1. The drone sends
the image feed to the PC, where the received images are processed using Pruned-YOLOv4
for person detection. If a person is detected in the image, their bounding box is displayed
on the screen. The green boxes in Figure 1 represent the human detection results. If the user
selects a specific object of interest by clicking on its bounding box, the system extracts the
person within that bounding box as a template frame for the SiamMask network, enabling
subsequent tracking. The tracking algorithm calculates the error between the target and the
center of the frame. This error serves as the input for the PID controller, which generates
flight commands for the yaw, roll, and altitude. As for the fourth flight command, pitch, it
is calculated based on the relative distance of the tracked object using its position data. If
no target is detected in the image, the drone maintains its position until a target appears.
Electronics 2023, 12, x FOR PEER REVIEW 4 of 19
Electronics 2023, 12, 4928 4 of 19

Figure 1. System
Figure 1. System framework.
framework.

2.1. Hardware Specifications

2.1. Hardware Specifications
The DJI Tello [25] drone is a small and easy-to-control consumer-grade drone. It has
The DJI Tello [25] drone is a small and easy-to-control consumer-grade drone. It has
dimensions of approximately 98 × 92 cm and weighs around 80 g. The design of this drone
dimensions of approximately 98 × 92 cm and weighs around 80 g. The design of this drone
allows for its usage in both indoor and outdoor environments. In terms of its features, it is
allows for its usage in both indoor and outdoor environments. In terms of its features, it
equipped with a front-facing camera, a 3-axis gyroscope, a 3-axis accelerometer, a 3-axis
is equipped with a front-facing camera, a 3-axis gyroscope, a 3-axis accelerometer, a 3-axis
magnetometer, a pressure sensor, and an ultrasonic altitude sensor. The resolution of the
magnetometer, a pressure sensor, and an ultrasonic altitude sensor. The resolution of the
front-facing camera is 1280 × 720, capturing video at 30 frames per second. The Tello drone
front-facing camera is 1280 × 720, capturing video at 30 frames per second. The Tello drone
can communicate with other devices, such as smartphones or laptops, through a Wi-Fi
can communicate
network. with other
In this particular devices,
study, suchused
a PC was as smartphones
for communicationor laptops,
with through
the drone. a Wi-Fi
network. In this particular study, a PC was used for communication with the drone.
2.2. Detection Model Pruning and Object Detection
2.2. Detection Model Pruning and Object Detection
During the object detection process, we first need to train the model. Our training
During
process the object
is illustrated detection
in Figure 2. Theprocess,
yellowwe partfirst need to2 represents
in Figure train the model. Our training
the modules for the
process is illustrated in Figure 2. The yellow part in Figure 2 represents
Darknet framework. The light blue part in Figure 2 represents the modules for the pruning the modules for
the Darknet framework. The light blue part in Figure 2 represents
stage. Firstly, we employ the Darknet framework to train the YOLOv4 base model. Then, the modules for the
pruning
during thestage.
pruning Firstly,
stage,we weemploy
performthe Darknet
sparse training,framework to train layer
channel pruning, the YOLOv4
pruning, base
and
model. Then,
fine-tuning on during
the basethe pruning
model using stage, we perform
the Darknet sparse training,
framework. Once thechannel
pruningpruning,
stage is
layer pruning,
complete, and fine-tuning
the model undergoes on the base training
fine-tuning model using the
on the Darknet
Darknet framework.
framework. Once we
Finally, the
pruningthestage
deploy model is on
complete,
a laptopthe
for model undergoes fine-tuning training on the Darknet
detection.
framework. Finally, we deploy the model on a laptop for detection.
A. Darknet Training
A. Darknet Training
We use the Darknet framework to train the YOLOv4 model [26] and adjust several
We use the Darknet
hyperparameters during the framework to train
initial training phasethetoYOLOv4
improve model [26] and
the accuracy andadjust several
performance
hyperparameters
of the model. Oneduring the hyperparameters
of the first initial training phase to improve
to adjust the accuracy
is the input size of theand perfor-
network.
mance of the
Increasing the model.
input size One of the
helps first hyperparameters
in detecting small objects, to adjust is
although the input
it may size down
also slow of the
network. Increasing the input size helps in detecting small objects,
the model’s inference speed and consume more GPU memory. It is important to note that although it may also
slow down the model’s inference speed and consume more GPU memory.
the YOLOv4 network downsamples the input size by a factor of 32 in both the vertical and It is important
to note thatdirections,
horizontal the YOLOv4 network
so the inputdownsamples
width and height the input
mustsize by a factorofof32.
be multiples 32 To
in both the
achieve
this, we decided to use 416 × 416 as the input size.
Electronics 2023, 12, x FOR PEER REVIEW 5 of 19

Electronics 2023, 12, 4928 5 of 19

vertical and horizontal directions, so the input width and height must be multiples of 32.
To achieve this, we decided to use 416 × 416 as the input size.

Training steps
Figure 2. Training steps for
for the
the detection
detection model.
model.

The second and third hyperparameters to adjust are the batch size and subdivisions. subdivisions.
These
These settings
settings are are adjusted
adjusted based
based on on the GPU’s performance.
the GPU’s performance. The The batch
batch size
size hyperparam-
hyperparam-
eter
eter represents
represents the the number
number of of images
images to to load
load during training, with
during training, with aa default
default value
value of
of 64.
64.
However, if the GPU’s memory size is insufficient, it will not be
However, if the GPU’s memory size is insuﬃcient, it will not be able to load 64 images atable to load 64 images
at once.
once. ToTo address
address thisissue,
this issue,each
eachbatch
batchisisfurther
furthersubdivided
subdividedinto into multiple
multiple sub-batches.
sub-batches.
Each sub-batch is fed into the GPU one by one until the batch
Each sub-batch is fed into the GPU one by one until the batch is completed. In this is completed. In this study,
study,
we set the batch size and subdivisions to 64 and 8, respectively. The
we set the batch size and subdivisions to 64 and 8, respectively. The fourth hyperparame- fourth hyperparame-
ter
ter to
to adjust
adjust is is the
the number
number of of iterations
iterations (note
(note that
that in
in the
the Darknet
Darknet framework,
framework, training
training is
is
measured in iterations, not epochs). According to the Darknet
measured in iterations, not epochs). According to the Darknet framework’s guidelines, framework’s guidelines,
each
each object
object class
class should
should havehave atat least
least 2000
2000 iterations.
iterations. Since
Since we
we have
have only
only one
one class,
class, the
the
number of iterations should exceed 2000. We set the number
number of iterations should exceed 2000. We set the number of iterations to 2200of iterations to 2200 to achieveto
higher
achieveaccuracy.
higher accuracy.
B.
B. Pruning Stage
Pruning Stage
Due to the hardware limitations of the laptop, a lightweight model needs needs to to be
be used.
used.
Therefore,
Therefore, after
after training
training the
the model
model using
using the
the Darknet
Darknet framework,
framework, it needs to be pruned to
achieve
achieve thethegoal
goalofoflightweighting.
lightweighting.We Weuse
usethe metrics
the metricsof accuracy
of accuracy (mAP@0.5)
(mAP@0.5) andand
inference
infer-
speed
ence speed (BFLOPs) to evaluate the pruned model. However, it is important to notethere
(BFLOPs) to evaluate the pruned model. However, it is important to note that that
is a trade-off
there between
is a trade-oﬀ accuracy
between and inference
accuracy speed. Assuming
and inference the hardware
speed. Assuming configuration
the hardware con-
is fixed, when
figuration the model
is fixed, when the is pruned
model is topruned
a very small size,small
to a very its inference speed may
size, its inference increase
speed may
but its accuracy
increase is typically
but its accuracy reduced. reduced.
is typically
Before
Before pruning,
pruning, the the weights
weights from
from the
the Darknet
Darknet framework
framework undergoundergo aa basic
basic training
training
process.
process. Once the basic training is completed, the obtained model is pruned using
Once the basic training is completed, the obtained model is pruned using thethe
pruning
pruning strategy
strategy from
from [27].
[27]. This
This strategy
strategy involves
involves first
first conducting
conducting sparse
sparse training
training on on the
the
model,
model, where
wherethe thechannel
channelsparsity
sparsityinindeep
deepmodels
models helps
helps with
withchannel
channelpruning.
pruning. To To
facilitate
facili-
channel
tate channel pruning, each channel in the convolutional layers is associated with a scaling
pruning, each channel in the convolutional layers is associated with a scaling
factor.
factor. During
Duringtraining,
training,L1 L1regularization
regularizationis isapplied
applied to to
these scaling
these factors
scaling to automatically
factors to automati-
identify unimportant
cally identify channels.
unimportant Channels
channels. with smaller
Channels scalingscaling
with smaller factor factor
valuesvalues
(orange color)
(orange
are pruned (left side). After pruning, we obtain a compact model (right side), which is then
color) are pruned (left side). After pruning, we obtain a compact model (right side), which
fine-tuned to achieve comparable (or even higher) accuracy with the fully trained network.
is then fine-tuned to achieve comparable (or even higher) accuracy with the fully trained
The pruning process is illustrated in Figure 3.
network. The pruning process is illustrated in Figure 3.
Electronics 2023, 12, x FOR PEER REVIEW 6 of 19
Electronics 2023, 12, 4928 6 of 19

Pruningmethod.
Figure3.3.Pruning
Figure method.

C.C. Sparsity
Sparsity Training
Training
Weadd
We adda aBatch
BatchNormalization
Normalization(BN)
(BN)layer
layerafter
aftereach
eachconvolutional
convolutionallayer
layerininYOLOv4
YOLOv4
totoaccelerate
accelerateconvergence
convergenceand
and improve
improve generalization.
generalization. The
The BN
BN layer
layerutilizes
utilizesbatch
batchstatistics
statis-
to normalize the convolutional features as:
tics to normalize the convolutional features as:
𝑥x −
− 𝑥x + β
y
y= ×√ 2
= γγ × +β (1)
(1)
√𝜎 +
σ + 𝜀ε

Here,𝑥x and
Here, and σ𝜎2 represent
represent thethe mean
mean andand variance
variance of the
of the input
input features
features in mini-batch,
in the the mini-
batch, respectively. γ and β
respectively. γ and β represent the trainable scale factor and bias in the BN layer. layer.
represent the trainable scale factor and bias in the BN In this
Instudy,
this study, we directly
we directly use the use
scalethe scaleinfactor
factor the BNinlayer
the BN layer
as an as an of
indicator indicator
channelof channel
importance.
importance.
To effectivelyTo eﬀectively
distinguishdistinguish between important
between important and unimportant
and unimportant channels,channels,
we applyweL1
apply L1 regularization
regularization to γ, enabling
to γ, enabling channel-level
channel-level sparse sparse
training.training.
The lossThefunction
loss function for
for sparse
sparse training
training is shown
is shown as: as:
L = lossyolo + α ∑ f (γ) (2)
γeΓ
Electronics 2023, 12, 4928 7 of 19

The function f (γ) represents the L1 norm applied to γ, which is widely used in the
sparsification step. α represents the penalty factor that balances the two loss terms.
The effectiveness of pruning depends on the sparsity of the model. Prior to sparse
training, the distribution of γ in the BN layer of YOLOv4 is expected to be uniform. After
sparse training, most of the γ values in the BN layer are compressed toward zero. This
brings two benefits:
(1) Achieving network pruning and compression to improve model efficiency: The
weights in the BN layer are typically used for standardizing and scaling each input
sample in the network. When the weights are close to zero, the corresponding stan-
dardization and scaling operations are reduced, thereby reducing the computational
complexity.
(2) By sparsifying the weights of the BN layer close to zero, it becomes possible to identify
parameters that have minimal impact on network performance and prune them.
D. Channel cutting
Once sparse training is completed, channel cutting can be performed. Here is an
explanation of how to proceed with channel cutting. First, the total number of channels in
the backbone is computed. Once the number of channels is determined, the corresponding
γ values are stored in a variable and sorted in ascending order. The next step is to decide
which channels to keep and which ones to prune. This can be achieved by setting a pruning
rate, which represents the proportion of channels to be pruned. The pruning rate is typically
a value between 0 and 1, where a higher value indicates a greater degree of pruning. By
following these steps, the channel-cutting process can be carried out to selectively retain or
remove channels based on the specified pruning rate.
E. Layer cutting
Within the YOLOv4 backbone, there are multiple CSPX modules, where each CSPX
module consists of three CBL layers and X ResUnit modules. The resulting features of
these modules are concatenated together, as depicted in Figure 4a. For layer cutting, we
mainly prune the ResUnit within YOLOv4. The architecture of the ResUnit is illustrated
in Figure 4b, which consists of two CBL layers and a shortcut connection. The CBL layer
comprises a Conv layer, a BN layer, and a Leaky ReLU activation function, as shown in
Figure 4c. In layer cutting, the mean values of γ for each layer are first sorted, and by
evaluating the previous CBL layer of each shortcut, the minimum value can be selected for
layer pruning. To ensure the structural integrity of YOLOv4, when pruning one ResUnit,
both the shortcut layer and the preceding CBL layer are simultaneously pruned, resulting
in the pruning of three layers in total.
F. Fine-tuning
Different pruning strategies and threshold settings yield different effects on the pruned
model. Sometimes, the accuracy of the pruned model may even increase, although in
most cases, pruning can have a negative impact on model accuracy. In such cases, it is
necessary to perform fine-tuning on the pruned model to compensate for the accuracy loss
caused by pruning. Fine-tuning is crucial for restoring the accuracy of the pruned model.
In our experiments, we directly retrained the Pruned-YOLOv4 using the same training
hyperparameters as the normal training process for YOLOv4.
Electronics 2023,12,
Electronics2023, 12,4928
x FOR PEER REVIEW 88 of
of 19
19

Figure4.4.(a)
Figure (a)CSPX,
CSPX,(b)
(b)ResUnit,
ResUnit,and
and(c)
(c)CBL
CBLlayer
layerin
inYOLOv4.
YOLOv4.
2.3. Object Tracking and Drone Control
2.3. Object Tracking and Drone Control
The laptop performs the real-time detection of drones, allowing users to track the
Thetargets
detected laptopuntil
performs the real-time
the tracking detection
is completed. of drones,
The objects allowing
detected usingusers to track the
Pruned-YOLOv4
detected targets until the tracking is completed. The objects
are represented by bounding boxes, each containing four coordinates (x, y, w, h). detected using Pruned-
Here,
YOLOv4 are represented by bounding boxes, each containing four
x and y represent the coordinates of the top left corner of the bounding box, while w and coordinates (x, y, w,h
h). Here, x and y represent the coordinates of the top left corner
represent the width and height of the bounding box, respectively. Once the coordinates of the bounding box,
while
are w and hwe
obtained, represent the width
continuously andthe
detect height
user’sof mouse
the bounding box,
position. If respectively.
the mouse click Once the
falls
coordinates
within a boundingare obtained,
box, thewe fourcontinuously
coordinatesdetect
of thethe user’s mouse
bounding box areposition.
passed to If the
the mouse
object
click falls within a bounding box, the four coordinates of the bounding
tracking module, which utilizes SiamMask [23]. SiamMask is a target-tracking algorithm box are passed to
the object
based tracking
on Siamese module,
Neural which [28].
Networks utilizes SiamMask
Siamese Neural[23]. SiamMask
Networks wereisinitially
a target-tracking
proposed
algorithm based on Siamese Neural Networks [28]. Siamese
by Bromley and LeCun to address signature verification problems [29] and have since Neural Networks were ini-
been
tially proposed
widely applied inby Bromley
various andsuch
fields, LeCun to address
as image matching signature verification
and target tracking.problems [29]
and Inhave
thesince been
task of widely
target applied
tracking, in various
Siamese Neural fields, such as
Networks imagetwo
employ matching
identical and target
subnet-
tracking.
works with shared parameters and weights. The tracking template is fed into the network,
and theIn the
outputtaskweights
of targetaretracking,
obtained. Siamese
TheseNeural
weights Networks
are then employ
matchedtwo with identical
the output sub-
weights of the search region to calculate the similarity score. The target’s location tonet-
networks with shared parameters and weights. The tracking template is fed into the be
work, and
tracked the output by
is determined weights are obtained.
computing Thesescore
the response weights
map.are then matched
Building upon the with the out-
traditional
put weights
Siamese of theSiamMask
network, search region to calculate
incorporates thesegmentation
target similarity score. The target’s
computation, whichlocation
allows to
be the
for tracked is determined
extraction by computing
of the target’s contour.the Thisresponse score map.
helps mitigate the Building
effects ofupontargetthe tradi-
feature
tional Siamese
variations caused network, SiamMask
by rotation incorporates target segmentation computation, which
and deformation.
allows for the
While extraction
performing of the target’s
tracking, SiamMask contour. This helps mitigate
simultaneously returns an theimage
eﬀectswith of target
the
feature variations
bounding box of the caused by rotation
tracked and deformation.
object. This bounding box contains information about the
While performing tracking, SiamMask simultaneously returns an image with the
bounding box of the tracked object. This bounding box contains information about the
Electronics 2023,
Electronics 12,12,
2023, x 4928
FOR PEER REVIEW 9 of9 of
1919

object’s
object’sposition
positionin in the image. The
the image. Thebounding
boundingbox boxof of
thethe tracked
tracked object
object consists
consists of points:
of four four
points: (𝑥 , 𝑦 ), (𝑥 , 𝑦 ), (𝑥 , 𝑦 ), and
(xmin , ymin ), (xmax , ymin ), (xmin , ymax ), and (xmax , ymax ). (𝑥 , 𝑦 ).
WeWecancanuseusethese
thesefour
fourpoints
pointstotocalculate
calculatethe thecenter
centerofofthetheobject’s
object’sposition,
position,which
whichisis
computed
computedas:as:
( xcenter,,𝑦ycenter )) =
+ 𝑥xmax 𝑦ymin +
𝑥xmin + ,
+ 𝑦ymax
(𝑥 = 2 , 2 (3)(3)
2 2
In order to track an object accurately, it is necessary to know the exact center position
In order to track an object accurately, it is necessary to know the exact center position
of the drone’s screen. This is because the detected object’s center should always align
of the drone’s screen. This is because the detected object’s center should always align with
with the center of the drone’s screen for proper tracking. The calculation of the disparity
the center of the drone’s screen for proper tracking. The calculation of the disparity be-
between the center of the drone’s screen and the object’s center is performed as:
tween the center of the drone’s screen and the object’s center is performed as:
𝑒 ex==𝑖𝑚𝑔𝑥 −𝑥xcenter
imgx center − (4)(4)

𝑒 = 𝑖𝑚𝑔𝑦 −𝑦 (5)
ey = imgycenter − ycenter (5)
𝑒 and 𝑒 should always be equal to or close to zero to achieve eﬀective tracking.
ex and ey should always be equal to or close to zero to achieve effective tracking.
The drone has a total of four control parameters: roll, yaw, altitude, and pitch. Roll
The drone has a total of four control parameters: roll, yaw, altitude, and pitch. Roll
controls the drone’s lateral movement, yaw controls the drone’s clockwise or counter-
controls the drone’s lateral movement, yaw controls the drone’s clockwise or counter-
clockwise rotation, altitude controls the drone’s vertical movement, and pitch controls the
clockwise rotation, altitude controls the drone’s vertical movement, and pitch controls the
drone’s
drone’sforward
forwardororbackward
backwardmovement.
movement.Figure
Figure5 5illustrates
illustratesthe
thebasic
basicflight
flightmaneuvers
maneuversofof
the drone.
the drone.

Figure 5.5.
Figure Fundamental maneuvers
Fundamental ofof
maneuvers a drone.
a drone.

Next,
Next,wewewill
willexplain
explainhowhowPID PIDcontrols
controlsdrone
droneflight.
flight.It Itisisevident
evidentthatthatbybyusing
usingthethe
center point of the tracked object and the center point of the screen,
center point of the tracked object and the center point of the screen, we can obtain we can obtain the error
the
inerror
the X-axis. This error
in the X-axis. Thisiserror
related to the to
is related drone’s roll for
the drone’s lateral
roll movement
for lateral movement and yaw
and for
yaw
clockwise or counterclockwise
for clockwise or counterclockwiserotation. If the Ifdrone
rotation. detects
the drone that the
detects thatobject is moving
the object left
is moving
orleft
right, we canwe
or right, choose to adjust
can choose the drone’s
to adjust headingheading
the drone’s to face the object
to face theorobject
perform lateral
or perform
movements to keep up
lateral movements withup
to keep it. with
Additionally, there isthere
it. Additionally, the pitch
is theaxis,pitchwhich involves
axis, which for-
involves
ward and and
forward backward
backwardmovements.
movements. By subtracting
By subtracting thethe
distance
distance between
between thethe
drone
droneandandthe
the
realobject
real objectfrom
fromthe
thedesired
desiredideal
idealdistance,
distance,we wecancancalculate
calculatethe thedistance
distanceerrorerrorand
andcontrol
control
thedrone’s
the drone’sforward
forwardororbackward
backward movements
movements accordingly.Finally,
accordingly. Finally,regarding
regardingaltitude,
altitude,byby
subtracting the Y-coordinate of the tracked object from the Y-coordinate of the screen center,
Electronics 2023, 12, x FOR PEER REVIEW 10 of 19
Electronics 2023, 12, x FOR PEER REVIEW 10 of 19

Electronics 2023, 12, 4928 10 of 19

subtracting the Y-coordinate of the tracked object from the Y-coordinate of the screen cen-
subtracting
ter, the Y-coordinate
we can obtain of the
the error in the tracked
Y-axis. Thisobject
allowsfrom the
us to Y-coordinate
calculate of the screen
the necessary cen-
altitude
ter,
for we
the can obtain
drone’s the
vertical error in
ascent the
or Y-axis.
descent. This
The allows
specificus to calculate
control the
methods necessary
are altitude
illustrated
we can obtain the error in the Y-axis. This allows us to calculate the necessary altitude in
for
for the6.drone’s vertical ascent or descent. The specific control methods are illustrated in
Figure the drone’s vertical ascent or descent. The specific control methods are illustrated in
Figure
Figure 6.6.

Figure 6. The control scheme for the drone.

Figure 6. The control scheme for the drone.
Figure 6. The control scheme for the drone.
When
Whenititcomescomestotothe theerror
errorininthetheX-axis,
X-axis,thethechoice
choicebetween
betweenroll rolland
andyaw yawisisdesigned
designed
When it comes
totobebedependent on to the
the error interm
derivative the X-axis,
(D) of the
the choice
PID between roll
controller. The and
D yawrepresents
term is designed
dependent on the derivative term (D) of the PID controller. The D term represents
to rate
the be dependent of on theerror.
derivative term (D) of therapidly,
PID controller. The D term represents
the rateofofchange
change ofthe the error.IfIfthe
theerror
errorchanges
changes rapidly,the thedrone
droneneeds
needstotoincrease
increaseitsits
the
power rate of change of the error. If the error changes rapidly, the drone needs to increase its
powertotokeep keepup upwith
withthe theobject’s
object’smovements.
movements.However,However,ififthe theobject
objectcontinues
continuestotomovemove
power
along to keep up with the object’s movements. However, if the object continues to move
alongthe theX-axis,
X-axis,asasshown
shownininFigure
Figure7a, 7a,a astronger
strongercontrol
controlisisrequired
requiredtototracktrackthe
thetarget
target
along
quickly. theTheX-axis,
red as shown
rectangle in Figure
represents 7a,
the a stronger
bounding control
box ofis required
the
quickly. The red rectangle represents the bounding box of the subject. Therefore, we choose to
subject. track the
Therefore, target
we
quickly.
choose
the rollthe The red rectangle
roll option,
option, which which represents
means means
performing the
performing bounding box of
lateral movements
lateral movements the subject.
to the
to the right Therefore,
in right
orderintoorder we
follow
choose
tothe
follow
object,the roll option,
therepresented which
object, represented means
by the greenby the performing
green
arrow lateral movements
arrow illustrated
illustrated in Figure 7a. to
in Figure the
On the7a. right
Onhand,
other in order
the other
if the
to
hand, follow
trackedif the the
object object,
tracked represented
doesobject
not does not
exhibit by exhibit
the green
significant arrow illustrated
significant
movement, the yawin
movement, Figure
the yaw
option is 7a. On theis
option
selected, other
se-
which
hand,
lected, if the only
which
only requires tracked object
requires
adjusting does
theadjusting
drone’s notheading
exhibit
the drone’ssignificant
to heading
follow themovement,
totarget, thethe
followrepresented yawrepresented
target, option is se-
by the green
bylected, which
the green
arrow onlyin
arrow
illustrated requires
illustrated adjusting
Figure 7b. in Figurethe 7b.drone’s heading to follow the target, represented
by the green arrow illustrated in Figure 7b.

(a) (b)
(a) (b)
Figure 7. Examples of object movement (a) Roll (b) Yaw.
Electronics 2023, 12, 4928 11 of 19

3. Experimental Results
In this section, we demonstrate and analyze the performance of the detection model
pruning, object detection, tracking and drone control.

3.1. Detection Model Pruning and Object Detection

We trained the baseline YOLOv4 model using the coco2014 [30] dataset. By applying
pruning techniques to the baseline YOLOv4, we obtained a lightweight model known as
Pruned-YOLOv4. In our comparison, we considered not only the baseline YOLOv4 but
also Tiny-YOLOv4. The Tiny version of YOLOv4 is specifically designed as a lightweight
variant for devices with lower computational resources.
To evaluate the performance of the object detector, we applied the following four met-
rics:
(1) Precision: Precision is a widely used metric in the object detection community. It
measures the proportion of true positives among all the detections made by the system.
A higher precision indicates that the system can accurately identify target objects,
reducing the likelihood of false alarms.
(2) Recall: Recall is another commonly used metric in object detection. It measures
the proportion of true positives among all the actual target objects. A higher recall
indicates that the system can successfully detect a larger portion of the target objects,
reducing the risk of missed detections.
(3) BFLOPs: BFLOPs is a metric used to measure the computational efficiency of a
computer system or machine-learning model. BFLOPs represents the number of
billion floating-point operations required for a specific convolution operation. By
summing up the BFLOPs consumed by multiple convolution and other operations,
the complexity of an algorithmic model can be quantified. It is a commonly used
metric for evaluating the computational efficiency and speed of systems or models.
(4) mAP@0.5 (mean Average Precision at IoU 0.5): mAP@0.5 is a commonly used evalua-
tion metric in object detection. It measures the average precision at an Intersection
over Union (IoU) threshold of 0.5 across different classes.
A. Basic Training in Darknet
The performance results of YOLOv4 and Tiny-YOLOv4 after basic training using
Darknet are shown in Tables 1 and 2. From Table 1, it is evident that there is a significant
difference in the model accuracy between the two models. YOLOv4 achieves an mAP@0.5
of 0.749, while Tiny-YOLOv4 achieves 0.55. However, Table 2 reveals that while YOLOv4
achieves higher accuracy, it also requires a larger number of parameters for the model to
learn. The trained weight size of YOLOv4 is 10.6 times larger than that of YOLOv4-Tiny,
and the BFLOPs is 7.64 times larger. The large size of the model also results in a longer
inference time, highlighting the importance of pruning YOLOv4 in this study.

Table 1. Evaluation of Darknet basic training.

Model Precision Recall mAP@0.5

YOLOv4 0.818 0.605 0.749
Tiny-YOLOv4 0.65 0.41 0.55

Table 2. Parameter size of Darknet basic training.

Model Params Size of .Weights BFLOPs

YOLOv4 63.9 M 250.2 MB 59.563
Tiny-YOLOv4 5.87 M 23.6 MB 7.79
Electronics 2023, 12, 4928 12 of 19

B. Sparsity Training Results

During the sparse training, we adjusted the balance between the two losses, denoted
as α. As the training progressed, the number of smaller scaling factors increased, while the
number of larger scaling factors decreased. Sparse training effectively reduces the scaling
factors, leading to channel sparsity in the convolutional layers of YOLOv4. This enables
the identification and pruning of parameters with less impact on network performance.
However, when using a larger penalty factor (i.e., α = 0.01) for sparse training, the scaling
factors decay too aggressively, resulting in underfitting and failure of the model. This can
be observed from the significant drop in the Recall and mAP@0.5 in Table 3, indicating
that this model is not suitable for pruning. On the other hand, using a very small penalty
factor (i.e., α = 0.0001) leads to insufficient sparsity in the scaling factors, resulting in poor
pruning performance. Although this model demonstrates excellent performance in Table 3,
it becomes challenging to identify parameters with less impact on network performance
during subsequent pruning operations. In our experiments, we trained the YOLOv4 model
with a penalty factor α = 0.001 for channel pruning. The scaling factors were compressed
to near zero, and the performance remained relatively stable, as shown in Table 3.

Table 3. Evaluation of the penalty factor.

Model Precision Recall mAP@0.5

α = 0.0001 0.822 0.664 0.75
α = 0.001 0.851 0.626 0.741
α = 0.01 0.981 0.059 0.26

C. Channel Cutting Results

During the channel-pruning process, we can adjust the pruning rate to perform
pruning, which ranges between 0 and 1. A higher pruning rate indicates a greater level of
pruning. From Table 4, it can be observed that when the pruning rate is set to 0.74, there is a
significant drop in the model’s mAP@0.5 and Recall. However, when the pruning rate is set
to 0.73, the decrease in the mAP@0.5 and Recall is not as significant as with a pruning rate
of 0.74. Therefore, we should explore a finer range below the pruning rate of 0.73 to find
the optimal value. From Table 5, it can be inferred that a higher pruning rate theoretically
leads to a reduction in the model parameters and an increase in the inference speed. If the
mAP@0.5 values are similar for different pruning rates, it is preferable to select the pruning
model with a higher pruning rate to achieve faster speed. Hence, we ultimately chose a
pruning rate of 0.735.

Table 4. Precision, Recall, and mAP under different pruning rates.

Pruning Rate Precision Recall mAP@0.5

0.5 0.86 0.56 0.7423
0.6 0.865 0.559 0.741
0.7 0.865 0.559 0.741
0.71 0.87 0.55 0.742
0.72 0.883 0.515 0.728
0.73 0.935 0.266 0.649
0.735 0.94 0.21 0.51
0.74 0.92 0.00013 0.38
Electronics 2023, 12, 4928 13 of 19

Table 5. Parameter size and BFLOPs under different pruning rates.

Pruning Rate Params Size of .Weights BFLOPs

0.5 15.03 M 58.8 MB 28.564
0.6 9.134 M 35.7 MB 24.886
0.7 6.42 M 25.2 MB 21.922
0.71 6.25 M 24.5 MB 21.592
0.72 6.02 M 23.6 MB 21.270
0.73 5.7 M 22.4 MB 20.999
0.735 5.57 M 21.8 MB 20.858
0.74 5.4 M 21.2 MB 20.703

D. Layer Cutting Results

During the layer-cutting process, we can determine the number of ResUnits to be
pruned. From Table 6, it can be observed that when the number of ResUnits is set to 15,
there is a significant drop in the model’s mAP@0.5 and Recall. However, when the number
of ResUnits is set to 14, the model’s mAP@0.5 and Recall do not decrease as dramatically as
with 15 ResUnits. Additionally, from Table 7, we can infer that the more units pruned, the
fewer model parameters theoretically and the faster the inference speed. If the mAP@0.5
values are similar for different numbers of ResUnits, it is preferable to choose the pruning
model with a higher number of units to achieve a faster speed. Therefore, we ultimately
decided to prune 14 ResUnits.

Table 6. Evaluation of detection accuracy for pruning the number of ResUnits.

Cut ResUnit
Precision Recall mAP@0.5
Numbers
11 0.914 0.175 0.5336
12 0.91 0.132 0.494
13 0.91 0.131 0.483
14 0.912 0.116 0.463
15 0.929 0.0188 0.335

Table 7. Parameter Sizes of pruning the number of ResUnits.

Cut ResUnit
Params Size of .Weights BFLOPs
Numbers
11 4.6 M 18 M 19.490
12 4.4 M 17.2 M 19.220
13 4.3 M 17.1 M 18.991
14 4.2 M 16.8 M 18.610
15 4.15 M 16.2 M 17.869

E. Final Model Tuning

After pruning the model, it is common for pruning to have a negative impact on model
accuracy. In such cases, fine-tuning the pruned model becomes crucial to recover the lost
accuracy. In our experiment, we directly used the same training hyperparameters and
dataset as in the normal training of YOLOv4 to retrain the Pruned-YOLOv4 model. The
performance after fine-tuning is shown in Table 8. Compared to the baseline YOLOv4, there
is a slight sacrifice of 0.013 in the mAP@0.5. However, from Table 9, we can observe that
Pruned-YOLOv4 achieves a faster inference speed and significantly reduces the model’s
parameter count and the size of the .weights file. This allows the model to be deployed
on embedded devices with limited memory space. Compared to Tiny-YOLOv4, there is a
significant improvement in the mAP@0.5, as shown in Table 8. Although there is a slight
Electronics
Electronics2023,
2023,12,
12,x4928
FOR PEER REVIEW 1414ofof19
19

15BFLOPs, as seen in Table

sacrifice in the 4.15 M9, this trade-off16.2 M
is worthwhile 17.869in a more
as it results
accurate model.
Table 8. Evaluation of various detection models.
Table 8. Evaluation of various detection models.
Model Precision Recall mAP@0.5
Model
SSD Precision
0.607 Recall
0.398 mAP@0.5
0.504
YOLOv3
SSD 0.661
0.607 0.436
0.398 0.579
0.504
YOLOv3
YOLOv4 0.661
0.818 0.436
0.605 0.579
0.749
YOLOv4 0.818 0.605 0.749
Tiny-YOLOv4 0.652 0.411 0.551
Tiny-YOLOv4 0.652 0.411 0.551
Pruned-YOLOv4
Pruned-YOLOv4 0.784
0.784 0.619
0.619 0.736
0.736

Table 9. Parameter size of various detection models.

Table 9. Parameter size of various detection models.
Model Params Size of .Weights BFLOPs
Model Params Size of .Weights BFLOPs
SSD 34.3 M 135.3 MB 70.40
SSD 34.3 M 135.3 MB 70.40
YOLOv3
YOLOv3 61.661.6
M M 242.9MB
242.9 MB 65.86
65.86
YOLOv4
YOLOv4 63.9 M M
63.9 250.2 MB
250.2 MB 59.563
59.563
Tiny-YOLOv4
Tiny-YOLOv4 5.875.87
M M 23.6
23.6MB
MB 7.79
7.79
Pruned-YOLOv4
Pruned-YOLOv4 4.284.28
M M 16.8
16.8MB
MB 18.61
18.61

3.2.Subject
3.2. SubjectTracking
Tracking and
and Drone
Drone Control
Control
Theaim
The aimofofthis
this study
study is explore
is to to explore the the feasibility
feasibility and and effectiveness
eﬀectiveness of automated
of automated con-
control in real-world applications. To achieve this goal, we design a
trol in real-world applications. To achieve this goal, we design a series of experimentsseries of experimentsto
to simulate
simulate thethe exploration
exploration needs
needs of of drones
drones in in real
real environmentsand
environments andrequire
requirethethedrones
dronestoto
successfully track target objects automatically. To ensure the reliability
successfully track target objects automatically. To ensure the reliability of the experi- of the experimental
results, results,
mental we perform experiments
we perform in both indoor
experiments in both and outdoor
indoor andenvironments. By conducting
outdoor environments. By
experiments in these locations, we are able to better assess the adaptability
conducting experiments in these locations, we are able to better assess the adaptability and performance
of the
and automatedof
performance control in variouscontrol
the automated real-world scenarios.
in various Figuresscenarios.
real-world 8 and 9 list the selected
Figures 8 and
9 list the selected outdoor scenes and indoor scenes in the experimental videos, subjects
outdoor scenes and indoor scenes in the experimental videos, respectively. The respec-
being tracked
tively. include
The subjects ten tracked
being differentinclude
people.ten Each personpeople.
diﬀerent is tracked
Eachforperson
50 to 90
is stracked
in outdoor
for
and indoor environments five times. During the tracking process, a
50 to 90 s in outdoor and indoor environments five times. During the tracking process,random number of 0 to
a
7 other people would appear as passersby in the scene.
random number of 0 to 7 other people would appear as passersby in the scene.

Figure
Figure8.8.Selected
Selectedscenes
scenesin
inoutdoor
outdoorenvironments.
environments.
Electronics 2023, 12, x FOR PEER REVIEW 15 of 19
Electronics2023,
Electronics 12, x4928
2023,12, FOR PEER REVIEW 1515ofof19
19

Figure 9. Selected scenes in indoor environments.

Figure
Figure9.9.Selected
Selectedscenes
scenesin
inindoor
indoorenvironments.
environments.
A. Parameters for PID
A.
A. Parameters
Parametersfor forPID
PID
After obtaining the error values of the actual drone state, the flight maneuvers of the
After
After obtaining the error values of the
theactual drone state, the
theflight maneuvers of
ofthe
drone can beobtaining
determined thetoerror
enablevalues
it toof track actual drone
the target. Tostate,
address flight
this maneuvers
issue, we employ the
drone
drone can be determined
can besystem.
determined to enable it to
to enable ittotonote track
track the target.
thethe
target. To address
To address this issue,
this issue, we employ
we employ
a PID control It is important that PID system generates outputs for all a
aPID
PIDcontrol
control system.ItItisisimportant
important to to note
note that
that the PID
PID system generates outputs for
forall
three axes, thesystem.
x, y, and z axes, represented by the the
green, red, system generates
and blue arrows outputs
illustrated all
three axes, the x, y, and z axes, represented by the green, red, and blue arrows illustrated
in Figure 10. Therefore, there are nine parameters in total. Setting higher Kp values allows in
three axes, the x, y, and z axes, represented by the green, red, and blue arrows illustrated
in Figure 10.10. Therefore, thereareare nine parameters in total. Setting higher Kp values allows
theFigure
controller Therefore,
to respond there
faster nine
to parameters
control errors.in total. Setting
Consequently, higher
the KpKp values
parameters allows
for allthe
the controller
controller to respond
to set
respond faster to control errors. Consequently, theparameters
Kp parameters for all
three axes are to thefaster
highest to control errors.
value. Conversely, Consequently, the Kp
the Ki parameters are set to forthe all three
mini-
three
axes axesset
are aretoset
thetohighest
the highestvalue. value. Conversely,
Conversely, the theparameters
Ki Ki parameters are areto
set setthe
to minimum
the mini-
mum value to reduce the impact of the accumulated errors, thereby avoiding excessive
mum value
valuetuning to reduce
to reduce the impact
the impact of for
thetheof the accumulated
accumulated errors, errors,
thereby thereby
avoiding avoiding
excessive excessive
system
system or instability. As configuration of the Kd parameters, an appropriate
system
tuning tuning or instability.
or instability. As for As for the configuration of the Kd parameters, an appropriate
setup provides a response to the
the configuration of the Kdthereby
rate of error variation, parameters, an appropriate
enhancing the system’s setup
setup provides
provides a a response
response to the to the
rate of rate
error ofvariation,
error variation,
thereby thereby
enhancing enhancing
the the system’s
system’s response
response speed and stability. The initial values for parameter settings are selected accord-
response
speed and speed and stability.
stability. The initial The initial
values forvalues for parameter
parameter settings aresettings
selectedare according
selected accord-
to [31],
ing to [31], followed by a grid search to determine the optimal values.
ing to [31],by
followed followed by a grid
a grid search search to the
to determine determine
optimalthe optimal values.
values.

Figure 10. Drone three-axis orientation.

B. 10.
Figure Analysis of PID control
Drone three-axis for drones
orientation.
Figure 10. Drone three-axis orientation.
In this sub-setcion, we demonstrate how the drone continuously tracks the target
B. Analysis of PID control for drones
object
B. in flightofand
Analysis PIDadjusts
controlitsforflight
dronesactions as the target object moves. Two experimental
In this
videos aresub-setcion,
selected towe demonstrate
demonstrate thehow the drone
tracking andcontinuously
control tracks the
processes. target1, ob-
In this sub-setcion, we demonstrate how the drone continuously tracksInthe
video the
target ob-
ject in flight
tracked and walks
subject adjustsonits
a flight
flat actions
surface, as as the target
shown in object
Figure 11a.moves.
The Two
four experimental
directions that the
ject in flight and adjusts its flight actions as the target object moves. Two experimental
videos are
subjectare selected
moves to demonstrate
are represented the
as thethe tracking
red, and control
blue, yellow, processes.
and purple In
arrows video 1, the
videos selected to demonstrate tracking and control processes. In in Figure
video 11a.
1, the
tracked subject walks
The response on acontrol
of the PID flat surface,
to theas shown
error in x-axis
in the Figureposition
11a. Theoffour directions
the tracked thatin
object
tracked subject walks on a flat surface, as shown in Figure 11a. The four directions that
the subject
video 1 is moves
plottedare represented
in Figure as the11b
11b. Figure red,shows
blue, that
yellow,
as theand purple
target arrows
object moves, in Figure
the error
the subject moves are represented as the red, blue, yellow, and purple arrows in Figure
11a. The response
increases, and theof PID
the PID control
control to thecorrects
quickly error inthe
theerror
x-axistoposition
minimize of the tracked
it toward object
zero. The
11a. The response of the PID control to the error in the x-axis position of the tracked object
indrone
video continuously
1 is plotted intracks
Figurethe 11b. Figure
target 11bwhile
object, showsthethat
PIDas control
the target object to
attempts moves,
reduce the
the
in video 1 is plotted in Figure 11b. Figure 11b shows that as the target object moves, the
Electronics 2023, 12, x FOR PEER REVIEW 16 of 19

Electronics 2023, 12, 4928 16 of 19

error increases, and the PID control quickly corrects the error to minimize it toward zero.
The drone continuously tracks the target object, while the PID control attempts to reduce
x-axis
the error
x-axis of the
error target
of the object.
target ForFor
object. the the
error in y-axis,
error since
in y-axis, the the
since subject in video
subject 1 does
in video not
1 does
undergo
not undergosignificant changes
significant in height,
changes in height,we we
cancan observe from
observe Figure
from 11c 11c
Figure thatthat
there is not
there is
a significant
not variation
a significant in the
variation y-axis
in the error.
y-axis As for
error. the the
As for distance in the
distance z-axis
in the position
z-axis of the
position of
tracked
the object,
tracked the the
object, distance between
distance between thethe
drone
drone andand
the the
target object
target is fixed
object at aatreference
is fixed a refer-
value of 150, which corresponds to a distance of 1.5 m on the ground
ence value of 150, which corresponds to a distance of 1.5 m on the ground between between the target
the
object and the drone. When the target object moves forward and
target object and the drone. When the target object moves forward and backward, thebackward, the drone has
to continuously
drone track the target
has to continuously object.
track the Figure
target object.11dFigure
plots the
11dresponse
plots the of the PIDofcontrol
response the PID to
the error
control tointhe z-axisinposition
theerror the z-axisof the tracked
position of object in the object
the tracked selected invideo. It demonstrates
the selected video. It
that the error varies
demonstrates that thewith the varies
error distance between
with the target
the distance object and
between the the drone,
target andand
object the PID
the
control attempts to reduce the z-axis error of the target object.
drone, and the PID control attempts to reduce the z-axis error of the target object.

(a) Scene of experimental video 1 (b) Error in x-axis

(d) Error in y-axis (c) Distance in z-axis

Figure
Figure 11.
11. Response
Response of
of the
the PID
PID control
control to
to the
the errors
errors in
in diﬀerent
different directions
directions for
for video
video 1.
1.

In
In video 2, the
the tracked
trackedsubject
subjectmoves
movesupstairs
upstairs andand downstairs,
downstairs, as shown
as shown in Figure
in Figure 12a.
12a.
The The red arrow
red arrow and blue
and blue arrowarrow represent
represent the directions
the directions moving moving
up and updown
and down the
the stairs,
stairs, respectively.
respectively. In video
In video 2, the drone
2, the drone needs needs to follow
to follow the subject
the subject and flyand fly straight
straight up or
up or down
along along
down the stairs. The main
the stairs. focus focus
The main is to test
is towhether the drone
test whether can adjust
the drone the y-axis
can adjust error
the y-axis
in real
error intime. Figure
real time. 12a–c12a–c
Figure showshowthe response
the responseof the
of PID control
the PID to the
control to error in the
the error x-axis
in the x-
axis y-axis
and and positions
y-axis and the
positions anddistance
the distance z-axis
in the in the position of the tracked
z-axis position object inobject
of the tracked videoin2,
respectively.
video Figure 12Figure
2, respectively. demonstrates that the PID
12 demonstrates thatcontrol
the PID continuously x-axis
adjusts theadjusts
control continuously
and y-axis errors to approach zero as the subject moves forward and
the x-axis and y-axis errors to approach zero as the subject moves forward and backward backward while
ascending or descending the stairs.
while ascending or descending the stairs.
C.
C. Tracking
TrackingAccuracy
AccuracyEvaluation
Evaluation
The
The mean
mean absolute
absolute errors
errors between
between the the target
target and
and the
the center
center of
of the
the frame
frame for
for tracking
tracking
are
are listed
listed in
inTable
Table10.
10.The
Theerrors
errorsininthe x-axis
the x-axis areare
slightly higher
slightly than
higher thethe
than errors in the
errors y-
in the
axis
y-axisbecause
because the subjects
the subjectschange
changetheir
theirmoving
movingdirections
directionsininmost
mostexperimental
experimentalvideos.
videos.
When
When thethe targets
targets being
being tracked
tracked are
are moving
moving on on the
the ground
ground plane
plane without
without ascending
ascending stairs
stairs
or
or descending stairs, the errors in the y-axis are close to zero. The tracking errorsthe
descending stairs, the errors in the y-axis are close to zero. The tracking errors in in
outdoor environments
the outdoor environmentsare higher than than
are higher the errors in indoor
the errors environments
in indoor environmentsdue todue
the influ-
to the
ence of wind.
influence of wind.
Electronics 2023, 12, x4928
Electronics FOR PEER REVIEW 1717ofof 19
19

(a) Scene of experimental video 2 (b) Error in x-axis

(d) Error in y-axis (c) Distance in z-axis

Figure
Figure 12.
12. Response
Response of
of the
the PID
PID control
control to
to the
the errors
errors in
in diﬀerent
different directions
directions for
for video
video 2.
2.

Table 10.
Table Mean absolute
10. Mean absolute error
error of
of tracking.
tracking.

Indoor
Indoor Outdoor
Outdoor
x-axis
x-axis 128.95
128.95 137.62
137.62
y-axis
y-axis 98.73
98.73 116.98
116.98

3.3. Discussion
3.3. Discussion
It is
It is aa challenging
challenging issueissue toto balance
balancethethesize
sizeofofthe
themodel
modelparameters
parametersand and the
theaccuracy
accuracyof
thethe
of model
model in in
thetheprocess
processof of
pruning.
pruning. Through
Through analyzing
analyzing thethe
Precision,
Precision, Recall, and
Recall, andmAP
mAP as
well
as as the
well as theparameter
parameter sizessizes
under various
under pruning
various rates, rates,
pruning we arewe able to able
are determine a suitable
to determine a
pruning rate to balance the trade-offs. Also, fine-tuning the pruned
suitable pruning rate to balance the trade-offs. Also, fine-tuning the pruned model is help- model is helpful to
recover
ful and increase
to recover the model
and increase accuracy.
the model The PID
accuracy. The control
PID controlprocess, which
process, continuously
which continu-
ously minimizes the error between the subject being tracked and the center of frame,
minimizes the error between the subject being tracked and the center of the can
the frame,
complete the task of automatic drone control and maintain a stable
can complete the task of automatic drone control and maintain a stable flight path in real flight path in real time.
This is
time. attributed
This to thetoability
is attributed of the
the ability ofautomatic
the automatic control system
control to promptly
system to promptly adjust to the
adjust to
position and movement of the target object to minimize the error
the position and movement of the target object to minimize the error between the refer- between the reference
position
ence and the
position and actual position.
the actual This allows
position. the drone
This allows thetodrone
track totarget
trackobjects
targetaccurately and
objects accu-
respond quickly to changes. Automated control is of great significance
rately and respond quickly to changes. Automated control is of great significance to hu- to human–machine
collaboration. collaboration.
man–machine It extends the It high cognitive
extends capabilities
the high cognitiveofcapabilities
human operators.
of human Atoperators.
the same
time, automated control ensures good execution efficiency and stability. Based on the
At the same time, automated control ensures good execution efficiency and stability.
above observations and analysis, automatic drone control has obvious advantages. It can
Based on the above observations and analysis, automatic drone control has obvious ad-
provide accurate and stable tracking capabilities while taking into account the execution
vantages. It can provide accurate and stable tracking capabilities while taking into account
efficiency of automated control. This collaborative model allows researchers and operators
the execution efficiency of automated control. This collaborative model allows researchers
to participate in drone missions and leverage their respective expertise while achieving
and operators to participate in drone missions and leverage their respective expertise
higher efficiency with the help of automatic control systems.
while achieving higher efficiency with the help of automatic control systems.
4. Conclusions
4. Conclusions
In this paper, we propose an implementation method for an object detection and target
In this
tracking paper,based
system we propose
on the an implementation
Robot method(ROS)
Operating System for anand
object detection
apply andTello
it to the tar-
get tracking system based on the Robot Operating System (ROS) and apply it to the
drone. The system achieves efficient object detection and target-tracking capabilities in Tello
drone. The
real-time system achieves
environments. efficient
We utilize the object
pruneddetection
YOLOv4and target-tracking
architecture capabilities
as the detection in
model
Electronics 2023, 12, 4928 18 of 19

and select SiamMask as the tracking model. Additionally, we introduce a PID module
to calculate the errors and determine the flight attitude and action. For the detection
module, we choose the pruned YOLOv4 architecture, which provides a faster execution
speed while maintaining the detection accuracy. By reducing the redundant parameters
and computations in the model, we achieve lightweight and accelerated performance. This
allows our system to efficiently perform object detection tasks in real-time environments.
For the tracking module, we adopt the SiamMask model. SiamMask is a single-object
tracking method capable of real-time target tracking. In our system, SiamMask is used to
track the objects detected by YOLOv4, enabling continuous object tracking and positioning.
Furthermore, we introduce the PID module to calculate the errors and adjust the flight
attitudes. PID is a classical control algorithm that computes control signals based on the
current error, accumulated error, and rate of error change, aiming to bring the system
output closer to the desired value. In our system, the PID module calculates errors based
on the target object’s position and the drone’s current state, and adjusts the drone’s attitude
control signals to stably track the target object. Through flight experiments, we validate
the feasibility of applying this system in everyday environments. The pruned YOLOv4
model provides efficient object detection capabilities, enabling fast target detection in real-
time environments. SiamMask is used for tracking the target object, and the PID module
accurately calculates the errors and adapts to different flight situations, allowing the drone
to stably track the target object.

Author Contributions: Conceptualization, S.-Y.Y.; Methodology, H.-Y.C.; Software, S.-Y.Y.; Validation,

C.-C.Y.; Investigation, H.-Y.C.; Resources, H.-Y.C.; Writing—original draft, S.-Y.Y.; Writing—review &
editing, H.-Y.C.; Visualization, C.-C.Y.; Supervision, C.-C.Y.; Project administration, C.-C.Y.; Funding
acquisition, H.-Y.C. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by National Science and Technology Council NSTC, Taiwan
under grant number 112-2221-E-008-069-MY3.
Data Availability Statement: The datasets generated and analyzed during the current study are
available from the corresponding author on reasonable request.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Attention Drone Geeks! Here’s Some Answers You’ve Been Looking for. The Local Brand. Available online: https://thelocalbrand.
com/attention-drone-geeks-some-answers/ (accessed on 18 November 2023).
2. Amazon Plans to Start Drone Deliveries in the UK and Italy Next Year. Engadget. Available online: https://www.engadget.com/
amazon-plans-to-start-drone-deliveries-in-the-uk-and-italy-next-year-185027120.html (accessed on 18 November 2023).
3. Operation and Certification of Small Unmanned Aircraft. Federal Aviation Administration. Available online: https:
//www.federalregister.gov/documents/2016/06/28/2016-15079/operation-and-certification-of-small-unmanned-aircraft-
systems#h-33 (accessed on 18 November 2023).
4. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, USA,
23–28 June 2014; pp. 580–587.
5. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In Proceedings
of the 13th European Conference on Computer Vision (ECCV 2014), Zurich, Switzerland, 6–12 September 2014; pp. 346–361.
6. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks. In
Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS 2015), Montreal, QC, Canada,
7–12 December 2015; pp. 91–99.
7. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A.C. SSD: Single Shot Multibox Detector. In Proceedings of
the 14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37.
8. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of
the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 26 June–1 July 2016;
pp. 779–788.
9. Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 30th IEEE International Conference on
Computer Vision (CVPR 2017), Venice, Italy, 22–29 October 2017; pp. 6517–6525.
Electronics 2023, 12, 4928 19 of 19

10. Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In
Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML 2015), Lille, France,
6–11 July 2015; pp. 448–456.
11. Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [CrossRef]
12. Bochkovskiy, A.; Wang, C.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934.
[CrossRef]
13. He, Y.; Zhang, X.; Sun, J. Channel Pruning for Accelerating Very Deep Neural Networks. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1398–1406.
14. Li, Q.; Li, H.; Meng, L. Deep Learning Architecture Improvement Based on Dynamic Pruning and Layer Fusion. Electronics 2023,
12, 1208. [CrossRef]
15. Liu, X.; Li, C.; Jiang, Z.; Han, L. Low-Complexity Pruned Convolutional Neural Network Based Nonlinear Equalizer in Coherent
Optical Communication Systems. Electronics 2023, 12, 3120. [CrossRef]
16. Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, C. Learning Efficient Convolutional Networks Through Network Slimming.
In Proceedings of the 30th IEEE International Conference on Computer Vision (CVPR 2017), Venice, Italy, 22–29 October 2017;
pp. 2736–2744.
17. Pruned-OpenVINO-YOLO. TNTWEN. Available online: https://github.com/TNTWEN/Pruned-OpenVINO-YOLO (accessed
on 10 May 2023).
18. Li, J.; Zhang, K.; Gao, Z.; Yang, L.; Zhuo, L. SiamPRA: An Effective Network for UAV Visual Tracking. Electronics 2023, 12, 2374.
[CrossRef]
19. Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H.S. Fully-Convolutional Siamese Networks for Object Tracking.
In Proceedings of the European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11–14 October 2016;
pp. 850–865.
20. Zhu, Z.; Wang, Q.; Li, B.; Wu, W.; Yan, J.; Hu, W. Distractor-aware Siamese Networks for Visual Object Tracking. In Proceedings
of the European Conference on Computer Vision (ECCV 2018), Munich, Germany, 8–14 September 2018; pp. 101–117.
21. Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High Performance Visual Tracking with Siamese Region Proposal Network. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–23 June 2018;
pp. 8971–8980.
22. Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks.
In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA,
15–20 June 2019; pp. 4282–4291.
23. Wang, Q.; Zhang, L.; Bertinetto, L.; Hu, W.; Torr, P. Fast Online Object Tracking and Segmentation: A Unifying Approach. In
Proceedings of the 32nd IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, CA, USA,
15–20 June 2019; pp. 1328–1338.
24. Quigley, M.; Gerkey, B.; Conley, K.; Faust, J.; Foote, T.; Leibs, J.; Berger, E.; Wheeler, R.; Ng, A. ROS: An Open-Source Robot
Operating System. In Proceedings of the IEEE International Conference on Robotics and Automation, Workshop on Open Source
Software (ICRA 2009), Kobe, Japan, 12–17 May 2009; pp. 1–6.
25. Tello Edu. Ryze Robotics. Available online: https://www.ryzerobotics.com/zh-tw/tello-edu?site=brandsite&from=
landing_page (accessed on 18 November 2023).
26. YOLOv4 Baseline Training. Available online: https://github.com/AlexeyAB/Darknet (accessed on 1 June 2023).
27. Zhang, P.; Zhong, Y.; Li, X. SlimYolov3: Narrower, Faster, and Better for Real-Time UAV Applications. In Proceedings of the
IEEE/CVF International Conference on Computer Vision Workshop (ICCVW 2019), Seoul, Republic of Korea, 27–28 October
2019; pp. 37–45.
28. Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese Neural Networks for One-Shot Image Recognition. In Proceedings of the
International Conference on Machine Learning Deep Learning Workshop (ICML 2015), Lille, France, 6–11 July 2015; pp. 1–8.
29. Bromley, J.; LeCun, Y. Signature Verification Using a “Siamese” Time Delay Neural Network. In Proceedings of the Advances in
the 6th Neural Information Processing Systems, Denver, CO, USA, November 1993; pp. 737–744.
30. Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. MicROSoft COCO: Common Objects in
Context. In Proceedings of the 13th European Conference on Computer Vision (ECCV 2014), Zurich, Switzerland, 6–12 September
2014; pp. 740–755.
31. Drone-Face-Tracking. Available online: https://github.com/murtazahassan/Drone-Face-Tracking (accessed on 12 March 2023).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Shafiq Khurram 2023 Thesis
No ratings yet
Shafiq Khurram 2023 Thesis
146 pages
1382 - SP24AI05 - GSP04 - AI Capstone Projec T - Đồ Án Tốt Nghiệp Trí Tuệ Nhân Tạo
No ratings yet
1382 - SP24AI05 - GSP04 - AI Capstone Projec T - Đồ Án Tốt Nghiệp Trí Tuệ Nhân Tạo
51 pages
2021SinghaandAydin DroneDetection YOLOv4
No ratings yet
2021SinghaandAydin DroneDetection YOLOv4
21 pages
3 - Introduction To DRRMIS
No ratings yet
3 - Introduction To DRRMIS
9 pages
Pick-And-Place Application Using A Dual Arm Collaborative Robot and An RGB-D Camera With YOLOv5
No ratings yet
Pick-And-Place Application Using A Dual Arm Collaborative Robot and An RGB-D Camera With YOLOv5
14 pages
Applsci 13 05409
No ratings yet
Applsci 13 05409
20 pages
A Survey of Computer Vision Methods For
No ratings yet
A Survey of Computer Vision Methods For
38 pages
2021SinghaandAydin DroneDetection YOLOv4
No ratings yet
2021SinghaandAydin DroneDetection YOLOv4
21 pages
Reference Paper
No ratings yet
Reference Paper
25 pages
Remotesensing 13 00965 v2
No ratings yet
Remotesensing 13 00965 v2
18 pages
2.fall 23 Lecture2QualityMetrics
No ratings yet
2.fall 23 Lecture2QualityMetrics
62 pages
Drones 07 00188
No ratings yet
Drones 07 00188
18 pages
Ref 19
No ratings yet
Ref 19
6 pages
Machine Data Sheet
No ratings yet
Machine Data Sheet
3 pages
ADD YOLO: A New Model For Object Detection in Aerial Images: Yifei Yang Zhengyong Feng Wei Jin Pengcheng Miao
No ratings yet
ADD YOLO: A New Model For Object Detection in Aerial Images: Yifei Yang Zhengyong Feng Wei Jin Pengcheng Miao
19 pages
YOLOv4 With Deformable-Embedding-Transformer Featu
No ratings yet
YOLOv4 With Deformable-Embedding-Transformer Featu
23 pages
Object Detection Using Yolo Algorithm-1
No ratings yet
Object Detection Using Yolo Algorithm-1
9 pages
44 Review Remote Sensing For Object Detection
No ratings yet
44 Review Remote Sensing For Object Detection
29 pages
Lightweight Air To Air Unmanned
No ratings yet
Lightweight Air To Air Unmanned
18 pages
Sensors 23 07190
No ratings yet
Sensors 23 07190
27 pages
Improved YOLOv7-Tiny For Object Detection Based On
No ratings yet
Improved YOLOv7-Tiny For Object Detection Based On
23 pages
Drones 07 00526
No ratings yet
Drones 07 00526
23 pages
A Real-Time UAV Object Detection System Design With FPGA Implementation
No ratings yet
A Real-Time UAV Object Detection System Design With FPGA Implementation
6 pages
Deep Learning-Based Real-Time Multiple-Object Detection and Tracking From Aerial Imagery Via A Flying Robot With GPU-Based Embedded Devices
No ratings yet
Deep Learning-Based Real-Time Multiple-Object Detection and Tracking From Aerial Imagery Via A Flying Robot With GPU-Based Embedded Devices
25 pages
Partial Half Fine-Tuning For Object Detection With Unmanned Aerial Vehicles
No ratings yet
Partial Half Fine-Tuning For Object Detection With Unmanned Aerial Vehicles
9 pages
PTG Chapter 15 Asal Physics
No ratings yet
PTG Chapter 15 Asal Physics
10 pages
Drones 07 00095
No ratings yet
Drones 07 00095
17 pages
Literature Review
No ratings yet
Literature Review
6 pages
2 PB
No ratings yet
2 PB
9 pages
Autonomous Ai Drone
No ratings yet
Autonomous Ai Drone
2 pages
Liu 2021
No ratings yet
Liu 2021
16 pages
Sensors 23 06423
No ratings yet
Sensors 23 06423
23 pages
Real-Time Drone Detection Using Deep Learning: Advanced Systems Laboratory Defence Research & Development Organisation
No ratings yet
Real-Time Drone Detection Using Deep Learning: Advanced Systems Laboratory Defence Research & Development Organisation
14 pages
ACN Microrproject 1
No ratings yet
ACN Microrproject 1
19 pages
Object Detection Using Ryze Tello Drone With Help of Mask-RCNN
No ratings yet
Object Detection Using Ryze Tello Drone With Help of Mask-RCNN
7 pages
Telecom 04 00024
No ratings yet
Telecom 04 00024
18 pages
UAV Target Detection Algorithm Based On Improved YOLOv8
No ratings yet
UAV Target Detection Algorithm Based On Improved YOLOv8
11 pages
Eng 04 00025
No ratings yet
Eng 04 00025
18 pages
An Intelligent Real-Time Object Detection System o
No ratings yet
An Intelligent Real-Time Object Detection System o
15 pages
Remotesensing 12 03035
No ratings yet
Remotesensing 12 03035
28 pages
An Improved YOLOv5 Method For Small Object
No ratings yet
An Improved YOLOv5 Method For Small Object
10 pages
Drones 07 00174 With Cover
No ratings yet
Drones 07 00174 With Cover
21 pages
Drones 07 00304 v2
No ratings yet
Drones 07 00304 v2
26 pages
Ubc60Xlt: Programmabie Hand-Held Scanner
No ratings yet
Ubc60Xlt: Programmabie Hand-Held Scanner
28 pages
Electronics 12 03664
No ratings yet
Electronics 12 03664
21 pages
Lightweight Aerial Image
No ratings yet
Lightweight Aerial Image
10 pages
RF-Enabled Deep-Learning-Assisted Drone Detection and Identification - An End-To-End Approach
No ratings yet
RF-Enabled Deep-Learning-Assisted Drone Detection and Identification - An End-To-End Approach
18 pages
Object Detection
No ratings yet
Object Detection
11 pages
Drones 07 00682
No ratings yet
Drones 07 00682
18 pages
WebNavigatorInformationSystem en-US PDF
No ratings yet
WebNavigatorInformationSystem en-US PDF
206 pages
Applsci 13 04144 v2
No ratings yet
Applsci 13 04144 v2
26 pages
Drone Movement Detection Network Using Raspberry Pi
No ratings yet
Drone Movement Detection Network Using Raspberry Pi
7 pages
RF-UAVNet High-Performance Convolutional Network For RF-Based Drone Surveillance Systems
No ratings yet
RF-UAVNet High-Performance Convolutional Network For RF-Based Drone Surveillance Systems
12 pages
Final Drone Presentation
No ratings yet
Final Drone Presentation
15 pages
Yevdokimov Tezy
No ratings yet
Yevdokimov Tezy
3 pages
Terminal Emulator Display and Printer User Guide Version 10.1.2 11-30-2021
No ratings yet
Terminal Emulator Display and Printer User Guide Version 10.1.2 11-30-2021
1 page
Electronics 12 03141
No ratings yet
Electronics 12 03141
14 pages
Articulo 3
No ratings yet
Articulo 3
10 pages
Weixun 2021
No ratings yet
Weixun 2021
6 pages
Ref 14
No ratings yet
Ref 14
5 pages
YOLO Model-Based Target Detection Algorithm For UA
No ratings yet
YOLO Model-Based Target Detection Algorithm For UA
5 pages
Getting To Know Afaria
No ratings yet
Getting To Know Afaria
75 pages
EZ Voice User Manual
No ratings yet
EZ Voice User Manual
23 pages
Computer 10 AB (Pre Board-2)
No ratings yet
Computer 10 AB (Pre Board-2)
4 pages
Data Analyst - CV
No ratings yet
Data Analyst - CV
1 page
PDF - Object Detection and Person Tracking Using Uav
No ratings yet
PDF - Object Detection and Person Tracking Using Uav
11 pages
Object Recognition in Aerial Images
No ratings yet
Object Recognition in Aerial Images
9 pages
Purposive Communication
No ratings yet
Purposive Communication
5 pages
Air-to-Air Visual Detection of Micro-UAVs An Experimental Evaluation of Deep Learning
No ratings yet
Air-to-Air Visual Detection of Micro-UAVs An Experimental Evaluation of Deep Learning
8 pages
Dream Venture Project - 2
No ratings yet
Dream Venture Project - 2
14 pages
Air TO Air UAV DETECTION
No ratings yet
Air TO Air UAV DETECTION
3 pages
Drone Detection Using Visual Analysis
No ratings yet
Drone Detection Using Visual Analysis
2 pages
Unit 5: Simulating and Debugging A Migration Object
No ratings yet
Unit 5: Simulating and Debugging A Migration Object
14 pages
MSOFTX3000 Documentation Bookshelf (08) 20140225 C
No ratings yet
MSOFTX3000 Documentation Bookshelf (08) 20140225 C
66 pages
EtherCAT Seminar Document PDF
No ratings yet
EtherCAT Seminar Document PDF
22 pages
FICCI WAZIR Report Building New Age Textile Industry PDF
No ratings yet
FICCI WAZIR Report Building New Age Textile Industry PDF
33 pages
02case Study Project Main Document March 2022 - CP Final 18042022
No ratings yet
02case Study Project Main Document March 2022 - CP Final 18042022
7 pages
Penetration Testing: Edmund Whitehead Rayce West
No ratings yet
Penetration Testing: Edmund Whitehead Rayce West
14 pages
FATEC - ADS - Ingês Instrumental I - Texts 1 and 2
No ratings yet
FATEC - ADS - Ingês Instrumental I - Texts 1 and 2
6 pages
LTE and Scheduling
No ratings yet
LTE and Scheduling
25 pages
DD 11-16-23 SILVER ET2 Datasheet
No ratings yet
DD 11-16-23 SILVER ET2 Datasheet
2 pages
Boiler Fan VFD With O2 Controls
No ratings yet
Boiler Fan VFD With O2 Controls
8 pages
Intellectual Capital
No ratings yet
Intellectual Capital
9 pages
Deep-Drone-Object 2
No ratings yet
Deep-Drone-Object 2
8 pages
Empowerment Week 3 4
No ratings yet
Empowerment Week 3 4
4 pages
Autodesk Inventor Exercises - Learn by Practicing Design 100 Real-World 3D Models by Practicing (Sandeep Dogra) (Z-Library)
100% (1)
Autodesk Inventor Exercises - Learn by Practicing Design 100 Real-World 3D Models by Practicing (Sandeep Dogra) (Z-Library)
136 pages
Saudi Aramco: Logging Guidelines
No ratings yet
Saudi Aramco: Logging Guidelines
7 pages
MEC18R443 Automatic Guided Vehicle 3 0 0 3: L T P C
No ratings yet
MEC18R443 Automatic Guided Vehicle 3 0 0 3: L T P C
2 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Key To Symbols Engine Fuse Box: Diagram 1 Peugeot 206 Wiring Diagrams
No ratings yet
Key To Symbols Engine Fuse Box: Diagram 1 Peugeot 206 Wiring Diagrams
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Electronics 12 04928

Uploaded by

Electronics 12 04928

Uploaded by

electronics

1 Department of Computer Science and Information Engineering, National Central University,

Citation: Yang, S.-Y.; Cheng, H.-Y.;

Electronics 2023, 12, 4928. https://doi.org/10.3390/electronics12244928 https://www.mdpi.com/journal/electronics

2.1. Hardware Specifications

Electronics 2023, 12, 4928 5 of 19

Electronics 2023, 12, 4928 10 of 19

Figure 6. The control scheme for the drone.

3.1. Detection Model Pruning and Object Detection

Table 1. Evaluation of Darknet basic training.

Model Precision Recall mAP@0.5

Table 2. Parameter size of Darknet basic training.

Model Params Size of .Weights BFLOPs

B. Sparsity Training Results

Table 3. Evaluation of the penalty factor.

Model Precision Recall mAP@0.5

C. Channel Cutting Results

Table 4. Precision, Recall, and mAP under different pruning rates.

Pruning Rate Precision Recall mAP@0.5

Table 5. Parameter size and BFLOPs under different pruning rates.

Pruning Rate Params Size of .Weights BFLOPs

D. Layer Cutting Results

Table 6. Evaluation of detection accuracy for pruning the number of ResUnits.

Table 7. Parameter Sizes of pruning the number of ResUnits.

E. Final Model Tuning

15BFLOPs, as seen in Table

Table 9. Parameter size of various detection models.

Figure 9. Selected scenes in indoor environments.

Figure 10. Drone three-axis orientation.

Electronics 2023, 12, 4928 16 of 19

(a) Scene of experimental video 1 (b) Error in x-axis

(d) Error in y-axis (c) Distance in z-axis

(a) Scene of experimental video 2 (b) Error in x-axis

(d) Error in y-axis (c) Distance in z-axis

Author Contributions: Conceptualization, S.-Y.Y.; Methodology, H.-Y.C.; Software, S.-Y.Y.; Validation,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.