0% found this document useful (0 votes)
2 views19 pages

agronomy-14-02669

This research proposes an improved path planning framework for agricultural UAVs using a novel BL-DQN algorithm that integrates Bi-directional Long Short-Term Memory with Deep Q-Networks. The framework aims to enhance the efficiency and safety of pesticide application by drones, achieving a 41.68% improvement in coverage compared to traditional methods. The study includes a comprehensive approach involving remote sensing, task area segmentation, and optimized coverage path planning, demonstrating superior performance in complex agricultural environments.

Uploaded by

longreanjali1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views19 pages

agronomy-14-02669

This research proposes an improved path planning framework for agricultural UAVs using a novel BL-DQN algorithm that integrates Bi-directional Long Short-Term Memory with Deep Q-Networks. The framework aims to enhance the efficiency and safety of pesticide application by drones, achieving a 41.68% improvement in coverage compared to traditional methods. The study includes a comprehensive approach involving remote sensing, task area segmentation, and optimized coverage path planning, demonstrating superior performance in complex agricultural environments.

Uploaded by

longreanjali1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Article

Research on Path Planning of Agricultural UAV Based on


Improved Deep Reinforcement Learning
Haitao Fu 1 , Zheng Li 1 , Weijian Zhang 1 , Yuxuan Feng 1 , Li Zhu 1 , Xu Fang 2 and Jian Li 1, *

1 College of Information Technology, Jilin Agricultural University, Changchun 130118, China;


fht@jlau.edu.cn (H.F.); 20231308@mails.jlau.edu.cn (Z.L.); 20231257@mails.jlau.edu.cn (W.Z.);
fengyuxuan@jlau.edu.cn (Y.F.); zhuli@jlau.edu.cn (L.Z.)
2 School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798,
Singapore; fa0001xu@e.ntu.edu.sg
* Correspondence: lijian@jlau.edu.cn; Tel.: +86-139-4419-5488

Abstract: Traditional manual or semi-mechanized pesticide spraying methods often suffer from
issues such as redundant coverage and cumbersome operational steps, which fail to meet current
pest and disease control requirements. Therefore, there is an urgent need to develop an efficient pest
control technology system. This paper builds upon the Deep Q-Network algorithm by integrating
the Bi-directional Long Short-Term Memory structure to propose the BL-DQN algorithm. Based on
this, a path planning framework for pest and disease control using agricultural drones is designed.
This framework comprises four modules: remote sensing image acquisition via the Google Earth
platform, task area segmentation using a deep learning U-Net model, rasterized environmental map
creation, and coverage path planning. The goal is to enhance the efficiency and safety of pesticide
application by drones in complex agricultural environments. Through simulation experiments, the
BL-DQN algorithm achieved a 41.68% improvement in coverage compared with the traditional DQN
algorithm. The repeat coverage rate for BL-DQN was 5.56%, which is lower than the 9.78% achieved
by the DQN algorithm and the 31.29% of the Depth-First Search (DFS) algorithm. Additionally, the
number of steps required by BL-DQN was only 80.1% of that of the DFS algorithm. In terms of
target point guidance, the BL-DQN algorithm also outperformed both DQN and DFS, demonstrating
superior performance.

Citation: Fu, H.; Li, Z.; Zhang, W.;


Keywords: precision agriculture; deep Q-learning; Bi-directional Long Short-Term Memory; pest
Feng, Y.; Zhu, L.; Fang, X.; Li, J.
control; remote sensing
Research on Path Planning of
Agricultural UAV Based on Improved
Deep Reinforcement Learning.
Agronomy 2024, 14, 2669. https://
doi.org/10.3390/agronomy14112669 1. Introduction

Academic Editor: Gniewko Niedbała


Monitoring data indicate that in 2024, major crops in China, such as grains, oilseeds,
and vegetables, will confront severe pest and disease threat, with affected areas expected
Received: 14 October 2024 to reach 15.541 million hectares, representing a significant increase compared with previ-
Revised: 31 October 2024 ous years. The potential food loss is projected to exceed 150 million tons [1]. Currently,
Accepted: 10 November 2024 traditional manual or semi-mechanical pesticide spraying methods suffer from incom-
Published: 13 November 2024
plete coverage and low efficiency, failing to meet the current pest and disease control
requirements [2,3]. Thus, accelerating the development of an efficient modern pest control
technology system is urgent. The application of drones in pest and disease control is of
Copyright: © 2024 by the authors.
great significance. By optimizing the flight paths of individual or swarm drones, more
Licensee MDPI, Basel, Switzerland. precise spraying can be achieved, thereby reducing redundant coverage, improving control
This article is an open access article efficiency, conserving resources, and lowering costs [4–7].
distributed under the terms and Recently, drone technology has gained increasing global applications due to its low
conditions of the Creative Commons cost, ease of use, and operational capabilities in high-risk or hard-to-reach areas. This
Attribution (CC BY) license (https:// trend has brought significant benefits and opportunities across various fields, including
creativecommons.org/licenses/by/ agriculture, healthcare, and military applications. Agricultural drones, as part of precision
4.0/). agriculture technology, differ from drones used in other sectors by typically requiring

Agronomy 2024, 14, 2669. https://doi.org/10.3390/agronomy14112669 https://www.mdpi.com/journal/agronomy


Agronomy 2024, 14, 2669 2 of 19

sensors such as hyperspectral and thermal imaging systems. These sensors are employed
to monitor plant health, soil moisture, and other agricultural concerns. Additionally,
agricultural drones may be equipped with liquid spraying systems for precise pesticide or
fertilizer application, a feature that is relatively uncommon in other types of drones [8–11].
The coverage path planning (CPP) problem, as a critical research area in drone path
planning, aims to design a path within a given area such that the drone can traverse
every point or cover each sub-region of the map with the shortest number of steps [12–14].
Solutions to the CPP problem can be roughly categorized into four types. First, Depth-First
Search (DFS) is a graph traversal algorithm that explores as far as possible along each
branch before backtracking. It is often used for solving problems that can be modeled as a
graph, where the goal is to visit all nodes or find a specific path. DFS uses a stack to manage
the nodes to be explored and systematically searches each branch to its end before retracing
its steps [15]. Second, heuristic algorithms, such as the A* algorithm, model the search
space as a tree structure and use heuristic search techniques to solve the problem [16]. Third,
the Artificial Potential Field (APF) method, as a local obstacle avoidance path planning
algorithm, completes the planning task by simulating the potential fields between the
target and obstacles [17]. Fourth, the Deep Q-Network (DQN) algorithm, based on deep
reinforcement learning (DRL) [18,19], approximates the Q-value function through deep
neural networks, allowing the agent to learn and optimize path planning strategies [20].
Cai et al. proposed a coverage path planning algorithm based on an improved A* algo-
rithm, which efficiently accomplishes coverage tasks for cleaning robots by incorporating a
U-turn search algorithm. However, despite its effectiveness in high-precision maps, the A*
algorithm is associated with significant computational complexity and node redundancy
issues [21].
Wang et al. proposed a multi-agent coverage path planning method based on the
Artificial Potential Field (APF) theory, which guides the movement of agents by simulating
the interactions of forces in a physical field. However, the APF method is susceptible to
becoming trapped in local optima, faces challenges with complex parameter adjustments,
and may encounter potential singularity issues during planning [22].
Tang et al. proposed a coverage path planning method based on region-optimal
decomposition, which combines an improved Depth-First Search (DFS) algorithm with a
genetic algorithm to achieve efficient coverage inspection tasks for drones in port environ-
ments [23]. Although the improved DFS algorithm successfully completes the coverage
path planning tasks, it may encounter local optima in certain complex environments and
faces challenges in ensuring safety during actual drone flight operations.
Unlike classical algorithms that rely on predetermined rules, the DQN algorithm,
based on deep reinforcement learning, introduces two optimization techniques: “target
network” and “experience replay”. The target network updates its parameters at regular
intervals to maintain relative stability in the target values, thereby reducing fluctuations
during the training process. Experience replay allows for the reuse of past experiences,
sampling from diverse previous interactions to mitigate the instability issues caused by data
correlation. Through this continuous improvement of experience, the drone is capable of
making decisions under complex environmental conditions, making it particularly effective
in dynamic and challenging environments [24–26].
In recent years, Mirco Theile and colleagues have addressed the CPP problem for
drones under fluctuating power limitations by leveraging the DDQN algorithm to balance
battery budgets, enabling the achievement of full coverage maps [27]. S.Y. Luis and
colleagues approached the problem of patrolling water resources by modeling it as a
Markov Decision Process (MDP) and using DQN and DDQN algorithms for training.
However, the large number of parameters involved made it challenging to ensure algorithm
stability [28].
Agronomy 2024, 14, 2669 3 of 19

In the field of agricultural applications, Li and their team introduced an algorithm


for multi-region task path planning utilizing a DDQN to tackle the difficulties associated
with precise fertilization using agricultural drones [29]. (1) This research partially extends
and supplements the application of DQN within the scope of agricultural drone CPP;
however, it primarily focuses on the formation control of drones in multi-task areas and
does not adequately account for the variations in actual farmland terrain, limiting the
generalizability of the algorithm. (2) Furthermore, this study did not consider the drone
recovery issue when designing the reward function, resulting in uncertainty regarding
the landing position after task completion, which poses significant recovery challenges.
(3) Although the improved DDQN algorithm has demonstrated some success in path
planning, it still exhibits shortcomings in obstacle avoidance, leading to issues such as the
presence of overlapping areas within the task region and difficulties in evading obstacles.
To address these concerns, this paper attempts to integrate a Bi-directional Long Short-Term
Memory (Bi-LSTM) structure with the DQN algorithm, resulting in an improved BL-DQN
algorithm. Through the use of the Google Earth platform, multiple farmland areas were
randomly selected to construct planning maps, and the reward function was adjusted
to better fit real agricultural application scenarios, thereby enhancing the algorithm’s
generalizability and optimizing issues related to drone recovery, repeated regions, and
obstacle avoidance.
The primary contributions of this paper are outlined as follows:
1. A framework for pest and disease control path planning for agricultural drones
has been developed using the BL-DQN algorithm. This framework includes four
modules: remote sensing image acquisition via the Google Earth platform, task area
segmentation using the deep learning U-Net model, grid-based environmental map
creation, and coverage path planning.
2. A new BL-DQN algorithm is proposed, which effectively integrates the BL-LSTM
structure with the target network in the DQN algorithm to achieve high-performance
information processing and learning.
3. To address the drone task retrieval issue, a target-oriented reward function is designed,
taking into account the priority of target areas, path efficiency, and task requirements.
The organization of this paper is structured as follows: Section 2 details the creation
of environmental maps, design of the reward function, and improvements to the DQN
algorithm; Section 3 describes the experimental design, presents experimental validation
and results analysis, and outlines future prospects; and Section 4 concludes with a summary
of research findings.

2. Materials and Methods


The path planning framework for pest and disease control in agricultural drones
proposed in this paper utilizes the GE platform in conjunction with the deep learning
U-Net algorithm to construct the task environment maps. The drone then employs the
BL-DQN algorithm to complete the coverage task and locate target arrival points, thus
facilitating the path planning task for pest and disease control. The comprehensive structure
of the proposed approach is depicted in Figure 1.
Agronomy 2024,
2024, 14,
14, 2669
x FOR PEER REVIEW 44of
of 21
19

Figure 1. The
Figure 1. The comprehensive
comprehensive process of the
process of the framework.
framework.

Area Description
2.1. Designed Planning Area
This research explores notable topographical differences differences in various
various agricultural
agricultural pro-
pro-
duction settings,
settings, focusing
focusing onon Jilin
Jilin Province
Province to to address
address the the diversity
diversity found
found in in farmlands.
farmlands.
Situated in inthe
theheart
heartofofthe
the Northeast
Northeast Plain,
Plain, JilinJilin Province
Province is part
is part of of
of one onetheofglobe’s
the globe’s
three
three main
main black black soil areas
soil areas and serves
and serves as a vital
as a vital agricultural
agricultural region region and significant
and significant graingrain
pro-
production
duction hubhub in China.
in China. TheThe province
province featuresdiverse
features diverseterrain
terrainwith
withhigher
higherelevations
elevations in
in the
southeast and
southeast and lower
lower elevations
elevations inin the
the northwest.
northwest. Its Its landscape
landscape predominantly
predominantly comprises
comprises
plains and mountainous regions, covering an area of approximately
plains and mountainous regions, covering an area of approximately 187,400 square km, 187,400 square km,
with elevations ranging from 5 m
with elevations ranging from 5 m to 2691 m. to 2691 m.
High-resolution remote
High-resolution remote sensing
sensing images
images of of farmlands
farmlands in in Jilin
Jilin Province
Province fromfrom April
April
2021 to October 2022 were acquired using the Google Earth platform.
2021 to October 2022 were acquired using the Google Earth platform. The geographic co- The geographic
coordinates
ordinates of the
of the study
study areaarea range
range fromfrom 123◦to
123°01′
′ to 128◦ 08′ east longitude and 43◦ 14′ to
01128°08′ east longitude and 43°14′ to 45°36′
45 ◦ 36 ′ north latitude [30]. Six regions were randomly selected for analysis, as in
depicted
north latitude [30]. Six regions were randomly selected for analysis, as depicted Figure
in Figure 2. Furthermore, the U-Net model was utilized for detailed
2. Furthermore, the U-Net model was utilized for detailed segmentation of the farmland segmentation of the
areas, classifying them into two categories: (1) task areas and (2) non-task areas, as areas,
farmland areas, classifying them into two categories: (1) task areas and (2) non-task illus-
as illustrated
trated in Figurein Figure
3. 3.
Additionally, this study employs a gridded map approach, dividing the environment
into a 10 × 10 grid and using a two-dimensional integer array for storage and operations.
Based on this, the drone’s environment map is defined as a state matrix with 100 elements,
where each element represents a grid cell on the map. The side length of the map is denoted
by L, and M( x,y) represents the environmental state at position (x, y) on the map. Each
position is assigned one of five distinct values to characterize its specific environmental
features, as illustrated in Table 1.
In this paper, each grid cell is considered as a unit of the map for each movement
of the drone. When an action Ai is executed, the corresponding state matrix changes,
transitioning from the current state Si to the next state Si+1 , as illustrated in Figure 4.
Agronomy 2024, 14, x FOR PEER REVIEW 5 of 21
Agronomy 2024, 14, 2669 5 of 19

Agronomy 2024, 14, x FOR PEER REVIEW 6 of 21

Figure 2. Remote sensing map extraction (task areas are highlighted with red boxes).
Figure 2. Remote sensing map extraction (task areas are highlighted with red boxes).

Designed
Figure3.3.Designed
Figure planning
planning areaarea process.
process.

Additionally, this study employs a gridded map approach, dividing the environment
into a 10 × 10 grid and using a two-dimensional integer array for storage and operations.
Based on this, the drone’s environment map is defined as a state matrix with 100 elements,
where each element represents a grid cell on the map. The side length of the map is de-
noted by L, and 𝑀( , ) represents the environmental state at position (x, y) on the map.
Each position is assigned one of five distinct values to characterize its specific environ-
Agronomy 2024, 14, 2669 6 of 19

Table 1. Map state corresponding to different values.

Value (M(x,y) ) Description


0 Non-task area
1 Task area
2 Current position of UAV
Agronomy 2024, 14, x FOR PEER REVIEW 3 Target location 7 of 21

4 Obstacle area

UAVaction
Figure4.4.UAV
Figure actionselection
selectionspace.
space.

Here, 𝐴Ai =
Here, {1, 2,
= {1, 2, 3,
3, 4}
4} represents
represents the
the four
fourallowed
allowedmovement
movementdirections
directionsfor
forthe
thedrone
droneat
its current position: left, right, down, and up, respectively.
at its current position: left, right, down, and up, respectively.
2.2. Basic Theory
2.2. Basic Theory
2.2.1. Deep Q-Learning
2.2.1. Deep Q-Learning
In 2016, Volodymyr Mnih proposed the DQN algorithm [31], which integrates the
In 2016, Volodymyr
characteristics of DL with Mnih
the proposed
principlesthe of DQN algorithmlearning
reinforcement [31], which
(RL).integrates the
This method
characteristics of DLtowith
utilizes DL models the principles
directly of reinforcement
acquire control strategies from learning (RL).
complex This method
sensory uti-
information
lizes DL models to directly acquire control strategies from complex
and assesses the value through neural networks. The DQN algorithm implements an sensory information
and assessesreplay
experience the value through
system, neural
in which networks.
experience tuplesThecreated
DQN algorithm
while the implements anwith
agent interacts ex-
perience replay system, in which experience tuples created while the agent
the environment are saved in a replay buffer, as shown in Equation (1). These experiences interacts with
the
areenvironment
then randomly aresampled
saved infrom
a replay
the buffer,
buffer for as shown
training.in Equation (1). These experiences
are then randomly sampled from the buffer for training.
< s t , a t , r t , s t +1 > (1)
< 𝑠 ,𝑎 ,𝑟 ,𝑠 (1)
Furthermore,DQN
Furthermore, DQN utilizes
utilizes two
two networks
networksthat
thatshare
sharethe
thesame
samestructure
structure and
andparameters:
parame-
the policy
ters: network
the policy andand
network thethe
target network.
target network.Throughout
Throughout training, only
training, the
only parameters
the parametersof
ofthe
thepolicy
policynetwork
networkareareconsistently
consistentlyupdated.
updated. At specific intervals,
At specific intervals, these
theseparameters
parametersare
are
transferred to the target network to reduce instability caused by the frequent updates toto
transferred to the target network to reduce instability caused by the frequent updates
thetarget
the targetQ-values.
Q-values.TheTheDQNDQNalgorithm
algorithmhashasshown
shownimpressive
impressiveperformance
performancein invarious
various
classic Atari 2600 games, reaching near-human-level proficiency through
classic Atari 2600 games, reaching near-human-level proficiency through learning, and learning, and has
consequently become a significant subject in artificial intelligence research in
has consequently become a significant subject in artificial intelligence research in recent recent years.
years.

2.2.2. Reward
Since the model relies solely on feedback rewards obtained through interactions with
the environment to guide learning, the design of an effective reward mechanism typically
determines the efficiency and accuracy of the model’s learning process. Therefore, a well-
Agronomy 2024, 14, 2669 7 of 19

2.2.2. Reward
Since the model relies solely on feedback rewards obtained through interactions with
the environment to guide learning, the design of an effective reward mechanism typically
determines the efficiency and accuracy of the model’s learning process. Therefore, a well-
designed reward function should be simple and reflect the specific requirements of the task.
The reward function used in traditional DQN path planning algorithms is represented by
Equation (2). 
 roverlay , Task area coverage
rt = r , Collision (2)
 crash
0, Other cases
Based on the different outcomes at the next timestep, the rewards are divided into
three parts. The action roverlay for reaching the target is given a positive value to encourage
the model to find the target. Conversely, the action rcrash for collisions is assigned a negative
value to penalize collision behavior. However, sparse rewards, which only occur when
reaching the target or experiencing collisions, result in a lack of valuable feedback during
each action. This not only reduces learning efficiency and increases exploration difficulty
but also complicates policy optimization [32].

2.3. A Reward Function Based on Goal Orientation


To tackle the issues of high complexity and slow convergence in traditional reward
function strategies, this paper focuses on practical tasks for agricultural drones. Specifically,
its goal is to devise the best route from the starting point to the target location, minimizing
steps, covering the entire task area, and evading obstacles. The reward function has been
optimized to improve these aspects, and a new reward function design method is proposed,
as illustrated in Equation (3).


 rreach , Reaching the target point
 r
 carsh
 , Collision
rt = roverlay , Task area coverage (3)
 rstep , Maximum remaining score/Maximum number of steps




0, Other cases

1. Coverage Reward: This approach allocates rewards according to the percentage of


the task area on the map that has been explored.
2. Target Guidance Reward: This method sets a reward to guide the drone towards
the target points.
These optimizations enable the drone to cover the map more quickly and reach target
points, thereby accelerating the model’s convergence speed.
The map coverage reward is given by Equation (4), where Mnewx ,newy denotes the
agent’s existing location. Areas marked with the number 0 represent non-task regions,
while those marked with the number 1 indicate task areas. Cinitial is the initial number of
task areas on the map, Ccurrent is the count of task areas at the present moment, and φ is the
adjustable coverage reward scaling factor. The coverage rate is evaluated by comparing the
number of current task areas and the distance to the target point, which allows for dynamic
adjustments of both the coverage reward and the target guidance reward. This process
effectively guides the drone in selecting a new position.
 Cinitial −Ccurrent
Cinitial
+ + R goal if Mnewx ,newy = 1

roverlay = 0.05 φe (4)
−0.1 if Mnewx ,newy = 0

The target guidance reward for the drone is designed as shown in Equation (4). In
this Equation, current− distance refers to the distance between the initial position and the
destination, while new− distance is the distance to the destination from the new position
Agronomy 2024, 14, 2669 8 of 19

after the movement. Mx and My represent the current position coordinates of the drone on
the map, Mnewx and Mnewy are the coordinates of the agent’s position after executing the
current action, selected based on a greedy strategy, and Gx and Gy are the coordinates of
the target location on the map. β is the adjustable scaling factor for the target point reward.
Equations (5)–(7) illustrate these calculations.
To calculate the distance measured from the current location to the destination, use
the Euclidean distance equation:
q 2
current− distance = ( Mx − Gx )2 + My − Gy (5)

To calculate the distance between the new position and the destination, use the Eu-
clidean distance equation:
r  2
new− distance = ( Mnewx − Gx )2 + Mnewy − Gy (6)

To calculate the reward for reaching the target R goal based on the difference between
the two distances, use the following Equation:
(
β(current− distance − new− distance) if newd < currentd
R goal = (7)
0 otherwise

To avoid the program getting stuck in local optima and to reduce the training burden,
this paper presents three solutions, where “True” indicates that the task has been completed
or failed, signaling that the current episode has ended, as illustrated in Equation (8).
1. The current episode is terminated when the drone collides with an obstacle.
2. The current episode is terminated if the drone exceeds the maximum step limit or if
the drone has not scored for a number of consecutive steps beyond the specified threshold.
3. The current episode is terminated when the drone reaches the target point and there
are no uncovered areas within the task zone.

 rcarsh (−1 and True), if Mnewx ,newy = 4
R, Done = r (−0.1 and True), S ≥ Smax or N ≥ Nmax (8)
 step
rreach (1 and True), if Mnewx ,newy = 3 and ones− ratio = 0

2.4. BL-DQN
Bi-LSTM is an extended LSTM network that simultaneously considers data from both
past and future contexts by employing two LSTM layers operating in opposite directions
to capture contextual relationships. This structure enables the model to obtain more
comprehensive information when processing data, thereby enhancing performance. This
paper extends the DQN algorithm by integrating the BL-LSTM structure, which improves
the focus on multi-temporal action values through deep neural networks. The network
architecture consists of two LSTM layers in different directions to increase model depth
and capture more complex sequence dependencies, and a fully connected layer that maps
the high-dimensional feature vectors from the Bi-LSTM layers to a low-dimensional space
matching the number of output classes for the task, as illustrated in Figure 5c.
The entire model inference process is illustrated in Figure 5b. Initially, the input data
were converted into an array and subjected to normalization. Subsequently, the data were
transformed into tensors and dimensions were added to meet the input requirements of the
subsequent model, as shown in Figure 5a. The preprocessed data were then input into the
model after being flattened and reshaped, allowing the model to generate output results
that provided guidance for path planning.
Agronomy 2024, 14, x FOR PEER REVIEW 10 of 21
Agronomy 2024, 14, 2669 9 of 19

Figure 5. (a) Data processing. (b) Model inference process. (c) Model structure.
Figure 5. (a) Data processing. (b) Model inference process. (c) Model structure.
At each time step t, the Bi-LSTM’s output consisted of the merging of hidden states
At each time step t, the Bi-LSTM’s output consisted of the merging of hidden states
from both the forward and backward LSTM layers, as shown in Equation (9).
from both the forward and backward LSTM layers, as shown in Equation (9).
ℎ = ℎ , ℎh t
BiLSTM = [ht , h′t ] (9)(9)
Here,ℎht and
Here, and hℎt ′′denote
denotethe
thehidden
hiddenstates
statesofofthe
theforward
forwardand
andbackward
backwardLSTM
LSTMlayers
lay-
atat
ers time
timestep
stept.t.Subsequently,
Subsequently,thetheoutput
outputofofthe
theBi-LSTM
Bi-LSTMlayer
layer will
will be
be processed
processed through
through a
a fully
fully connected
connected layer
layer to
to estimate
estimate the
the Q-values.
Q-values.
𝑄(𝑠, 𝑎) 𝑄(𝑠,
Q(𝑎;
s, 𝜃)
a) =
≈𝑊Q(⋅s,ℎa; θ ) = +
W 𝑏· htBiLSTM + b (10)
(10)
The Q-value update equation is given by Equation (11).
The Q-value update equation is given by Equation (11).
𝑄 ( , ) ← 𝑄(𝑠, 𝑎) + 𝛼 𝑟 + 𝛾𝑚𝑎𝑥 𝑄(𝑠 , 𝑎 ) − 𝑄(𝑠, 𝑎) (11)
Q′(s,a) ← Q(s, a) + α[r + γmax a′ Q(s′ , a′ ) − Q(s, a)] (11)
Here, 𝑄(𝑠, 𝑎) denotes the Q-value for taking action 𝑎 in state 𝑠 , where 𝛼 repre-
sents the learning
Here, Q(s, arate, 𝑟 is the
) denotes the Q-value
reward, for 𝛾 is the
andtaking discount
action factor.
a in state A larger
s, where 𝛾 empha-
α represents the
sizes long-term
learning rate, rrewards more heavily,
is the reward, and γ is whereas factor. 𝛾A prioritizes
a smaller
the discount short-termlong-term
larger γ emphasizes gains.
This paper
rewards employswhereas
more heavily, a greedy strategyγ to
a smaller balance short-term
prioritizes the concepts of exploitation and
gains.
This paper employs a greedy strategy to balance
exploration when selecting actions at each time step, as shown in Equation the concepts of exploitation and
(12). In this
equation, 𝑅𝑁 when
exploration selecting
represents actions
a random at eachgenerated
number time step,inasthe shown
rangeinofEquation
0 to 1 for(12).
each In this
time
step. 𝜖 is aRN
equation, represents a random
hyperparameter used tonumber
balance generated
exploitation in the
andrange of 0 to 1dynamically
exploration, for each time
step. ϵ throughout
adjusted is a hyperparameter used
the training to balance
phase, as shownexploitation
in Equation and (13). When 𝑅𝑁
exploration, 𝜖, the
dynamically
adjusted throughout the training phase, as shown in Equation (13).
action with the highest Q-value for the current state is chosen for exploitation. Otherwise,When RN > ϵ, the
action with the highest Q-value for
a random action is selected for exploration. the current state is chosen for exploitation. Otherwise,
a random action is selected for exploration.
𝑎𝑟𝑔𝑚𝑎𝑥(𝑄(𝑠, 𝑎; 𝜃)) if 𝑅𝑁 𝜖
𝐴𝑐𝑡𝑖𝑜𝑛 = (12)
𝑟𝑎𝑛𝑑𝑜𝑚 𝑎𝑐𝑡𝑖𝑜𝑛 argmax ( Qotherwise
(
(s, a; θ )) if RN > ϵ
Action = (12)
𝜖=𝜖 + (𝜖 − 𝜖 ) random𝑒 action otherwise (13)
This paper usesϵ the
= ϵSmooth L1 Loss as the loss
min + ( ϵinitial − ϵmin ) × e
function,
− Decay which
rate×currunt smooths the input
episode (13)
values close to zero to reduce the occurrence of extreme gradient values during gradient
descent. This loss function applies squared error for small errors and linear error for larger
errors. By computing the loss between 𝑄value and 𝑄target , and optimizing parameters
Agronomy 2024, 14, 2669 10 of 19

This paper uses the Smooth L1 Loss as the loss function, which smooths the input
values close to zero to reduce the occurrence of extreme gradient values during gradient
descent. This loss function applies squared error for small errors and linear error for larger
errors. By computing the loss between Qvalue and Qtarget , and optimizing parameters
through backpropagation, the agent learns the actions that maximize expected rewards in a
given state after extensive training and optimization, as shown in Equation (14).
(
0.5( Q(s, a; θ ) − y)2 if | x − y| < 1
L(y, ŷ) = (14)
| Q(s, a; θ ) − y| − 0.5 otherwise

Here, y represents the Qtarget value given by the target network, as shown in Equation (15).

y = r + γmax a′ Q(s′ , a′ ) (15)

3. Results and Discussion


This section validates the robustness and efficiency of the agricultural drone pesticide
application path planning algorithm through simulation experiments. This includes es-
tablishing the simulated training environment tasks, setting algorithm parameters, and
optimizing the model path.

3.1. Experimental Setup


All simulation experiments in this study were performed on a desktop computer
equipped with NVIDIA GeForce RTX 3090 GPUs (24 GB × 2) and running the Ubuntu
operating system, using Python 3.11.5 for programming. The parameter settings for the
proposed algorithm are shown in Table 2.

Table 2. The parameters of the BL-DQN algorithm.

Parameter Value Description


EP 50,000 maximum episode
Smax 100 maximum step size
Nmax 5 maximum consecutive unrewarded steps
γ 0.95 discount rate
B 128 batch size
M 100,000 experience replay buffer capacity
LR 1× 10−3 learning rate
ϵ_initial 1.0 initial exploration rate
ϵ_min 0.1 minimum exploration rate
Decay rate 3000 controlling the speed at which ϵ epsilonϵ decreases
Layers LSTM1 128 the number of neurons in LSTM1
Layers LSTM2 128 the number of neurons in LSTM2
n 10 network update frequency
N_actions 4 action space size
Optimizer Adam optimizer

Table 2 presents the hyperparameter configurations for the BL-DQN algorithm. The
maximum training episodes (EP ) was set to 50,000, with a maximum step count per episode
(Smax ) of 100 to prevent excessive training time. The maximum number of consecutive steps
without reward (Nmax ) was set to 5, encouraging exploration in the absence of rewards. The
discount factor (γ) was 0.95, highlighting the importance of long-term returns. The batch
size (B) was set to 128, and the capacity of the experience replay buffer (M) was 100,000. The
Agronomy 2024, 14, 2669 11 of 19

learning rate (LR) was 1 × 10−3 , affecting the step size for weight adjustments. The initial
exploration rate (ϵ_initial) was set to 1.0 to encourage the agent to explore by selecting
random actions during the early training phase. The minimum exploration rate (ϵ_min) was
0.1, ensuring that the agent retains some randomness for exploration in the later training
stages. The decay rate was set to 3000, controlling the speed at which ϵ epsilonϵ decreases.
Each LSTM layer contained 128 neurons (Layers LSTM1 and Layers LSTM2 ), enhancing the
model’s expressive capability. The network update frequency (n) was set to 10, meaning
that the parameters of the policy network would be transferred to the target network every
10 training episodes. The action space size (N_actions) was 4, corresponding to movements
in four directions. Finally, the Adam optimizer was selected to accelerate convergence and
improve learning efficiency.

3.2. Results and Analysis


Due to its high applicability and flexibility, the DFS algorithm was easy to deploy
quickly and ensured that all potential paths were explored during the search process. This
makes it suitable for tasks requiring complete coverage of specific areas. Therefore, this
study conducted a detailed comparison of the DFS algorithm with the BL-DQN and DQN
algorithms on six randomly selected grid maps. In these experiments, black grids represent
obstacles, gray grids indicate the starting points for the drones, white grids denote task
areas, and brown grids mark the target points on the map.
The BL-DQN algorithm was tested on six randomly selected grid maps, comparing
the DQN and DFS algorithm. In these experiments, black grids represented obstacles, gray
grids indicated the starting points for drones, white grids denoted task areas, and brown
squares marked the target points on the map.
The pseudocode for the agricultural drone path planning algorithm and its training
process is illustrated in Algorithm 1.
The hyperparameters were initialized, and in each episode, the agent initialized the
map and reset the reward and loss values. During the episode, the agent determined its
actions based on the current ϵ value: if a randomly generated number was less than ϵ, a
random action was selected for exploration; otherwise, the action with the highest Q-value
for the current state was chosen for exploitation. After executing the action, the agent
observed the reward received and updated the environmental map. Whenever the episode
count reached a multiple of n, the parameters of the policy network were copied to the
target network, and the loss was calculated to update the network weights. This process
continued until the maximum number of episodes (EP ) was reached.
The results of the path planning are depicted in Figures 6–11, where the areas enclosed
by red boxes represent repeated paths. Under the predefined task conditions, the BL-DQN
algorithm, as a type of statistical learning method, demonstrated superior performance
with respect to coverage, number of steps, and repeat coverage rates in comparison to both
the DQN and DFS algorithms, while also effectively considering target point planning. The
DFS algorithm achieved complete coverage of the map but failed to plan specifically for
target points, resulting in numerous non-predefined action trajectories that significantly
reduced the operational safety of the drones. Additionally, the traditional DQN algorithm
did not meet the task requirements for complete regional coverage or target-oriented
planning. In contrast, the proposed BL-DQN algorithm exhibited exceptional performance
across all metrics, including complete coverage of task areas, repeat coverage rates, number
of steps, and overall task completion.
Agronomy 2024, 14, 2669 12 of 19

Algorithm 1 BL-DQN algorithm for agricultural UAV path planning


1. Input: EP ← 50,000 // Maximum episodes
2. n ← 10 // Network update frequency
3. ϵ_initial ← 1.0 // Initial exploration rate
4. ϵ_min ← 0.1 // Minimum exploration rate
5. Initialize hyperparameters(learning rate, gamma,)
6. Initialize Police and Target networks with parameters
7. for episode in range( EP ) do
8. Initialize Map
9. Set episode_reward to 0
10. Set episode_loss to 0
11. done ← false
12. while (not Done) do
13. if random() < ϵ then
14. action ← random_action()
15. else
16. action ← argmax(Q(state, action))
17. end if
18. Execute action in Map and observe Reward R and done
19. Update Map environment
20. if episode mod n = 0 then
21. Copy parameters from PolicyNet to TargetNet
22. end if
23. Passing the parameters of policy net to Target net
24. Calculate Loss
25. Update networks using backpropagation
Agronomy 2024, 14, x FOR PEER REVIEW
26. end while 14 of 21
27. end for

Figure6.6.Path
Figure Pathplanning
planning results
results of
of the
the DQN
DQN(left),
(left),DFS
DFS(middle),
(middle),and
andBL-DQN
BL-DQN(right) onon
(right) Map 1. 1.
Map

Figure7.7.Path
Figure Pathplanning
planning results
results of
of the
the DQN
DQN (left),
(left),DFS
DFS(middle),
(middle),and
andBL-DQN
BL-DQN(right) onon
(right) Map 2. 2.
Map
Agronomy 2024, 14, 2669 13 of 19
Figure 7. Path planning results of the DQN (left), DFS (middle), and BL-DQN (right) on Map 2.

Agronomy 2024, 14, x FOR PEER REVIEW 15 of 21


Agronomy 2024, 14, x FOR PEER REVIEW 15 of 21
Agronomy 2024, 14, x FOR PEER REVIEW 15 of 21

Figure8.8.Path
Figure Pathplanning
planning results
results of
of the
the DQN
DQN(left),
(left),DFS
DFS(middle),
(middle),and
andBL-DQN
BL-DQN(right) on on
(right) Map 3. 3.
Map

Figure
Figure 9.
9. Path
Path planning
planning results
results of
of the
the DQN
DQN (left),
(left), DFS
DFS (middle),
(middle), and
and BL-DQN
BL-DQN (right)
(right) on
on Map
Map 4.
4.
Figure9.9.Path
Figure Pathplanning
planning results
results of
of the
the DQN
DQN (left),
(left),DFS
DFS(middle),
(middle),and
andBL-DQN
BL-DQN(right) onon
(right) Map 4. 4.
Map

Figure
Figure 10.
10. Path
Path planning
planning results
results of
of the
the DQN
DQN (left),
(left), DFS
DFS (middle),
(middle), and
and BL-DQN
BL-DQN (right)
(right) on
on Map
Map 5.
5.
Figure 10.Path
Figure10. Pathplanning
planning results
results of the
of the DQN (left),
DQN (left),DFS
DFS(middle),
(middle),and
andBL-DQN
BL-DQN (right)
(right) onon Map
Map 5. 5.

Figure
Figure 11.
Figure11. Path
11.Path planning
Pathplanning results
planning results of
of the
results of the DQN
the DQN (left),
DQN (left),DFS
(left), DFS(middle),
DFS (middle),and
(middle), andBL-DQN
and BL-DQN
BL-DQN (right) on
(right)
(right) Map
onon 6.
Map
Map 6. 6.
Figure 11. Path planning results of the DQN (left), DFS (middle), and BL-DQN (right) on Map 6.
As shown
As shown in
shown in Figure
in Figure 12,
Figure 12, the
12, the loss
the loss values
loss values of
values of the
of the BL-DQN
the BL-DQN algorithm
BL-DQN algorithm were
algorithm were generally
were generally lower
generally lower
lower
than As
those of the DQN algorithm. During training, the BL-DQN demonstrated greater
than
than those
those of the
ofmore DQN
the DQN algorithm.
algorithm. During training,
During strategy the BL-DQN
training,learning
the BL-DQN demonstrated
demonstrated greater
greater
stability and
stability and
and more efficient
more efficient problem-solving
efficient problem-solving
problem-solving strategy
strategy learning capabilities,
learning capabilities, with
capabilities, with better
with better gen-
better gen-
gen-
stability
eralization performance. Although there may be significant fluctuations in the loss values
eralization performance. Although there may be significant fluctuations in the loss
eralization performance. Although there may be significant fluctuations in the loss values values
Agronomy 2024, 14, 2669 14 of 19

As shown in Figure 12, the loss values of the BL-DQN algorithm were generally lower
than those of the DQN algorithm. During training, the BL-DQN demonstrated greater
stability and more efficient problem-solving strategy learning capabilities, with better
generalization performance. Although there may be significant fluctuations in the loss
values of the BL-DQN under certain specific conditions, leading to occasional performance
Agronomy 2024, 14, x FOR PEER REVIEW 16 of 21
degradation, its overall performance remained superior to that of the DQN algorithm.

Figure
Figure 12.12. Comparisonof
Comparison ofloss
loss between
between BL-DQN
BL-DQNalgorithm andand
algorithm DQN algorithm.
DQN algorithm.

AsAs illustrated in
illustrated in Figure
Figure13,
13,thethe
reward value
reward of theof
value proposed BL-DQNBL-DQN
the proposed algorithmalgorithm
sig-
nificantly outperformed that of the DQN algorithm during training. However,
significantly outperformed that of the DQN algorithm during training. However, as the as the com-
plexity ofof
complexity thethe
map increased,
map the the
increased, model’s performance
model’s also exhibited
performance greater
also exhibited fluctuations.
greater fluctuations.
Agronomy 2024,
Agronomy 2024, 14,
14, 2669
x FOR PEER REVIEW 17 of 21
15 of 19

Figure 13. Comparison of reward between BL-DQN algorithm and DQN algorithm.

To perform
To perform aa quantitative assessment of
quantitative assessment of the
the three
three algorithms,
algorithms, the
the research
research measured
measured
the repetition rate, the steps involved in path planning, coverage rate, number of collisions,
the repetition rate, the steps involved in path planning, coverage rate, number of collisions,
target point
target point arrival
arrival status,
status, and
and adherence
adherence to
to rules
rules (including
(including collisions
collisions with
with obstacles
obstacles and
and
deviations from the specified direction) for each algorithm in completing the
deviations from the specified direction) for each algorithm in completing the task area task area
coverage. The
coverage. The analysis
analysis results
results are
are shown
shown in
in Table
Table 3.3. The
The analysis
analysis shows
shows that
that the
the BL-DQN
BL-DQN
algorithm surpasses the other algorithms in terms of drone path planning, coverage
algorithm surpasses the other algorithms in terms of drone path planning, coverage rate, rate,
number of steps, target point guidance, and adherence to rules. After 50,000
number of steps, target point guidance, and adherence to rules. After 50,000 training iter- training
iterations, the DQN algorithm did not achieve full coverage and effective target point
ations, the DQN algorithm did not achieve full coverage and effective target point guid-
guidance. Although the DFS algorithm showed stable coverage, it did not match the
ance. Although the DFS algorithm showed stable coverage, it did not match the BL-DQN
BL-DQN in terms of target point accuracy and task completion, and it exhibited higher
in terms of target point accuracy and task completion, and it exhibited higher repeat rates
repeat rates and rule violations.
and rule violations.
Agronomy 2024, 14, 2669 16 of 19

Table 3. Comparison of the experimental results.

Repeat Offense Complete


Map Algorithms Step Coverage (%) Reach Target
(%) Against Rule the Task
Ours 52 8.3% 100% True False True
Map 1 DQN 24 12.5% 43.75% False False False
DFS 65 35.42% 100% False True False
Ours 50 11.11% 97.78% False False False
Map 2 DQN 33 13.33% 62.22% False False False
DFS 62 37.78% 100% False True False
Ours 51 10.87% 100% True False True
Map 3 DQN 29 6.52% 56.52% False False False
DFS 58 26.09% 100% False True False
Ours 51 4.08% 100% True False True
Map 4 DQN 31 10.2% 53.06% False False False
DFS 69 40.82% 100% False True False
Ours 49 4.08% 95.92% False False False
Map 5 DQN 40 18.37% 63.26% False False False
DFS 65 32.65% 100% False False False
Ours 53 3.92% 100% True False True
Map 6 DQN 38 9.8% 64.71% False False False
DFS 63 23.53% 100% False False False

3.3. Discussion
The BL-DQN algorithm outperformed traditional DQN and DFS algorithms in terms
of the number of steps, coverage rate, and repeat coverage rate. It also achieved significant
advancements in task-oriented guidance, indicating that the BL-DQN algorithm enhances
efficiency in path planning while better optimizing drone recovery issues. However,
despite the notable optimization effects demonstrated by the proposed BL-DQN algorithm
in simulated environments, several limitations remain in the current research. In recent
years, Pan and colleagues have made innovative improvements to the traditional APF
method for formation control in three-dimensional constrained spaces by introducing
the concept of rotational potential fields. They developed a novel formation controller
utilizing potential function methods [33]. Additionally, Zhou and associates proposed a
biologically inspired path planning algorithm for real-time obstacle avoidance in unmapped
environments for unmanned aerial vehicles [34]. Fang et al. proposed a solution that
integrates distributed network localization with formation maneuvering control. This
approach utilizes relative measurement information among agents to achieve real-time
positioning and coordinated formation management of multi-agent systems in multi-
dimensional spaces [35]. Enrique Aldao and colleagues introduced a real-time obstacle
avoidance algorithm based on optimal control theory, suitable for autonomous navigation
of UAVs in dynamic indoor environments. By integrating pre-registered three-dimensional
model information with onboard sensor data, this algorithm optimizes UAV flight paths and
effectively avoids collisions with both fixed and moving obstacles in the environment [36].
He, Y et al. proposed a new stability analysis method for dealing with hybrid systems with
double time delays, which has important implications for the design of control strategies
in the field of UAVs [37]. Considering the research trends of the past three years, there
remains room for exploration in the following areas within this study.
Agronomy 2024, 14, 2669 17 of 19

1. A limitation of the current model is its reliance on pre-defined map data. Future
research should focus on integrating real-time environmental data, such as weather
conditions, crop growth dynamics, pest distribution information, and other distur-
bances, to enable dynamic adjustments in path planning, ensuring the stability of the
drones. The development of such adaptive algorithms will substantially enhance the
robustness and effectiveness of the model in practical agricultural applications.
2. Extending the existing single-agent model to a multi-agent framework holds promise
for further improving operational efficiency and coverage in large-scale farmland.
Investigating how to coordinate multiple drones for joint path optimization, while
considering communication constraints and task allocation strategies, represents a
challenging yet promising direction for future research.
3. As depicted in Figures 7 and 10, the complexity of the maps resulted in target points
being unmet in Map 2 and Map 5. This indicates that there is potential for enhance-
ment. Future efforts will focus on refining the model and adjusting parameters to
improve planning efficacy.

4. Conclusions and Future Work


This study improves the Deep Q-Network algorithm by incorporating a Bi-directional
Long Short-Term Memory structure, resulting in the BL-DQN algorithm. A target point-
oriented reward function suitable for complex farmland environments was designed based
on this, and a path planning framework for agricultural drones was developed. This
framework includes four modules: remote sensing image acquisition based on the Google
Earth platform, task area segmentation using the deep learning U-Net model, grid-based
environmental map creation, and coverage path planning. Through simulation experiments,
the BL-DQN algorithm achieved a 41.68% improvement in coverage compared with the
traditional DQN algorithm. The repeat coverage rate for the BL-DQN was 5.56%, which is
lower than the 9.78% achieved by the DQN algorithm and the 31.29% of the DFS algorithm.
Additionally, the number of steps required by the BL-DQN was only 80.1% of that of the
DFS algorithm. In terms of target point guidance, the BL-DQN algorithm also outperformed
both the DQN and DFS, demonstrating superior performance.
These improvements not only highlight the advantages of the BL-DQN algorithm,
but also hold significant practical implications for enhancing precision and intelligence in
modern agriculture. This indicates that drones equipped with the BL-DQN algorithm can
more effectively cover target areas during pest and disease control operations, reducing
the impact of multiple applications and missed spray areas. Consequently, this leads to
significant savings in time and energy, lowers operational costs, and improves overall
efficiency in crop management.
Although positive results were achieved under the assumption of a constant search
environment, future research will focus on integrating real-time environmental data (such
as weather conditions, crop growth dynamics, and pest distribution) into path planning to
develop dynamic environment-adaptive algorithms. Additionally, coordinating multiple
drone fleet path planning while considering communication constraints and task allocation
strategies will be explored, with the aim of adapting the framework for agricultural drones
to further enhance precision farming efficiency and intelligence.

Author Contributions: Conceptualization, H.F., Z.L., X.F. and J.L.; methodology, J.L. and W.Z.;
software, H.F. and Y.F.; investigation, L.Z. and W.Z.; resources, Z.L. and Y.F.; writing—original draft,
Z.L.; writing—review and editing, H.F.; visualization, Z.L.; supervision, H.F., J.L. and X.F.; funding
acquisition, L.Z.; validation, L.Z. and X.F.; data curation, W.Z.; project administration, X.F. All authors
have read and agreed to the published version of the manuscript.
Funding: This research was funded by the Jilin Province Science and Technology Development Plan
Project (20240302092GX).
Data Availability Statement: The original contributions presented in the study are included in the
article; further inquiries can be directed to the corresponding author.
Agronomy 2024, 14, 2669 18 of 19

Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Liu, J. Trend forecast of major crop diseases and insect pests in China in 2024. China Plant Prot. Guide 2024, 44, 37–40.
2. Tudi, M.; Li, H.; Li, H.; Wang, L.; Lyu, J.; Yang, L.; Tong, S.; Yu, Q.J.; Ruan, H.D.; Atabila, A.; et al. Exposure Routes and Health
Risks Associated with Pesticide Application. Toxics 2022, 10, 335. [CrossRef]
3. Benbrook, C.M. Trends in Glyphosate Herbicide Use in the United States and Globally. Environ. Sci. Eur. 2016, 28, 3. [CrossRef]
4. Fang, X.; Xie, L.; Li, X. Distributed Localization in Dynamic Networks via Complex Laplacian. Automatica 2023, 151, 110915.
[CrossRef]
5. Kim, J.; Kim, S.; Ju, C.; Son, H.I. Unmanned Aerial Vehicles in Agriculture: A Review of Perspective of Platform, Control, and
Applications. IEEE Access 2019, 7, 105100–105115. [CrossRef]
6. Fang, X.; Li, J.; Li, X.; Xie, L. 2-D Distributed Pose Estimation of Multi-Agent Systems Using Bearing Measurements. J. Autom.
Intell. 2023, 2, 70–78. [CrossRef]
7. He, Y.; Zhu, D.; Chen, C.; Wang, Y. Data-Driven Control of Singularly Perturbed Hybrid Systems with Multi-Rate Sampling. ISA
Trans. 2024, 148, 490–499. [CrossRef]
8. Nazarov, D.; Nazarov, A.; Kulikova, E. Drones in Agriculture: Analysis of Different Countries. BIO Web Conf. 2023, 67. [CrossRef]
9. Ayamga, M.; Akaba, S.; Nyaaba, A.A. Multifaceted Applicability of Drones: A Review. Technol. Forecast. Soc. Change 2021, 167,
120677. [CrossRef]
10. Tsouros, D.C.; Bibi, S.; Sarigiannidis, P.G. A Review on UAV-Based Applications for Precision Agriculture. Information 2019,
10, 349. [CrossRef]
11. An, D.; Chen, Y. Non-Intrusive Soil Carbon Content Quantification Methods Using Machine Learning Algorithms: A Comparison
of Microwave and Millimeter Wave Radar Sensors. J. Autom. Intell. 2023, 2, 152–166. [CrossRef]
12. Cabreira, T.M.; Brisolara, L.B.; Paulo, R.F., Jr. Survey on Coverage Path Planning with Unmanned Aerial Vehicles. Drones 2019,
3, 4. [CrossRef]
13. Aggarwal, S.; Kumar, N. Path Planning Techniques for Unmanned Aerial Vehicles: A Review, Solutions, and Challenges. Comput.
Commun. 2020, 149, 270–299. [CrossRef]
14. Fang, X.; Xie, L. Distributed Formation Maneuver Control Using Complex Laplacian. IEEE Trans. Autom. Control 2024, 69,
1850–1857. [CrossRef]
15. Tarjan, R. Depth-First Search and Linear Graph Algorithms. In Proceedings of the 12th Annual Symposium on Switching and
Automata Theory (swat 1971), East Lansing, MI, USA, 13–15 October 1971; pp. 114–121.
16. Tang, G.; Tang, C.; Claramunt, C.; Hu, X.; Zhou, P. Geometric A-Star Algorithm: An Improved A-Star Algorithm for AGV Path
Planning in a Port Environment. IEEE Access 2021, 9, 59196–59210. [CrossRef]
17. Sang, H.; You, Y.; Sun, X.; Zhou, Y.; Liu, F. The Hybrid Path Planning Algorithm Based on Improved A* and Artificial Potential
Field for Unmanned Surface Vehicle Formations. OCEAN Eng. 2021, 223, 108709. [CrossRef]
18. Hu, L.; Hu, H.; Naeem, W.; Wang, Z. A Review on COLREGs-Compliant Navigation of Autonomous Surface Vehicles: From
Traditional to Learning-Based Approaches. J. Autom. Intell. 2022, 1, 100003. [CrossRef]
19. Ning, Z.; Xie, L. A Survey on Multi-Agent Reinforcement Learning and Its Application. J. Autom. Intell. 2024, 3, 73–91. [CrossRef]
20. Li, L.; Wu, D.; Huang, Y.; Yuan, Z.-M. A Path Planning Strategy Unified with a COLREGS Collision Avoidance Function Based on
Deep Reinforcement Learning and Artificial Potential Field. Appl. Ocean Res. 2021, 113, 102759. [CrossRef]
21. Cai, Z.; Li, S.; Gan, Y.; Zhang, R.; Zhang, Q. Research on Complete Coverage Path Planning Algorithms Based on A* Algorithms.
Open Cybern. Syst. J. 2014, 8, 418–426.
22. Wang, Z.; Zhao, X.; Zhang, J.; Yang, N.; Wang, P.; Tang, J.; Zhang, J.; Shi, L. APF-CPP: An Artificial Potential Field Based
Multi-Robot Online Coverage Path Planning Approach. IEEE Robot. Autom. Lett. 2024, 9, 9199–9206. [CrossRef]
23. Tang, G.; Tang, C.; Zhou, H.; Claramunt, C.; Men, S. R-DFS: A Coverage Path Planning Approach Based on Region Optimal
Decomposition. Remote Sens. 2021, 13, 1525. [CrossRef]
24. Liu, L.; Wang, X.; Yang, X.; Liu, H.; Li, J.; Wang, P. Path Planning Techniques for Mobile Robots: Review and Prospect. Expert Syst.
Appl. 2023, 227, 120254. [CrossRef]
25. Qin, H.; Shao, S.; Wang, T.; Yu, X.; Jiang, Y.; Cao, Z. Review of Autonomous Path Planning Algorithms for Mobile Robots. Drones
2023, 7, 211. [CrossRef]
26. Patle, B.K.; Babu, L.G.; Pandey, A.; Parhi, D.R.K.; Jagadeesh, A. A Review: On Path Planning Strategies for Navigation of Mobile
Robot. Def. Technol. 2019, 15, 582–606. [CrossRef]
27. Theile, M.; Bayerlein, H.; Nai, R.; Gesbert, D.; Caccamo, M. UAV Coverage Path Planning under Varying Power Constraints
Using Deep Reinforcement Learning. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 1444–1449.
28. Luis, S.Y.; Reina, D.G.; Marín, S.L.T. A Deep Reinforcement Learning Approach for the Patrolling Problem of Water Resources
Through Autonomous Surface Vehicles: The Ypacarai Lake Case. IEEE Access 2020, 8, 204076–204093. [CrossRef]
29. Li, J.; Zhang, W.; Ren, J.; Yu, W.; Wang, G.; Ding, P.; Wang, J.; Zhang, X. A Multi-Area Task Path-Planning Algorithm for
Agricultural Drones Based on Improved Double Deep Q-Learning Net. Agriculture 2024, 14, 1294. [CrossRef]
Agronomy 2024, 14, 2669 19 of 19

30. Ma, C.; Wang, L.; Chen, Y.; Wu, J.; Liang, A.; Li, X.; Jiang, C.; Omrani, H. Evolution and Drivers of Production Patterns of Major
Crops in Jilin Province, China. Land 2024, 13, 992. [CrossRef]
31. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.;
Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [CrossRef]
32. Guo, S.; Zhang, X.; Du, Y.; Zheng, Y.; Cao, Z. Path Planning of Coastal Ships Based on Optimized DQN Reward Function. J. Mar.
Sci. Eng. 2021, 9, 210. [CrossRef]
33. Pan, Z.; Zhang, C.; Xia, Y.; Xiong, H.; Shao, X. An Improved Artificial Potential Field Method for Path Planning and Formation
Control of the Multi-UAV Systems. IEEE Trans. Circuits Syst. II-Express Briefs 2022, 69, 1129–1133. [CrossRef]
34. Zhou, Y.; Su, Y.; Xie, A.; Kong, L. A Newly Bio-Inspired Path Planning Algorithm for Autonomous Obstacle Avoidance of UAV.
Chin. J. Aeronaut. 2021, 34, 199–209. [CrossRef]
35. Fang, X.; Xie, L.; Li, X. Integrated Relative-Measurement-Based Network Localization and Formation Maneuver Control. IEEE
Trans. Autom. Control 2024, 69, 1906–1913. [CrossRef]
36. Aldao, E.; Gonzalez-deSantos, L.M.; Michinel, H.; Gonzalez-Jorge, H. UAV Obstacle Avoidance Algorithm to Navigate in
Dynamic Building Environments. Drones 2022, 6, 16. [CrossRef]
37. He, Y.; Zhu, G.; Gong, C.; Shi, P. Stability Analysis for Hybrid Time-Delay Systems with Double Degrees. IEEE Trans. Syst. Man
Cybern. -Syst. 2022, 52, 7444–7456. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy