0% found this document useful (0 votes)

13 views7 pages

Collaborative Coverage Path Planning of UAVs Using RL

This paper presents a method for collaborative coverage path planning of multiple UAVs using Deep Reinforcement Learning, specifically the Double Deep Q-learning Networks (DDQN). The approach aims to optimize the UAVs' coverage of a designated area while adhering to energy constraints and avoiding obstacles, allowing for effective cooperation among UAVs without prior environmental knowledge. Simulation results indicate that the proposed method can achieve full coverage regardless of initial positions, demonstrating its potential for application in dynamic environments.

Uploaded by

Sameer Gulia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views7 pages

Collaborative Coverage Path Planning of UAVs Using RL

Uploaded by

Sameer Gulia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer (ICFTIC)

Collaborative Coverage Path Planning of UAV

Cluster based on Deep Reinforcement Learning
2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer (ICFTIC) | 978-1-6654-0605-5/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICFTIC54370.2021.9647298

Zanliang Dong Chunhui Liu

School of Electronic and Information Engineering Institute of Unmanned system
Beihang University Beihang University
Beijing, China Beijing, China
dongzanliang314@163.com liuchunhui2134@buaa.edu.cn

Abstract—With the continuous application of unmanned aerial coverage while avoiding obstacles and minimizing the path
vehicles (UAV) in the field of national defense and civil use, the length. As for multiple UAVs, the researches [1, 4] decompose
UAV cluster system in which multiple UAVs cooperate to perform the terrain and assign the resulting sub-regions among the
tasks has become a key research in many countries. This paper UAVs to transform the mCPP problem into irrelevant CPP
focuses on the problem of multi-UAV’s Coverage Path Planning problem. However, those decomposition methods rely on prior
(mCPP) to exploit all points of interests within an area based on cognition of the environment and cannot coordinate with each
Reinforcement Learning (RL), where each UAV starts with a other when it comes to multi-UAV. It is difficult to predict
random position and carries a camera during the mission. There trajectories with the large-scale environment and random start
have been a number of optimal algorithms proposed for the
positions.
coverage path planning of single UAV, however, it is under-
explored for multiple UAVs. As such, we leverage Deep With the help of deep learning techniques, Reinforcement
reinforcement learning with Double Q-learning Networks (DDQN) Learning (RL) has shown impressive improvement nowadays,
to learn a global optimal control policy for a team of UAVs under such as playing video games [5], path planning for data
certain power constraints to cooperate effectively to explore a harvesting [6], and V2X resource allocation [7]. This paper
wider area. Regarding the task area as a 2D plane, we divide it into aims to deal with the coverage path planning of UAV cluster
a collection of uniform grid cells, which represent a section of the based on the RL algorithm, to be specific, the Deep Double Q-
environment. The camera field of view of each UAV covers a cell
learning Networks (DDQN) method. Various RL methods [5,
area underneath the UAV. Simulation results demonstrate that
11, 12] have been developed for UAV path planning. However,
wherever the start positions are, the UAV cluster can fully cover
the whole task area under the energy constraints and achieve they are only designed for single UAV. With the increasing
autonomous collaboration. The proposed method has great number of UAVs, the problem of path planning will become
potential in applying to dynamic environment. more complicated and uncertain. In this paper, we discretize the
task area by splitting it into grid cells with equal size, and design
Keywords-UAV cluster system, mCPP, DDQN, full coverage, the state set and action set of the UAV cluster. More importantly,
energy constraints we formulate the reward and punishment mechanisms to help
the UAV cluster achieve better path planning ability with higher
I. INTRODUCTION accumulative scores. In addition, the establishment of the two
With the advantage of high mobility, flexible deployment deep neural networks weakens the dependence between the
and low cost, unmanned aerial vehicles (UAV) served as an target value and the network parameters, so that speed up the
emerging facility have been widely applied to terrain coverage, training process to convergence.
agricultural production, environmental reconnaissance, air  The main contributions of this paper are as following:
rescue, disaster warning, and other social industries [1-2]. More
recently, the UAV cluster system composed of several  We introduce a novel control policy based on DDQN
collaborative UAVs has become the focus of attention in for UAV cluster to completely cover the task area.
various application domains for higher execution efficiency. In  The proposed method can generalize over random start
this paper, we investigate the problem of multi-UAV’s positions and balance the requirement of full coverage,
Coverage Path Planning (mCPP) to achieve complete coverage shortening path, and energy limitation.
under battery duration, which is a fundamental issue for the
UAV cluster system.  The control method enables the UAV cluster to
autonomously and cooperatively plan the flight
Generally, the Coverage Path Planning (CPP) task aims to trajectory without inter-vehicle communication and
determine an optimal path that travel over all points within an prior knowledge of task area.
area of interest while reducing redundant paths [2]. For the case
of single agent, it is easy to design a non-repetitive route or The remainder of this paper is organized as follows: Section
minimize the path length. A recent survey [3] has analyzed Ⅱ introduces the multi-UAV mobility and scenario model,
different CPP approaches for UAV, where the CPP problem is Section Ⅲ describes the proposed DDQN-based approach for
typically solved by splitting the target area into different non- UAV cluster. Section Ⅳ present the simulation results and
intersecting shapes, so that UAV can travel over the maximum discussion. We conclude the paper with a summary in Section
Ⅴ.
978-1-6654-0605-5/21/$31.00 ©2021 IEEE
201

Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on March 06,2023 at 03:33:58 UTC from IEEE Xplore. Restrictions apply.
II. SYSTEM MODEL
In this section, we present the key models for the UAV
cluster coverage path planning. In order to implement RL
approach, we make some simplification and reasonable
assumptions in our models.
A. Task Scenario Model
It is assumed that there are n UAVs performing coverage
reconnaissance mission in an open area which can be
represented by a square grid world of K km*K km with cell size
C km*C km. In that way, we abstract and discretize the area Figure 2. Flight directions of the UAV
into a grid by size of N * N, where N equals to the ceiling
number of K/C. As is shown in Figure 1, the red dotted lines III. METHODOLOGY
represent the warning areas where the UAVs in them have to In this section, we describe the RL-based method to address
control their direction to prevent being out of bounds. the aforementioned mCPP issue in detail. RL is usually
modeled as a Markov Decision Process (MDP), which is
defined through the tuple (S, A, P, R). S indicates the state space
of the UAV cluster, which is a set of observation from the UAV
cluster interacting with the environment. A describes the joint
action space of UAVs.
demonstrates a probability function determining how state
transfers. R represents a reward function to criticize A selected
by the agent according to the goals.
A. State space
State space reflects the observation from environment over
a sequence of discrete time slots. Note that the environment is
dynamic because of the mobility of UAV cluster during the
mission. In this paper, we focus on a centralized learning way
and distributed implementation to achieve collaboration among
UAVs. Therefore, we need to collect the global position
information from the UAV cluster as the basis for making the
Figure 1. Task environment of the model
decision. An individual UAV can obtain its current position
B. UAV Model by GPS as a sub-state, where and indicate
In Figure 1, each yellow point represent one UAV with the X-axis position and Y-axis position at time t. Then, the joint
random start positions. As the reconnaissance range of small state space is presented by
UAVs is limited, the field of view of a UAV can be
approximated to a grid cell in the area map with a certain range. (1)
Thus the cell where the UAV trajectory is located is the covered
area, which afterwards would be marked as 1 no matter how where . In this way, the whole position state of
many times it is overlapped, otherwise as 0. As such, by other UAVs’ can be available to each individual UAV.
counting the number of grid cell with 1, we can calculate the B. Action space
coverage rate at each moment. As mentioned before, mCPP aims to build a strategy that
Because the DDQN algorithm can only dispose of discrete can autonomously make multiple UAVs find an optimal
variables, the flight direction D of each UAV agent is direction to completely cover the task area as efficient as
discretized to several certain directions. As shown in Figure 2, possible. While the DDQN-based method requires to optimize
the UAV can choose one flight direction from north, south, west, discrete variables. As a result, the joint action space of the UAV
or east, marked by 1, 2, 3 and 4, namely, . Also, cluster is composed of 4 directions, namely, north, south, west,
the total moving distance of each UAV is limited to 300 steps and east. Each UAV will be respectively assigned a flight
for the consideration of battery capacity. direction based on DDQN. As such, the size of the proposed
joint action space is with 4 actions and n UAVs. The
action space is expressed as

(2)

where expressing the flight direction of the

UAV at time .

202

Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on March 06,2023 at 03:33:58 UTC from IEEE Xplore. Restrictions apply.
C. Reward Function but cumulative discounted rewards in the long run [11]. The
Using a reward signal to formalize the idea of a goal is one expected return is also called action-value function or Q-value,
of the most distinctive features of reinforcement learning, denoted as follows:
which plays a critical role in solving problems with hard-to-
optimize objectives [11]. A reward function should involve the
encouragement to the correct actions as well as the punishment
(6)
to the false behaviors of agents. According to the final objective
and constraints, the reward is defined as a simple number where is the discount rate determining the present value of
at each time step, and the whole UAV cluster share the future reward. A larger means greater reward in the
the same reward such that collaborative behavior among them future would be concerned into the total return until the mission
is motivated [12]. In particular, we make the main target is finished. On the contrary, UAV cluster would be concerned
decompositive, namely credit assignment, to instruct UAV only with maximizing immediate rewards when .
agents by generating available signals densely. In response to
the main goal, full coverage, reward function is formulated as D. Learning algorithm
1) Q-learning
Q-learning is a model-free RL algorithm to generalize over
(3) various situations, which does not require the prior information
on state transition function [13]. For the case in this paper, Q-
where indicates the current covering rate of UAV cluster at learning is based on a cycle of interaction between multi-agent
time , and represents the covering rate at next moment (namely UAV cluster) and task environment, through which an
after UAV cluster taking an action , and optimizer behavior rule is trained to get as higher reward as
symbols for division rounded down. It is worth noting that with possible. Figure 3 demonstrates that cycle of interaction, i.e. the
the expansion of coverage, the reward increases UAV cluster receive the current state from the
nonlinearly, as the gap with the final goal determines the environment, which in turn makes each UAV determines a
intensity of incentive to guide agents and the final goal flight action and integrate different actions together as a joint
corresponds to the maximum reward. Moreover, a larger action . Subsequently, the environment feedbacks a reward
coverage means the path of UAVs is more likely to overlap. As to the UAV cluster to evaluate their performance and evolves
such, reducing repeated path is necessary and we set a
punishment with negative reward -3 in such case. to the next state . It can be seen that Q-learning is devoted
to studying an improved policy mapping from state to
Meanwhile, with regard to unallowed behavior, like out-of-
bound, the dangerous action would be forecast in warning area. action through action-value function , which is
Once a UAV is predicted to fly out of range at next step, it formally expressed as
would be forced to hover at current positon until its action in
sequence is legal. So the punishment from stationary coverage (7)
also help to avoid out-of-bound behavior. where is denoted in (6). Hence, for a given state, an
In addition to illegal behavior and overlapping coverage, we optimal policy can be simply trained by selecting action as
take the battery capacity into consideration as the constraints in
mCPP mission, which is transferred into the limited flight steps (8)
for each UAV. This forms the other part of reward function as That is, the action to be taken is the one that maximizes the
follows: Q-value.

(4)
where acts as the maximum flight steps, is the actual
average steps of each UAV, and indicates the
discount factor to make the extra expense as a linearly varying
negative reward.
According to (3) and (4), the total reward function is
summarized as
(5)
where and are the weights of each part. However, it
is difficult to predict these agents’ performance in the future by
simply depending on the score in the next state. Therefore, the Figure 3. The cycle of interaction for Q-learning
goal of RL is to maximize the total amount of discounted return
that the agents receive, that is, maximize not immediate reward

203

Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on March 06,2023 at 03:33:58 UTC from IEEE Xplore. Restrictions apply.
2) Double Deep Q-learning Networks copied from in Q-evaluate network periodically and update
In general, Q-learning method can be tackled by after several rounds. is set as a discount factor. Formula (9)
constructing a table with the dimension of . means that when acquiring the optimal Q-value , TD error
But for the case with large-scale state space and action space, it
will converge to zero. Simply, we use to represent the
is inefficient and inaccurate for traditional Q-learning method
former term of (9) as target value, i.e.
to search optimal actions, because a number of state-action
pairs will be seldom visited. As a result, the emerging and
widely used Deep Neutral Network (DNN) is considered in this (10)
paper to address the sophisticate and large-scale problem. But
As such, with from Q-target network acting as a
if we just rely on one DNN to work on both calculating target
label and the evaluative Q-value from Q-evaluate network, the
Q-value and evaluating Q-value and updating parameters of
loss function for updating weights of DQN can be
network based on target Q-value, it is of disadvantage to let
expressed as
network converge because of the excessive dependence. For
this reason, we use two DNNs of identical structure, one is to
select actions and update network parameters, the other is only
responsible for calculating target Q-values and asynchronously (11)
copy the parameters of the former networks. With the RMS
Nevertheless, it is not robust and proper to blindly select the
optimizer, we train a Deep Q-Network (DQN) to enable the
action that makes the Q-value maximum from Q-target network,
expected temporal difference (TD) error converge efficiently to
which may lead to overestimation of Q-values under certain
achieve the optimal mapping between the input to the conditions. As a result, further improvements have been made
output . in Double Deep Q-Networks (DDQN), which separates the
function of selecting the action that generates target Q-values
In the training phase, we store the previous experience in a
and calculating target Q-values. The Q-evaluate network is used
memory buffer to break the correlations of successive training
data in the sequence, denoted by . To be specific, after the for choosing the action that makes maximum.
UAV cluster performs a cycle of iteration, a training sample Subsequently, the Q-target network outputs the target Q-value
by finding the mapping of and previous selected action.
represented by a tuple would be collected
This operation ensures the target Q-value to be appropriate
into . When the number of samples grows up to the rather than the maximum one to avoid choosing the over-
maximum size of , the new sample would randomly estimated action. Thus, we finally adopt DDQN method in this
replace one sample stored in the library. As such, DQN could paper and the loss function for our model is given by
be trained with a mini-batch of diverse and irrelevant data at
every episode. At each training step, the UAV cluster leverages
a dynamic soft policy, i.e., , to guarantee that both
(12)
exploration and exploitation are considered. This policy
indicates that the action with maximal estimated value is chosen where the target value is described by
with a probability while a random action is instead
chosen with a probability , and decreases gradually with
the increase of training steps. (13)
In particular, there are two separate networks, called Q- 3) Training and Testing
evaluate and Q-target networks, to construct the TD error as
The entire structure of DDQN algorithm is shown as figure
4.
(9)
We also summarize the procedure of how to solve the mCPP
where and indicate the parameters of Q-evaluate problem for UAV cluster based on DDQN in Algorithm 1.
network and Q-target network respectively. Note that is

204

Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on March 06,2023 at 03:33:58 UTC from IEEE Xplore. Restrictions apply.
Figure 4. The structure of DDQN

Algorithm 1 UAVs coverage path planning with DDQN results are presented to illustrate the performance of the
proposed method.
1: Set task environment parameters and start environment
modeling We assume that all UAVs in one cluster keep the same flight
altitude and speed during the mission. Within every episode, the
2: Build neural network model and initialize network UAV cluster generates their initial positions randomly. This
parameters paper follows the parameters in Table 1 to conduct the
3: for each episode do: experiment.
4: Update the location and coverage of UAV cluster TABLE I. SIMULATION PARAMETERS
nodes.
Parameter Value
5: for each step do Number of UAVs n 4

6: for each UAV do Range of task area K 200km×200km

Field of view for each UAV C 20km×20km
7: Get observation from environment Episode for each task 2000
8: Select action based on -greedy policy Steps for each episode 500
Mini-batch size 128
9: end for
Capacity of memory buffer 10000
10: Update the coverage of task area Frequency for updating 50
11: UAV cluster receives a reward Discount factor 0.95
Greedy factor Descend from 0.5 to 0.001
12: Store in the Maximum flight steps 300
13: end for Weight of 0.7

14: for each UAV do Weight of 0.3

15: Uniformly sample mini-batches data from A. Analysis of coverage result

16: Make loss function defined in (12) convergent Figure 5 demonstrates the average coverage rate in the UAV
cluster with random start positions and it shows complete
17: end for coverage at average steps of 190. With the limit of maximum
18: Duplicate network parameter from Q-evaluate flight steps of 300, it is guaranteed that the UAV cluster can
network to Q-target network achieve a full coverage with average 190 ~283 steps among
various situations.
19: end for
As shown in Figure 6, the loss function in (12) converges
quickly during the training process. That means once Q-value
IV. SIMULATION RESULTS converges, the mapping of sates and actions can be determined,
In this section, we apply DDQN algorithm into solving which contribute to the mCPP problem by learning a
mCPP problem for UAV cluster. The implementation process deterministic strategy. Afterwards, the Q-networks in training
is coherent with the content in Algorithm 1, and the simulation procedure are downloaded for testing implementation whose
input is the coordinate set of positions of UAV cluster and

205

Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on March 06,2023 at 03:33:58 UTC from IEEE Xplore. Restrictions apply.
output is the Q-values of joint actions. Furthermore, this verifies
that it is not necessary for the UAV cluster to get prior
knowledge about the geographic information of task area. Each
individual UAV only need to upload its GPS location message
periodically to the dispatching center, and all UAVs receive and
obey the command in coherence without the redundant delay
and energy consumption through inter-communication.

(b) Path planning of UAV 1 (c) Path planning of UAV 2

Figure 5. Coverage ratio with varying training steps

(d) Path planning of UAV 3 (e) Path planning of UAV 4

Figure 7. Trajectory of UAV cluster for complete coverage

We further visualize the trajectories generated by our

DDQN method in Figure 7. Map (a) shows an integral coverage
path of all UAVs with random initial locations in the task area,
while map (b), (c), (d), (e) are the paths of each individual UAV
respectively. From (b)-(e), it shows that UAVs take the
advantage of global location to realize autonomously
collaboration, where the UAV in (b), (c), and (d) mainly focus
on the area in the upper left, upper right, lower left corners,
respectively, while the UAV in (e) is responsible for covering
the left area.

Figure 6. Loss function convergence curve V. CONCLUSION

In this paper, we propose to employ the DDQN method to
B. Analysis of path planning result tackle the multi-agent coverage path planning problem for
UAV cluster. Experimental results demonstrate that under the
constraints of limited power capacity and inter-vehicle
communication, the UAV cluster can autonomously explore the
task environment and learn a collaborative policy that assigns
each UAV with an optimal action to control the overlapping
coverage and flight steps in a rational range. Finally, the UAV
cluster performs complete coverage in every test process.
ACKNOWLEDGMENT
This work is supported by the Aviation Science Foundation
under Grant 20185551019 and Science and Technology
Innovation 2030-Key Project of "New Generation Artificial
Intelligence" under Grant 2020AAA0108200.
REFERENCES
[1] Gao C Q., Kou Y X., Li Z W., Xu A., Li Y., Chang Y Z. (2019).
(a) Path planning of UAV cluster Cooperative coverage path planning for small UAVs. J. Systems
Engineering and Electronics. 41(6), 1294-1299.

206

Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on March 06,2023 at 03:33:58 UTC from IEEE Xplore. Restrictions apply.
[2] Galceran, E., Carreras, M. (2013). A survey on coverage path planning for
robotics. J. Robotics and Autonomous systems. 61(12), 1258-1276.
[3] Cabreira, T. M., Brisolara, L. B., Ferreira Jr, P. R. (2019). Survey on
coverage path planning with unmanned aerial vehicles. J. Drones, 3(1), 4.
[4] Maza I., Ollero A. (2007) Multiple UAV cooperative searching operation
using polygon area decomposition and efficient coverage algorithms. In:
Alami R., Chatila R., Asama H. (Eds), Distributed Autonomous Robotic
Systems 6. Springer, Tokyo. pp. 221-230.
[5] Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y. (2021). The
surprising effectiveness of mappo in cooperative, multi-agent games. J.
arXiv preprint arXiv:2103.01955.
[6] Bayerlein, H., Theile, M., Caccamo, M., Gesbert, D. (2021). Multi-uav
path planning for wireless data harvesting with deep reinforcement
learning. J. IEEE Open Journal of the Communications Society, 2, 1171-
1187.
[7] Ye, H., Li, G. Y., Juang, B. H. F. (2019). Deep reinforcement learning
based resource allocation for V2V communications. J. IEEE Transactions
on Vehicular Technology, 68(4), 3163-3173.
[8] Theile, M., Bayerlein, H., Nai, R., Gesbert, D., Caccamo, M. (2020). UAV
coverage path planning under varying power constraints using deep
reinforcement learning. In: 2020 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS). IEEE. pp. 1444-1449.
[9] Maciel-Pearson, B. G., Marchegiani, L., Akcay, S., Atapour-Abarghouei,
A., Garforth, J., Breckon, T. P. (2019). Online deep reinforcement learning
for autonomous UAV navigation and exploration of outdoor
environments. J. arXiv preprint arXiv:1912.05684.
[10] Piciarelli, C., Foresti, G. L. (2019). Drone patrolling with reinforcement
learning. In: Proceedings of the 13th International Conference on
Distributed Smart Cameras. pp. 1-6.
[11] Thrun, S., Littman, M. L. (2000). Reinforcement learning: an introduction.
J. AI Magazine, 21(1), 103-103.
[12] Liang, L., Ye, H., & Li, G. Y. (2019). Spectrum sharing in vehicular
networks based on multi-agent reinforcement learning. J. IEEE Journal on
Selected Areas in Communications, 37(10), 2282-2292.
[13] Wu, F., Zhang, H., Wu, J., Han, Z., Poor, H. V., & Song, L. (2021). UAV-
to-device underlay communications: Age of information minimization by
multi-agent deep reinforcement learning. J. IEEE Transactions on
Communications.

207

Authorized licensed use limited to: Indian Institute of Technology Patna. Downloaded on March 06,2023 at 03:33:58 UTC from IEEE Xplore. Restrictions apply.

Network Security Basics
100% (1)
Network Security Basics
93 pages
applsci-11-03417-v2
No ratings yet
applsci-11-03417-v2
20 pages
Paper5_Optimizing UAV-UGV coalition operations_ A hybrid clustering and multi-agent reinforcement learning approach for path planning in obstructed environment
No ratings yet
Paper5_Optimizing UAV-UGV coalition operations_ A hybrid clustering and multi-agent reinforcement learning approach for path planning in obstructed environment
16 pages
Ahmed (2023= Energy-Efficient UAVs Coverage Path Planning Approach
No ratings yet
Ahmed (2023= Energy-Efficient UAVs Coverage Path Planning Approach
25 pages
10 3390@drones3030058
No ratings yet
10 3390@drones3030058
14 pages
QPGAO RL UAVQ Rev1 Fix-1
No ratings yet
QPGAO RL UAVQ Rev1 Fix-1
15 pages
drones-08-00018-v2
No ratings yet
drones-08-00018-v2
18 pages
Electronics 11 04187 v2
No ratings yet
Electronics 11 04187 v2
33 pages
TSP Csse 31116
No ratings yet
TSP Csse 31116
16 pages
Applied Sciences
No ratings yet
Applied Sciences
16 pages
Paper - 3D ONLINE PATH PLANNING OF UAV
No ratings yet
Paper - 3D ONLINE PATH PLANNING OF UAV
15 pages
Cooperative Distributed Robust Trajectory Optimization Using Receding Horizon Milp, TCST.2010.2045501
No ratings yet
Cooperative Distributed Robust Trajectory Optimization Using Receding Horizon Milp, TCST.2010.2045501
9 pages
Version 9
No ratings yet
Version 9
15 pages
Maddula Ccco02
No ratings yet
Maddula Ccco02
14 pages
Reinforcement Learning-Based Routing Protocols in
No ratings yet
Reinforcement Learning-Based Routing Protocols in
60 pages
sensors-22-07243-v2
No ratings yet
sensors-22-07243-v2
18 pages
A Clustering-Based Coverage Path Planning Method For Autonomous Heterogeneous UAVs
No ratings yet
A Clustering-Based Coverage Path Planning Method For Autonomous Heterogeneous UAVs
11 pages
sensors-23-09541-v2
No ratings yet
sensors-23-09541-v2
23 pages
Toward Autonomous Multi-UAV Wireless Network A Survey of Reinforcement Learning-Based Approaches
No ratings yet
Toward Autonomous Multi-UAV Wireless Network A Survey of Reinforcement Learning-Based Approaches
30 pages
B-APFDQN a UAV Path Planning Algorithm Based on Deep Q-Network and Artificial Potential Field
No ratings yet
B-APFDQN a UAV Path Planning Algorithm Based on Deep Q-Network and Artificial Potential Field
14 pages
Drones 07 00443
No ratings yet
Drones 07 00443
20 pages
Multi-Agent_Deep_Reinforcement_Learning_Based_Optimizing_Joint_3D_Trajectories_and_Phase_Shifts_in_RIS-Assisted_UAV-Enabled_Wireless_Communications
No ratings yet
Multi-Agent_Deep_Reinforcement_Learning_Based_Optimizing_Joint_3D_Trajectories_and_Phase_Shifts_in_RIS-Assisted_UAV-Enabled_Wireless_Communications
15 pages
Particle Swarm Optimization For Target Encirclement by A UAV Formation
No ratings yet
Particle Swarm Optimization For Target Encirclement by A UAV Formation
8 pages
agronomy-14-02669
No ratings yet
agronomy-14-02669
19 pages
1 s2.0 S0167739X18325299 Main
No ratings yet
1 s2.0 S0167739X18325299 Main
9 pages
A Multi-Objective Coverage Path Planning Algorithm For UAVs To Cover Spatially Distributed Regions in Urban Environments
No ratings yet
A Multi-Objective Coverage Path Planning Algorithm For UAVs To Cover Spatially Distributed Regions in Urban Environments
35 pages
Fault-Tolerant Cooperative Navigation of Networked UAV Swarms For Forest Fire Monitoring
No ratings yet
Fault-Tolerant Cooperative Navigation of Networked UAV Swarms For Forest Fire Monitoring
12 pages
electronics-11-02667-v3
No ratings yet
electronics-11-02667-v3
15 pages
A clustering based coverage planning method for autonomous heterogenous UAVs
No ratings yet
A clustering based coverage planning method for autonomous heterogenous UAVs
11 pages
1 s2.0 S100093612030594X Main
No ratings yet
1 s2.0 S100093612030594X Main
18 pages
Yahya 2021 Uav FL Allocation
No ratings yet
Yahya 2021 Uav FL Allocation
6 pages
A Hierarchical Optimization Strategy of Trajectory Planning For Multi-UAVs
No ratings yet
A Hierarchical Optimization Strategy of Trajectory Planning For Multi-UAVs
5 pages
QoE-Driven_Adaptive_Deployment_Strategy_of_Multi-UAV_Networks_Based_on_Hybrid_Deep_Reinforceme
No ratings yet
QoE-Driven_Adaptive_Deployment_Strategy_of_Multi-UAV_Networks_Based_on_Hybrid_Deep_Reinforceme
14 pages
An Algorithm of Reactive Collision Free 3D Deployment of Networked Unmanned Aerial Vehicles For Surveillance and Monitoring
No ratings yet
An Algorithm of Reactive Collision Free 3D Deployment of Networked Unmanned Aerial Vehicles For Surveillance and Monitoring
9 pages
BTP_Report (2)
No ratings yet
BTP_Report (2)
35 pages
UAV Path Planning Based On Receding Horizon Control With Adaptive Strategy
No ratings yet
UAV Path Planning Based On Receding Horizon Control With Adaptive Strategy
5 pages
Drones 08 00060 With Cover
No ratings yet
Drones 08 00060 With Cover
22 pages
Multi-Agent Deep Reinforcement Learning for Joint Decoupled User Association and Trajectory Design in Full-Duplex Multi-UAV Networks - 2022
No ratings yet
Multi-Agent Deep Reinforcement Learning for Joint Decoupled User Association and Trajectory Design in Full-Duplex Multi-UAV Networks - 2022
15 pages
Three-Stage Stackelberg Game Enabled Clustered Federated Learning in Heterogeneous UAV Swarms
No ratings yet
Three-Stage Stackelberg Game Enabled Clustered Federated Learning in Heterogeneous UAV Swarms
15 pages
Drones 05 00098 v2
No ratings yet
Drones 05 00098 v2
18 pages
Trajectory Optimization For Autonomous Flying Base Station Via Reinforcement Learning
No ratings yet
Trajectory Optimization For Autonomous Flying Base Station Via Reinforcement Learning
5 pages
A Two Phase Multi Objective Metaheuristic For A Green UAV Grid Routing Problem
No ratings yet
A Two Phase Multi Objective Metaheuristic For A Green UAV Grid Routing Problem
24 pages
Trajectory Synthesis For A UAV Swarm Based On Resilient Data Collection Objectives
No ratings yet
Trajectory Synthesis For A UAV Swarm Based On Resilient Data Collection Objectives
14 pages
Document (3)
No ratings yet
Document (3)
10 pages
UAV Swarm Cooperative Target Search a Multi-Agent Reinforcement Learning Approach
No ratings yet
UAV Swarm Cooperative Target Search a Multi-Agent Reinforcement Learning Approach
11 pages
Cooperative Task Assignment and Path Planning For Multiple Uavs
No ratings yet
Cooperative Task Assignment and Path Planning For Multiple Uavs
30 pages
UB-ANC_planner_Energy_efficient_coverage_path_planning_with_multiple_drones
No ratings yet
UB-ANC_planner_Energy_efficient_coverage_path_planning_with_multiple_drones
8 pages
5. DONE_2020_A_UAV-Assisted_Data_Collection_for_Wireless_Sensor_Networks_Autonomous_Navigation_and_Scheduling
No ratings yet
5. DONE_2020_A_UAV-Assisted_Data_Collection_for_Wireless_Sensor_Networks_Autonomous_Navigation_and_Scheduling
15 pages
Multi Agent Reinforcement Learning Based Resource Allocation For
No ratings yet
Multi Agent Reinforcement Learning Based Resource Allocation For
15 pages
Wang 等 - 2022 - Deep Reinforcement Learning Based Dynamic Trajectory Control for UAV-Assisted Mobile Edge Computing
No ratings yet
Wang 等 - 2022 - Deep Reinforcement Learning Based Dynamic Trajectory Control for UAV-Assisted Mobile Edge Computing
16 pages
A_Cooperative_Path_Planning_and_Smoothing_Algorithm_for_UAVs_in_Three_Dimensional_Environment
No ratings yet
A_Cooperative_Path_Planning_and_Smoothing_Algorithm_for_UAVs_in_Three_Dimensional_Environment
5 pages
Xie Et Al 2024 Editorial Special Issue on Advanced Air Mobility Enabling Technologies and Applications
No ratings yet
Xie Et Al 2024 Editorial Special Issue on Advanced Air Mobility Enabling Technologies and Applications
3 pages
Trajectory-Control Using Deep System Identification and Model Predictive Control For Drone Control Under Uncertain Load
No ratings yet
Trajectory-Control Using Deep System Identification and Model Predictive Control For Drone Control Under Uncertain Load
6 pages
Iot Paper
No ratings yet
Iot Paper
2 pages
CNS Up
No ratings yet
CNS Up
108 pages
The_Gromov-Wasserstein_Distance_Between_Spheres
No ratings yet
The_Gromov-Wasserstein_Distance_Between_Spheres
56 pages
pattern mining[1]
No ratings yet
pattern mining[1]
36 pages
Trajectory Optimization and Computing Offloading Strategy in UAV-Assisted MEC System
No ratings yet
Trajectory Optimization and Computing Offloading Strategy in UAV-Assisted MEC System
6 pages
MonA02-4
No ratings yet
MonA02-4
6 pages
Timely Data Collection For UAV-based IoT Networks A Deep Reinforcement Learning Approach
No ratings yet
Timely Data Collection For UAV-based IoT Networks A Deep Reinforcement Learning Approach
13 pages
15-dijkstra
No ratings yet
15-dijkstra
48 pages
10.1109@ICCWorkshops49005.2020.9145458
No ratings yet
10.1109@ICCWorkshops49005.2020.9145458
6 pages
Ce 5155: Finite Element Analysis of Structural Systems: Muhammad Fahim
No ratings yet
Ce 5155: Finite Element Analysis of Structural Systems: Muhammad Fahim
77 pages
BA6 - Linear Optimization
No ratings yet
BA6 - Linear Optimization
25 pages
Modern Control: State-Space Analysis and Design Methods Arie Nakhmani - The ebook in PDF format is ready for immediate access
100% (1)
Modern Control: State-Space Analysis and Design Methods Arie Nakhmani - The ebook in PDF format is ready for immediate access
63 pages
Multi-Agent Reinforcement Learning Based Resource Allocation For UAV Networks
No ratings yet
Multi-Agent Reinforcement Learning Based Resource Allocation For UAV Networks
30 pages
Feature Extraction: 4.1. Principal Component Analysis (PCA)
No ratings yet
Feature Extraction: 4.1. Principal Component Analysis (PCA)
10 pages
DSA - Lecture 19 - Prims and Kruskals Algorithm (1)
No ratings yet
DSA - Lecture 19 - Prims and Kruskals Algorithm (1)
31 pages
Decentralized Model Predictive Control of Cooperating Uavs: Arthur Richards and Jonathan How
No ratings yet
Decentralized Model Predictive Control of Cooperating Uavs: Arthur Richards and Jonathan How
6 pages
Delta Sigma AD Conversion Technique Overview
No ratings yet
Delta Sigma AD Conversion Technique Overview
11 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
21 pages
I B.SC CS DS Unit V
No ratings yet
I B.SC CS DS Unit V
22 pages
A Comprehensive Survey of Deep Learning in The Field of Medical Imaging and Medical Natural Language Processing Challenges and Research Direct
No ratings yet
A Comprehensive Survey of Deep Learning in The Field of Medical Imaging and Medical Natural Language Processing Challenges and Research Direct
17 pages
Chap 11
No ratings yet
Chap 11
12 pages
Unit 3 Cover Sheet Homework Packet Fall 2016
No ratings yet
Unit 3 Cover Sheet Homework Packet Fall 2016
14 pages
Assignment Class Notes
No ratings yet
Assignment Class Notes
8 pages
CT Delta Sigma ADC Tutorial
No ratings yet
CT Delta Sigma ADC Tutorial
77 pages
A Neural Network Approach To Ordinal Regression
No ratings yet
A Neural Network Approach To Ordinal Regression
6 pages
Pert CPM
No ratings yet
Pert CPM
11 pages
Experiment03 - Full Adder
No ratings yet
Experiment03 - Full Adder
9 pages
What Is FEC, and How Do I Use It - 2019-06-13 - Signal Integrity Journal
No ratings yet
What Is FEC, and How Do I Use It - 2019-06-13 - Signal Integrity Journal
4 pages
4basic Econometrics Chapter III
No ratings yet
4basic Econometrics Chapter III
13 pages
Design Analysis of Algorithm
No ratings yet
Design Analysis of Algorithm
8 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Difference Between DFA and NFA
No ratings yet
Difference Between DFA and NFA
3 pages
Sathyabama: Register Number
No ratings yet
Sathyabama: Register Number
2 pages
Problem Set On TVM
No ratings yet
Problem Set On TVM
1 page
Harshal PAL: Education
No ratings yet
Harshal PAL: Education
1 page
DAA Worksheet Exp-2.1
No ratings yet
DAA Worksheet Exp-2.1
5 pages
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Collaborative Coverage Path Planning of UAVs Using RL

Uploaded by

Collaborative Coverage Path Planning of UAVs Using RL

Uploaded by

2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer (ICFTIC)

Collaborative Coverage Path Planning of UAV

Zanliang Dong Chunhui Liu

where expressing the flight direction of the

6: for each UAV do Range of task area K 200km×200km

14: for each UAV do Weight of 0.3

15: Uniformly sample mini-batches data from A. Analysis of coverage result

(b) Path planning of UAV 1 (c) Path planning of UAV 2

Figure 5. Coverage ratio with varying training steps

(d) Path planning of UAV 3 (e) Path planning of UAV 4

We further visualize the trajectories generated by our

Figure 6. Loss function convergence curve V. CONCLUSION

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.