DL-DRL: A Double-Level Deep Reinforcement Learning Approach for Large-Scale Task Scheduling of Multi-UAV

Uloženo v:
Podrobná bibliografie
Název: DL-DRL: A Double-Level Deep Reinforcement Learning Approach for Large-Scale Task Scheduling of Multi-UAV
Autoři: Xiao Mao, Guohua Wu, Mingfeng Fan, Zhiguang Cao, Witold Pedrycz
Zdroj: IEEE Transactions on Automation Science and Engineering. 22:1028-1044
Publication Status: Preprint
Informace o vydavateli: Institute of Electrical and Electronics Engineers (IEEE), 2025.
Rok vydání: 2025
Témata: Deep reinforcement learning, Signal Processing (eess.SP), FOS: Computer and information sciences, Operations Research, Computer Science - Machine Learning, Theory and Algorithms, multi-UAV task scheduling, 0211 other engineering and technologies, 02 engineering and technology, interactive training, Systems Engineering and Industrial Engineering, Machine Learning (cs.LG), Computer Science - Robotics, divide and conquer-based framework, 0202 electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Signal Processing, Robotics (cs.RO)
Popis: Exploiting unmanned aerial vehicles (UAVs) to execute tasks is gaining growing popularity recently. To solve the underlying task scheduling problem, the deep reinforcement learning (DRL) based methods demonstrate notable advantage over the conventional heuristics as they rely less on hand-engineered rules. However, their decision space will become prohibitively huge as the problem scales up, thus deteriorating the computation efficiency. To alleviate this issue, we propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF), where we decompose the task scheduling of multi-UAV into task allocation and route planning. Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs, and we exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks given the maximum flight distance of the UAV. To effectively train the two models, we design an interactive training strategy (ITS), which includes pre-training, intensive training and alternate training. Experimental results show that our DL-DRL performs favorably against the learning-based and conventional baselines including the OR-Tools, in terms of solution quality and computation efficiency. We also verify the generalization performance of our approach by applying it to larger sizes of up to 1000 tasks. Moreover, we also show via an ablation study that our ITS can help achieve a balance between the performance and training efficiency.
13 pages, 7 figures
Druh dokumentu: Article
Popis souboru: application/pdf
ISSN: 1558-3783
1545-5955
DOI: 10.1109/tase.2024.3358894
DOI: 10.48550/arxiv.2208.02447
Přístupová URL adresa: http://arxiv.org/abs/2208.02447
Rights: IEEE Copyright
arXiv Non-Exclusive Distribution
CC BY NC ND
Přístupové číslo: edsair.doi.dedup.....ce4165eb787048762aa8c8f97648ffe6
Databáze: OpenAIRE
Popis
Abstrakt:Exploiting unmanned aerial vehicles (UAVs) to execute tasks is gaining growing popularity recently. To solve the underlying task scheduling problem, the deep reinforcement learning (DRL) based methods demonstrate notable advantage over the conventional heuristics as they rely less on hand-engineered rules. However, their decision space will become prohibitively huge as the problem scales up, thus deteriorating the computation efficiency. To alleviate this issue, we propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF), where we decompose the task scheduling of multi-UAV into task allocation and route planning. Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs, and we exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks given the maximum flight distance of the UAV. To effectively train the two models, we design an interactive training strategy (ITS), which includes pre-training, intensive training and alternate training. Experimental results show that our DL-DRL performs favorably against the learning-based and conventional baselines including the OR-Tools, in terms of solution quality and computation efficiency. We also verify the generalization performance of our approach by applying it to larger sizes of up to 1000 tasks. Moreover, we also show via an ablation study that our ITS can help achieve a balance between the performance and training efficiency.<br />13 pages, 7 figures
ISSN:15583783
15455955
DOI:10.1109/tase.2024.3358894