Novel task decomposed multi-agent twin delayed deep deterministic policy gradient algorithm for multi-UAV autonomous path planning

•This study develops a novel TD-MATD3 model to improve convergence efficiency.•A novel reward function is also designed to facilitate the convergence of the algorithm.•The TD-MATD3 can obtain superior performance in complex dynamic environments. Path planning is one of the most essential parts of ta...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Knowledge-based systems Ročník 287; s. 111462
Hlavní autoři: Zhou, Yatong, Kong, Xiaoran, Lin, Kuo-Ping, Liu, Liangyu
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 05.03.2024
Témata:
ISSN:0950-7051, 1872-7409
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•This study develops a novel TD-MATD3 model to improve convergence efficiency.•A novel reward function is also designed to facilitate the convergence of the algorithm.•The TD-MATD3 can obtain superior performance in complex dynamic environments. Path planning is one of the most essential parts of task planning. However, multiple unmanned aerial vehicles (UAVs) path planning is a challenge when considering the cooperativity of multiple UAVs and the uncertainty of environments. This study proposed the novel task decomposed multi-agent twin delayed deep deterministic policy gradient (TD-MATD3) algorithm that enables UAVs execute path planning in complex multiple obstacles environments. TD-MATD3 improves upon the multi-agent twin delayed deep deterministic policy gradient (MATD3) algorithm by decomposing path planning task into the navigation task module for flying to the target and the obstacle avoidance task module for avoiding obstacles and other UAVs. Specifically, TD-MATD3 decomposes the Actor-Critic network structure of MATD3 into two corresponding parts according to the reward functions of two task modules. And the navigation features output by the Actor-Critic network of the navigation task module are input to the Actor-Critic network of the obstacle avoidance task module to guide UAVs to complete the overall path planning task. A novel reward function is also proposed to facilitate convergence of the algorithm. Experimental results indicate that TD-MATD3 can effectively accelerate convergence and enhance convergence effect during the training process, and it achieves a higher success rate in complex dynamic environments than multi-agent deep deterministic policy gradient (MADDPG) and MATD3 for multi-UAV path planning problem.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2024.111462