Novel task decomposed multi-agent twin delayed deep deterministic policy gradient algorithm for multi-UAV autonomous path planning

•This study develops a novel TD-MATD3 model to improve convergence efficiency.•A novel reward function is also designed to facilitate the convergence of the algorithm.•The TD-MATD3 can obtain superior performance in complex dynamic environments. Path planning is one of the most essential parts of ta...

Full description

Saved in:

Bibliographic Details
Published in:	Knowledge-based systems Vol. 287; p. 111462
Main Authors:	Zhou, Yatong, Kong, Xiaoran, Lin, Kuo-Ping, Liu, Liangyu
Format:	Journal Article
Language:	English
Published:	Elsevier B.V 05.03.2024
Subjects:	Decomposed Actor-Critic network DRL MATD3 Multiple UAVs Path planning Path planning MATD3 Decomposed Actor-Critic network Multiple UAVs DRL
ISSN:	0950-7051, 1872-7409
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•This study develops a novel TD-MATD3 model to improve convergence efficiency.•A novel reward function is also designed to facilitate the convergence of the algorithm.•The TD-MATD3 can obtain superior performance in complex dynamic environments. Path planning is one of the most essential parts of task planning. However, multiple unmanned aerial vehicles (UAVs) path planning is a challenge when considering the cooperativity of multiple UAVs and the uncertainty of environments. This study proposed the novel task decomposed multi-agent twin delayed deep deterministic policy gradient (TD-MATD3) algorithm that enables UAVs execute path planning in complex multiple obstacles environments. TD-MATD3 improves upon the multi-agent twin delayed deep deterministic policy gradient (MATD3) algorithm by decomposing path planning task into the navigation task module for flying to the target and the obstacle avoidance task module for avoiding obstacles and other UAVs. Specifically, TD-MATD3 decomposes the Actor-Critic network structure of MATD3 into two corresponding parts according to the reward functions of two task modules. And the navigation features output by the Actor-Critic network of the navigation task module are input to the Actor-Critic network of the obstacle avoidance task module to guide UAVs to complete the overall path planning task. A novel reward function is also proposed to facilitate convergence of the algorithm. Experimental results indicate that TD-MATD3 can effectively accelerate convergence and enhance convergence effect during the training process, and it achieves a higher success rate in complex dynamic environments than multi-agent deep deterministic policy gradient (MADDPG) and MATD3 for multi-UAV path planning problem.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2024.111462