Research on Decision-Making Strategies for Multi-Agent UAVs in Island Missions Based on Rainbow Fusion MADDPG Algorithm

To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this pape...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Drones (Basel) Ročník 9; číslo 10; s. 673
Hlavní autoři: Yang, Chaofan, Zhang, Bo, Zhang, Meng, Wang, Qi, Zhu, Peican
Médium: Journal Article
Jazyk:angličtina
Vydáno: Basel MDPI AG 01.10.2025
Témata:
ISSN:2504-446X, 2504-446X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this paper proposes an enhanced algorithm by integrating the Rainbow module. The proposed algorithm improves long-term reward optimization through prioritized experience replay (PER) and multi-step TD updating mechanisms. Additionally, a dynamic reward allocation strategy is introduced to enhance the collaborative and adaptive decision-making capabilities of agents in complex adversarial scenarios. Furthermore, behavioral cloning is employed to accelerate convergence during the initial training phase. Extensive experiments are conducted on the MaCA simulation platform for 5 vs. 5 to 10 vs. 10 UAV island capture missions. The results demonstrate that the Rainbow-MADDPG outperforms the original MADDPG in several key metrics: (1) The average reward value improves across all confrontation scales, with notable enhancements in 6 vs. 6 and 7 vs. 7 tasks, achieving reward values of 14, representing 6.05-fold and 2.5-fold improvements over the baseline, respectively. (2) The convergence speed increases by 40%. (3) The combat effectiveness preservation rate doubles that of the baseline. Moreover, the algorithm achieves the highest average reward value in quasi-rectangular island scenarios, demonstrating its strong adaptability to large-scale dynamic game environments. This study provides an innovative technical solution to address the challenges of strategy stability and efficiency imbalance in multi-agent autonomous control tasks, with significant application potential in UAV defense, cluster cooperative tasks, and related fields.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2504-446X
2504-446X
DOI:10.3390/drones9100673