Research on Decision-Making Strategies for Multi-Agent UAVs in Island Missions Based on Rainbow Fusion MADDPG Algorithm
To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this pape...
Gespeichert in:
| Veröffentlicht in: | Drones (Basel) Jg. 9; H. 10; S. 673 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Basel
MDPI AG
01.10.2025
|
| Schlagworte: | |
| ISSN: | 2504-446X, 2504-446X |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this paper proposes an enhanced algorithm by integrating the Rainbow module. The proposed algorithm improves long-term reward optimization through prioritized experience replay (PER) and multi-step TD updating mechanisms. Additionally, a dynamic reward allocation strategy is introduced to enhance the collaborative and adaptive decision-making capabilities of agents in complex adversarial scenarios. Furthermore, behavioral cloning is employed to accelerate convergence during the initial training phase. Extensive experiments are conducted on the MaCA simulation platform for 5 vs. 5 to 10 vs. 10 UAV island capture missions. The results demonstrate that the Rainbow-MADDPG outperforms the original MADDPG in several key metrics: (1) The average reward value improves across all confrontation scales, with notable enhancements in 6 vs. 6 and 7 vs. 7 tasks, achieving reward values of 14, representing 6.05-fold and 2.5-fold improvements over the baseline, respectively. (2) The convergence speed increases by 40%. (3) The combat effectiveness preservation rate doubles that of the baseline. Moreover, the algorithm achieves the highest average reward value in quasi-rectangular island scenarios, demonstrating its strong adaptability to large-scale dynamic game environments. This study provides an innovative technical solution to address the challenges of strategy stability and efficiency imbalance in multi-agent autonomous control tasks, with significant application potential in UAV defense, cluster cooperative tasks, and related fields. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2504-446X 2504-446X |
| DOI: | 10.3390/drones9100673 |