Research on Decision-Making Strategies for Multi-Agent UAVs in Island Missions Based on Rainbow Fusion MADDPG Algorithm

To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this pape...

Full description

Saved in:
Bibliographic Details
Published in:Drones (Basel) Vol. 9; no. 10; p. 673
Main Authors: Yang, Chaofan, Zhang, Bo, Zhang, Meng, Wang, Qi, Zhu, Peican
Format: Journal Article
Language:English
Published: Basel MDPI AG 01.10.2025
Subjects:
ISSN:2504-446X, 2504-446X
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this paper proposes an enhanced algorithm by integrating the Rainbow module. The proposed algorithm improves long-term reward optimization through prioritized experience replay (PER) and multi-step TD updating mechanisms. Additionally, a dynamic reward allocation strategy is introduced to enhance the collaborative and adaptive decision-making capabilities of agents in complex adversarial scenarios. Furthermore, behavioral cloning is employed to accelerate convergence during the initial training phase. Extensive experiments are conducted on the MaCA simulation platform for 5 vs. 5 to 10 vs. 10 UAV island capture missions. The results demonstrate that the Rainbow-MADDPG outperforms the original MADDPG in several key metrics: (1) The average reward value improves across all confrontation scales, with notable enhancements in 6 vs. 6 and 7 vs. 7 tasks, achieving reward values of 14, representing 6.05-fold and 2.5-fold improvements over the baseline, respectively. (2) The convergence speed increases by 40%. (3) The combat effectiveness preservation rate doubles that of the baseline. Moreover, the algorithm achieves the highest average reward value in quasi-rectangular island scenarios, demonstrating its strong adaptability to large-scale dynamic game environments. This study provides an innovative technical solution to address the challenges of strategy stability and efficiency imbalance in multi-agent autonomous control tasks, with significant application potential in UAV defense, cluster cooperative tasks, and related fields.
AbstractList To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this paper proposes an enhanced algorithm by integrating the Rainbow module. The proposed algorithm improves long-term reward optimization through prioritized experience replay (PER) and multi-step TD updating mechanisms. Additionally, a dynamic reward allocation strategy is introduced to enhance the collaborative and adaptive decision-making capabilities of agents in complex adversarial scenarios. Furthermore, behavioral cloning is employed to accelerate convergence during the initial training phase. Extensive experiments are conducted on the MaCA simulation platform for 5 vs. 5 to 10 vs. 10 UAV island capture missions. The results demonstrate that the Rainbow-MADDPG outperforms the original MADDPG in several key metrics: (1) The average reward value improves across all confrontation scales, with notable enhancements in 6 vs. 6 and 7 vs. 7 tasks, achieving reward values of 14, representing 6.05-fold and 2.5-fold improvements over the baseline, respectively. (2) The convergence speed increases by 40%. (3) The combat effectiveness preservation rate doubles that of the baseline. Moreover, the algorithm achieves the highest average reward value in quasi-rectangular island scenarios, demonstrating its strong adaptability to large-scale dynamic game environments. This study provides an innovative technical solution to address the challenges of strategy stability and efficiency imbalance in multi-agent autonomous control tasks, with significant application potential in UAV defense, cluster cooperative tasks, and related fields.
What are the main findings? * This study presents an enhanced algorithm that integrates the Rainbow module to improve the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm for multi-agent UAV cooperative and competitive scenarios. * The proposed algorithm incorporates Prioritized Experience Replay (PER) and multi-step TD updating to optimize long-term reward perception and enhance learning efficiency. Behavioral cloning is also employed to accelerate convergence during initial training. This study presents an enhanced algorithm that integrates the Rainbow module to improve the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm for multi-agent UAV cooperative and competitive scenarios. The proposed algorithm incorporates Prioritized Experience Replay (PER) and multi-step TD updating to optimize long-term reward perception and enhance learning efficiency. Behavioral cloning is also employed to accelerate convergence during initial training. What is the implication of the main finding? * Experimental results on a UAV island capture simulation demonstrate that the enhanced algorithm outperforms the original MADDPG, showing a 40% increase in convergence speed and a doubled combat power preservation rate. * The algorithm proves to be a robust and efficient solution for complex, dynamic, multi-agent game environments. Experimental results on a UAV island capture simulation demonstrate that the enhanced algorithm outperforms the original MADDPG, showing a 40% increase in convergence speed and a doubled combat power preservation rate. The algorithm proves to be a robust and efficient solution for complex, dynamic, multi-agent game environments. To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this paper proposes an enhanced algorithm by integrating the Rainbow module. The proposed algorithm improves long-term reward optimization through prioritized experience replay (PER) and multi-step TD updating mechanisms. Additionally, a dynamic reward allocation strategy is introduced to enhance the collaborative and adaptive decision-making capabilities of agents in complex adversarial scenarios. Furthermore, behavioral cloning is employed to accelerate convergence during the initial training phase. Extensive experiments are conducted on the MaCA simulation platform for 5 vs. 5 to 10 vs. 10 UAV island capture missions. The results demonstrate that the Rainbow-MADDPG outperforms the original MADDPG in several key metrics: (1) The average reward value improves across all confrontation scales, with notable enhancements in 6 vs. 6 and 7 vs. 7 tasks, achieving reward values of 14, representing 6.05-fold and 2.5-fold improvements over the baseline, respectively. (2) The convergence speed increases by 40%. (3) The combat effectiveness preservation rate doubles that of the baseline. Moreover, the algorithm achieves the highest average reward value in quasi-rectangular island scenarios, demonstrating its strong adaptability to large-scale dynamic game environments. This study provides an innovative technical solution to address the challenges of strategy stability and efficiency imbalance in multi-agent autonomous control tasks, with significant application potential in UAV defense, cluster cooperative tasks, and related fields.
What are the main findings? This study presents an enhanced algorithm that integrates the Rainbow module to improve the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm for multi-agent UAV cooperative and competitive scenarios. The proposed algorithm incorporates Prioritized Experience Replay (PER) and multi-step TD updating to optimize long-term reward perception and enhance learning efficiency. Behavioral cloning is also employed to accelerate convergence during initial training. What is the implication of the main finding? Experimental results on a UAV island capture simulation demonstrate that the enhanced algorithm outperforms the original MADDPG, showing a 40% increase in convergence speed and a doubled combat power preservation rate. The algorithm proves to be a robust and efficient solution for complex, dynamic, multi-agent game environments. To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this paper proposes an enhanced algorithm by integrating the Rainbow module. The proposed algorithm improves long-term reward optimization through prioritized experience replay (PER) and multi-step TD updating mechanisms. Additionally, a dynamic reward allocation strategy is introduced to enhance the collaborative and adaptive decision-making capabilities of agents in complex adversarial scenarios. Furthermore, behavioral cloning is employed to accelerate convergence during the initial training phase. Extensive experiments are conducted on the MaCA simulation platform for 5 vs. 5 to 10 vs. 10 UAV island capture missions. The results demonstrate that the Rainbow-MADDPG outperforms the original MADDPG in several key metrics: (1) The average reward value improves across all confrontation scales, with notable enhancements in 6 vs. 6 and 7 vs. 7 tasks, achieving reward values of 14, representing 6.05-fold and 2.5-fold improvements over the baseline, respectively. (2) The convergence speed increases by 40%. (3) The combat effectiveness preservation rate doubles that of the baseline. Moreover, the algorithm achieves the highest average reward value in quasi-rectangular island scenarios, demonstrating its strong adaptability to large-scale dynamic game environments. This study provides an innovative technical solution to address the challenges of strategy stability and efficiency imbalance in multi-agent autonomous control tasks, with significant application potential in UAV defense, cluster cooperative tasks, and related fields.
Audience Academic
Author Yang, Chaofan
Wang, Qi
Zhang, Meng
Zhu, Peican
Zhang, Bo
Author_xml – sequence: 1
  givenname: Chaofan
  surname: Yang
  fullname: Yang, Chaofan
– sequence: 2
  givenname: Bo
  orcidid: 0000-0002-4568-8035
  surname: Zhang
  fullname: Zhang, Bo
– sequence: 3
  givenname: Meng
  orcidid: 0000-0002-8744-2922
  surname: Zhang
  fullname: Zhang, Meng
– sequence: 4
  givenname: Qi
  orcidid: 0000-0002-7028-4956
  surname: Wang
  fullname: Wang, Qi
– sequence: 5
  givenname: Peican
  orcidid: 0000-0002-8389-1093
  surname: Zhu
  fullname: Zhu, Peican
BookMark eNpVkUtv1DAUhSPUSpTSJXtLrFPs2PFjGTq0jNQRqC-xi278SD1k7GJnVPHvcToVAnlx7Wufz0f3vKuOQgy2qj4QfE6pwp9MKuesCMZc0DfVSdNiVjPGfxz9s39bneW8xRg3DWu5IifV843NFpJ-RDGgldU--xjqDfz0YUS3c4LZjt5m5GJCm_00-7obbZjRffeQkQ9onScIBm18XoQZfYZszcK6AR-G-Iwu98sF2nSr1fcr1E1jTH5-3L2vjh1M2Z691tPq7vLL3cXX-vrb1fqiu6415WKuhWgHrqUzWinTKj1gxhpSvGsqtAPiWiLpoFrDpWMA2PBBuEHjQSosJKen1fqANRG2_VPyO0i_-wi-f2nENPaQZq8n2-t2sFpqzUEyBoRJaUXDwUhmAVPhCuvjgfWU4q-9zXO_jfsUivueNryVnBC8_Hh-eDVCgfrgYhmiLsvYndclI-dLv5O8YQKzRhVBfRDoFHNO1v21SXC_RNv_Fy39A68TmNg
Cites_doi 10.1007/978-3-642-14435-6_7
10.1609/aaai.v32i1.11796
10.1109/IROS47612.2022.9982096
10.1609/aiide.v13i1.12922
10.1109/JAS.2015.7032901
10.1038/nature14540
10.1038/s41586-020-03051-4
10.3390/drones8010018
10.1016/j.asoc.2024.111968
10.1016/j.knosys.2015.10.022
10.1038/nature16961
10.1360/SSI-2022-0222
10.1007/978-3-319-71682-4_5
10.4028/www.scientific.net/AMM.494-495.1102
10.3390/drones9050321
10.1109/ACCESS.2022.3199070
10.1038/nature24270
10.3390/drones8080382
10.1360/N112018-00338
10.1038/s41586-019-1724-z
10.1142/S2301385023410029
10.1609/aaai.v32i1.11631
10.1016/B978-1-55860-335-6.50027-1
10.3390/drones8060238
10.1016/j.eswa.2007.12.033
10.1360/SSI-2022-0303
10.2514/1.46815
10.1109/TRO.2005.851373
ContentType Journal Article
Copyright COPYRIGHT 2025 MDPI AG
2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: COPYRIGHT 2025 MDPI AG
– notice: 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
8FE
8FG
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
DOA
DOI 10.3390/drones9100673
DatabaseName CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials
ProQuest Central (NC Live)
Technology collection
ProQuest One Community College
ProQuest Central
SciTech Premium Collection
ProQuest advanced technologies & aerospace journals
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
Directory of Open Access Journals (Open Access)
DatabaseTitle CrossRef
Publicly Available Content Database
Advanced Technologies & Aerospace Collection
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
Advanced Technologies & Aerospace Database
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList CrossRef

Publicly Available Content Database

Database_xml – sequence: 1
  dbid: DOA
  name: Directory of Open Access Journals (Open Access)
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
EISSN 2504-446X
ExternalDocumentID oai_doaj_org_article_c5bec8cc6a844a1488e726ad84ea037f
A862470429
10_3390_drones9100673
GeographicLocations China
GeographicLocations_xml – name: China
GroupedDBID AADQD
AAFWJ
AAYXX
ADBBV
AFFHD
AFKRA
AFPKN
AFZYC
ALMA_UNASSIGNED_HOLDINGS
ARAPS
BCNDV
BENPR
BGLVJ
CCPQU
CITATION
GROUPED_DOAJ
HCIFZ
IAO
ITC
MODMG
M~E
OK1
PHGZM
PHGZT
PIMPY
PQGLB
8FE
8FG
ABUWG
AZQEC
DWQXO
P62
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c367t-775b6c8fdc99d59cb04421456c37cfa1f5183b95d68f4aa0d6b7fbc0b8907863
IEDL.DBID P5Z
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001602854900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2504-446X
IngestDate Mon Nov 03 22:05:11 EST 2025
Tue Oct 28 21:37:14 EDT 2025
Sat Nov 29 10:29:38 EST 2025
Sat Nov 29 07:14:03 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 10
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c367t-775b6c8fdc99d59cb04421456c37cfa1f5183b95d68f4aa0d6b7fbc0b8907863
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-4568-8035
0000-0002-7028-4956
0000-0002-8389-1093
0000-0002-8744-2922
OpenAccessLink https://www.proquest.com/docview/3265861106?pq-origsite=%requestingapplication%
PQID 3265861106
PQPubID 5046906
ParticipantIDs doaj_primary_oai_doaj_org_article_c5bec8cc6a844a1488e726ad84ea037f
proquest_journals_3265861106
gale_infotracacademiconefile_A862470429
crossref_primary_10_3390_drones9100673
PublicationCentury 2000
PublicationDate 2025-10-01
PublicationDateYYYYMMDD 2025-10-01
PublicationDate_xml – month: 10
  year: 2025
  text: 2025-10-01
  day: 01
PublicationDecade 2020
PublicationPlace Basel
PublicationPlace_xml – name: Basel
PublicationTitle Drones (Basel)
PublicationYear 2025
Publisher MDPI AG
Publisher_xml – name: MDPI AG
References Li (ref_3) 2022; 52
Aler (ref_27) 2009; 36
Chen (ref_10) 2014; 494
ref_14
Chen (ref_21) 2020; 41
ref_36
ref_13
ref_35
ref_33
Zhou (ref_12) 2020; 50
ref_32
Duan (ref_11) 2015; 2
Ho (ref_28) 2016; Volume 29
Liu (ref_43) 2018; 38
McGrew (ref_44) 2010; 33
Silver (ref_18) 2017; 550
Vinyals (ref_20) 2019; 575
ref_19
Wang (ref_39) 2016; 92
ref_17
Schrittwieser (ref_5) 2020; 588
ref_16
Gong (ref_24) 2023; 11
ref_15
Sun (ref_30) 2024; 164
ref_37
Wolpert (ref_42) 2001; Volume 12
Silver (ref_1) 2016; 529
Littman (ref_38) 2015; 521
Yan (ref_4) 2019; 49
Wang (ref_34) 2022; 52
ref_46
Kaneshige (ref_9) 2007; Volume 6560
ref_23
ref_45
Isler (ref_8) 2005; 21
Wang (ref_26) 2024; 54
ref_41
ref_40
Li (ref_22) 2022; 10
Foerster (ref_31) 2016; Volume 29
ref_2
ref_29
Bi (ref_6) 2022; 33
Zhou (ref_25) 2023; 45
ref_7
References_xml – ident: ref_37
  doi: 10.1007/978-3-642-14435-6_7
– ident: ref_46
  doi: 10.1609/aaai.v32i1.11796
– ident: ref_23
  doi: 10.1109/IROS47612.2022.9982096
– volume: 33
  start-page: 2838
  year: 2022
  ident: ref_6
  article-title: A data-driven modeling method for game adversity agent
  publication-title: J. Syst. Simul.
– ident: ref_7
  doi: 10.1609/aiide.v13i1.12922
– volume: 2
  start-page: 11
  year: 2015
  ident: ref_11
  article-title: A predator-prey particle swarm optimization approach to multiple ucav air combat modeled by dynamic game theory
  publication-title: IEEE/CAA J. Autom. Sin.
  doi: 10.1109/JAS.2015.7032901
– volume: Volume 6560
  start-page: 68
  year: 2007
  ident: ref_9
  article-title: Artificial immune system approach for air combat maneuvering
  publication-title: Intelligent Computing: Theory and Applications V
– volume: 38
  start-page: 109
  year: 2018
  ident: ref_43
  article-title: A decision making strategy for generating unit tripping under emergency circumstances based on deep reinforcement learning
  publication-title: Proc. CSEE
– volume: 521
  start-page: 445
  year: 2015
  ident: ref_38
  article-title: Reinforcement learning improves behaviour from evaluative feedback
  publication-title: Nature
  doi: 10.1038/nature14540
– volume: Volume 12
  start-page: 265
  year: 2001
  ident: ref_42
  article-title: Optimal payoff functions for members of multiagent teams
  publication-title: Advances in Neural Information Processing Systems
– ident: ref_40
– volume: Volume 29
  start-page: 4572
  year: 2016
  ident: ref_28
  article-title: Generative adversarial imitation learning
  publication-title: Advances in Neural Information Processing Systems
– volume: 45
  start-page: 99
  year: 2023
  ident: ref_25
  article-title: Research on heterogeneous multi-agent reinforcement learning algorithm integrating prior knowledge
  publication-title: Command Control Simul.
– volume: 588
  start-page: 604
  year: 2020
  ident: ref_5
  article-title: Mastering atari, go, chess and shogi by planning with a learned model
  publication-title: Nature
  doi: 10.1038/s41586-020-03051-4
– ident: ref_13
  doi: 10.3390/drones8010018
– volume: 41
  start-page: 324152
  year: 2020
  ident: ref_21
  article-title: Asymmetric maneuverability Multi-UAV intelligent coordinated attack and defense confrontation
  publication-title: Acta Aeronaut. Astronaut. Sin.
– ident: ref_35
– volume: 164
  start-page: 111968
  year: 2024
  ident: ref_30
  article-title: Cooperative defense of autonomous surface vessels with quantity disadvantage using behavior cloning and deep reinforcement learning
  publication-title: Appl. Soft Comput.
  doi: 10.1016/j.asoc.2024.111968
– volume: 92
  start-page: 151
  year: 2016
  ident: ref_39
  article-title: Effective service composition using multi-agent reinforcement learning
  publication-title: Knowl. Based Syst.
  doi: 10.1016/j.knosys.2015.10.022
– volume: 529
  start-page: 484
  year: 2016
  ident: ref_1
  article-title: Mastering the game of go with deep neural networks and tree search
  publication-title: Nature
  doi: 10.1038/nature16961
– volume: 52
  start-page: 2239
  year: 2022
  ident: ref_3
  article-title: Human-computer gaming decision-making method in air combat under an incomplete strategy set
  publication-title: Sci. Sin. Inform.
  doi: 10.1360/SSI-2022-0222
– ident: ref_32
  doi: 10.1007/978-3-319-71682-4_5
– volume: 494
  start-page: 1102
  year: 2014
  ident: ref_10
  article-title: Study on multi-uav air combat game based on fuzzy strategy
  publication-title: Appl. Mech. Mater.
  doi: 10.4028/www.scientific.net/AMM.494-495.1102
– ident: ref_16
  doi: 10.3390/drones9050321
– volume: 10
  start-page: 91385
  year: 2022
  ident: ref_22
  article-title: Collaborative decision-making method for multi-uav based on multiagent reinforcement learning
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2022.3199070
– ident: ref_29
– ident: ref_33
– ident: ref_2
– volume: 550
  start-page: 354
  year: 2017
  ident: ref_18
  article-title: Mastering the game of go without human knowledge
  publication-title: Nature
  doi: 10.1038/nature24270
– ident: ref_15
  doi: 10.3390/drones8080382
– volume: 49
  start-page: 555
  year: 2019
  ident: ref_4
  article-title: Real-time task allocation for a heterogeneous multi-uav simultaneous attack
  publication-title: Sci. Sin. Inform.
  doi: 10.1360/N112018-00338
– volume: 575
  start-page: 350
  year: 2019
  ident: ref_20
  article-title: Grandmaster level in starcraft II using multi-agent reinforcement learning
  publication-title: Nature
  doi: 10.1038/s41586-019-1724-z
– volume: 11
  start-page: 273
  year: 2023
  ident: ref_24
  article-title: Uav cooperative air combat maneuvering confrontation based on multi-agent reinforcement learning
  publication-title: Unmanned Syst.
  doi: 10.1142/S2301385023410029
– volume: 50
  start-page: 363
  year: 2020
  ident: ref_12
  article-title: An unmanned air combat system based on swarm intelligence
  publication-title: Sci. China Inf. Sci.
– volume: 54
  start-page: 1175
  year: 2024
  ident: ref_26
  article-title: Introducing a counterfactual baseline for the UAV cluster adversarial game approach
  publication-title: Sci. Sin. Inform.
– ident: ref_36
  doi: 10.1609/aaai.v32i1.11631
– volume: Volume 29
  start-page: 2145
  year: 2016
  ident: ref_31
  article-title: Learning to communicate with deep multi-agent reinforcement learning
  publication-title: Advances in Neural Information Processing Systems
– ident: ref_41
  doi: 10.1016/B978-1-55860-335-6.50027-1
– ident: ref_17
– ident: ref_45
– ident: ref_19
– ident: ref_14
  doi: 10.3390/drones8060238
– volume: 36
  start-page: 1850
  year: 2009
  ident: ref_27
  article-title: Programming robosoccer agents by modeling human behavior
  publication-title: Expert Syst. Appl.
  doi: 10.1016/j.eswa.2007.12.033
– volume: 52
  start-page: 2254
  year: 2022
  ident: ref_34
  article-title: Masac-based confrontation game method of uav clusters
  publication-title: Sci. Sin. Inform.
  doi: 10.1360/SSI-2022-0303
– volume: 33
  start-page: 1641
  year: 2010
  ident: ref_44
  article-title: Air-combat strategy using approximate dynamic programming
  publication-title: J. Guid. Control Dyn.
  doi: 10.2514/1.46815
– volume: 21
  start-page: 875
  year: 2005
  ident: ref_8
  article-title: Randomized pursuit-evasion in a polygonal environment
  publication-title: IEEE Trans. Robot.
  doi: 10.1109/TRO.2005.851373
SSID ssj0002245691
Score 2.3048756
Snippet To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence...
What are the main findings? * This study presents an enhanced algorithm that integrates the Rainbow module to improve the Multi-Agent Deep Deterministic Policy...
What are the main findings? This study presents an enhanced algorithm that integrates the Rainbow module to improve the Multi-Agent Deep Deterministic Policy...
SourceID doaj
proquest
gale
crossref
SourceType Open Website
Aggregation Database
Index Database
StartPage 673
SubjectTerms Algorithms
Analysis
Behavior
Cloning
Collaboration
Control algorithms
Control tasks
Convergence
Decision making
Deep learning
Drone aircraft
Efficiency
Expected values
Game theory
Immune system
MADDPG
Missions
Modules
multi-agent
Multi-agent systems
multi-step TD update
Multiagent systems
Optimization
prioritized experience replay
rainbow
Rainbows
reinforcement learning
Simulation
Stability
Strategy
Teaching methods
SummonAdditionalLinks – databaseName: Directory of Open Access Journals (Open Access)
  dbid: DOA
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELVQxcCCQIAoX_KAYIqaNo4_xkApDBR1KKib5dgxFEGKmgJ_nzsnRXRALKxxFFvv2b475e4dIadOpV46qcBz6_KIuZ6IgGYTec6c8dzAaCgUvhV3d3IyUaMfrb4wJ6yWB66B69gUZpHWciMZM-C8y0L0uHGSFSZOhMfbNxbqRzD1HERdwDFQ3VpUM4G4vuPmqH0PxhE7s6wYoaDV_9uNHMzMYItsNv4hzep1bZO1otwhn8v8ODorab_pihMNQyMputSXLSoKDigNFbVRhhVT9D57qOi0pMh76ehwiimvZUUvwHQ5_Fbzd4cO3nGADrN-f3RNs5fH2Xy6eHrdJePB1fjyJmo6JkQ24WIBgKc5t9I7q5RLlc1jxlCKnNtEWG-6PoUTnKvUcemZMbHjufC5jXMJMbLkyR5plQDTPqEucSpRqYTwjDEBMZLvFS7x0orcxdwWbXK2RFC_1boYGuIJhFqvQN0mF4jv90soZx0eAMm6IVn_RXKbnCM7Gg8dgGpNUzsA86B8lc6wzEWgbW2ToyWBujmNlQYXNZUcHB1-8B-rOSQbPewCHFL6jkhrMX8vjsm6_VhMq_lJ2Ihftzzkjw
  priority: 102
  providerName: Directory of Open Access Journals
Title Research on Decision-Making Strategies for Multi-Agent UAVs in Island Missions Based on Rainbow Fusion MADDPG Algorithm
URI https://www.proquest.com/docview/3265861106
https://doaj.org/article/c5bec8cc6a844a1488e726ad84ea037f
Volume 9
WOSCitedRecordID wos001602854900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: Directory of Open Access Journals (Open Access)
  customDbUrl:
  eissn: 2504-446X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002245691
  issn: 2504-446X
  databaseCode: DOA
  dateStart: 20170101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2504-446X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002245691
  issn: 2504-446X
  databaseCode: M~E
  dateStart: 20170101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: ProQuest advanced technologies & aerospace journals
  customDbUrl:
  eissn: 2504-446X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002245691
  issn: 2504-446X
  databaseCode: P5Z
  dateStart: 20210101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 2504-446X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002245691
  issn: 2504-446X
  databaseCode: BENPR
  dateStart: 20210101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database
  customDbUrl:
  eissn: 2504-446X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002245691
  issn: 2504-446X
  databaseCode: PIMPY
  dateStart: 20210101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3Nb9MwFLdg48CFgQBRGJUPCE7R0sSfJ5TSdSDRKkIDDS6WY8ejEqSj6ca_z3uuM7TDuHCNo8Ty7_l92O_9HiGvvOZBeaXBc5uIjPlCZgCzzYJg3gZhYTQWCn-Uy6U6O9N1OnDrU1rloBOjovZrh2fkR-BmcCXAWIm3F78y7BqFt6uphcZdso8sCdi6oebfrs9YCrzV05MdtWYJ0f2R3yADPphI7M9ywxRFxv7b9HI0NvOD_53mQ_IguZm02snFI3Kn7R6T30OaHV13dJaa62SL2I-KDjS1bU_Bj6WxMDersPCKfq6-9HTVURSfztPFCjNnu55OwQJ6_Fa6JKLzSxygi2o2q09o9eMcZrb9_vMJOZ0fn757n6XGC5krhdwCbrwRTgXvtPZcuyZnDBnNhSulC3YSOCiCRnMvVGDW5l40MjQubxSE2kqUT8leB-v8jFBfel1qriDKY0xCqBWK1pdBOdn4XLh2RF4PEJiLHb2GgbAEsTI3sBqRKQJ0_RKyYscH6825SZvMOA4SqZwTVjFmIdBTrSyE9Yq1Ni9lGJE3CK_BvQuL6mwqQYD_IAuWqbBaRqKJHpHDAV6TNnVv_mL7_N_DL8j9AtsEx5y_Q7K33Vy2L8k9d7Vd9Zsx2Z8eL-tP4xj-j6PEwrP6w6L--gcZbPYE
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9NAEB6VFAkuPASogQJ74HGy6tjrfRwQcgmhUZMoh4DKabXe9ZZI4JQ4peJH8R-ZceyiHuDWA1ev5cfOt_PYnZkP4IXXWVBeafTcBiLiPpERitlGQXBvg7A42hQKT-Rspk5O9HwHfnW1MJRW2enERlH7laM98gN0MzIl0FiJt2ffI2KNotPVjkJjC4vj8ucFhmz1m_EQ5fsySUbvF--OopZVIHKpkBv8qKwQTgXvtPaZdkXMObXrFi6VLthByBDlhc68UIFbG3tRyFC4uFAYRyqR4mNvwC4nrPdgdz6ezj9fbuokdIyoB9tenmmq4wO_ppb7aJOJEOaK7WsoAv5mCBrrNrr7n83LPbjTutEs3-L-PuyU1QO46NII2apiw5Y8KJo2fFusa8Nb1gz9dNYUHkc5FZaxj_mnmi0rRsuj8my6pMzgqmaHaOE9Pas9BGOjcxpg03w4nH9g-ddTnIjNl28PYXEdv_oIehWKdQ-YT71OdaYwiuVcYigZktKnQTlZ-Fi4sg-vOombs237EINhF0HDXIFGHw4JD5c3Udfv5sJqfWpaJWJchitOOSes4txiIKtKmQjrFS9tnMrQh9eEJkO6CSfV2bbEAt9DXb5MTtVAklyQPux3aDKt0qrNHyg9_vfwc7h1tJhOzGQ8O34CtxOiRG7yG_eht1mfl0_hpvuxWdbrZ-0CYWCuGXq_AeJQT5E
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9QwEB6VghAXHgLUhVJ84HGKNps4fhxQlTYsVO2u9lBQxcVy7LisBNmy2VLx0_h3nckmRT3ArQeucZSH5_M87Jn5AF55nQXllUbPbSQi7hMZoZhtFAT3NgiLo22h8JGcTtXJiZ5twO--FobSKnud2Cpqv3C0Rz5ENyNTAo2VGIYuLWJWjHfPfkTEIEUnrT2dxhoih9WvCwzfmncHBcr6dZKM3x_vf4w6hoHIpUKu8AOzUjgVvNPaZ9qVMefUulu4VLpgRyFDxJc680IFbm3sRSlD6eJSYUypRIqPvQW3JYaYlE04y75cbe8kdKCoR-uunmmq46FfUvN9tM5EDXPNCrZkAX8zCa2dGz_4j2foIdzvnGuWr1fDI9io6sdw0ScXskXNio5SKJq0LFysb85bNQy9d9aWI0c5lZuxT_nnhs1rRoum9mwyp3zhumF7aPc9Pas7GmPjcxpgk7woZh9Y_u0UJ2L19fsTOL6JX30KmzWKeAuYT71OdaYwtuVcYoAZksqnQTlZ-li4agBveumbs3VTEYPBGMHEXIPJAPYIG1c3US_w9sJieWo61WJchutQOSes4txieKsqmQjrFa9snMowgLeELEMaCyfV2a7wAt9Dvb9MTjVCkhyTAWz3yDKdKmvMH1g9-_fwS7iLeDNHB9PD53AvIZ7kNulxGzZXy_PqBdxxP1fzZrnTrhQG5oZxdwmzo1b0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Research+on+Decision-Making+Strategies+for+Multi-Agent+UAVs+in+Island+Missions+Based+on+Rainbow+Fusion+MADDPG+Algorithm&rft.jtitle=Drones+%28Basel%29&rft.au=Yang%2C+Chaofan&rft.au=Zhang%2C+Bo&rft.au=Zhang%2C+Meng&rft.au=Wang%2C+Qi&rft.date=2025-10-01&rft.issn=2504-446X&rft.eissn=2504-446X&rft.volume=9&rft.issue=10&rft.spage=673&rft_id=info:doi/10.3390%2Fdrones9100673&rft.externalDBID=n%2Fa&rft.externalDocID=10_3390_drones9100673
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2504-446X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2504-446X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2504-446X&client=summon