Research on Decision-Making Strategies for Multi-Agent UAVs in Island Missions Based on Rainbow Fusion MADDPG Algorithm
To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this pape...
Saved in:
| Published in: | Drones (Basel) Vol. 9; no. 10; p. 673 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Basel
MDPI AG
01.10.2025
|
| Subjects: | |
| ISSN: | 2504-446X, 2504-446X |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this paper proposes an enhanced algorithm by integrating the Rainbow module. The proposed algorithm improves long-term reward optimization through prioritized experience replay (PER) and multi-step TD updating mechanisms. Additionally, a dynamic reward allocation strategy is introduced to enhance the collaborative and adaptive decision-making capabilities of agents in complex adversarial scenarios. Furthermore, behavioral cloning is employed to accelerate convergence during the initial training phase. Extensive experiments are conducted on the MaCA simulation platform for 5 vs. 5 to 10 vs. 10 UAV island capture missions. The results demonstrate that the Rainbow-MADDPG outperforms the original MADDPG in several key metrics: (1) The average reward value improves across all confrontation scales, with notable enhancements in 6 vs. 6 and 7 vs. 7 tasks, achieving reward values of 14, representing 6.05-fold and 2.5-fold improvements over the baseline, respectively. (2) The convergence speed increases by 40%. (3) The combat effectiveness preservation rate doubles that of the baseline. Moreover, the algorithm achieves the highest average reward value in quasi-rectangular island scenarios, demonstrating its strong adaptability to large-scale dynamic game environments. This study provides an innovative technical solution to address the challenges of strategy stability and efficiency imbalance in multi-agent autonomous control tasks, with significant application potential in UAV defense, cluster cooperative tasks, and related fields. |
|---|---|
| AbstractList | To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this paper proposes an enhanced algorithm by integrating the Rainbow module. The proposed algorithm improves long-term reward optimization through prioritized experience replay (PER) and multi-step TD updating mechanisms. Additionally, a dynamic reward allocation strategy is introduced to enhance the collaborative and adaptive decision-making capabilities of agents in complex adversarial scenarios. Furthermore, behavioral cloning is employed to accelerate convergence during the initial training phase. Extensive experiments are conducted on the MaCA simulation platform for 5 vs. 5 to 10 vs. 10 UAV island capture missions. The results demonstrate that the Rainbow-MADDPG outperforms the original MADDPG in several key metrics: (1) The average reward value improves across all confrontation scales, with notable enhancements in 6 vs. 6 and 7 vs. 7 tasks, achieving reward values of 14, representing 6.05-fold and 2.5-fold improvements over the baseline, respectively. (2) The convergence speed increases by 40%. (3) The combat effectiveness preservation rate doubles that of the baseline. Moreover, the algorithm achieves the highest average reward value in quasi-rectangular island scenarios, demonstrating its strong adaptability to large-scale dynamic game environments. This study provides an innovative technical solution to address the challenges of strategy stability and efficiency imbalance in multi-agent autonomous control tasks, with significant application potential in UAV defense, cluster cooperative tasks, and related fields. What are the main findings? * This study presents an enhanced algorithm that integrates the Rainbow module to improve the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm for multi-agent UAV cooperative and competitive scenarios. * The proposed algorithm incorporates Prioritized Experience Replay (PER) and multi-step TD updating to optimize long-term reward perception and enhance learning efficiency. Behavioral cloning is also employed to accelerate convergence during initial training. This study presents an enhanced algorithm that integrates the Rainbow module to improve the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm for multi-agent UAV cooperative and competitive scenarios. The proposed algorithm incorporates Prioritized Experience Replay (PER) and multi-step TD updating to optimize long-term reward perception and enhance learning efficiency. Behavioral cloning is also employed to accelerate convergence during initial training. What is the implication of the main finding? * Experimental results on a UAV island capture simulation demonstrate that the enhanced algorithm outperforms the original MADDPG, showing a 40% increase in convergence speed and a doubled combat power preservation rate. * The algorithm proves to be a robust and efficient solution for complex, dynamic, multi-agent game environments. Experimental results on a UAV island capture simulation demonstrate that the enhanced algorithm outperforms the original MADDPG, showing a 40% increase in convergence speed and a doubled combat power preservation rate. The algorithm proves to be a robust and efficient solution for complex, dynamic, multi-agent game environments. To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this paper proposes an enhanced algorithm by integrating the Rainbow module. The proposed algorithm improves long-term reward optimization through prioritized experience replay (PER) and multi-step TD updating mechanisms. Additionally, a dynamic reward allocation strategy is introduced to enhance the collaborative and adaptive decision-making capabilities of agents in complex adversarial scenarios. Furthermore, behavioral cloning is employed to accelerate convergence during the initial training phase. Extensive experiments are conducted on the MaCA simulation platform for 5 vs. 5 to 10 vs. 10 UAV island capture missions. The results demonstrate that the Rainbow-MADDPG outperforms the original MADDPG in several key metrics: (1) The average reward value improves across all confrontation scales, with notable enhancements in 6 vs. 6 and 7 vs. 7 tasks, achieving reward values of 14, representing 6.05-fold and 2.5-fold improvements over the baseline, respectively. (2) The convergence speed increases by 40%. (3) The combat effectiveness preservation rate doubles that of the baseline. Moreover, the algorithm achieves the highest average reward value in quasi-rectangular island scenarios, demonstrating its strong adaptability to large-scale dynamic game environments. This study provides an innovative technical solution to address the challenges of strategy stability and efficiency imbalance in multi-agent autonomous control tasks, with significant application potential in UAV defense, cluster cooperative tasks, and related fields. What are the main findings? This study presents an enhanced algorithm that integrates the Rainbow module to improve the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm for multi-agent UAV cooperative and competitive scenarios. The proposed algorithm incorporates Prioritized Experience Replay (PER) and multi-step TD updating to optimize long-term reward perception and enhance learning efficiency. Behavioral cloning is also employed to accelerate convergence during initial training. What is the implication of the main finding? Experimental results on a UAV island capture simulation demonstrate that the enhanced algorithm outperforms the original MADDPG, showing a 40% increase in convergence speed and a doubled combat power preservation rate. The algorithm proves to be a robust and efficient solution for complex, dynamic, multi-agent game environments. To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this paper proposes an enhanced algorithm by integrating the Rainbow module. The proposed algorithm improves long-term reward optimization through prioritized experience replay (PER) and multi-step TD updating mechanisms. Additionally, a dynamic reward allocation strategy is introduced to enhance the collaborative and adaptive decision-making capabilities of agents in complex adversarial scenarios. Furthermore, behavioral cloning is employed to accelerate convergence during the initial training phase. Extensive experiments are conducted on the MaCA simulation platform for 5 vs. 5 to 10 vs. 10 UAV island capture missions. The results demonstrate that the Rainbow-MADDPG outperforms the original MADDPG in several key metrics: (1) The average reward value improves across all confrontation scales, with notable enhancements in 6 vs. 6 and 7 vs. 7 tasks, achieving reward values of 14, representing 6.05-fold and 2.5-fold improvements over the baseline, respectively. (2) The convergence speed increases by 40%. (3) The combat effectiveness preservation rate doubles that of the baseline. Moreover, the algorithm achieves the highest average reward value in quasi-rectangular island scenarios, demonstrating its strong adaptability to large-scale dynamic game environments. This study provides an innovative technical solution to address the challenges of strategy stability and efficiency imbalance in multi-agent autonomous control tasks, with significant application potential in UAV defense, cluster cooperative tasks, and related fields. |
| Audience | Academic |
| Author | Yang, Chaofan Wang, Qi Zhang, Meng Zhu, Peican Zhang, Bo |
| Author_xml | – sequence: 1 givenname: Chaofan surname: Yang fullname: Yang, Chaofan – sequence: 2 givenname: Bo orcidid: 0000-0002-4568-8035 surname: Zhang fullname: Zhang, Bo – sequence: 3 givenname: Meng orcidid: 0000-0002-8744-2922 surname: Zhang fullname: Zhang, Meng – sequence: 4 givenname: Qi orcidid: 0000-0002-7028-4956 surname: Wang fullname: Wang, Qi – sequence: 5 givenname: Peican orcidid: 0000-0002-8389-1093 surname: Zhu fullname: Zhu, Peican |
| BookMark | eNpVkUtv1DAUhSPUSpTSJXtLrFPs2PFjGTq0jNQRqC-xi278SD1k7GJnVPHvcToVAnlx7Wufz0f3vKuOQgy2qj4QfE6pwp9MKuesCMZc0DfVSdNiVjPGfxz9s39bneW8xRg3DWu5IifV843NFpJ-RDGgldU--xjqDfz0YUS3c4LZjt5m5GJCm_00-7obbZjRffeQkQ9onScIBm18XoQZfYZszcK6AR-G-Iwu98sF2nSr1fcr1E1jTH5-3L2vjh1M2Z691tPq7vLL3cXX-vrb1fqiu6415WKuhWgHrqUzWinTKj1gxhpSvGsqtAPiWiLpoFrDpWMA2PBBuEHjQSosJKen1fqANRG2_VPyO0i_-wi-f2nENPaQZq8n2-t2sFpqzUEyBoRJaUXDwUhmAVPhCuvjgfWU4q-9zXO_jfsUivueNryVnBC8_Hh-eDVCgfrgYhmiLsvYndclI-dLv5O8YQKzRhVBfRDoFHNO1v21SXC_RNv_Fy39A68TmNg |
| Cites_doi | 10.1007/978-3-642-14435-6_7 10.1609/aaai.v32i1.11796 10.1109/IROS47612.2022.9982096 10.1609/aiide.v13i1.12922 10.1109/JAS.2015.7032901 10.1038/nature14540 10.1038/s41586-020-03051-4 10.3390/drones8010018 10.1016/j.asoc.2024.111968 10.1016/j.knosys.2015.10.022 10.1038/nature16961 10.1360/SSI-2022-0222 10.1007/978-3-319-71682-4_5 10.4028/www.scientific.net/AMM.494-495.1102 10.3390/drones9050321 10.1109/ACCESS.2022.3199070 10.1038/nature24270 10.3390/drones8080382 10.1360/N112018-00338 10.1038/s41586-019-1724-z 10.1142/S2301385023410029 10.1609/aaai.v32i1.11631 10.1016/B978-1-55860-335-6.50027-1 10.3390/drones8060238 10.1016/j.eswa.2007.12.033 10.1360/SSI-2022-0303 10.2514/1.46815 10.1109/TRO.2005.851373 |
| ContentType | Journal Article |
| Copyright | COPYRIGHT 2025 MDPI AG 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: COPYRIGHT 2025 MDPI AG – notice: 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | AAYXX CITATION 8FE 8FG ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS DOA |
| DOI | 10.3390/drones9100673 |
| DatabaseName | CrossRef ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Essentials ProQuest Central (NC Live) Technology collection ProQuest One Community College ProQuest Central SciTech Premium Collection ProQuest advanced technologies & aerospace journals ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Directory of Open Access Journals (Open Access) |
| DatabaseTitle | CrossRef Publicly Available Content Database Advanced Technologies & Aerospace Collection Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central Advanced Technologies & Aerospace Database ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | CrossRef Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: DOA name: Directory of Open Access Journals (Open Access) url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 2504-446X |
| ExternalDocumentID | oai_doaj_org_article_c5bec8cc6a844a1488e726ad84ea037f A862470429 10_3390_drones9100673 |
| GeographicLocations | China |
| GeographicLocations_xml | – name: China |
| GroupedDBID | AADQD AAFWJ AAYXX ADBBV AFFHD AFKRA AFPKN AFZYC ALMA_UNASSIGNED_HOLDINGS ARAPS BCNDV BENPR BGLVJ CCPQU CITATION GROUPED_DOAJ HCIFZ IAO ITC MODMG M~E OK1 PHGZM PHGZT PIMPY PQGLB 8FE 8FG ABUWG AZQEC DWQXO P62 PKEHL PQEST PQQKQ PQUKI PRINS |
| ID | FETCH-LOGICAL-c367t-775b6c8fdc99d59cb04421456c37cfa1f5183b95d68f4aa0d6b7fbc0b8907863 |
| IEDL.DBID | P5Z |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001602854900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2504-446X |
| IngestDate | Mon Nov 03 22:05:11 EST 2025 Tue Oct 28 21:37:14 EDT 2025 Sat Nov 29 10:29:38 EST 2025 Sat Nov 29 07:14:03 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 10 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c367t-775b6c8fdc99d59cb04421456c37cfa1f5183b95d68f4aa0d6b7fbc0b8907863 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-4568-8035 0000-0002-7028-4956 0000-0002-8389-1093 0000-0002-8744-2922 |
| OpenAccessLink | https://www.proquest.com/docview/3265861106?pq-origsite=%requestingapplication% |
| PQID | 3265861106 |
| PQPubID | 5046906 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_c5bec8cc6a844a1488e726ad84ea037f proquest_journals_3265861106 gale_infotracacademiconefile_A862470429 crossref_primary_10_3390_drones9100673 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-10-01 |
| PublicationDateYYYYMMDD | 2025-10-01 |
| PublicationDate_xml | – month: 10 year: 2025 text: 2025-10-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Basel |
| PublicationPlace_xml | – name: Basel |
| PublicationTitle | Drones (Basel) |
| PublicationYear | 2025 |
| Publisher | MDPI AG |
| Publisher_xml | – name: MDPI AG |
| References | Li (ref_3) 2022; 52 Aler (ref_27) 2009; 36 Chen (ref_10) 2014; 494 ref_14 Chen (ref_21) 2020; 41 ref_36 ref_13 ref_35 ref_33 Zhou (ref_12) 2020; 50 ref_32 Duan (ref_11) 2015; 2 Ho (ref_28) 2016; Volume 29 Liu (ref_43) 2018; 38 McGrew (ref_44) 2010; 33 Silver (ref_18) 2017; 550 Vinyals (ref_20) 2019; 575 ref_19 Wang (ref_39) 2016; 92 ref_17 Schrittwieser (ref_5) 2020; 588 ref_16 Gong (ref_24) 2023; 11 ref_15 Sun (ref_30) 2024; 164 ref_37 Wolpert (ref_42) 2001; Volume 12 Silver (ref_1) 2016; 529 Littman (ref_38) 2015; 521 Yan (ref_4) 2019; 49 Wang (ref_34) 2022; 52 ref_46 Kaneshige (ref_9) 2007; Volume 6560 ref_23 ref_45 Isler (ref_8) 2005; 21 Wang (ref_26) 2024; 54 ref_41 ref_40 Li (ref_22) 2022; 10 Foerster (ref_31) 2016; Volume 29 ref_2 ref_29 Bi (ref_6) 2022; 33 Zhou (ref_25) 2023; 45 ref_7 |
| References_xml | – ident: ref_37 doi: 10.1007/978-3-642-14435-6_7 – ident: ref_46 doi: 10.1609/aaai.v32i1.11796 – ident: ref_23 doi: 10.1109/IROS47612.2022.9982096 – volume: 33 start-page: 2838 year: 2022 ident: ref_6 article-title: A data-driven modeling method for game adversity agent publication-title: J. Syst. Simul. – ident: ref_7 doi: 10.1609/aiide.v13i1.12922 – volume: 2 start-page: 11 year: 2015 ident: ref_11 article-title: A predator-prey particle swarm optimization approach to multiple ucav air combat modeled by dynamic game theory publication-title: IEEE/CAA J. Autom. Sin. doi: 10.1109/JAS.2015.7032901 – volume: Volume 6560 start-page: 68 year: 2007 ident: ref_9 article-title: Artificial immune system approach for air combat maneuvering publication-title: Intelligent Computing: Theory and Applications V – volume: 38 start-page: 109 year: 2018 ident: ref_43 article-title: A decision making strategy for generating unit tripping under emergency circumstances based on deep reinforcement learning publication-title: Proc. CSEE – volume: 521 start-page: 445 year: 2015 ident: ref_38 article-title: Reinforcement learning improves behaviour from evaluative feedback publication-title: Nature doi: 10.1038/nature14540 – volume: Volume 12 start-page: 265 year: 2001 ident: ref_42 article-title: Optimal payoff functions for members of multiagent teams publication-title: Advances in Neural Information Processing Systems – ident: ref_40 – volume: Volume 29 start-page: 4572 year: 2016 ident: ref_28 article-title: Generative adversarial imitation learning publication-title: Advances in Neural Information Processing Systems – volume: 45 start-page: 99 year: 2023 ident: ref_25 article-title: Research on heterogeneous multi-agent reinforcement learning algorithm integrating prior knowledge publication-title: Command Control Simul. – volume: 588 start-page: 604 year: 2020 ident: ref_5 article-title: Mastering atari, go, chess and shogi by planning with a learned model publication-title: Nature doi: 10.1038/s41586-020-03051-4 – ident: ref_13 doi: 10.3390/drones8010018 – volume: 41 start-page: 324152 year: 2020 ident: ref_21 article-title: Asymmetric maneuverability Multi-UAV intelligent coordinated attack and defense confrontation publication-title: Acta Aeronaut. Astronaut. Sin. – ident: ref_35 – volume: 164 start-page: 111968 year: 2024 ident: ref_30 article-title: Cooperative defense of autonomous surface vessels with quantity disadvantage using behavior cloning and deep reinforcement learning publication-title: Appl. Soft Comput. doi: 10.1016/j.asoc.2024.111968 – volume: 92 start-page: 151 year: 2016 ident: ref_39 article-title: Effective service composition using multi-agent reinforcement learning publication-title: Knowl. Based Syst. doi: 10.1016/j.knosys.2015.10.022 – volume: 529 start-page: 484 year: 2016 ident: ref_1 article-title: Mastering the game of go with deep neural networks and tree search publication-title: Nature doi: 10.1038/nature16961 – volume: 52 start-page: 2239 year: 2022 ident: ref_3 article-title: Human-computer gaming decision-making method in air combat under an incomplete strategy set publication-title: Sci. Sin. Inform. doi: 10.1360/SSI-2022-0222 – ident: ref_32 doi: 10.1007/978-3-319-71682-4_5 – volume: 494 start-page: 1102 year: 2014 ident: ref_10 article-title: Study on multi-uav air combat game based on fuzzy strategy publication-title: Appl. Mech. Mater. doi: 10.4028/www.scientific.net/AMM.494-495.1102 – ident: ref_16 doi: 10.3390/drones9050321 – volume: 10 start-page: 91385 year: 2022 ident: ref_22 article-title: Collaborative decision-making method for multi-uav based on multiagent reinforcement learning publication-title: IEEE Access doi: 10.1109/ACCESS.2022.3199070 – ident: ref_29 – ident: ref_33 – ident: ref_2 – volume: 550 start-page: 354 year: 2017 ident: ref_18 article-title: Mastering the game of go without human knowledge publication-title: Nature doi: 10.1038/nature24270 – ident: ref_15 doi: 10.3390/drones8080382 – volume: 49 start-page: 555 year: 2019 ident: ref_4 article-title: Real-time task allocation for a heterogeneous multi-uav simultaneous attack publication-title: Sci. Sin. Inform. doi: 10.1360/N112018-00338 – volume: 575 start-page: 350 year: 2019 ident: ref_20 article-title: Grandmaster level in starcraft II using multi-agent reinforcement learning publication-title: Nature doi: 10.1038/s41586-019-1724-z – volume: 11 start-page: 273 year: 2023 ident: ref_24 article-title: Uav cooperative air combat maneuvering confrontation based on multi-agent reinforcement learning publication-title: Unmanned Syst. doi: 10.1142/S2301385023410029 – volume: 50 start-page: 363 year: 2020 ident: ref_12 article-title: An unmanned air combat system based on swarm intelligence publication-title: Sci. China Inf. Sci. – volume: 54 start-page: 1175 year: 2024 ident: ref_26 article-title: Introducing a counterfactual baseline for the UAV cluster adversarial game approach publication-title: Sci. Sin. Inform. – ident: ref_36 doi: 10.1609/aaai.v32i1.11631 – volume: Volume 29 start-page: 2145 year: 2016 ident: ref_31 article-title: Learning to communicate with deep multi-agent reinforcement learning publication-title: Advances in Neural Information Processing Systems – ident: ref_41 doi: 10.1016/B978-1-55860-335-6.50027-1 – ident: ref_17 – ident: ref_45 – ident: ref_19 – ident: ref_14 doi: 10.3390/drones8060238 – volume: 36 start-page: 1850 year: 2009 ident: ref_27 article-title: Programming robosoccer agents by modeling human behavior publication-title: Expert Syst. Appl. doi: 10.1016/j.eswa.2007.12.033 – volume: 52 start-page: 2254 year: 2022 ident: ref_34 article-title: Masac-based confrontation game method of uav clusters publication-title: Sci. Sin. Inform. doi: 10.1360/SSI-2022-0303 – volume: 33 start-page: 1641 year: 2010 ident: ref_44 article-title: Air-combat strategy using approximate dynamic programming publication-title: J. Guid. Control Dyn. doi: 10.2514/1.46815 – volume: 21 start-page: 875 year: 2005 ident: ref_8 article-title: Randomized pursuit-evasion in a polygonal environment publication-title: IEEE Trans. Robot. doi: 10.1109/TRO.2005.851373 |
| SSID | ssj0002245691 |
| Score | 2.3048756 |
| Snippet | To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence... What are the main findings? * This study presents an enhanced algorithm that integrates the Rainbow module to improve the Multi-Agent Deep Deterministic Policy... What are the main findings? This study presents an enhanced algorithm that integrates the Rainbow module to improve the Multi-Agent Deep Deterministic Policy... |
| SourceID | doaj proquest gale crossref |
| SourceType | Open Website Aggregation Database Index Database |
| StartPage | 673 |
| SubjectTerms | Algorithms Analysis Behavior Cloning Collaboration Control algorithms Control tasks Convergence Decision making Deep learning Drone aircraft Efficiency Expected values Game theory Immune system MADDPG Missions Modules multi-agent Multi-agent systems multi-step TD update Multiagent systems Optimization prioritized experience replay rainbow Rainbows reinforcement learning Simulation Stability Strategy Teaching methods |
| SummonAdditionalLinks | – databaseName: Directory of Open Access Journals (Open Access) dbid: DOA link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELVQxcCCQIAoX_KAYIqaNo4_xkApDBR1KKib5dgxFEGKmgJ_nzsnRXRALKxxFFvv2b475e4dIadOpV46qcBz6_KIuZ6IgGYTec6c8dzAaCgUvhV3d3IyUaMfrb4wJ6yWB66B69gUZpHWciMZM-C8y0L0uHGSFSZOhMfbNxbqRzD1HERdwDFQ3VpUM4G4vuPmqH0PxhE7s6wYoaDV_9uNHMzMYItsNv4hzep1bZO1otwhn8v8ODorab_pihMNQyMputSXLSoKDigNFbVRhhVT9D57qOi0pMh76ehwiimvZUUvwHQ5_Fbzd4cO3nGADrN-f3RNs5fH2Xy6eHrdJePB1fjyJmo6JkQ24WIBgKc5t9I7q5RLlc1jxlCKnNtEWG-6PoUTnKvUcemZMbHjufC5jXMJMbLkyR5plQDTPqEucSpRqYTwjDEBMZLvFS7x0orcxdwWbXK2RFC_1boYGuIJhFqvQN0mF4jv90soZx0eAMm6IVn_RXKbnCM7Gg8dgGpNUzsA86B8lc6wzEWgbW2ToyWBujmNlQYXNZUcHB1-8B-rOSQbPewCHFL6jkhrMX8vjsm6_VhMq_lJ2Ihftzzkjw priority: 102 providerName: Directory of Open Access Journals |
| Title | Research on Decision-Making Strategies for Multi-Agent UAVs in Island Missions Based on Rainbow Fusion MADDPG Algorithm |
| URI | https://www.proquest.com/docview/3265861106 https://doaj.org/article/c5bec8cc6a844a1488e726ad84ea037f |
| Volume | 9 |
| WOSCitedRecordID | wos001602854900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: Directory of Open Access Journals (Open Access) customDbUrl: eissn: 2504-446X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002245691 issn: 2504-446X databaseCode: DOA dateStart: 20170101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2504-446X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002245691 issn: 2504-446X databaseCode: M~E dateStart: 20170101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: ProQuest advanced technologies & aerospace journals customDbUrl: eissn: 2504-446X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002245691 issn: 2504-446X databaseCode: P5Z dateStart: 20210101 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 2504-446X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002245691 issn: 2504-446X databaseCode: BENPR dateStart: 20210101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Publicly Available Content Database customDbUrl: eissn: 2504-446X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002245691 issn: 2504-446X databaseCode: PIMPY dateStart: 20210101 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3Nb9MwFLdg48CFgQBRGJUPCE7R0sSfJ5TSdSDRKkIDDS6WY8ejEqSj6ca_z3uuM7TDuHCNo8Ty7_l92O_9HiGvvOZBeaXBc5uIjPlCZgCzzYJg3gZhYTQWCn-Uy6U6O9N1OnDrU1rloBOjovZrh2fkR-BmcCXAWIm3F78y7BqFt6uphcZdso8sCdi6oebfrs9YCrzV05MdtWYJ0f2R3yADPphI7M9ywxRFxv7b9HI0NvOD_53mQ_IguZm02snFI3Kn7R6T30OaHV13dJaa62SL2I-KDjS1bU_Bj6WxMDersPCKfq6-9HTVURSfztPFCjNnu55OwQJ6_Fa6JKLzSxygi2o2q09o9eMcZrb9_vMJOZ0fn757n6XGC5krhdwCbrwRTgXvtPZcuyZnDBnNhSulC3YSOCiCRnMvVGDW5l40MjQubxSE2kqUT8leB-v8jFBfel1qriDKY0xCqBWK1pdBOdn4XLh2RF4PEJiLHb2GgbAEsTI3sBqRKQJ0_RKyYscH6825SZvMOA4SqZwTVjFmIdBTrSyE9Yq1Ni9lGJE3CK_BvQuL6mwqQYD_IAuWqbBaRqKJHpHDAV6TNnVv_mL7_N_DL8j9AtsEx5y_Q7K33Vy2L8k9d7Vd9Zsx2Z8eL-tP4xj-j6PEwrP6w6L--gcZbPYE |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9NAEB6VFAkuPASogQJ74HGy6tjrfRwQcgmhUZMoh4DKabXe9ZZI4JQ4peJH8R-ZceyiHuDWA1ev5cfOt_PYnZkP4IXXWVBeafTcBiLiPpERitlGQXBvg7A42hQKT-Rspk5O9HwHfnW1MJRW2enERlH7laM98gN0MzIl0FiJt2ffI2KNotPVjkJjC4vj8ucFhmz1m_EQ5fsySUbvF--OopZVIHKpkBv8qKwQTgXvtPaZdkXMObXrFi6VLthByBDlhc68UIFbG3tRyFC4uFAYRyqR4mNvwC4nrPdgdz6ezj9fbuokdIyoB9tenmmq4wO_ppb7aJOJEOaK7WsoAv5mCBrrNrr7n83LPbjTutEs3-L-PuyU1QO46NII2apiw5Y8KJo2fFusa8Nb1gz9dNYUHkc5FZaxj_mnmi0rRsuj8my6pMzgqmaHaOE9Pas9BGOjcxpg03w4nH9g-ddTnIjNl28PYXEdv_oIehWKdQ-YT71OdaYwiuVcYigZktKnQTlZ-Fi4sg-vOombs237EINhF0HDXIFGHw4JD5c3Udfv5sJqfWpaJWJchitOOSes4txiIKtKmQjrFS9tnMrQh9eEJkO6CSfV2bbEAt9DXb5MTtVAklyQPux3aDKt0qrNHyg9_vfwc7h1tJhOzGQ8O34CtxOiRG7yG_eht1mfl0_hpvuxWdbrZ-0CYWCuGXq_AeJQT5E |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9QwEB6VghAXHgLUhVJ84HGKNps4fhxQlTYsVO2u9lBQxcVy7LisBNmy2VLx0_h3nckmRT3ArQeucZSH5_M87Jn5AF55nQXllUbPbSQi7hMZoZhtFAT3NgiLo22h8JGcTtXJiZ5twO--FobSKnud2Cpqv3C0Rz5ENyNTAo2VGIYuLWJWjHfPfkTEIEUnrT2dxhoih9WvCwzfmncHBcr6dZKM3x_vf4w6hoHIpUKu8AOzUjgVvNPaZ9qVMefUulu4VLpgRyFDxJc680IFbm3sRSlD6eJSYUypRIqPvQW3JYaYlE04y75cbe8kdKCoR-uunmmq46FfUvN9tM5EDXPNCrZkAX8zCa2dGz_4j2foIdzvnGuWr1fDI9io6sdw0ScXskXNio5SKJq0LFysb85bNQy9d9aWI0c5lZuxT_nnhs1rRoum9mwyp3zhumF7aPc9Pas7GmPjcxpgk7woZh9Y_u0UJ2L19fsTOL6JX30KmzWKeAuYT71OdaYwtuVcYoAZksqnQTlZ-li4agBveumbs3VTEYPBGMHEXIPJAPYIG1c3US_w9sJieWo61WJchutQOSes4txieKsqmQjrFa9snMowgLeELEMaCyfV2a7wAt9Dvb9MTjVCkhyTAWz3yDKdKmvMH1g9-_fwS7iLeDNHB9PD53AvIZ7kNulxGzZXy_PqBdxxP1fzZrnTrhQG5oZxdwmzo1b0 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Research+on+Decision-Making+Strategies+for+Multi-Agent+UAVs+in+Island+Missions+Based+on+Rainbow+Fusion+MADDPG+Algorithm&rft.jtitle=Drones+%28Basel%29&rft.au=Yang%2C+Chaofan&rft.au=Zhang%2C+Bo&rft.au=Zhang%2C+Meng&rft.au=Wang%2C+Qi&rft.date=2025-10-01&rft.issn=2504-446X&rft.eissn=2504-446X&rft.volume=9&rft.issue=10&rft.spage=673&rft_id=info:doi/10.3390%2Fdrones9100673&rft.externalDBID=n%2Fa&rft.externalDocID=10_3390_drones9100673 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2504-446X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2504-446X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2504-446X&client=summon |