A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State Transition

The prevalent utilization of deterministic strategy algorithms in Multi-Agent Deep Reinforcement Learning (MADRL) for collaborative tasks has posed a significant challenge in achieving stable and high-performance cooperative behavior. Addressing the need for the balanced exploration and exploitation...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Algorithms Ročník 17; číslo 12; s. 579
Hlavní autoři: Sheng, Lei, Chen, Honghui, Chen, Xiliang
Médium: Journal Article
Jazyk:angličtina
Vydáno: Basel MDPI AG 01.12.2024
Témata:
ISSN:1999-4893, 1999-4893
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The prevalent utilization of deterministic strategy algorithms in Multi-Agent Deep Reinforcement Learning (MADRL) for collaborative tasks has posed a significant challenge in achieving stable and high-performance cooperative behavior. Addressing the need for the balanced exploration and exploitation of multi-agent ant robots within a partially observable continuous action space, this study introduces a multi-agent centralized strategy gradient algorithm grounded in a local state transition mechanism. In order to solve this challenge, the algorithm learns local state and local state-action representation from local observations and action values, thereby establishing a “local state transition” mechanism autonomously. As the input of the actor network, the automatically extracted local observation representation reduces the input state dimension, enhances the local state features closely related to the local state transition, and promotes the agent to use the local state features that affect the next observation state. To mitigate non-stationarity and reliability assignment issues in multi-agent environments, a centralized critic network evaluates the current joint strategy. The proposed algorithm, NST-FACMAC, is evaluated alongside other multi-agent deterministic strategy algorithms in a continuous control simulation environment using a multi-agent ant robot. The experimental results indicate accelerated convergence and higher average reward values in cooperative multi-agent ant simulation environments. Notably, in four simulated environments named Ant-v2 (2 × 4), Ant-v2 (2 × 4d), Ant-v2 (4 × 2), and Manyant (2 × 3), the algorithm demonstrates performance improvements of approximately 1.9%, 4.8%, 11.9%, and 36.1%, respectively, compared to the best baseline algorithm. These findings underscore the algorithm’s effectiveness in enhancing the stability of multi-agent ant robot control within dynamic environments.
AbstractList The prevalent utilization of deterministic strategy algorithms in Multi-Agent Deep Reinforcement Learning (MADRL) for collaborative tasks has posed a significant challenge in achieving stable and high-performance cooperative behavior. Addressing the need for the balanced exploration and exploitation of multi-agent ant robots within a partially observable continuous action space, this study introduces a multi-agent centralized strategy gradient algorithm grounded in a local state transition mechanism. In order to solve this challenge, the algorithm learns local state and local state-action representation from local observations and action values, thereby establishing a “local state transition” mechanism autonomously. As the input of the actor network, the automatically extracted local observation representation reduces the input state dimension, enhances the local state features closely related to the local state transition, and promotes the agent to use the local state features that affect the next observation state. To mitigate non-stationarity and reliability assignment issues in multi-agent environments, a centralized critic network evaluates the current joint strategy. The proposed algorithm, NST-FACMAC, is evaluated alongside other multi-agent deterministic strategy algorithms in a continuous control simulation environment using a multi-agent ant robot. The experimental results indicate accelerated convergence and higher average reward values in cooperative multi-agent ant simulation environments. Notably, in four simulated environments named Ant-v2 (2 × 4), Ant-v2 (2 × 4d), Ant-v2 (4 × 2), and Manyant (2 × 3), the algorithm demonstrates performance improvements of approximately 1.9%, 4.8%, 11.9%, and 36.1%, respectively, compared to the best baseline algorithm. These findings underscore the algorithm’s effectiveness in enhancing the stability of multi-agent ant robot control within dynamic environments.
Audience Academic
Author Sheng, Lei
Chen, Xiliang
Chen, Honghui
Author_xml – sequence: 1
  givenname: Lei
  surname: Sheng
  fullname: Sheng, Lei
– sequence: 2
  givenname: Honghui
  surname: Chen
  fullname: Chen, Honghui
– sequence: 3
  givenname: Xiliang
  surname: Chen
  fullname: Chen, Xiliang
BookMark eNptUcFu3CAQRVUqNUl76B9Y6qkHJwPY2BzdVZtG2ihSm54RC4PLyoYUvIf064u7UdJIFRIMw3tvhnln5CTEgIS8p3DBuYRLTTvKoO3kK3JKpZR100t-8k_8hpzlvAcQrRT0lLihujlMi6-HEcNSbcqW9OR_o62-l2jB8aG6Str69fUb-uBiMjivty3qFHwYq2EaY_LLz7n6pHMhxlC4hVrdJR2yX3wMb8lrp6eM7x7Pc_Ljy-e7zdd6e3t1vRm2teF9u9TdjjLRGuNaSntkjWyh7xpmemE7ax2n2tFOgxGIIDTqXjPWAjSc7ToLpuHn5Pqoa6Peq_vkZ50eVNRe_U3ENCqdFm8mVLJjQMEASmobsHxX6jFEt4MWHBWiaH04at2n-OuAeVH7eEihtK84XVvjrWDPqFEX0XU-ZWxm9tmooWdUAG8kL6iL_6DKsjh7Uyx0vuRfED4eCSbFnBO6p89QUKvT6slp_gfHwZof
Cites_doi 10.1523/JNEUROSCI.1327-21.2021
10.1016/j.procs.2017.05.431
10.3390/app11114948
10.1016/j.ins.2022.10.042
10.1109/TCYB.2020.2977661
10.1108/IR-11-2019-0240
10.1007/s43154-020-00039-w
10.1007/s10462-022-10299-x
10.1177/09544100241235824
10.1007/s10462-022-10335-w
10.1016/j.neucom.2020.05.097
10.1093/bib/bbab513
10.1016/j.apenergy.2018.11.002
10.1016/j.eswa.2020.113573
10.1038/nature14236
10.1038/nature16961
10.1016/j.robot.2021.103905
10.1109/IJCNN52387.2021.9533636
10.1109/TSMC.2024.3370186
ContentType Journal Article
Copyright COPYRIGHT 2024 MDPI AG
2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: COPYRIGHT 2024 MDPI AG
– notice: 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
3V.
7SC
7TB
7XB
8AL
8FD
8FE
8FG
8FK
ABJCF
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FR3
GNUQQ
HCIFZ
JQ2
K7-
KR7
L6V
L7M
L~C
L~D
M0N
M7S
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
Q9U
DOA
DOI 10.3390/a17120579
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
Mechanical & Transportation Engineering Abstracts
ProQuest Central (purchase pre-March 2016)
Computing Database (Alumni Edition)
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One
ProQuest Central Korea
Engineering Research Database
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
Civil Engineering Abstracts
ProQuest Engineering Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Computing Database
Engineering Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
ProQuest Central Basic
DOAJ Open Access Journals
DatabaseTitle CrossRef
Publicly Available Content Database
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
Mechanical & Transportation Engineering Abstracts
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies Database with Aerospace
Engineering Collection
Advanced Technologies & Aerospace Collection
Civil Engineering Abstracts
ProQuest Computing
Engineering Database
ProQuest Central Basic
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
Engineering Research Database
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
DatabaseTitleList CrossRef

Publicly Available Content Database

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1999-4893
ExternalDocumentID oai_doaj_org_article_972010c0e91d40d3be242eefb050f166
A821603493
10_3390_a17120579
GroupedDBID 23M
2WC
5VS
8FE
8FG
AADQD
AAFWJ
AAYXX
ABDBF
ABJCF
ABUWG
ACUHS
ADBBV
AFFHD
AFKRA
AFPKN
AFZYC
ALMA_UNASSIGNED_HOLDINGS
AMVHM
ARAPS
AZQEC
BCNDV
BENPR
BGLVJ
BPHCQ
CCPQU
CITATION
DWQXO
E3Z
ESX
GNUQQ
GROUPED_DOAJ
HCIFZ
IAO
ICD
ITC
J9A
K6V
K7-
KQ8
L6V
M7S
MODMG
M~E
OK1
OVT
P2P
PHGZM
PHGZT
PIMPY
PQGLB
PQQKQ
PROAC
PTHSS
TR2
TUS
3V.
7SC
7TB
7XB
8AL
8FD
8FK
FR3
JQ2
KR7
L7M
L~C
L~D
M0N
P62
PKEHL
PQEST
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c385t-7b1265ccf5118e249508742c86d7ddf31af17a0c6ee06aea8a22500432b7d0c43
IEDL.DBID M7S
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001384067100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1999-4893
IngestDate Tue Oct 14 14:51:09 EDT 2025
Fri Jul 25 12:14:29 EDT 2025
Tue Nov 11 10:49:16 EST 2025
Tue Nov 04 18:14:08 EST 2025
Sat Nov 29 07:11:08 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c385t-7b1265ccf5118e249508742c86d7ddf31af17a0c6ee06aea8a22500432b7d0c43
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://www.proquest.com/docview/3149503562?pq-origsite=%requestingapplication%
PQID 3149503562
PQPubID 2032439
ParticipantIDs doaj_primary_oai_doaj_org_article_972010c0e91d40d3be242eefb050f166
proquest_journals_3149503562
gale_infotracmisc_A821603493
gale_infotracacademiconefile_A821603493
crossref_primary_10_3390_a17120579
PublicationCentury 2000
PublicationDate 2024-12-01
PublicationDateYYYYMMDD 2024-12-01
PublicationDate_xml – month: 12
  year: 2024
  text: 2024-12-01
  day: 01
PublicationDecade 2020
PublicationPlace Basel
PublicationPlace_xml – name: Basel
PublicationTitle Algorithms
PublicationYear 2024
Publisher MDPI AG
Publisher_xml – name: MDPI AG
References ref_50
Nagy (ref_5) 2019; 235
Peng (ref_22) 2021; 34
Li (ref_36) 2020; 51
Ji (ref_6) 2024; 238
ref_10
ref_52
Brunec (ref_47) 2022; 42
ref_51
Gans (ref_13) 2021; 2
(ref_33) 2017; 109
Rashid (ref_29) 2020; 33
ref_18
Rashid (ref_14) 2020; 21
Phan (ref_19) 2021; 34
ref_25
Silver (ref_2) 2016; 529
ref_24
Zhang (ref_32) 2024; 54
Cha (ref_37) 2021; 34
Wang (ref_17) 2021; 34
ref_28
ref_27
Liu (ref_34) 2022; 35
Ghassemi (ref_12) 2022; 147
Park (ref_8) 2020; 158
Lin (ref_23) 2021; 34
Shen (ref_31) 2022; 35
Hu (ref_7) 2024; 2
ref_35
Lowe (ref_20) 2017; 30
Volodymyr (ref_3) 2015; 518
Shi (ref_11) 2020; 47
Guo (ref_1) 2014; 4
ref_39
Wong (ref_16) 2023; 56
Rastogi (ref_38) 2018; 12
ref_46
ref_45
ref_44
ref_43
ref_42
(ref_40) 2022; 1–6
ref_41
Gorsane (ref_26) 2022; 35
ref_49
ref_48
ref_9
Ning (ref_15) 2024; 3
Du (ref_30) 2022; 615
Plaat (ref_53) 2023; 56
ref_4
Zhang (ref_21) 2020; 411
References_xml – volume: 42
  start-page: 299
  year: 2022
  ident: ref_47
  article-title: Predictive representations in hippocampal and prefrontal hierarchies
  publication-title: J. Neurosci.
  doi: 10.1523/JNEUROSCI.1327-21.2021
– ident: ref_9
– ident: ref_49
– volume: 109
  start-page: 1146
  year: 2017
  ident: ref_33
  article-title: An adaptive implementation of ε-greedy in reinforcement learning
  publication-title: Procedia Comput. Sci.
  doi: 10.1016/j.procs.2017.05.431
– ident: ref_51
– ident: ref_39
– ident: ref_27
  doi: 10.3390/app11114948
– volume: 30
  start-page: 1
  year: 2017
  ident: ref_20
  article-title: Multi-agent actor-critic for mixed cooperative-competitive environments
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 34
  start-page: 29142
  year: 2021
  ident: ref_17
  article-title: Towards understanding cooperative multi-agent q-learning with value factorization
  publication-title: Adv. Neural Inf. Process. Syst.
– ident: ref_42
– ident: ref_35
– ident: ref_4
– volume: 21
  start-page: 1
  year: 2020
  ident: ref_14
  article-title: Monotonic value function factorisation for deep multi-agent reinforcement learning
  publication-title: J. Mach. Learn. Res.
– ident: ref_52
– volume: 12
  start-page: 1
  year: 2018
  ident: ref_38
  article-title: Is Q-learning provably efficient?
  publication-title: Adv. Neural Inf. Process. Syst.
– ident: ref_10
– volume: 615
  start-page: 191
  year: 2022
  ident: ref_30
  article-title: Value function factorization with dynamic weighting for deep multi-agent reinforcement learning
  publication-title: Inf. Sci.
  doi: 10.1016/j.ins.2022.10.042
– ident: ref_41
– volume: 51
  start-page: 3103
  year: 2020
  ident: ref_36
  article-title: Deep reinforcement learning for multi-objective optimization
  publication-title: IEEE Trans. Cybern.
  doi: 10.1109/TCYB.2020.2977661
– volume: 47
  start-page: 335
  year: 2020
  ident: ref_11
  article-title: Deep reinforcement learning-based attitude motion control for humanoid robots with stability constraints
  publication-title: Ind. Robot. Int. J. Robot. Res. Appl.
  doi: 10.1108/IR-11-2019-0240
– ident: ref_45
– volume: 2
  start-page: 105
  year: 2021
  ident: ref_13
  article-title: Cooperative multirobot systems for military applications
  publication-title: Curr. Robot. Rep.
  doi: 10.1007/s43154-020-00039-w
– volume: 56
  start-page: 5023
  year: 2023
  ident: ref_16
  article-title: Deep multiagent reinforcement learning: Challenges and directions
  publication-title: Artif. Intell. Rev.
  doi: 10.1007/s10462-022-10299-x
– volume: 33
  start-page: 10199
  year: 2020
  ident: ref_29
  article-title: Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 238
  start-page: 711
  year: 2024
  ident: ref_6
  article-title: Single-lever control method design based on power management system and deep reinforcement learning for turboprop engines
  publication-title: Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng.
  doi: 10.1177/09544100241235824
– volume: 3
  start-page: 73
  year: 2024
  ident: ref_15
  article-title: A survey on multi-agent reinforcement learning and its application
  publication-title: J. Autom. Intell.
– volume: 56
  start-page: 9541
  year: 2023
  ident: ref_53
  article-title: High-accuracy model-based reinforcement learning, a survey
  publication-title: Artif. Intell. Rev.
  doi: 10.1007/s10462-022-10335-w
– volume: 411
  start-page: 206
  year: 2020
  ident: ref_21
  article-title: A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2020.05.097
– ident: ref_28
– ident: ref_48
  doi: 10.1093/bib/bbab513
– volume: 1–6
  start-page: 84
  year: 2022
  ident: ref_40
  article-title: Intrinsic motivation based on feature extractor distillation
  publication-title: Kognice Umelý Zivot XX
– ident: ref_24
– volume: 235
  start-page: 1072
  year: 2019
  ident: ref_5
  article-title: Reinforcement learning for demand response: A review of algorithms and modeling techniques
  publication-title: Appl. Energy
  doi: 10.1016/j.apenergy.2018.11.002
– volume: 4
  start-page: 3338
  year: 2014
  ident: ref_1
  article-title: Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning
  publication-title: Int. Conf. Neural Inf. Process. Syst.
– ident: ref_44
– volume: 158
  start-page: 113573
  year: 2020
  ident: ref_8
  article-title: An intelligent financial portfolio trading strategy using deep Q-learning
  publication-title: Expert Syst. Appl.
  doi: 10.1016/j.eswa.2020.113573
– ident: ref_25
– ident: ref_50
– volume: 34
  start-page: 22405
  year: 2021
  ident: ref_37
  article-title: Swad: Domain generalization by seeking flat minima
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 518
  start-page: 529
  year: 2015
  ident: ref_3
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
  doi: 10.1038/nature14236
– ident: ref_46
– volume: 529
  start-page: 484
  year: 2016
  ident: ref_2
  article-title: Mastering the game of Go with deep neural networks and tree search
  publication-title: Nature
  doi: 10.1038/nature16961
– volume: 35
  start-page: 5471
  year: 2022
  ident: ref_31
  article-title: ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 35
  start-page: 5093
  year: 2022
  ident: ref_34
  article-title: Understanding deep neural function approximation in reinforcement learning via ϵ-greedy exploration
  publication-title: Adv. Neural Inf. Process. Syst.
– ident: ref_43
– volume: 2
  start-page: 2
  year: 2024
  ident: ref_7
  article-title: Towards risk-aware real-time security constrained economic dispatch: A tailored deep reinforcement learning approach
  publication-title: IEEE Trans. Power Syst. A Publ. Power Eng. Soc.
– volume: 147
  start-page: 103905
  year: 2022
  ident: ref_12
  article-title: Multi-robot task allocation in disaster response: Addressing dynamic tasks with deadlines and robots with range and payload constraints
  publication-title: Robot. Auton. Syst.
  doi: 10.1016/j.robot.2021.103905
– volume: 34
  start-page: 15230
  year: 2021
  ident: ref_23
  article-title: Learning to ground multi-agent communication with autoencoders
  publication-title: Adv. Neural Inf. Process. Syst.
– ident: ref_18
  doi: 10.1109/IJCNN52387.2021.9533636
– volume: 34
  start-page: 24018
  year: 2021
  ident: ref_19
  article-title: Vast: Value function factorization with variable agent sub-teams
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 35
  start-page: 5510
  year: 2022
  ident: ref_26
  article-title: Towards a standardised performance evaluation protocol for cooperative marl
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 54
  start-page: 6550
  year: 2024
  ident: ref_32
  article-title: SQIX: QMIX Algorithm Activated by General Softmax Operator for Cooperative Multiagent Reinforcement Learning
  publication-title: IEEE Trans. Syst. Man Cybern. Syst.
  doi: 10.1109/TSMC.2024.3370186
– volume: 34
  start-page: 12208
  year: 2021
  ident: ref_22
  article-title: Facmac: Factored multi-agent centralised policy gradients
  publication-title: Adv. Neural Inf. Process. Syst.
SSID ssj0065961
Score 2.3232658
Snippet The prevalent utilization of deterministic strategy algorithms in Multi-Agent Deep Reinforcement Learning (MADRL) for collaborative tasks has posed a...
SourceID doaj
proquest
gale
crossref
SourceType Open Website
Aggregation Database
Index Database
StartPage 579
SubjectTerms Actors
Actresses
Algorithms
automatic representation
Automation
Collaborative learning
Control simulation
Control stability
Cooperation
Cooperative control
Critics
Data mining
Deep learning
deterministic strategy gradient algorithm
Efficiency
Exploitation
exploration
Learning strategies
Machine learning
multi-agent reinforcement learning
Multiagent systems
Network reliability
Representations
Robot control
Robots
state transition
SummonAdditionalLinks – databaseName: DOAJ Open Access Journals
  dbid: DOA
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA4yPHjxt1idEkTwVJY0bZMcO3F6GiIKu4U0SedAN-mqoH-9SZoOdxAvXtscwvf68r6Xvvc9AC4ZksSyZBWXipk4pUTFnGt3HUa5wlwnaVb6YRN0PGaTCb__MerL1YS18sAtcANO3f9ahQzHOkWalMYGFWOqEmWowrkX20aUd8lUewbnGc9xqyNEbFI_kJjixLVdrkUfL9L_21Hs48toF2wHYgiLdkN7YMPM98FON3QBBh88AFUBfdNsXLimKBhuZ2dfRsOgNPsJb2tfyNXAB-OFUZW_A4RBS3UKi5fpop41z69waIOYhos59KQT-sjli7gOwdPo5vH6Lg7DEmJFWNbEtMRJnilVuZTBuInSiNm0V7FcU60rgmWFqUQqNwbl0kgmrSd7Qb6SaqRScgR688XcHANYZYYkyjITbcmSJFpmiZQZxynVLE2rJAIXHYjirdXEEDaXcEiLFdIRGDp4VwucjLV_YI0rgnHFX8aNwJUzjnBYWQiVDD0Ddp9OtkoULHFjslNOItBfW2mdRK2_7swrgpMuBXHZISKWAZ78x2ZPwVZiGU9b69IHvaZ-N2dgU300s2V97r_Pb7Mg6JA
  priority: 102
  providerName: Directory of Open Access Journals
Title A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State Transition
URI https://www.proquest.com/docview/3149503562
https://doaj.org/article/972010c0e91d40d3be242eefb050f166
Volume 17
WOSCitedRecordID wos001384067100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1999-4893
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0065961
  issn: 1999-4893
  databaseCode: DOA
  dateStart: 20080101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources (ISSN International Center)
  customDbUrl:
  eissn: 1999-4893
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0065961
  issn: 1999-4893
  databaseCode: M~E
  dateStart: 20080101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: Computer Science Database (ProQuest)
  customDbUrl:
  eissn: 1999-4893
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0065961
  issn: 1999-4893
  databaseCode: K7-
  dateStart: 20080301
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Engineering Database
  customDbUrl:
  eissn: 1999-4893
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0065961
  issn: 1999-4893
  databaseCode: M7S
  dateStart: 20080301
  isFulltext: true
  titleUrlDefault: http://search.proquest.com
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central Database Suite (ProQuest)
  customDbUrl:
  eissn: 1999-4893
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0065961
  issn: 1999-4893
  databaseCode: BENPR
  dateStart: 20080301
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database
  customDbUrl:
  eissn: 1999-4893
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0065961
  issn: 1999-4893
  databaseCode: PIMPY
  dateStart: 20080301
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LbxMxEB5By4EL5SlSSmQhJE5W7fXu2j6hDUoBIaKogFROltf2ppUgaZMFCQ78djyOtygHuHDxYe3DyON57sw3AM8VsyJ6yY62TgVaSuGo1h7TYVI7rn1RVm0aNiFnM3V2puc54bbJZZWDTkyK2q8c5siPBbryTERz_fLyiuLUKPy7mkdo3IR9REngqXTvw6CJ60rXfIsmJGJof2y55AU2X-7YoATV_zeFnKzMycH_0ncX7mT_kjTbB3EPboTlfTgYZjeQLMoPoGtI6r2lDfZWkZzkvfgZPMmAtT_I63WqB-vJaUj4qi6lEkmGZF2Q5ssiUtCffyWTaAs9WS1J8l1JMoCpFuwhfDqZfnz1huaZC9QJVfVUtryoK-c6jDwCDqZmKkbPTtVeet8JbjsuLXN1CKy2wSobFULC9WulZ64Uj2BvuVqGx0C6KojCRQfHR5_LCm-rwtpK81J6VZZdMYJnAxfM5RZaw8SQBFllrlk1ggny5_oAomGnD6v1wmThMlriP33Hgua-ZF60kfAihK5lFet4XY_gBXLX4F3FK3Q2tx5EOhH9yjSqwGnbpRYjONo5GWXN7W4PzDdZ1jfmD-cP_739BG4X0SXaFsMcwV6__haewi33vb_YrMewP5nO5qfjlBWI6ztJx-k54_prGvfnb9_PP_8GuuL9TQ
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9QwEB6VggQXyqtioYCFQJyi-pHE9gGhFCittqwQKlJvxrGdpRLdbXcDqPwofiMeJynaA9x64BpHkR-fZz47M_MBPFPUisiSXVY7FbJcCpdp7fE6TGrHtOd5USexCTmZqKMj_WENfg25MBhWOdjEZKj93OEd-bZAKk9FdNevTs8yVI3Cv6uDhEYHi3E4_xGPbMuX-2_i-j7nfPft4eu9rFcVyJxQRZvJmvGycK5Bbh1QepmqeD50qvTS-0Yw2zBpqStDoKUNVtkI-VS5rpaeulzE716Bq7lQEvfVWGaD5S8LXbKuepEQmm5bJhnHZM8Vn5ekAf7mAJJX29343-bjFtzs-TOpOsDfhrUwuwMbgzYF6U3VXWgqknKLswpzx0h_iX38M3jSF-Q9J-8WKd6tJR9Dqh_r0lUp6UvOTkn1dRpH3H45ITvR13syn5HEzUly8CnW7R58upTRbsL6bD4L94E0RRDcRQLnI6e0wtuCW1tolkuv8rzhI3g6rLo57UqHmHjkQmiYC2iMYAfxcPECVvtOD-aLqemNh9ESYxYcDZr5nHpRx47zEJqaFrRhZTmCF4gmg3MVp9DZPrUi9hOre5lKcVQTz7UYwdbKm9GWuNXmAWymt2VL8wdpD_7d_ASu7x2-PzAH-5PxQ7jBI_3rAn-2YL1dfAuP4Jr73h4vF4_TtiHw-bJx-Rspa1Km
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Nb9QwEB2VLUJcKJ9ioYCFQJyi9UcS2weEUsrCqrBaIZDKyTi2s61UdstuAJWfxq_DdpyiPcCtB66xFdnO88yzM_MG4InAmnmWbLLaCJflnJlMShuuw7g0RFqaF3UsNsGnU3F4KGdb8KvPhQlhlb1NjIbaLk24Ix-xQOUx8-561KSwiNn--MXp1yxUkAp_WvtyGh1EDtzZD398Wz-f7Ptv_ZTS8asPL99kqcJAZpgo2ozXhJaFMU3g2S6UYcbCnxWNKC23tmFEN4RrbErncKmdFtrDP6rY1dxikzP_3kuw7Sl5TgewPZu8m33q_UBZyJJ0WkaMSTzShBMaUj83PGAsFPA3dxB93Hjnf16d63AtMWtUdVvhBmy5xU3Y6atWoGTEbkFToZh1nFUhqwyl6-3jn86iJNV7hl6vYiRci967qCxr4iUqSmK0c1SdzP2M26MvaM-zAIuWCxRZO4quP0bB3YaPFzLbOzBYLBfuLqCmcIwaT-2sZ5uaWV1QrQtJcm5Fnjd0CI97BKjTTlRE-cNYgIk6h8kQ9gI2zjsEHfD4YLmaq2RWlOQhmsFgJ4nNsWW1Hzh1rqlxgRtSlkN4FpClwlr5JTQ6JV34cQbdL1UJGuqM55INYXejp7cyZrO5B55KVm6t_qDu3r-bH8EVD0f1djI9uA9XqeeFXUTQLgza1Tf3AC6b7-3xevUw7SEEny8amL8B_5BdJw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Multi-Agent+Centralized+Strategy+Gradient+Reinforcement+Learning+Algorithm+Based+on+State+Transition&rft.jtitle=Algorithms&rft.au=Sheng%2C+Lei&rft.au=Chen%2C+Honghui&rft.au=Chen%2C+Xiliang&rft.date=2024-12-01&rft.pub=MDPI+AG&rft.issn=1999-4893&rft.eissn=1999-4893&rft.volume=17&rft.issue=12&rft_id=info:doi/10.3390%2Fa17120579&rft.externalDocID=A821603493
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1999-4893&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1999-4893&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1999-4893&client=summon