Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multiagent Reinforcement Learning
This article studies the networked multiagent reinforcement learning problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer from poor expression due to linear function approximation, we propose...
Uloženo v:
| Vydáno v: | IEEE transactions on automatic control Ročník 70; číslo 11; s. 7109 - 7124 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
01.11.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 0018-9286, 1558-2523 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | This article studies the networked multiagent reinforcement learning problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer from poor expression due to linear function approximation, we propose a distributed neural policy gradient algorithm that features two innovatively designed neural networks, specifically for the approximate <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-functions and policy functions of agents. This distributed neural policy gradient algorithm consists of two key components: the distributed critic step and the decentralized actor step. In the distributed critic step, agents receive the approximate <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-function parameters from their neighboring agents via a time-varying communication networks to collaboratively evaluate the joint policy. In contrast, in the decentralized actor step, each agent updates its local policy parameter solely based on its own approximate <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-function. In the convergence analysis, we first establish the global convergence of agents for the joint policy evaluation in the distributed critic step. Subsequently, we rigorously demonstrate the global convergence of the overall distributed neural policy gradient algorithm with respect to the objective function. Finally, the effectiveness of the proposed algorithm is demonstrated by comparing it with a centralized algorithm through simulation in the robot path planning environment. |
|---|---|
| AbstractList | This article studies the networked multiagent reinforcement learning problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer from poor expression due to linear function approximation, we propose a distributed neural policy gradient algorithm that features two innovatively designed neural networks, specifically for the approximate [Formula Omitted]-functions and policy functions of agents. This distributed neural policy gradient algorithm consists of two key components: the distributed critic step and the decentralized actor step. In the distributed critic step, agents receive the approximate [Formula Omitted]-function parameters from their neighboring agents via a time-varying communication networks to collaboratively evaluate the joint policy. In contrast, in the decentralized actor step, each agent updates its local policy parameter solely based on its own approximate [Formula Omitted]-function. In the convergence analysis, we first establish the global convergence of agents for the joint policy evaluation in the distributed critic step. Subsequently, we rigorously demonstrate the global convergence of the overall distributed neural policy gradient algorithm with respect to the objective function. Finally, the effectiveness of the proposed algorithm is demonstrated by comparing it with a centralized algorithm through simulation in the robot path planning environment. This article studies the networked multiagent reinforcement learning problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer from poor expression due to linear function approximation, we propose a distributed neural policy gradient algorithm that features two innovatively designed neural networks, specifically for the approximate <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-functions and policy functions of agents. This distributed neural policy gradient algorithm consists of two key components: the distributed critic step and the decentralized actor step. In the distributed critic step, agents receive the approximate <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-function parameters from their neighboring agents via a time-varying communication networks to collaboratively evaluate the joint policy. In contrast, in the decentralized actor step, each agent updates its local policy parameter solely based on its own approximate <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-function. In the convergence analysis, we first establish the global convergence of agents for the joint policy evaluation in the distributed critic step. Subsequently, we rigorously demonstrate the global convergence of the overall distributed neural policy gradient algorithm with respect to the objective function. Finally, the effectiveness of the proposed algorithm is demonstrated by comparing it with a centralized algorithm through simulation in the robot path planning environment. |
| Author | Mo, Yuanqiu Ren, Wei Yu, Wenwu Dai, Pengcheng |
| Author_xml | – sequence: 1 givenname: Pengcheng orcidid: 0000-0003-1455-7216 surname: Dai fullname: Dai, Pengcheng email: Jldaipc@163.com organization: School of Mathematics, Southeast University, Nanjing, China – sequence: 2 givenname: Yuanqiu orcidid: 0000-0002-0500-5201 surname: Mo fullname: Mo, Yuanqiu email: yuanqiumo@seu.edu.cn organization: School of Mathematics, Southeast University, Nanjing, China – sequence: 3 givenname: Wenwu orcidid: 0000-0003-0301-9180 surname: Yu fullname: Yu, Wenwu email: wwyu@seu.edu.cn organization: School of Mathematics, Frontiers Science Center for Mobile Information Communication and Security, Southeast University, Nanjing, China – sequence: 4 givenname: Wei orcidid: 0000-0002-2818-9752 surname: Ren fullname: Ren, Wei email: ren@ece.ucr.edu organization: Department of Electrical and Computer Engineering, University of California, Riverside, CA, USA |
| BookMark | eNpFkEtLw0AUhQdRsK3uXbgIuE69M5PJY1miVqE-kLoOk8lNnJrO1Emi9N87oQVXlwPfdy6cKTk11iAhVxTmlEJ2u17kcwZMzLlIAGJxQiZUiDRkgvFTMgGgaZixND4n067b-BhHEZ0Qd6e73uly6LEKXnBwsg3ebKvVPlg6WWk0fbBoG-t0_7kNauuCZWtLD-XW_KBr0CgMbO3V_te6L1_yPLS9ls0ovqM2XlG4HdMKpTPaNBfkrJZth5fHOyMfD_fr_DFcvS6f8sUqVCxK-rCqKpZiIlDSipY1x0rxMqsljZRMa0zjpKpRcQCPA1MKUpkpyBIlIvB8yWfk5tC7c_Z7wK4vNnZwxr8sOItjkYkoFp6CA6Wc7TqHdbFzeivdvqBQjMsWftliXLY4LuuV64OiEfEfpwBcxBn_A0YeeS0 |
| CODEN | IETAA9 |
| Cites_doi | 10.1109/JSAC.2019.2933973 10.1016/B978-1-55860-307-3.50049-6 10.1007/978-3-319-91578-4 10.1109/TNNLS.2021.3139138 10.1109/TWC.2019.2933417 10.1109/TNN.1998.712192 10.1109/TCYB.2021.3082639 10.1109/TII.2019.2933443 10.1109/CDC40024.2019.9029257 10.1016/j.automatica.2016.12.020 10.1109/TNSE.2020.3018871 10.1016/j.ifacol.2020.12.2021 10.1109/TAC.2025.3570065 10.1109/TCYB.2020.3015811 10.1609/aaai.v32i1.11794 10.1287/moor.2023.1370 10.1016/j.neucom.2007.11.026 10.1145/3219819.3220096 10.1109/TITS.2019.2901791 10.1109/JPROC.2018.2817461 10.1109/TAC.2010.2041686 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 7TB 8FD FR3 JQ2 L7M L~C L~D |
| DOI | 10.1109/TAC.2025.3570065 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Technology Research Database Engineering Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1558-2523 |
| EndPage | 7124 |
| ExternalDocumentID | 10_1109_TAC_2025_3570065 11003569 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 62233004; 62303112 funderid: 10.13039/501100001809 – fundername: Natural Science Foundation of Jiangsu Province grantid: BK20230826 funderid: 10.13039/501100004608 – fundername: National Science and Technology Major Project of China grantid: 2022ZD0120001; 2022ZD0120002 – fundername: Jiangsu Provincial Scientific Research Center of Applied Mathematics grantid: BK20233002 |
| GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ H~9 IAAWW IBMZZ ICLAB IDIHD IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RNS TAE TN5 VH1 VJK ~02 AAYXX CITATION 7SC 7SP 7TB 8FD FR3 JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c247t-ddd28e75ea1d1bf3edc3b9fa14ca8fe867dfec300c2402cc08a9c097c5401bfb3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001605046600045&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0018-9286 |
| IngestDate | Thu Oct 30 15:56:50 EDT 2025 Sat Nov 29 06:59:57 EST 2025 Wed Nov 05 07:07:51 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 11 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c247t-ddd28e75ea1d1bf3edc3b9fa14ca8fe867dfec300c2402cc08a9c097c5401bfb3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-0500-5201 0000-0002-2818-9752 0000-0003-0301-9180 0000-0003-1455-7216 |
| PQID | 3266595465 |
| PQPubID | 85475 |
| PageCount | 16 |
| ParticipantIDs | ieee_primary_11003569 proquest_journals_3266595465 crossref_primary_10_1109_TAC_2025_3570065 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-11-01 |
| PublicationDateYYYYMMDD | 2025-11-01 |
| PublicationDate_xml | – month: 11 year: 2025 text: 2025-11-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on automatic control |
| PublicationTitleAbbrev | TAC |
| PublicationYear | 2025 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | Zhou (ref30) 2023 ref14 ref11 ref10 ref32 ref2 ref1 ref17 Wagner (ref25) 2013; 26 ref24 ref23 ref20 Zhang (ref16) 2018 ref21 Sutton (ref22) 1999; 12 Doan (ref19) 2019 ref28 Xu (ref26) 2020; 33 ref27 Rashid (ref12) 2018 Wang (ref15) 2020 Wang (ref18) 2019 ref8 ref7 ref9 ref4 ref3 Nesterov (ref29) 2018 Mei (ref31) 2020 ref6 ref5 Wang (ref13) 2020 |
| References_xml | – ident: ref8 doi: 10.1109/JSAC.2019.2933973 – ident: ref11 doi: 10.1016/B978-1-55860-307-3.50049-6 – volume-title: Introductory Lectures on Convex Optimization year: 2018 ident: ref29 doi: 10.1007/978-3-319-91578-4 – ident: ref17 doi: 10.1109/TNNLS.2021.3139138 – ident: ref9 doi: 10.1109/TWC.2019.2933417 – ident: ref1 doi: 10.1109/TNN.1998.712192 – ident: ref3 doi: 10.1109/TCYB.2021.3082639 – ident: ref2 doi: 10.1109/TII.2019.2933443 – start-page: 2563 volume-title: Proc. Uncertainty Artif. Intell. Conf. year: 2023 ident: ref30 article-title: Convergence rates for localized actor-critic in networked Markov potential games – year: 2020 ident: ref13 article-title: QPLEX: Duplex dueling multi-agent Q-learning – ident: ref20 doi: 10.1109/CDC40024.2019.9029257 – ident: ref6 doi: 10.1016/j.automatica.2016.12.020 – ident: ref7 doi: 10.1109/TNSE.2020.3018871 – ident: ref21 doi: 10.1016/j.ifacol.2020.12.2021 – ident: ref27 doi: 10.1109/TAC.2025.3570065 – ident: ref5 doi: 10.1109/TCYB.2020.3015811 – volume: 12 start-page: 1057 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 1999 ident: ref22 article-title: Policy gradient methods for reinforcement learning with function approximation – ident: ref14 doi: 10.1609/aaai.v32i1.11794 – start-page: 6846 volume-title: Proc. Int. Conf. Mach. Learn. year: 2018 ident: ref12 article-title: Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning – ident: ref23 doi: 10.1287/moor.2023.1370 – ident: ref24 doi: 10.1016/j.neucom.2007.11.026 – ident: ref10 doi: 10.1145/3219819.3220096 – start-page: 6820 volume-title: Proc. Int. Conf. Mach. Learn. year: 2020 ident: ref31 article-title: On the global convergence rates of softmax policy gradient methods – volume: 26 start-page: 3101 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2013 ident: ref25 article-title: Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result – start-page: 1626 volume-title: Proc. Int. Conf. Mach. Learn. year: 2019 ident: ref19 article-title: Finite-time analysis of distributed TD(0) with linear function approximation on multi-agent reinforcement learning – ident: ref4 doi: 10.1109/TITS.2019.2901791 – volume: 33 start-page: 4358 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2020 ident: ref26 article-title: Improving sample complexity bounds for (natural) actor-critic algorithms – year: 2020 ident: ref15 article-title: Off-policy multi-agent decomposed policy gradients – ident: ref32 doi: 10.1109/JPROC.2018.2817461 – ident: ref28 doi: 10.1109/TAC.2010.2041686 – start-page: 5872 volume-title: Proc. Int. Conf. Mach. Learn. year: 2018 ident: ref16 article-title: Fully decentralized multi-agent reinforcement learning with networked agents – year: 2019 ident: ref18 article-title: Neural policy gradient methods: Global optimality and rates of convergence |
| SSID | ssj0016441 |
| Score | 2.4914813 |
| Snippet | This article studies the networked multiagent reinforcement learning problem, where the objective of agents is to collaboratively maximize the discounted... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 7109 |
| SubjectTerms | Algorithms Approximation algorithms Communication networks Convergence Distributed neural policy gradient algorithm Function approximation global convergence Linear functions Linear programming Machine learning Multiagent systems networked multiagent reinforcement learning (NMARL) Neural networks Parameters Reinforcement learning Reviews Scalability Training Vectors |
| Title | Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multiagent Reinforcement Learning |
| URI | https://ieeexplore.ieee.org/document/11003569 https://www.proquest.com/docview/3266595465 |
| Volume | 70 |
| WOSCitedRecordID | wos001605046600045&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Xplore customDbUrl: eissn: 1558-2523 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0016441 issn: 0018-9286 databaseCode: RIE dateStart: 19630101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxQADzyIKBXlgYUibOA_bY1UoDKhCqEjdosQ-l0rQojTl9-Nz0lIJMbBlsK3ozvbd-e77jpCbKNaQWNPhiVwyLwKjvYzp2DM8D3MAIXTueGaf-GgkJhP5XIPVHRYGAFzxGXTx0-Xy9UKt8Kmsh_RmYZzIBmlwziuw1iZlgIa9unbtCWZik5P0ZW_cH9hIkMXdENnc0Y5s2SDXVOXXTezMy_Dwnz92RA5qP5L2K8Ufkx2Yn5D9LXbBU1LcISku9rMCTZGEw46vaIDpQ-EqvUraf58uiln59kGt80qrBgB0gJXoDpQJdGHsVFcpbhdxaN0MwVj0BRzlqnKvi7RmaZ22yOvwfjx49OoWC55iES89rTUTwGPIAh3kJgStwlyaLIhUJgyIhGsDKvR9hVkYpXyRSeVLrqyjZ8fn4RlpzhdzOCdUZHiew8jGkDpiTOXM6jlgRkeBNEnC2uR2LfT0s2LSSF0E4svUKihFBaW1gtqkhUL-GVfLt006azWl9VlbptYBRVLEKIkv_ph2SfZw9QpC2CHNsljBFdlVX-VsWVy7bfQN99HIPA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8QwEB58gXrwLa7PHLx4qLZp2ibHZX3iuois4K20yUQF3ZVa_f1m0voA8eCth6Qt-ZLMJDPfNwD7IjGYOtMRyFLxQKA1QcFNEtisjEtEKU3pdWb72WAg7-7UdUtW91wYRPTJZ3hIjz6Wb8b6ja7KjkjeLE5SNQnTiRA8auhaX0EDMu3NxuvWMJdfUclQHQ27PXcW5MlhTHruZEl-WCFfVuXXXuwNzOniP39tCRZaT5J1G-iXYQJHKzD_Q19wFapjksWlilZoGMlwuPaNEDA7q3yuV826T_fj6rF-eGbOfWVNCQDWo1x0T8tENrauq88Vdy_xfN2C6FjsBr3oqvb3i6zVab1fg9vTk2HvPGiLLASai6wOjDFcYpZgEZmotDEaHZfKFpHQhbQo08xY1HEYaorDaB3KQulQZdq5eq59Ga_D1Gg8wg1gsqAVHTtgQiM41yV3SEfcGhEpm6a8Awefg56_NFoauT-DhCp3AOUEUN4C1IE1GuTvdu34dmD7E6a8XW2vuXNBSRZRpMnmH932YPZ8eNXP-xeDyy2Yoy81hMJtmKqrN9yBGf1eP75Wu35KfQCK88uD |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Distributed+Neural+Policy+Gradient+Algorithm+for+Global+Convergence+of+Networked+Multiagent+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+automatic+control&rft.au=Dai%2C+Pengcheng&rft.au=Mo%2C+Yuanqiu&rft.au=Yu%2C+Wenwu&rft.au=Ren%2C+Wei&rft.date=2025-11-01&rft.issn=0018-9286&rft.eissn=1558-2523&rft.volume=70&rft.issue=11&rft.spage=7109&rft.epage=7124&rft_id=info:doi/10.1109%2FTAC.2025.3570065&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TAC_2025_3570065 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9286&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9286&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9286&client=summon |