Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multiagent Reinforcement Learning

This article studies the networked multiagent reinforcement learning problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer from poor expression due to linear function approximation, we propose...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on automatic control Ročník 70; číslo 11; s. 7109 - 7124
Hlavní autoři: Dai, Pengcheng, Mo, Yuanqiu, Yu, Wenwu, Ren, Wei
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.11.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:0018-9286, 1558-2523
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract This article studies the networked multiagent reinforcement learning problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer from poor expression due to linear function approximation, we propose a distributed neural policy gradient algorithm that features two innovatively designed neural networks, specifically for the approximate <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-functions and policy functions of agents. This distributed neural policy gradient algorithm consists of two key components: the distributed critic step and the decentralized actor step. In the distributed critic step, agents receive the approximate <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-function parameters from their neighboring agents via a time-varying communication networks to collaboratively evaluate the joint policy. In contrast, in the decentralized actor step, each agent updates its local policy parameter solely based on its own approximate <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-function. In the convergence analysis, we first establish the global convergence of agents for the joint policy evaluation in the distributed critic step. Subsequently, we rigorously demonstrate the global convergence of the overall distributed neural policy gradient algorithm with respect to the objective function. Finally, the effectiveness of the proposed algorithm is demonstrated by comparing it with a centralized algorithm through simulation in the robot path planning environment.
AbstractList This article studies the networked multiagent reinforcement learning problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer from poor expression due to linear function approximation, we propose a distributed neural policy gradient algorithm that features two innovatively designed neural networks, specifically for the approximate [Formula Omitted]-functions and policy functions of agents. This distributed neural policy gradient algorithm consists of two key components: the distributed critic step and the decentralized actor step. In the distributed critic step, agents receive the approximate [Formula Omitted]-function parameters from their neighboring agents via a time-varying communication networks to collaboratively evaluate the joint policy. In contrast, in the decentralized actor step, each agent updates its local policy parameter solely based on its own approximate [Formula Omitted]-function. In the convergence analysis, we first establish the global convergence of agents for the joint policy evaluation in the distributed critic step. Subsequently, we rigorously demonstrate the global convergence of the overall distributed neural policy gradient algorithm with respect to the objective function. Finally, the effectiveness of the proposed algorithm is demonstrated by comparing it with a centralized algorithm through simulation in the robot path planning environment.
This article studies the networked multiagent reinforcement learning problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer from poor expression due to linear function approximation, we propose a distributed neural policy gradient algorithm that features two innovatively designed neural networks, specifically for the approximate <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-functions and policy functions of agents. This distributed neural policy gradient algorithm consists of two key components: the distributed critic step and the decentralized actor step. In the distributed critic step, agents receive the approximate <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-function parameters from their neighboring agents via a time-varying communication networks to collaboratively evaluate the joint policy. In contrast, in the decentralized actor step, each agent updates its local policy parameter solely based on its own approximate <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-function. In the convergence analysis, we first establish the global convergence of agents for the joint policy evaluation in the distributed critic step. Subsequently, we rigorously demonstrate the global convergence of the overall distributed neural policy gradient algorithm with respect to the objective function. Finally, the effectiveness of the proposed algorithm is demonstrated by comparing it with a centralized algorithm through simulation in the robot path planning environment.
Author Mo, Yuanqiu
Ren, Wei
Yu, Wenwu
Dai, Pengcheng
Author_xml – sequence: 1
  givenname: Pengcheng
  orcidid: 0000-0003-1455-7216
  surname: Dai
  fullname: Dai, Pengcheng
  email: Jldaipc@163.com
  organization: School of Mathematics, Southeast University, Nanjing, China
– sequence: 2
  givenname: Yuanqiu
  orcidid: 0000-0002-0500-5201
  surname: Mo
  fullname: Mo, Yuanqiu
  email: yuanqiumo@seu.edu.cn
  organization: School of Mathematics, Southeast University, Nanjing, China
– sequence: 3
  givenname: Wenwu
  orcidid: 0000-0003-0301-9180
  surname: Yu
  fullname: Yu, Wenwu
  email: wwyu@seu.edu.cn
  organization: School of Mathematics, Frontiers Science Center for Mobile Information Communication and Security, Southeast University, Nanjing, China
– sequence: 4
  givenname: Wei
  orcidid: 0000-0002-2818-9752
  surname: Ren
  fullname: Ren, Wei
  email: ren@ece.ucr.edu
  organization: Department of Electrical and Computer Engineering, University of California, Riverside, CA, USA
BookMark eNpFkEtLw0AUhQdRsK3uXbgIuE69M5PJY1miVqE-kLoOk8lNnJrO1Emi9N87oQVXlwPfdy6cKTk11iAhVxTmlEJ2u17kcwZMzLlIAGJxQiZUiDRkgvFTMgGgaZixND4n067b-BhHEZ0Qd6e73uly6LEKXnBwsg3ebKvVPlg6WWk0fbBoG-t0_7kNauuCZWtLD-XW_KBr0CgMbO3V_te6L1_yPLS9ls0ovqM2XlG4HdMKpTPaNBfkrJZth5fHOyMfD_fr_DFcvS6f8sUqVCxK-rCqKpZiIlDSipY1x0rxMqsljZRMa0zjpKpRcQCPA1MKUpkpyBIlIvB8yWfk5tC7c_Z7wK4vNnZwxr8sOItjkYkoFp6CA6Wc7TqHdbFzeivdvqBQjMsWftliXLY4LuuV64OiEfEfpwBcxBn_A0YeeS0
CODEN IETAA9
Cites_doi 10.1109/JSAC.2019.2933973
10.1016/B978-1-55860-307-3.50049-6
10.1007/978-3-319-91578-4
10.1109/TNNLS.2021.3139138
10.1109/TWC.2019.2933417
10.1109/TNN.1998.712192
10.1109/TCYB.2021.3082639
10.1109/TII.2019.2933443
10.1109/CDC40024.2019.9029257
10.1016/j.automatica.2016.12.020
10.1109/TNSE.2020.3018871
10.1016/j.ifacol.2020.12.2021
10.1109/TAC.2025.3570065
10.1109/TCYB.2020.3015811
10.1609/aaai.v32i1.11794
10.1287/moor.2023.1370
10.1016/j.neucom.2007.11.026
10.1145/3219819.3220096
10.1109/TITS.2019.2901791
10.1109/JPROC.2018.2817461
10.1109/TAC.2010.2041686
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
7TB
8FD
FR3
JQ2
L7M
L~C
L~D
DOI 10.1109/TAC.2025.3570065
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Mechanical & Transportation Engineering Abstracts
Technology Research Database
Engineering Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2523
EndPage 7124
ExternalDocumentID 10_1109_TAC_2025_3570065
11003569
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62233004; 62303112
  funderid: 10.13039/501100001809
– fundername: Natural Science Foundation of Jiangsu Province
  grantid: BK20230826
  funderid: 10.13039/501100004608
– fundername: National Science and Technology Major Project of China
  grantid: 2022ZD0120001; 2022ZD0120002
– fundername: Jiangsu Provincial Scientific Research Center of Applied Mathematics
  grantid: BK20233002
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
H~9
IAAWW
IBMZZ
ICLAB
IDIHD
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
RIA
RIE
RNS
TAE
TN5
VH1
VJK
~02
AAYXX
CITATION
7SC
7SP
7TB
8FD
FR3
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c247t-ddd28e75ea1d1bf3edc3b9fa14ca8fe867dfec300c2402cc08a9c097c5401bfb3
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001605046600045&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0018-9286
IngestDate Thu Oct 30 15:56:50 EDT 2025
Sat Nov 29 06:59:57 EST 2025
Wed Nov 05 07:07:51 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c247t-ddd28e75ea1d1bf3edc3b9fa14ca8fe867dfec300c2402cc08a9c097c5401bfb3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-0500-5201
0000-0002-2818-9752
0000-0003-0301-9180
0000-0003-1455-7216
PQID 3266595465
PQPubID 85475
PageCount 16
ParticipantIDs ieee_primary_11003569
proquest_journals_3266595465
crossref_primary_10_1109_TAC_2025_3570065
PublicationCentury 2000
PublicationDate 2025-11-01
PublicationDateYYYYMMDD 2025-11-01
PublicationDate_xml – month: 11
  year: 2025
  text: 2025-11-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on automatic control
PublicationTitleAbbrev TAC
PublicationYear 2025
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References Zhou (ref30) 2023
ref14
ref11
ref10
ref32
ref2
ref1
ref17
Wagner (ref25) 2013; 26
ref24
ref23
ref20
Zhang (ref16) 2018
ref21
Sutton (ref22) 1999; 12
Doan (ref19) 2019
ref28
Xu (ref26) 2020; 33
ref27
Rashid (ref12) 2018
Wang (ref15) 2020
Wang (ref18) 2019
ref8
ref7
ref9
ref4
ref3
Nesterov (ref29) 2018
Mei (ref31) 2020
ref6
ref5
Wang (ref13) 2020
References_xml – ident: ref8
  doi: 10.1109/JSAC.2019.2933973
– ident: ref11
  doi: 10.1016/B978-1-55860-307-3.50049-6
– volume-title: Introductory Lectures on Convex Optimization
  year: 2018
  ident: ref29
  doi: 10.1007/978-3-319-91578-4
– ident: ref17
  doi: 10.1109/TNNLS.2021.3139138
– ident: ref9
  doi: 10.1109/TWC.2019.2933417
– ident: ref1
  doi: 10.1109/TNN.1998.712192
– ident: ref3
  doi: 10.1109/TCYB.2021.3082639
– ident: ref2
  doi: 10.1109/TII.2019.2933443
– start-page: 2563
  volume-title: Proc. Uncertainty Artif. Intell. Conf.
  year: 2023
  ident: ref30
  article-title: Convergence rates for localized actor-critic in networked Markov potential games
– year: 2020
  ident: ref13
  article-title: QPLEX: Duplex dueling multi-agent Q-learning
– ident: ref20
  doi: 10.1109/CDC40024.2019.9029257
– ident: ref6
  doi: 10.1016/j.automatica.2016.12.020
– ident: ref7
  doi: 10.1109/TNSE.2020.3018871
– ident: ref21
  doi: 10.1016/j.ifacol.2020.12.2021
– ident: ref27
  doi: 10.1109/TAC.2025.3570065
– ident: ref5
  doi: 10.1109/TCYB.2020.3015811
– volume: 12
  start-page: 1057
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 1999
  ident: ref22
  article-title: Policy gradient methods for reinforcement learning with function approximation
– ident: ref14
  doi: 10.1609/aaai.v32i1.11794
– start-page: 6846
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2018
  ident: ref12
  article-title: Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning
– ident: ref23
  doi: 10.1287/moor.2023.1370
– ident: ref24
  doi: 10.1016/j.neucom.2007.11.026
– ident: ref10
  doi: 10.1145/3219819.3220096
– start-page: 6820
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2020
  ident: ref31
  article-title: On the global convergence rates of softmax policy gradient methods
– volume: 26
  start-page: 3101
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2013
  ident: ref25
  article-title: Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
– start-page: 1626
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2019
  ident: ref19
  article-title: Finite-time analysis of distributed TD(0) with linear function approximation on multi-agent reinforcement learning
– ident: ref4
  doi: 10.1109/TITS.2019.2901791
– volume: 33
  start-page: 4358
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2020
  ident: ref26
  article-title: Improving sample complexity bounds for (natural) actor-critic algorithms
– year: 2020
  ident: ref15
  article-title: Off-policy multi-agent decomposed policy gradients
– ident: ref32
  doi: 10.1109/JPROC.2018.2817461
– ident: ref28
  doi: 10.1109/TAC.2010.2041686
– start-page: 5872
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2018
  ident: ref16
  article-title: Fully decentralized multi-agent reinforcement learning with networked agents
– year: 2019
  ident: ref18
  article-title: Neural policy gradient methods: Global optimality and rates of convergence
SSID ssj0016441
Score 2.4914813
Snippet This article studies the networked multiagent reinforcement learning problem, where the objective of agents is to collaboratively maximize the discounted...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Index Database
Publisher
StartPage 7109
SubjectTerms Algorithms
Approximation algorithms
Communication networks
Convergence
Distributed neural policy gradient algorithm
Function approximation
global convergence
Linear functions
Linear programming
Machine learning
Multiagent systems
networked multiagent reinforcement learning (NMARL)
Neural networks
Parameters
Reinforcement learning
Reviews
Scalability
Training
Vectors
Title Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multiagent Reinforcement Learning
URI https://ieeexplore.ieee.org/document/11003569
https://www.proquest.com/docview/3266595465
Volume 70
WOSCitedRecordID wos001605046600045&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Xplore
  customDbUrl:
  eissn: 1558-2523
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0016441
  issn: 0018-9286
  databaseCode: RIE
  dateStart: 19630101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxQADzyIKBXlgYUibOA_bY1UoDKhCqEjdosQ-l0rQojTl9-Nz0lIJMbBlsK3ozvbd-e77jpCbKNaQWNPhiVwyLwKjvYzp2DM8D3MAIXTueGaf-GgkJhP5XIPVHRYGAFzxGXTx0-Xy9UKt8Kmsh_RmYZzIBmlwziuw1iZlgIa9unbtCWZik5P0ZW_cH9hIkMXdENnc0Y5s2SDXVOXXTezMy_Dwnz92RA5qP5L2K8Ufkx2Yn5D9LXbBU1LcISku9rMCTZGEw46vaIDpQ-EqvUraf58uiln59kGt80qrBgB0gJXoDpQJdGHsVFcpbhdxaN0MwVj0BRzlqnKvi7RmaZ22yOvwfjx49OoWC55iES89rTUTwGPIAh3kJgStwlyaLIhUJgyIhGsDKvR9hVkYpXyRSeVLrqyjZ8fn4RlpzhdzOCdUZHiew8jGkDpiTOXM6jlgRkeBNEnC2uR2LfT0s2LSSF0E4svUKihFBaW1gtqkhUL-GVfLt006azWl9VlbptYBRVLEKIkv_ph2SfZw9QpC2CHNsljBFdlVX-VsWVy7bfQN99HIPA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8QwEB58gXrwLa7PHLx4qLZp2ibHZX3iuois4K20yUQF3ZVa_f1m0voA8eCth6Qt-ZLMJDPfNwD7IjGYOtMRyFLxQKA1QcFNEtisjEtEKU3pdWb72WAg7-7UdUtW91wYRPTJZ3hIjz6Wb8b6ja7KjkjeLE5SNQnTiRA8auhaX0EDMu3NxuvWMJdfUclQHQ27PXcW5MlhTHruZEl-WCFfVuXXXuwNzOniP39tCRZaT5J1G-iXYQJHKzD_Q19wFapjksWlilZoGMlwuPaNEDA7q3yuV826T_fj6rF-eGbOfWVNCQDWo1x0T8tENrauq88Vdy_xfN2C6FjsBr3oqvb3i6zVab1fg9vTk2HvPGiLLASai6wOjDFcYpZgEZmotDEaHZfKFpHQhbQo08xY1HEYaorDaB3KQulQZdq5eq59Ga_D1Gg8wg1gsqAVHTtgQiM41yV3SEfcGhEpm6a8Awefg56_NFoauT-DhCp3AOUEUN4C1IE1GuTvdu34dmD7E6a8XW2vuXNBSRZRpMnmH932YPZ8eNXP-xeDyy2Yoy81hMJtmKqrN9yBGf1eP75Wu35KfQCK88uD
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Distributed+Neural+Policy+Gradient+Algorithm+for+Global+Convergence+of+Networked+Multiagent+Reinforcement+Learning&rft.jtitle=IEEE+transactions+on+automatic+control&rft.au=Dai%2C+Pengcheng&rft.au=Mo%2C+Yuanqiu&rft.au=Yu%2C+Wenwu&rft.au=Ren%2C+Wei&rft.date=2025-11-01&rft.issn=0018-9286&rft.eissn=1558-2523&rft.volume=70&rft.issue=11&rft.spage=7109&rft.epage=7124&rft_id=info:doi/10.1109%2FTAC.2025.3570065&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TAC_2025_3570065
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9286&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9286&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9286&client=summon