Safe multi-agent reinforcement learning for multi-robot control

A challenging problem in robotics is how to control multiple robots cooperatively and safely in real-world applications. Yet, developing multi-robot control methods from the perspective of safe multi-agent reinforcement learning (MARL) has merely been studied. To fill this gap, in this study, we inv...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Artificial intelligence Ročník 319; s. 103905
Hlavní autoři: Gu, Shangding, Grudzien Kuba, Jakub, Chen, Yuanpei, Du, Yali, Yang, Long, Knoll, Alois, Yang, Yaodong
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.06.2023
Témata:
ISSN:0004-3702, 1872-7921
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract A challenging problem in robotics is how to control multiple robots cooperatively and safely in real-world applications. Yet, developing multi-robot control methods from the perspective of safe multi-agent reinforcement learning (MARL) has merely been studied. To fill this gap, in this study, we investigate safe MARL for multi-robot control on cooperative tasks, in which each individual robot has to not only meet its own safety constraints while maximising their reward, but also consider those of others to guarantee safe team behaviours. Firstly, we formulate the safe MARL problem as a constrained Markov game and employ policy optimisation to solve it theoretically. The proposed algorithm guarantees monotonic improvement in reward and satisfaction of safety constraints at every iteration. Secondly, as approximations to the theoretical solution, we propose two safe multi-agent policy gradient methods: Multi-Agent Constrained Policy Optimisation (MACPO) and MAPPO-Lagrangian. Thirdly, we develop the first three safe MARL benchmarks—Safe Multi-Agent MuJoCo (Safe MAMuJoCo), Safe Multi-Agent Robosuite (Safe MARobosuite) and Safe Multi-Agent Isaac Gym (Safe MAIG) to expand the toolkit of MARL and robot control research communities. Finally, experimental results on the three safe MARL benchmarks indicate that our methods can achieve state-of-the-art performance in the balance between improving reward and satisfying safety constraints compared with strong baselines. Demos and code are available at the link (https://sites.google.com/view/aij-safe-marl/).2 •The problem of safe multi-agent reinforcement learning is formulated.•Multi-agent constrained policy optimisation (MACPO) method is proposed.•MACPO ensures both safety constraints satisfaction and monotonic performance improvement guarantee.•Three safe MARL benchmarks are developed: Safe Multi-Agent MuJoCo (Safe MAMuJoCo), Safe Multi-Agent Robosuite (Safe MARobosuite) and Safe Multi-Agent Isaac Gym (Safe MAIG).•Experiments on multiple benchmark environments confirm the effectiveness of MACPO and MAPPO-Lagrangian.
AbstractList A challenging problem in robotics is how to control multiple robots cooperatively and safely in real-world applications. Yet, developing multi-robot control methods from the perspective of safe multi-agent reinforcement learning (MARL) has merely been studied. To fill this gap, in this study, we investigate safe MARL for multi-robot control on cooperative tasks, in which each individual robot has to not only meet its own safety constraints while maximising their reward, but also consider those of others to guarantee safe team behaviours. Firstly, we formulate the safe MARL problem as a constrained Markov game and employ policy optimisation to solve it theoretically. The proposed algorithm guarantees monotonic improvement in reward and satisfaction of safety constraints at every iteration. Secondly, as approximations to the theoretical solution, we propose two safe multi-agent policy gradient methods: Multi-Agent Constrained Policy Optimisation (MACPO) and MAPPO-Lagrangian. Thirdly, we develop the first three safe MARL benchmarks—Safe Multi-Agent MuJoCo (Safe MAMuJoCo), Safe Multi-Agent Robosuite (Safe MARobosuite) and Safe Multi-Agent Isaac Gym (Safe MAIG) to expand the toolkit of MARL and robot control research communities. Finally, experimental results on the three safe MARL benchmarks indicate that our methods can achieve state-of-the-art performance in the balance between improving reward and satisfying safety constraints compared with strong baselines. Demos and code are available at the link (https://sites.google.com/view/aij-safe-marl/).2 •The problem of safe multi-agent reinforcement learning is formulated.•Multi-agent constrained policy optimisation (MACPO) method is proposed.•MACPO ensures both safety constraints satisfaction and monotonic performance improvement guarantee.•Three safe MARL benchmarks are developed: Safe Multi-Agent MuJoCo (Safe MAMuJoCo), Safe Multi-Agent Robosuite (Safe MARobosuite) and Safe Multi-Agent Isaac Gym (Safe MAIG).•Experiments on multiple benchmark environments confirm the effectiveness of MACPO and MAPPO-Lagrangian.
ArticleNumber 103905
Author Du, Yali
Yang, Long
Knoll, Alois
Yang, Yaodong
Grudzien Kuba, Jakub
Chen, Yuanpei
Gu, Shangding
Author_xml – sequence: 1
  givenname: Shangding
  orcidid: 0000-0002-2722-3779
  surname: Gu
  fullname: Gu, Shangding
  organization: Department of Computer Science, Technical University of Munich, Germany
– sequence: 2
  givenname: Jakub
  surname: Grudzien Kuba
  fullname: Grudzien Kuba, Jakub
  organization: Department of Statistics, University of Oxford, UK
– sequence: 3
  givenname: Yuanpei
  surname: Chen
  fullname: Chen, Yuanpei
  organization: Institute for Artificial Intelligence, Peking University, China
– sequence: 4
  givenname: Yali
  surname: Du
  fullname: Du, Yali
  organization: Department of Informatics, King's College London, UK
– sequence: 5
  givenname: Long
  surname: Yang
  fullname: Yang, Long
  organization: Institute for Artificial Intelligence, Peking University, China
– sequence: 6
  givenname: Alois
  surname: Knoll
  fullname: Knoll, Alois
  organization: Department of Computer Science, Technical University of Munich, Germany
– sequence: 7
  givenname: Yaodong
  surname: Yang
  fullname: Yang, Yaodong
  email: yaodong.yang@pku.edu.cn
  organization: Institute for Artificial Intelligence, Peking University, China
BookMark eNqFkM1KxDAUhYOMYGf0DVz0BTrmp5k2LhQZ_IMBF-o6JOnNkKFNJI2Cb29KZ-VCV5dzuN_h3rNECx88IHRJ8Jpgsrk6rFVMzqc1xZRliwnMT1BB2oZWjaBkgQqMcV2xBtMztBzHQ5ZMCFKg21dloRw---QqtQefygjO2xANDJPqQUXv_L7M1nEtBh1SaYJPMfTn6NSqfoSL41yh94f7t-1TtXt5fN7e7SrDOE1VK9q21lqApVzjzrCW2QaDbUzLs8CWcC7Mhk9mHpTVgnRYU9BUUKsJW6HrOdfEMI4RrDQuqeSmK5TrJcFyqkIe5FyFnKqQcxUZrn_BH9ENKn7_h93MGOTHvhxEORoH3kDnIpgku-D-DvgBcUd9PQ
CitedBy_id crossref_primary_10_1007_s42405_024_00722_8
crossref_primary_10_1109_TIFS_2025_3560203
crossref_primary_10_1016_j_rser_2025_116022
crossref_primary_10_1016_j_neucom_2025_131193
crossref_primary_10_1155_int_4810561
crossref_primary_10_1016_j_aei_2025_103216
crossref_primary_10_1109_TASE_2025_3574280
crossref_primary_10_20965_jrm_2024_p0508
crossref_primary_10_1109_TII_2024_3391934
crossref_primary_10_3390_drones8090481
crossref_primary_10_1177_15741702251370050
crossref_primary_10_1016_j_neunet_2025_107253
crossref_primary_10_1016_j_engappai_2025_111929
crossref_primary_10_1109_JSAC_2024_3365869
crossref_primary_10_3390_electronics13152927
crossref_primary_10_3390_bdcc8050049
crossref_primary_10_1016_j_enbuild_2024_114410
crossref_primary_10_3390_app15042216
crossref_primary_10_1109_TASE_2024_3501580
crossref_primary_10_1007_s10462_025_11166_1
crossref_primary_10_1016_j_isatra_2024_09_002
crossref_primary_10_1016_j_trc_2025_105184
crossref_primary_10_1016_j_engappai_2025_110809
crossref_primary_10_1016_j_neucom_2025_130041
crossref_primary_10_3389_fnbot_2023_1280341
crossref_primary_10_1007_s10846_024_02156_6
crossref_primary_10_1016_j_jii_2025_100917
crossref_primary_10_1007_s00170_025_16331_y
crossref_primary_10_1016_j_neucom_2025_129535
crossref_primary_10_7717_peerj_cs_2588
crossref_primary_10_1007_s11432_024_4223_2
crossref_primary_10_1016_j_neunet_2025_107245
crossref_primary_10_1109_TAI_2024_3497919
crossref_primary_10_1016_j_cherd_2024_12_028
crossref_primary_10_1016_j_engappai_2025_110206
crossref_primary_10_1109_TMC_2024_3417441
crossref_primary_10_1016_j_eswa_2025_126437
crossref_primary_10_1002_aic_70010
crossref_primary_10_1109_JIOT_2024_3409169
crossref_primary_10_1109_TCYB_2025_3557694
crossref_primary_10_1109_OJCOMS_2024_3509440
crossref_primary_10_3390_drones8010018
crossref_primary_10_1016_j_knosys_2024_112703
crossref_primary_10_1631_FITEE_2400259
crossref_primary_10_1109_TIV_2024_3450205
crossref_primary_10_1109_TPAMI_2024_3457538
crossref_primary_10_1016_j_cie_2024_110143
crossref_primary_10_1016_j_arcontrol_2024_100948
crossref_primary_10_1109_TCDS_2025_3533744
crossref_primary_10_1109_TVT_2024_3410930
crossref_primary_10_1109_TPEL_2025_3590572
crossref_primary_10_1109_TPAMI_2025_3528944
crossref_primary_10_3390_app15042234
crossref_primary_10_1109_TASE_2024_3431530
crossref_primary_10_1016_j_rcim_2024_102834
Cites_doi 10.1109/TAC.2016.2638961
10.1126/scirobotics.aaw1924
10.1109/TAC.2015.2444131
10.1038/nature16961
10.1177/0278364902021010981
10.1016/j.mechmachtheory.2007.03.003
10.1016/j.ins.2012.07.014
10.1016/j.ifacol.2015.11.154
10.1109/TITS.2019.2901791
10.1109/LCSYS.2021.3138546
10.3390/robotics11040081
ContentType Journal Article
Copyright 2023
Copyright_xml – notice: 2023
DBID AAYXX
CITATION
DOI 10.1016/j.artint.2023.103905
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-7921
ExternalDocumentID 10_1016_j_artint_2023_103905
S0004370223000516
GroupedDBID --K
--M
--Z
-~X
.DC
.~1
0R~
1B1
1~.
1~5
23N
4.4
457
4G.
5GY
5VS
6I.
6J9
6TJ
7-5
71M
77K
8P~
9JN
AACTN
AAEDT
AAEDW
AAFTH
AAIAV
AAIKJ
AAKOC
AAKPC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABFNM
ABFRF
ABJNI
ABMAC
ABVKL
ABXDB
ABYKQ
ACDAQ
ACGFO
ACGFS
ACNCT
ACNNM
ACRLP
ACWUS
ACZNC
ADBBV
ADEZE
ADMUD
AEBSH
AECPX
AEFWE
AEKER
AENEX
AETEA
AEXQZ
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
E3Z
EBS
EFJIC
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
IXB
J1W
JJJVA
KOM
KQ8
LG9
LY7
M41
MO0
MVM
N9A
NCXOZ
O-L
O9-
OAUVE
OK1
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
R2-
RIG
RNS
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SET
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TAE
TN5
TR2
TWZ
UPT
UQL
VQA
WH7
WUQ
XFK
XJE
XJT
XPP
XSW
ZMT
~02
~G-
77I
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKYEP
ANKPU
APXCP
CITATION
EFKBS
EFLBG
~HD
ID FETCH-LOGICAL-c352t-89884bb9ef25b0dc383f70ef7c85c380f1559c65f70ec6523491d0b2eb292fb13
ISICitedReferencesCount 74
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000966125700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0004-3702
IngestDate Sat Nov 29 07:28:59 EST 2025
Tue Nov 18 22:28:06 EST 2025
Sat Apr 13 16:36:10 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Safe multi-robot control
Safe multi-agent benchmarks
Constrained policy optimisation
Constrained Markov game
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c352t-89884bb9ef25b0dc383f70ef7c85c380f1559c65f70ec6523491d0b2eb292fb13
ORCID 0000-0002-2722-3779
OpenAccessLink https://kclpure.kcl.ac.uk/portal/en/publications/4b6e2578-0b6d-455a-84f9-b317b80838cc
ParticipantIDs crossref_citationtrail_10_1016_j_artint_2023_103905
crossref_primary_10_1016_j_artint_2023_103905
elsevier_sciencedirect_doi_10_1016_j_artint_2023_103905
PublicationCentury 2000
PublicationDate June 2023
2023-06-00
PublicationDateYYYYMMDD 2023-06-01
PublicationDate_xml – month: 06
  year: 2023
  text: June 2023
PublicationDecade 2020
PublicationTitle Artificial intelligence
PublicationYear 2023
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Liu, Geng, Aggarwal, Lan, Yang, Xu (br0280) 2021
Ma, Li, Kochenderfer, Isele, Fujimura (br0300) 2021
Gu, Yang, Du, Chen, Walter, Wang, Yang, Knoll (br0220) 2022
Panagou, Stipanović, Voulgaris (br0340) 2015; 61
Makoviychuk, Wawrzyniak, Guo, Lu, Storey, Macklin, Hoeller, Rudin, Allshire, Handa (br0310) 2021
Ray, Achiam, Amodei (br0390) 2019
De Santis, Siciliano, De Luca, Bicchi (br0160) 2008; 43
Beckers, Colombo, Hirche, Pappas (br0060) 2021; 6
Chow, Nachum, Faust, Duenez-Guzman, Ghavamzadeh (br0140) 2019
Brunke, Greeff, Hall, Yuan, Zhou, Panerati, Schoellig (br0090) 2021; 5
Schroeder de Witt, Gupta, Makoviichuk, Makoviychuk, Torr, Sun, Whiteson (br0170) 2020
Zhao, Gu, Zhang, Yang, Liu, Liu, Tang (br0520) 2021; vol. 35
Moldovan, Abbeel (br0320) 2012
Pollard (br0360) 2000
Qin, Zhang, Chen, Chen, Fan (br0370) 2020
Chu, Wang, Codecà, Li (br0150) 2019; 21
Shalev-Shwartz, Shammah, Shashua (br0440) 2016
Zhu, Wong, Mandlekar, Martín-Martín (br0530) 2020
Schulman, Wolski, Dhariwal, Radford, Klimov (br0430) 2017
Borrmann, Wang, Ames, Egerstedt (br0070) 2015; 48
Silver, Huang, Maddison, Guez, Sifre, Van Den Driessche, Schrittwieser, Antonoglou, Panneershelvam, Lanctot (br0450) 2016; 529
Rashid, Samvelyan, Schroeder, Farquhar, Foerster, Whiteson (br0380) 2018
Schulman, Levine, Abbeel, Jordan, Moritz (br0410) 2015
Gu, Chen, Zhang, Hou, Hu, Knoll (br0200) 2022; 11
Timothy Paul Lillicrap, Jonathan James Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, Daniel Pieter Wierstra, Continuous control with deep reinforcement learning, September 15 2020. US Patent 10,776,692.
Chow, Ghavamzadeh, Janson, Pavone (br0120) 2017; 18
Hou, Wang (br0230) 2013; 235
Peng, Rashid, Schroeder de Witt, Kamienny, Torr, Böhmer, Facmac (br0350) 2021; 34
Gu, Kuba, Wen, Chen, Wang, Tian, Wang, Knoll, Yang (br0210) 2021
Samvelyan, Rashid, Schroeder de Witt, Farquhar, Nardelli, Rudner, Hung, Torr, Foerster, Whiteson (br0400) 2019
Fierro, Das, Spletzer, Esposito, Kumar, Ostrowski, Pappas, Taylor, Hur, Alur (br0180) 2002; 21
Lattimore, Szepesvári (br0260) 2020
Lu, Zhang, Chen, Basar, Horesh (br0290) 2021; vol. 35
Althoff, Giusti, Liu, Pereira (br0030) 2019; 4
Achiam, Held, Tamar, Abbeel (br0020) 2017
Yang, Ji, Dai, Zhang, Zhou, Li, Yang, Pan (br0470) 2022
Chen, Qi, Dong, Zhong (br0100) 2021
NVIDIA (br0330) 2020
Chow, Nachum, Duenez-Guzman, Ghavamzadeh (br0130) 2018; 31
Yu, Velu, Vinitsky, Wang, Bayen, Wu (br0490) 2021
Brockman, Cheung, Pettersson, Schneider, Schulman, Tang, Zaremba (br0080) 2016
Grudzien Kuba, Chen, Wen, Wen, Sun, Wang, Yang (br0240) 2021
Garcıa, Fernández (br0190) 2015; 16
Yang, Zhang, Zheng, Zheng, Li, Huang, Pan (br0480) 2022; vol. 36
Abe, Melville, Pendus, Reddy, Jensen, Thomas, Bennett, Anderson, Cooley, Kowalczyk (br0010) 2010
Chinchali, Hu, Chu, Sharma, Bansal, Misra, Pavone, Katti (br0110) 2018
Zanger, Daaboul, Zöllner (br0500) 2021
Ames, Xu, Grizzle, Tabuada (br0050) 2016; 62
Schulman, Moritz, Levine, Jordan, Abbeel (br0420) 2015
Altman (br0040) 1999
Grudzien Kuba, Wen, Yang, Meng, Gu, Zhang, Mguni, Wang (br0250) 2021; 34
Zhang, Bastani, Kumar (br0510) 2019
Sutton, Barto (br0460) 2018
Schulman (10.1016/j.artint.2023.103905_br0410) 2015
Chow (10.1016/j.artint.2023.103905_br0120) 2017; 18
Borrmann (10.1016/j.artint.2023.103905_br0070) 2015; 48
Gu (10.1016/j.artint.2023.103905_br0220)
Brunke (10.1016/j.artint.2023.103905_br0090) 2021; 5
Chu (10.1016/j.artint.2023.103905_br0150) 2019; 21
Abe (10.1016/j.artint.2023.103905_br0010) 2010
Makoviychuk (10.1016/j.artint.2023.103905_br0310) 2021
Beckers (10.1016/j.artint.2023.103905_br0060) 2021; 6
Silver (10.1016/j.artint.2023.103905_br0450) 2016; 529
10.1016/j.artint.2023.103905_br0270
Althoff (10.1016/j.artint.2023.103905_br0030) 2019; 4
Schroeder de Witt (10.1016/j.artint.2023.103905_br0170)
Peng (10.1016/j.artint.2023.103905_br0350) 2021; 34
Qin (10.1016/j.artint.2023.103905_br0370) 2020
Chen (10.1016/j.artint.2023.103905_br0100) 2021
NVIDIA (10.1016/j.artint.2023.103905_br0330)
Sutton (10.1016/j.artint.2023.103905_br0460) 2018
Schulman (10.1016/j.artint.2023.103905_br0430)
Chow (10.1016/j.artint.2023.103905_br0130) 2018; 31
Garcıa (10.1016/j.artint.2023.103905_br0190) 2015; 16
Altman (10.1016/j.artint.2023.103905_br0040) 1999
Zhao (10.1016/j.artint.2023.103905_br0520) 2021; vol. 35
Ma (10.1016/j.artint.2023.103905_br0300) 2021
Gu (10.1016/j.artint.2023.103905_br0200) 2022; 11
Ames (10.1016/j.artint.2023.103905_br0050) 2016; 62
Zhang (10.1016/j.artint.2023.103905_br0510)
Chinchali (10.1016/j.artint.2023.103905_br0110) 2018
Ray (10.1016/j.artint.2023.103905_br0390)
Liu (10.1016/j.artint.2023.103905_br0280) 2021
De Santis (10.1016/j.artint.2023.103905_br0160) 2008; 43
Brockman (10.1016/j.artint.2023.103905_br0080) 2016
Moldovan (10.1016/j.artint.2023.103905_br0320) 2012
Lu (10.1016/j.artint.2023.103905_br0290) 2021; vol. 35
Pollard (10.1016/j.artint.2023.103905_br0360)
Samvelyan (10.1016/j.artint.2023.103905_br0400) 2019
Gu (10.1016/j.artint.2023.103905_br0210)
Grudzien Kuba (10.1016/j.artint.2023.103905_br0250) 2021; 34
Lattimore (10.1016/j.artint.2023.103905_br0260) 2020
Zanger (10.1016/j.artint.2023.103905_br0500) 2021
Zhu (10.1016/j.artint.2023.103905_br0530)
Fierro (10.1016/j.artint.2023.103905_br0180) 2002; 21
Rashid (10.1016/j.artint.2023.103905_br0380) 2018
Hou (10.1016/j.artint.2023.103905_br0230) 2013; 235
Grudzien Kuba (10.1016/j.artint.2023.103905_br0240) 2021
Yu (10.1016/j.artint.2023.103905_br0490)
Schulman (10.1016/j.artint.2023.103905_br0420)
Achiam (10.1016/j.artint.2023.103905_br0020) 2017
Panagou (10.1016/j.artint.2023.103905_br0340) 2015; 61
Yang (10.1016/j.artint.2023.103905_br0470) 2022
Chow (10.1016/j.artint.2023.103905_br0140)
Shalev-Shwartz (10.1016/j.artint.2023.103905_br0440)
Yang (10.1016/j.artint.2023.103905_br0480) 2022; vol. 36
References_xml – year: 2020
  ident: br0170
  article-title: Is independent learning all you need in the starcraft multi-agent challenge?
– year: 2020
  ident: br0330
  article-title: Nvidia physx
– volume: 62
  start-page: 3861
  year: 2016
  end-page: 3876
  ident: br0050
  article-title: Control barrier function based quadratic programs for safety critical systems
  publication-title: IEEE Trans. Autom. Control
– volume: 34
  start-page: 12208
  year: 2021
  end-page: 12221
  ident: br0350
  article-title: Factored multi-agent centralised policy gradients
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: vol. 35
  start-page: 8767
  year: 2021
  end-page: 8775
  ident: br0290
  article-title: Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
– year: 2021
  ident: br0490
  article-title: The surprising effectiveness of mappo in cooperative, multi-agent games
– year: 2022
  ident: br0220
  article-title: A review of safe reinforcement learning: methods, theory and applications
– volume: 529
  start-page: 484
  year: 2016
  end-page: 489
  ident: br0450
  article-title: Mastering the game of go with deep neural networks and tree search
  publication-title: Nature
– start-page: 22
  year: 2017
  end-page: 31
  ident: br0020
  article-title: Constrained policy optimization
  publication-title: International Conference on Machine Learning
– year: 2020
  ident: br0260
  article-title: Bandit Algorithms
– year: 2021
  ident: br0210
  article-title: Multi-agent constrained policy optimisation
– volume: 5
  year: 2021
  ident: br0090
  article-title: Safe learning in robotics: from learning-based control to safe reinforcement learning
  publication-title: Annu. Rev. Control Robotics Auton. Syst.
– year: 2019
  ident: br0510
  article-title: Mamps: safe multi-agent reinforcement learning via model predictive shielding
– start-page: 2186
  year: 2019
  end-page: 2188
  ident: br0400
  article-title: The starcraft multi-agent challenge
  publication-title: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems
– volume: 43
  start-page: 253
  year: 2008
  end-page: 270
  ident: br0160
  article-title: An atlas of physical human–robot interaction
  publication-title: Mech. Mach. Theory
– year: 2019
  ident: br0140
  article-title: Lyapunov-based safe policy optimization for continuous control
– year: 2019
  ident: br0390
  article-title: Benchmarking safe exploration in deep reinforcement learning
– volume: 34
  start-page: 13458
  year: 2021
  end-page: 13470
  ident: br0250
  article-title: Settling the variance of multi-agent policy gradients
  publication-title: Adv. Neural Inf. Process. Syst.
– year: 2021
  ident: br0310
  article-title: Isaac gym: high performance gpu based physics simulation for robot learning
  publication-title: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)
– year: 2018
  ident: br0460
  article-title: Reinforcement Learning: An Introduction
– year: 2020
  ident: br0530
  article-title: robosuite: a modular simulation framework and benchmark for robot learning
– year: 2020
  ident: br0370
  article-title: Learning safe multi-agent control with decentralized neural barrier certificates
  publication-title: International Conference on Learning Representations
– volume: vol. 36
  start-page: 8823
  year: 2022
  end-page: 8831
  ident: br0480
  article-title: Policy optimization with stochastic mirror descent
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
– start-page: 75
  year: 2010
  end-page: 84
  ident: br0010
  article-title: Optimizing debt collections using constrained reinforcement learning
  publication-title: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
– year: 1999
  ident: br0040
  article-title: Constrained Markov Decision Processes, vol. 7
– volume: 235
  start-page: 3
  year: 2013
  end-page: 35
  ident: br0230
  article-title: From model-based control to data-driven control: survey, classification and perspective
  publication-title: Inf. Sci.
– volume: vol. 35
  start-page: 750
  year: 2021
  end-page: 758
  ident: br0520
  article-title: Dear: deep reinforcement learning for online advertising impression in recommender systems
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
– year: 2017
  ident: br0430
  article-title: Proximal policy optimization algorithms
– volume: 18
  start-page: 6070
  year: 2017
  end-page: 6120
  ident: br0120
  article-title: Risk-constrained reinforcement learning with percentile risk criteria
  publication-title: J. Mach. Learn. Res.
– reference: Timothy Paul Lillicrap, Jonathan James Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, Daniel Pieter Wierstra, Continuous control with deep reinforcement learning, September 15 2020. US Patent 10,776,692.
– volume: 61
  start-page: 617
  year: 2015
  end-page: 632
  ident: br0340
  article-title: Distributed coordination control for multi-robot networks using Lyapunov-like barrier functions
  publication-title: IEEE Trans. Autom. Control
– year: 2016
  ident: br0440
  article-title: Safe, multi-agent, reinforcement learning for autonomous driving
– start-page: 4295
  year: 2018
  end-page: 4304
  ident: br0380
  article-title: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning
  publication-title: International Conference on Machine Learning
– volume: 48
  start-page: 68
  year: 2015
  end-page: 73
  ident: br0070
  article-title: Control barrier certificates for safe swarm behavior
  publication-title: IFAC-PapersOnLine
– year: 2022
  ident: br0470
  article-title: Constrained update projection approach to safe policy optimization
  publication-title: Advances in Neural Information Processing Systems (NeurIPS)
– volume: 6
  start-page: 2090
  year: 2021
  end-page: 2095
  ident: br0060
  article-title: Online learning-based trajectory tracking for underactuated vehicles with uncertain dynamics
  publication-title: IEEE Control Syst. Lett.
– start-page: 879
  year: 2021
  end-page: 884
  ident: br0100
  article-title: Multi-robot formation control and implementation
  publication-title: 2021 40th Chinese Control Conference (CCC)
– volume: 16
  start-page: 1437
  year: 2015
  end-page: 1480
  ident: br0190
  article-title: A comprehensive survey on safe reinforcement learning
  publication-title: J. Mach. Learn. Res.
– volume: 11
  start-page: 81
  year: 2022
  ident: br0200
  article-title: Constrained reinforcement learning for vehicle motion planning with topological reachability analysis
  publication-title: Robotics
– volume: 21
  start-page: 977
  year: 2002
  end-page: 995
  ident: br0180
  article-title: A framework and architecture for multi-robot coordination
  publication-title: Int. J. Robot. Res.
– start-page: 1451
  year: 2012
  end-page: 1458
  ident: br0320
  article-title: Safe exploration in Markov decision processes
  publication-title: Proceedings of the 29th International Conference on International Conference on Machine Learning
– year: 2018
  ident: br0110
  article-title: Cellular network traffic scheduling with deep reinforcement learning
  publication-title: Thirty-Second AAAI Conference on Artificial Intelligence
– volume: 31
  year: 2018
  ident: br0130
  article-title: A Lyapunov-based approach to safe reinforcement learning
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 21
  start-page: 1086
  year: 2019
  end-page: 1095
  ident: br0150
  article-title: Multi-agent deep reinforcement learning for large-scale traffic signal control
  publication-title: IEEE Trans. Intell. Transp. Syst.
– start-page: 157
  year: 2021
  end-page: 173
  ident: br0280
  article-title: Cmix: deep multi-agent reinforcement learning with peak and average constraints
  publication-title: Joint European Conference on Machine Learning and Knowledge Discovery in Databases
– year: 2000
  ident: br0360
  article-title: Asymptopia: an exposition of statistical asymptotic theory
– year: 2016
  ident: br0080
  article-title: Openai gym
– year: 2021
  ident: br0240
  article-title: Trust region policy optimisation in multi-agent reinforcement learning
  publication-title: International Conference on Learning Representations
– year: 2015
  ident: br0420
  article-title: High-dimensional continuous control using generalized advantage estimation
– start-page: 1889
  year: 2015
  end-page: 1897
  ident: br0410
  article-title: Trust region policy optimization
  publication-title: International Conference on Machine Learning
– volume: 4
  year: 2019
  ident: br0030
  article-title: Effortless creation of safe robots from modules through self-programming and self-verification
  publication-title: Sci. Robot.
– start-page: 6064
  year: 2021
  end-page: 6071
  ident: br0300
  article-title: Reinforcement learning for autonomous driving with latent state inference and spatial-temporal relationships
  publication-title: 2021 IEEE International Conference on Robotics and Automation (ICRA)
– start-page: 3512
  year: 2021
  end-page: 3519
  ident: br0500
  article-title: Safe continuous control with constrained model-based policy optimization
  publication-title: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
– start-page: 3512
  year: 2021
  ident: 10.1016/j.artint.2023.103905_br0500
  article-title: Safe continuous control with constrained model-based policy optimization
– year: 2016
  ident: 10.1016/j.artint.2023.103905_br0080
– ident: 10.1016/j.artint.2023.103905_br0330
– year: 2022
  ident: 10.1016/j.artint.2023.103905_br0470
  article-title: Constrained update projection approach to safe policy optimization
– ident: 10.1016/j.artint.2023.103905_br0210
– year: 2021
  ident: 10.1016/j.artint.2023.103905_br0240
  article-title: Trust region policy optimisation in multi-agent reinforcement learning
– ident: 10.1016/j.artint.2023.103905_br0220
– volume: 5
  year: 2021
  ident: 10.1016/j.artint.2023.103905_br0090
  article-title: Safe learning in robotics: from learning-based control to safe reinforcement learning
  publication-title: Annu. Rev. Control Robotics Auton. Syst.
– ident: 10.1016/j.artint.2023.103905_br0490
– volume: 34
  start-page: 13458
  year: 2021
  ident: 10.1016/j.artint.2023.103905_br0250
  article-title: Settling the variance of multi-agent policy gradients
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 62
  start-page: 3861
  issue: 8
  year: 2016
  ident: 10.1016/j.artint.2023.103905_br0050
  article-title: Control barrier function based quadratic programs for safety critical systems
  publication-title: IEEE Trans. Autom. Control
  doi: 10.1109/TAC.2016.2638961
– year: 2021
  ident: 10.1016/j.artint.2023.103905_br0310
  article-title: Isaac gym: high performance gpu based physics simulation for robot learning
– ident: 10.1016/j.artint.2023.103905_br0420
– ident: 10.1016/j.artint.2023.103905_br0140
– ident: 10.1016/j.artint.2023.103905_br0360
– volume: vol. 36
  start-page: 8823
  year: 2022
  ident: 10.1016/j.artint.2023.103905_br0480
  article-title: Policy optimization with stochastic mirror descent
– start-page: 22
  year: 2017
  ident: 10.1016/j.artint.2023.103905_br0020
  article-title: Constrained policy optimization
– volume: 4
  issue: 31
  year: 2019
  ident: 10.1016/j.artint.2023.103905_br0030
  article-title: Effortless creation of safe robots from modules through self-programming and self-verification
  publication-title: Sci. Robot.
  doi: 10.1126/scirobotics.aaw1924
– start-page: 1889
  year: 2015
  ident: 10.1016/j.artint.2023.103905_br0410
  article-title: Trust region policy optimization
– ident: 10.1016/j.artint.2023.103905_br0170
– volume: 61
  start-page: 617
  issue: 3
  year: 2015
  ident: 10.1016/j.artint.2023.103905_br0340
  article-title: Distributed coordination control for multi-robot networks using Lyapunov-like barrier functions
  publication-title: IEEE Trans. Autom. Control
  doi: 10.1109/TAC.2015.2444131
– year: 2020
  ident: 10.1016/j.artint.2023.103905_br0370
  article-title: Learning safe multi-agent control with decentralized neural barrier certificates
– volume: vol. 35
  start-page: 750
  year: 2021
  ident: 10.1016/j.artint.2023.103905_br0520
  article-title: Dear: deep reinforcement learning for online advertising impression in recommender systems
– ident: 10.1016/j.artint.2023.103905_br0270
– start-page: 75
  year: 2010
  ident: 10.1016/j.artint.2023.103905_br0010
  article-title: Optimizing debt collections using constrained reinforcement learning
– volume: 529
  start-page: 484
  issue: 7587
  year: 2016
  ident: 10.1016/j.artint.2023.103905_br0450
  article-title: Mastering the game of go with deep neural networks and tree search
  publication-title: Nature
  doi: 10.1038/nature16961
– volume: 31
  year: 2018
  ident: 10.1016/j.artint.2023.103905_br0130
  article-title: A Lyapunov-based approach to safe reinforcement learning
  publication-title: Adv. Neural Inf. Process. Syst.
– year: 1999
  ident: 10.1016/j.artint.2023.103905_br0040
– volume: 21
  start-page: 977
  issue: 10–11
  year: 2002
  ident: 10.1016/j.artint.2023.103905_br0180
  article-title: A framework and architecture for multi-robot coordination
  publication-title: Int. J. Robot. Res.
  doi: 10.1177/0278364902021010981
– volume: 16
  start-page: 1437
  issue: 1
  year: 2015
  ident: 10.1016/j.artint.2023.103905_br0190
  article-title: A comprehensive survey on safe reinforcement learning
  publication-title: J. Mach. Learn. Res.
– ident: 10.1016/j.artint.2023.103905_br0510
– start-page: 1451
  year: 2012
  ident: 10.1016/j.artint.2023.103905_br0320
  article-title: Safe exploration in Markov decision processes
– ident: 10.1016/j.artint.2023.103905_br0440
– volume: 43
  start-page: 253
  issue: 3
  year: 2008
  ident: 10.1016/j.artint.2023.103905_br0160
  article-title: An atlas of physical human–robot interaction
  publication-title: Mech. Mach. Theory
  doi: 10.1016/j.mechmachtheory.2007.03.003
– volume: 235
  start-page: 3
  year: 2013
  ident: 10.1016/j.artint.2023.103905_br0230
  article-title: From model-based control to data-driven control: survey, classification and perspective
  publication-title: Inf. Sci.
  doi: 10.1016/j.ins.2012.07.014
– ident: 10.1016/j.artint.2023.103905_br0530
– volume: 48
  start-page: 68
  issue: 27
  year: 2015
  ident: 10.1016/j.artint.2023.103905_br0070
  article-title: Control barrier certificates for safe swarm behavior
  publication-title: IFAC-PapersOnLine
  doi: 10.1016/j.ifacol.2015.11.154
– volume: 21
  start-page: 1086
  issue: 3
  year: 2019
  ident: 10.1016/j.artint.2023.103905_br0150
  article-title: Multi-agent deep reinforcement learning for large-scale traffic signal control
  publication-title: IEEE Trans. Intell. Transp. Syst.
  doi: 10.1109/TITS.2019.2901791
– volume: 6
  start-page: 2090
  year: 2021
  ident: 10.1016/j.artint.2023.103905_br0060
  article-title: Online learning-based trajectory tracking for underactuated vehicles with uncertain dynamics
  publication-title: IEEE Control Syst. Lett.
  doi: 10.1109/LCSYS.2021.3138546
– year: 2020
  ident: 10.1016/j.artint.2023.103905_br0260
– volume: vol. 35
  start-page: 8767
  year: 2021
  ident: 10.1016/j.artint.2023.103905_br0290
  article-title: Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning
– volume: 11
  start-page: 81
  issue: 4
  year: 2022
  ident: 10.1016/j.artint.2023.103905_br0200
  article-title: Constrained reinforcement learning for vehicle motion planning with topological reachability analysis
  publication-title: Robotics
  doi: 10.3390/robotics11040081
– start-page: 6064
  year: 2021
  ident: 10.1016/j.artint.2023.103905_br0300
  article-title: Reinforcement learning for autonomous driving with latent state inference and spatial-temporal relationships
– start-page: 157
  year: 2021
  ident: 10.1016/j.artint.2023.103905_br0280
  article-title: Cmix: deep multi-agent reinforcement learning with peak and average constraints
– year: 2018
  ident: 10.1016/j.artint.2023.103905_br0460
– volume: 18
  start-page: 6070
  issue: 1
  year: 2017
  ident: 10.1016/j.artint.2023.103905_br0120
  article-title: Risk-constrained reinforcement learning with percentile risk criteria
  publication-title: J. Mach. Learn. Res.
– ident: 10.1016/j.artint.2023.103905_br0390
– start-page: 2186
  year: 2019
  ident: 10.1016/j.artint.2023.103905_br0400
  article-title: The starcraft multi-agent challenge
– ident: 10.1016/j.artint.2023.103905_br0430
– volume: 34
  start-page: 12208
  year: 2021
  ident: 10.1016/j.artint.2023.103905_br0350
  article-title: Factored multi-agent centralised policy gradients
  publication-title: Adv. Neural Inf. Process. Syst.
– year: 2018
  ident: 10.1016/j.artint.2023.103905_br0110
  article-title: Cellular network traffic scheduling with deep reinforcement learning
– start-page: 879
  year: 2021
  ident: 10.1016/j.artint.2023.103905_br0100
  article-title: Multi-robot formation control and implementation
– start-page: 4295
  year: 2018
  ident: 10.1016/j.artint.2023.103905_br0380
  article-title: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning
SSID ssj0003991
Score 2.6803298
Snippet A challenging problem in robotics is how to control multiple robots cooperatively and safely in real-world applications. Yet, developing multi-robot control...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 103905
SubjectTerms Constrained Markov game
Constrained policy optimisation
Safe multi-agent benchmarks
Safe multi-robot control
Title Safe multi-agent reinforcement learning for multi-robot control
URI https://dx.doi.org/10.1016/j.artint.2023.103905
Volume 319
WOSCitedRecordID wos000966125700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-7921
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0003991
  issn: 0004-3702
  databaseCode: AIEXJ
  dateStart: 20211211
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1La9wwEBbtpode-i5NmhYdeisqWmltS8clJPRFKCQt25ORtBIkKd7FWZeQX5_Ry3bZ0hf0YnuFZS-aYfTN-JsZhF4JqcG9KSmxoizJbGk00UYpoqbOaU4dU6F1wpeP1fGxWCzkp5RxfRnaCVRNI66u5Pq_ihrGQNg-dfYvxN0_FAbgGoQORxA7HP9I8CfK2cgTJMrnTb1ubaiOakIgMLeJiPTJeFu70qtNJq2P0eq8DUyi0NdjVLqz5-x0IXbqQ879DhioPN3y2ncP-NDpxMS96PTAI4iG7munmrU9G4B0GASvYByHYHzgS8Xg2FaCTDK4M7BhNBpcG22sqADUy5gYnY0wj4Zzy6DH2ML5m1BTwXNfGfd1AiQthg2spxWehA-b8Dbwq7y1KW-jHVYVUkzQzvzd4eJ9v0cDLEu9FOPfy0mVgfm3_a6fg5YREDl9gO4lDwLPo-Qfolu2eYTu5-4cOBnrx-CGgSLgkSLgHxQBZ0XAMIRHioCTIjxBn48OTw_ektQugxhA0RsipBAzraV1rNB0abjgrqLWVUYU8IM6_wXalIUfhBPjMzldUs2sZpI5PeVP0aRZNfYZwrxwilmPn3y9RltoK5wHdk6V1AAk3kU8r0dtUi1539LkW51Jg-d1XMXar2IdV3EXkX7WOtZS-c39VV7qOuHBiPNq0I5fztz755nP0d1BuffRZNN29gW6Y75vzi7bl0mNbgClH4dw
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Safe+multi-agent+reinforcement+learning+for+multi-robot+control&rft.jtitle=Artificial+intelligence&rft.au=Gu%2C+Shangding&rft.au=Grudzien+Kuba%2C+Jakub&rft.au=Chen%2C+Yuanpei&rft.au=Du%2C+Yali&rft.date=2023-06-01&rft.pub=Elsevier+B.V&rft.issn=0004-3702&rft.eissn=1872-7921&rft.volume=319&rft_id=info:doi/10.1016%2Fj.artint.2023.103905&rft.externalDocID=S0004370223000516
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0004-3702&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0004-3702&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0004-3702&client=summon