Safe multi-agent reinforcement learning for multi-robot control
A challenging problem in robotics is how to control multiple robots cooperatively and safely in real-world applications. Yet, developing multi-robot control methods from the perspective of safe multi-agent reinforcement learning (MARL) has merely been studied. To fill this gap, in this study, we inv...
Uloženo v:
| Vydáno v: | Artificial intelligence Ročník 319; s. 103905 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
01.06.2023
|
| Témata: | |
| ISSN: | 0004-3702, 1872-7921 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | A challenging problem in robotics is how to control multiple robots cooperatively and safely in real-world applications. Yet, developing multi-robot control methods from the perspective of safe multi-agent reinforcement learning (MARL) has merely been studied. To fill this gap, in this study, we investigate safe MARL for multi-robot control on cooperative tasks, in which each individual robot has to not only meet its own safety constraints while maximising their reward, but also consider those of others to guarantee safe team behaviours. Firstly, we formulate the safe MARL problem as a constrained Markov game and employ policy optimisation to solve it theoretically. The proposed algorithm guarantees monotonic improvement in reward and satisfaction of safety constraints at every iteration. Secondly, as approximations to the theoretical solution, we propose two safe multi-agent policy gradient methods: Multi-Agent Constrained Policy Optimisation (MACPO) and MAPPO-Lagrangian. Thirdly, we develop the first three safe MARL benchmarks—Safe Multi-Agent MuJoCo (Safe MAMuJoCo), Safe Multi-Agent Robosuite (Safe MARobosuite) and Safe Multi-Agent Isaac Gym (Safe MAIG) to expand the toolkit of MARL and robot control research communities. Finally, experimental results on the three safe MARL benchmarks indicate that our methods can achieve state-of-the-art performance in the balance between improving reward and satisfying safety constraints compared with strong baselines. Demos and code are available at the link (https://sites.google.com/view/aij-safe-marl/).2
•The problem of safe multi-agent reinforcement learning is formulated.•Multi-agent constrained policy optimisation (MACPO) method is proposed.•MACPO ensures both safety constraints satisfaction and monotonic performance improvement guarantee.•Three safe MARL benchmarks are developed: Safe Multi-Agent MuJoCo (Safe MAMuJoCo), Safe Multi-Agent Robosuite (Safe MARobosuite) and Safe Multi-Agent Isaac Gym (Safe MAIG).•Experiments on multiple benchmark environments confirm the effectiveness of MACPO and MAPPO-Lagrangian. |
|---|---|
| AbstractList | A challenging problem in robotics is how to control multiple robots cooperatively and safely in real-world applications. Yet, developing multi-robot control methods from the perspective of safe multi-agent reinforcement learning (MARL) has merely been studied. To fill this gap, in this study, we investigate safe MARL for multi-robot control on cooperative tasks, in which each individual robot has to not only meet its own safety constraints while maximising their reward, but also consider those of others to guarantee safe team behaviours. Firstly, we formulate the safe MARL problem as a constrained Markov game and employ policy optimisation to solve it theoretically. The proposed algorithm guarantees monotonic improvement in reward and satisfaction of safety constraints at every iteration. Secondly, as approximations to the theoretical solution, we propose two safe multi-agent policy gradient methods: Multi-Agent Constrained Policy Optimisation (MACPO) and MAPPO-Lagrangian. Thirdly, we develop the first three safe MARL benchmarks—Safe Multi-Agent MuJoCo (Safe MAMuJoCo), Safe Multi-Agent Robosuite (Safe MARobosuite) and Safe Multi-Agent Isaac Gym (Safe MAIG) to expand the toolkit of MARL and robot control research communities. Finally, experimental results on the three safe MARL benchmarks indicate that our methods can achieve state-of-the-art performance in the balance between improving reward and satisfying safety constraints compared with strong baselines. Demos and code are available at the link (https://sites.google.com/view/aij-safe-marl/).2
•The problem of safe multi-agent reinforcement learning is formulated.•Multi-agent constrained policy optimisation (MACPO) method is proposed.•MACPO ensures both safety constraints satisfaction and monotonic performance improvement guarantee.•Three safe MARL benchmarks are developed: Safe Multi-Agent MuJoCo (Safe MAMuJoCo), Safe Multi-Agent Robosuite (Safe MARobosuite) and Safe Multi-Agent Isaac Gym (Safe MAIG).•Experiments on multiple benchmark environments confirm the effectiveness of MACPO and MAPPO-Lagrangian. |
| ArticleNumber | 103905 |
| Author | Du, Yali Yang, Long Knoll, Alois Yang, Yaodong Grudzien Kuba, Jakub Chen, Yuanpei Gu, Shangding |
| Author_xml | – sequence: 1 givenname: Shangding orcidid: 0000-0002-2722-3779 surname: Gu fullname: Gu, Shangding organization: Department of Computer Science, Technical University of Munich, Germany – sequence: 2 givenname: Jakub surname: Grudzien Kuba fullname: Grudzien Kuba, Jakub organization: Department of Statistics, University of Oxford, UK – sequence: 3 givenname: Yuanpei surname: Chen fullname: Chen, Yuanpei organization: Institute for Artificial Intelligence, Peking University, China – sequence: 4 givenname: Yali surname: Du fullname: Du, Yali organization: Department of Informatics, King's College London, UK – sequence: 5 givenname: Long surname: Yang fullname: Yang, Long organization: Institute for Artificial Intelligence, Peking University, China – sequence: 6 givenname: Alois surname: Knoll fullname: Knoll, Alois organization: Department of Computer Science, Technical University of Munich, Germany – sequence: 7 givenname: Yaodong surname: Yang fullname: Yang, Yaodong email: yaodong.yang@pku.edu.cn organization: Institute for Artificial Intelligence, Peking University, China |
| BookMark | eNqFkM1KxDAUhYOMYGf0DVz0BTrmp5k2LhQZ_IMBF-o6JOnNkKFNJI2Cb29KZ-VCV5dzuN_h3rNECx88IHRJ8Jpgsrk6rFVMzqc1xZRliwnMT1BB2oZWjaBkgQqMcV2xBtMztBzHQ5ZMCFKg21dloRw---QqtQefygjO2xANDJPqQUXv_L7M1nEtBh1SaYJPMfTn6NSqfoSL41yh94f7t-1TtXt5fN7e7SrDOE1VK9q21lqApVzjzrCW2QaDbUzLs8CWcC7Mhk9mHpTVgnRYU9BUUKsJW6HrOdfEMI4RrDQuqeSmK5TrJcFyqkIe5FyFnKqQcxUZrn_BH9ENKn7_h93MGOTHvhxEORoH3kDnIpgku-D-DvgBcUd9PQ |
| CitedBy_id | crossref_primary_10_1007_s42405_024_00722_8 crossref_primary_10_1109_TIFS_2025_3560203 crossref_primary_10_1016_j_rser_2025_116022 crossref_primary_10_1016_j_neucom_2025_131193 crossref_primary_10_1155_int_4810561 crossref_primary_10_1016_j_aei_2025_103216 crossref_primary_10_1109_TASE_2025_3574280 crossref_primary_10_20965_jrm_2024_p0508 crossref_primary_10_1109_TII_2024_3391934 crossref_primary_10_3390_drones8090481 crossref_primary_10_1177_15741702251370050 crossref_primary_10_1016_j_neunet_2025_107253 crossref_primary_10_1016_j_engappai_2025_111929 crossref_primary_10_1109_JSAC_2024_3365869 crossref_primary_10_3390_electronics13152927 crossref_primary_10_3390_bdcc8050049 crossref_primary_10_1016_j_enbuild_2024_114410 crossref_primary_10_3390_app15042216 crossref_primary_10_1109_TASE_2024_3501580 crossref_primary_10_1007_s10462_025_11166_1 crossref_primary_10_1016_j_isatra_2024_09_002 crossref_primary_10_1016_j_trc_2025_105184 crossref_primary_10_1016_j_engappai_2025_110809 crossref_primary_10_1016_j_neucom_2025_130041 crossref_primary_10_3389_fnbot_2023_1280341 crossref_primary_10_1007_s10846_024_02156_6 crossref_primary_10_1016_j_jii_2025_100917 crossref_primary_10_1007_s00170_025_16331_y crossref_primary_10_1016_j_neucom_2025_129535 crossref_primary_10_7717_peerj_cs_2588 crossref_primary_10_1007_s11432_024_4223_2 crossref_primary_10_1016_j_neunet_2025_107245 crossref_primary_10_1109_TAI_2024_3497919 crossref_primary_10_1016_j_cherd_2024_12_028 crossref_primary_10_1016_j_engappai_2025_110206 crossref_primary_10_1109_TMC_2024_3417441 crossref_primary_10_1016_j_eswa_2025_126437 crossref_primary_10_1002_aic_70010 crossref_primary_10_1109_JIOT_2024_3409169 crossref_primary_10_1109_TCYB_2025_3557694 crossref_primary_10_1109_OJCOMS_2024_3509440 crossref_primary_10_3390_drones8010018 crossref_primary_10_1016_j_knosys_2024_112703 crossref_primary_10_1631_FITEE_2400259 crossref_primary_10_1109_TIV_2024_3450205 crossref_primary_10_1109_TPAMI_2024_3457538 crossref_primary_10_1016_j_cie_2024_110143 crossref_primary_10_1016_j_arcontrol_2024_100948 crossref_primary_10_1109_TCDS_2025_3533744 crossref_primary_10_1109_TVT_2024_3410930 crossref_primary_10_1109_TPEL_2025_3590572 crossref_primary_10_1109_TPAMI_2025_3528944 crossref_primary_10_3390_app15042234 crossref_primary_10_1109_TASE_2024_3431530 crossref_primary_10_1016_j_rcim_2024_102834 |
| Cites_doi | 10.1109/TAC.2016.2638961 10.1126/scirobotics.aaw1924 10.1109/TAC.2015.2444131 10.1038/nature16961 10.1177/0278364902021010981 10.1016/j.mechmachtheory.2007.03.003 10.1016/j.ins.2012.07.014 10.1016/j.ifacol.2015.11.154 10.1109/TITS.2019.2901791 10.1109/LCSYS.2021.3138546 10.3390/robotics11040081 |
| ContentType | Journal Article |
| Copyright | 2023 |
| Copyright_xml | – notice: 2023 |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.artint.2023.103905 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1872-7921 |
| ExternalDocumentID | 10_1016_j_artint_2023_103905 S0004370223000516 |
| GroupedDBID | --K --M --Z -~X .DC .~1 0R~ 1B1 1~. 1~5 23N 4.4 457 4G. 5GY 5VS 6I. 6J9 6TJ 7-5 71M 77K 8P~ 9JN AACTN AAEDT AAEDW AAFTH AAIAV AAIKJ AAKOC AAKPC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABFNM ABFRF ABJNI ABMAC ABVKL ABXDB ABYKQ ACDAQ ACGFO ACGFS ACNCT ACNNM ACRLP ACWUS ACZNC ADBBV ADEZE ADMUD AEBSH AECPX AEFWE AEKER AENEX AETEA AEXQZ AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV AKRWK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 E3Z EBS EFJIC EJD EO8 EO9 EP2 EP3 F0J F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ IHE IXB J1W JJJVA KOM KQ8 LG9 LY7 M41 MO0 MVM N9A NCXOZ O-L O9- OAUVE OK1 OZT P-8 P-9 P2P PC. PQQKQ Q38 R2- RIG RNS ROL RPZ SBC SDF SDG SDP SES SET SEW SPC SPCBC SST SSV SSZ T5K TAE TN5 TR2 TWZ UPT UQL VQA WH7 WUQ XFK XJE XJT XPP XSW ZMT ~02 ~G- 77I 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO ADVLN AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKYEP ANKPU APXCP CITATION EFKBS EFLBG ~HD |
| ID | FETCH-LOGICAL-c352t-89884bb9ef25b0dc383f70ef7c85c380f1559c65f70ec6523491d0b2eb292fb13 |
| ISICitedReferencesCount | 74 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000966125700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0004-3702 |
| IngestDate | Sat Nov 29 07:28:59 EST 2025 Tue Nov 18 22:28:06 EST 2025 Sat Apr 13 16:36:10 EDT 2024 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Safe multi-robot control Safe multi-agent benchmarks Constrained policy optimisation Constrained Markov game |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c352t-89884bb9ef25b0dc383f70ef7c85c380f1559c65f70ec6523491d0b2eb292fb13 |
| ORCID | 0000-0002-2722-3779 |
| OpenAccessLink | https://kclpure.kcl.ac.uk/portal/en/publications/4b6e2578-0b6d-455a-84f9-b317b80838cc |
| ParticipantIDs | crossref_citationtrail_10_1016_j_artint_2023_103905 crossref_primary_10_1016_j_artint_2023_103905 elsevier_sciencedirect_doi_10_1016_j_artint_2023_103905 |
| PublicationCentury | 2000 |
| PublicationDate | June 2023 2023-06-00 |
| PublicationDateYYYYMMDD | 2023-06-01 |
| PublicationDate_xml | – month: 06 year: 2023 text: June 2023 |
| PublicationDecade | 2020 |
| PublicationTitle | Artificial intelligence |
| PublicationYear | 2023 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Liu, Geng, Aggarwal, Lan, Yang, Xu (br0280) 2021 Ma, Li, Kochenderfer, Isele, Fujimura (br0300) 2021 Gu, Yang, Du, Chen, Walter, Wang, Yang, Knoll (br0220) 2022 Panagou, Stipanović, Voulgaris (br0340) 2015; 61 Makoviychuk, Wawrzyniak, Guo, Lu, Storey, Macklin, Hoeller, Rudin, Allshire, Handa (br0310) 2021 Ray, Achiam, Amodei (br0390) 2019 De Santis, Siciliano, De Luca, Bicchi (br0160) 2008; 43 Beckers, Colombo, Hirche, Pappas (br0060) 2021; 6 Chow, Nachum, Faust, Duenez-Guzman, Ghavamzadeh (br0140) 2019 Brunke, Greeff, Hall, Yuan, Zhou, Panerati, Schoellig (br0090) 2021; 5 Schroeder de Witt, Gupta, Makoviichuk, Makoviychuk, Torr, Sun, Whiteson (br0170) 2020 Zhao, Gu, Zhang, Yang, Liu, Liu, Tang (br0520) 2021; vol. 35 Moldovan, Abbeel (br0320) 2012 Pollard (br0360) 2000 Qin, Zhang, Chen, Chen, Fan (br0370) 2020 Chu, Wang, Codecà, Li (br0150) 2019; 21 Shalev-Shwartz, Shammah, Shashua (br0440) 2016 Zhu, Wong, Mandlekar, Martín-Martín (br0530) 2020 Schulman, Wolski, Dhariwal, Radford, Klimov (br0430) 2017 Borrmann, Wang, Ames, Egerstedt (br0070) 2015; 48 Silver, Huang, Maddison, Guez, Sifre, Van Den Driessche, Schrittwieser, Antonoglou, Panneershelvam, Lanctot (br0450) 2016; 529 Rashid, Samvelyan, Schroeder, Farquhar, Foerster, Whiteson (br0380) 2018 Schulman, Levine, Abbeel, Jordan, Moritz (br0410) 2015 Gu, Chen, Zhang, Hou, Hu, Knoll (br0200) 2022; 11 Timothy Paul Lillicrap, Jonathan James Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, Daniel Pieter Wierstra, Continuous control with deep reinforcement learning, September 15 2020. US Patent 10,776,692. Chow, Ghavamzadeh, Janson, Pavone (br0120) 2017; 18 Hou, Wang (br0230) 2013; 235 Peng, Rashid, Schroeder de Witt, Kamienny, Torr, Böhmer, Facmac (br0350) 2021; 34 Gu, Kuba, Wen, Chen, Wang, Tian, Wang, Knoll, Yang (br0210) 2021 Samvelyan, Rashid, Schroeder de Witt, Farquhar, Nardelli, Rudner, Hung, Torr, Foerster, Whiteson (br0400) 2019 Fierro, Das, Spletzer, Esposito, Kumar, Ostrowski, Pappas, Taylor, Hur, Alur (br0180) 2002; 21 Lattimore, Szepesvári (br0260) 2020 Lu, Zhang, Chen, Basar, Horesh (br0290) 2021; vol. 35 Althoff, Giusti, Liu, Pereira (br0030) 2019; 4 Achiam, Held, Tamar, Abbeel (br0020) 2017 Yang, Ji, Dai, Zhang, Zhou, Li, Yang, Pan (br0470) 2022 Chen, Qi, Dong, Zhong (br0100) 2021 NVIDIA (br0330) 2020 Chow, Nachum, Duenez-Guzman, Ghavamzadeh (br0130) 2018; 31 Yu, Velu, Vinitsky, Wang, Bayen, Wu (br0490) 2021 Brockman, Cheung, Pettersson, Schneider, Schulman, Tang, Zaremba (br0080) 2016 Grudzien Kuba, Chen, Wen, Wen, Sun, Wang, Yang (br0240) 2021 Garcıa, Fernández (br0190) 2015; 16 Yang, Zhang, Zheng, Zheng, Li, Huang, Pan (br0480) 2022; vol. 36 Abe, Melville, Pendus, Reddy, Jensen, Thomas, Bennett, Anderson, Cooley, Kowalczyk (br0010) 2010 Chinchali, Hu, Chu, Sharma, Bansal, Misra, Pavone, Katti (br0110) 2018 Zanger, Daaboul, Zöllner (br0500) 2021 Ames, Xu, Grizzle, Tabuada (br0050) 2016; 62 Schulman, Moritz, Levine, Jordan, Abbeel (br0420) 2015 Altman (br0040) 1999 Grudzien Kuba, Wen, Yang, Meng, Gu, Zhang, Mguni, Wang (br0250) 2021; 34 Zhang, Bastani, Kumar (br0510) 2019 Sutton, Barto (br0460) 2018 Schulman (10.1016/j.artint.2023.103905_br0410) 2015 Chow (10.1016/j.artint.2023.103905_br0120) 2017; 18 Borrmann (10.1016/j.artint.2023.103905_br0070) 2015; 48 Gu (10.1016/j.artint.2023.103905_br0220) Brunke (10.1016/j.artint.2023.103905_br0090) 2021; 5 Chu (10.1016/j.artint.2023.103905_br0150) 2019; 21 Abe (10.1016/j.artint.2023.103905_br0010) 2010 Makoviychuk (10.1016/j.artint.2023.103905_br0310) 2021 Beckers (10.1016/j.artint.2023.103905_br0060) 2021; 6 Silver (10.1016/j.artint.2023.103905_br0450) 2016; 529 10.1016/j.artint.2023.103905_br0270 Althoff (10.1016/j.artint.2023.103905_br0030) 2019; 4 Schroeder de Witt (10.1016/j.artint.2023.103905_br0170) Peng (10.1016/j.artint.2023.103905_br0350) 2021; 34 Qin (10.1016/j.artint.2023.103905_br0370) 2020 Chen (10.1016/j.artint.2023.103905_br0100) 2021 NVIDIA (10.1016/j.artint.2023.103905_br0330) Sutton (10.1016/j.artint.2023.103905_br0460) 2018 Schulman (10.1016/j.artint.2023.103905_br0430) Chow (10.1016/j.artint.2023.103905_br0130) 2018; 31 Garcıa (10.1016/j.artint.2023.103905_br0190) 2015; 16 Altman (10.1016/j.artint.2023.103905_br0040) 1999 Zhao (10.1016/j.artint.2023.103905_br0520) 2021; vol. 35 Ma (10.1016/j.artint.2023.103905_br0300) 2021 Gu (10.1016/j.artint.2023.103905_br0200) 2022; 11 Ames (10.1016/j.artint.2023.103905_br0050) 2016; 62 Zhang (10.1016/j.artint.2023.103905_br0510) Chinchali (10.1016/j.artint.2023.103905_br0110) 2018 Ray (10.1016/j.artint.2023.103905_br0390) Liu (10.1016/j.artint.2023.103905_br0280) 2021 De Santis (10.1016/j.artint.2023.103905_br0160) 2008; 43 Brockman (10.1016/j.artint.2023.103905_br0080) 2016 Moldovan (10.1016/j.artint.2023.103905_br0320) 2012 Lu (10.1016/j.artint.2023.103905_br0290) 2021; vol. 35 Pollard (10.1016/j.artint.2023.103905_br0360) Samvelyan (10.1016/j.artint.2023.103905_br0400) 2019 Gu (10.1016/j.artint.2023.103905_br0210) Grudzien Kuba (10.1016/j.artint.2023.103905_br0250) 2021; 34 Lattimore (10.1016/j.artint.2023.103905_br0260) 2020 Zanger (10.1016/j.artint.2023.103905_br0500) 2021 Zhu (10.1016/j.artint.2023.103905_br0530) Fierro (10.1016/j.artint.2023.103905_br0180) 2002; 21 Rashid (10.1016/j.artint.2023.103905_br0380) 2018 Hou (10.1016/j.artint.2023.103905_br0230) 2013; 235 Grudzien Kuba (10.1016/j.artint.2023.103905_br0240) 2021 Yu (10.1016/j.artint.2023.103905_br0490) Schulman (10.1016/j.artint.2023.103905_br0420) Achiam (10.1016/j.artint.2023.103905_br0020) 2017 Panagou (10.1016/j.artint.2023.103905_br0340) 2015; 61 Yang (10.1016/j.artint.2023.103905_br0470) 2022 Chow (10.1016/j.artint.2023.103905_br0140) Shalev-Shwartz (10.1016/j.artint.2023.103905_br0440) Yang (10.1016/j.artint.2023.103905_br0480) 2022; vol. 36 |
| References_xml | – year: 2020 ident: br0170 article-title: Is independent learning all you need in the starcraft multi-agent challenge? – year: 2020 ident: br0330 article-title: Nvidia physx – volume: 62 start-page: 3861 year: 2016 end-page: 3876 ident: br0050 article-title: Control barrier function based quadratic programs for safety critical systems publication-title: IEEE Trans. Autom. Control – volume: 34 start-page: 12208 year: 2021 end-page: 12221 ident: br0350 article-title: Factored multi-agent centralised policy gradients publication-title: Adv. Neural Inf. Process. Syst. – volume: vol. 35 start-page: 8767 year: 2021 end-page: 8775 ident: br0290 article-title: Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning publication-title: Proceedings of the AAAI Conference on Artificial Intelligence – year: 2021 ident: br0490 article-title: The surprising effectiveness of mappo in cooperative, multi-agent games – year: 2022 ident: br0220 article-title: A review of safe reinforcement learning: methods, theory and applications – volume: 529 start-page: 484 year: 2016 end-page: 489 ident: br0450 article-title: Mastering the game of go with deep neural networks and tree search publication-title: Nature – start-page: 22 year: 2017 end-page: 31 ident: br0020 article-title: Constrained policy optimization publication-title: International Conference on Machine Learning – year: 2020 ident: br0260 article-title: Bandit Algorithms – year: 2021 ident: br0210 article-title: Multi-agent constrained policy optimisation – volume: 5 year: 2021 ident: br0090 article-title: Safe learning in robotics: from learning-based control to safe reinforcement learning publication-title: Annu. Rev. Control Robotics Auton. Syst. – year: 2019 ident: br0510 article-title: Mamps: safe multi-agent reinforcement learning via model predictive shielding – start-page: 2186 year: 2019 end-page: 2188 ident: br0400 article-title: The starcraft multi-agent challenge publication-title: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems – volume: 43 start-page: 253 year: 2008 end-page: 270 ident: br0160 article-title: An atlas of physical human–robot interaction publication-title: Mech. Mach. Theory – year: 2019 ident: br0140 article-title: Lyapunov-based safe policy optimization for continuous control – year: 2019 ident: br0390 article-title: Benchmarking safe exploration in deep reinforcement learning – volume: 34 start-page: 13458 year: 2021 end-page: 13470 ident: br0250 article-title: Settling the variance of multi-agent policy gradients publication-title: Adv. Neural Inf. Process. Syst. – year: 2021 ident: br0310 article-title: Isaac gym: high performance gpu based physics simulation for robot learning publication-title: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) – year: 2018 ident: br0460 article-title: Reinforcement Learning: An Introduction – year: 2020 ident: br0530 article-title: robosuite: a modular simulation framework and benchmark for robot learning – year: 2020 ident: br0370 article-title: Learning safe multi-agent control with decentralized neural barrier certificates publication-title: International Conference on Learning Representations – volume: vol. 36 start-page: 8823 year: 2022 end-page: 8831 ident: br0480 article-title: Policy optimization with stochastic mirror descent publication-title: Proceedings of the AAAI Conference on Artificial Intelligence – start-page: 75 year: 2010 end-page: 84 ident: br0010 article-title: Optimizing debt collections using constrained reinforcement learning publication-title: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – year: 1999 ident: br0040 article-title: Constrained Markov Decision Processes, vol. 7 – volume: 235 start-page: 3 year: 2013 end-page: 35 ident: br0230 article-title: From model-based control to data-driven control: survey, classification and perspective publication-title: Inf. Sci. – volume: vol. 35 start-page: 750 year: 2021 end-page: 758 ident: br0520 article-title: Dear: deep reinforcement learning for online advertising impression in recommender systems publication-title: Proceedings of the AAAI Conference on Artificial Intelligence – year: 2017 ident: br0430 article-title: Proximal policy optimization algorithms – volume: 18 start-page: 6070 year: 2017 end-page: 6120 ident: br0120 article-title: Risk-constrained reinforcement learning with percentile risk criteria publication-title: J. Mach. Learn. Res. – reference: Timothy Paul Lillicrap, Jonathan James Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, Daniel Pieter Wierstra, Continuous control with deep reinforcement learning, September 15 2020. US Patent 10,776,692. – volume: 61 start-page: 617 year: 2015 end-page: 632 ident: br0340 article-title: Distributed coordination control for multi-robot networks using Lyapunov-like barrier functions publication-title: IEEE Trans. Autom. Control – year: 2016 ident: br0440 article-title: Safe, multi-agent, reinforcement learning for autonomous driving – start-page: 4295 year: 2018 end-page: 4304 ident: br0380 article-title: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning publication-title: International Conference on Machine Learning – volume: 48 start-page: 68 year: 2015 end-page: 73 ident: br0070 article-title: Control barrier certificates for safe swarm behavior publication-title: IFAC-PapersOnLine – year: 2022 ident: br0470 article-title: Constrained update projection approach to safe policy optimization publication-title: Advances in Neural Information Processing Systems (NeurIPS) – volume: 6 start-page: 2090 year: 2021 end-page: 2095 ident: br0060 article-title: Online learning-based trajectory tracking for underactuated vehicles with uncertain dynamics publication-title: IEEE Control Syst. Lett. – start-page: 879 year: 2021 end-page: 884 ident: br0100 article-title: Multi-robot formation control and implementation publication-title: 2021 40th Chinese Control Conference (CCC) – volume: 16 start-page: 1437 year: 2015 end-page: 1480 ident: br0190 article-title: A comprehensive survey on safe reinforcement learning publication-title: J. Mach. Learn. Res. – volume: 11 start-page: 81 year: 2022 ident: br0200 article-title: Constrained reinforcement learning for vehicle motion planning with topological reachability analysis publication-title: Robotics – volume: 21 start-page: 977 year: 2002 end-page: 995 ident: br0180 article-title: A framework and architecture for multi-robot coordination publication-title: Int. J. Robot. Res. – start-page: 1451 year: 2012 end-page: 1458 ident: br0320 article-title: Safe exploration in Markov decision processes publication-title: Proceedings of the 29th International Conference on International Conference on Machine Learning – year: 2018 ident: br0110 article-title: Cellular network traffic scheduling with deep reinforcement learning publication-title: Thirty-Second AAAI Conference on Artificial Intelligence – volume: 31 year: 2018 ident: br0130 article-title: A Lyapunov-based approach to safe reinforcement learning publication-title: Adv. Neural Inf. Process. Syst. – volume: 21 start-page: 1086 year: 2019 end-page: 1095 ident: br0150 article-title: Multi-agent deep reinforcement learning for large-scale traffic signal control publication-title: IEEE Trans. Intell. Transp. Syst. – start-page: 157 year: 2021 end-page: 173 ident: br0280 article-title: Cmix: deep multi-agent reinforcement learning with peak and average constraints publication-title: Joint European Conference on Machine Learning and Knowledge Discovery in Databases – year: 2000 ident: br0360 article-title: Asymptopia: an exposition of statistical asymptotic theory – year: 2016 ident: br0080 article-title: Openai gym – year: 2021 ident: br0240 article-title: Trust region policy optimisation in multi-agent reinforcement learning publication-title: International Conference on Learning Representations – year: 2015 ident: br0420 article-title: High-dimensional continuous control using generalized advantage estimation – start-page: 1889 year: 2015 end-page: 1897 ident: br0410 article-title: Trust region policy optimization publication-title: International Conference on Machine Learning – volume: 4 year: 2019 ident: br0030 article-title: Effortless creation of safe robots from modules through self-programming and self-verification publication-title: Sci. Robot. – start-page: 6064 year: 2021 end-page: 6071 ident: br0300 article-title: Reinforcement learning for autonomous driving with latent state inference and spatial-temporal relationships publication-title: 2021 IEEE International Conference on Robotics and Automation (ICRA) – start-page: 3512 year: 2021 end-page: 3519 ident: br0500 article-title: Safe continuous control with constrained model-based policy optimization publication-title: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) – start-page: 3512 year: 2021 ident: 10.1016/j.artint.2023.103905_br0500 article-title: Safe continuous control with constrained model-based policy optimization – year: 2016 ident: 10.1016/j.artint.2023.103905_br0080 – ident: 10.1016/j.artint.2023.103905_br0330 – year: 2022 ident: 10.1016/j.artint.2023.103905_br0470 article-title: Constrained update projection approach to safe policy optimization – ident: 10.1016/j.artint.2023.103905_br0210 – year: 2021 ident: 10.1016/j.artint.2023.103905_br0240 article-title: Trust region policy optimisation in multi-agent reinforcement learning – ident: 10.1016/j.artint.2023.103905_br0220 – volume: 5 year: 2021 ident: 10.1016/j.artint.2023.103905_br0090 article-title: Safe learning in robotics: from learning-based control to safe reinforcement learning publication-title: Annu. Rev. Control Robotics Auton. Syst. – ident: 10.1016/j.artint.2023.103905_br0490 – volume: 34 start-page: 13458 year: 2021 ident: 10.1016/j.artint.2023.103905_br0250 article-title: Settling the variance of multi-agent policy gradients publication-title: Adv. Neural Inf. Process. Syst. – volume: 62 start-page: 3861 issue: 8 year: 2016 ident: 10.1016/j.artint.2023.103905_br0050 article-title: Control barrier function based quadratic programs for safety critical systems publication-title: IEEE Trans. Autom. Control doi: 10.1109/TAC.2016.2638961 – year: 2021 ident: 10.1016/j.artint.2023.103905_br0310 article-title: Isaac gym: high performance gpu based physics simulation for robot learning – ident: 10.1016/j.artint.2023.103905_br0420 – ident: 10.1016/j.artint.2023.103905_br0140 – ident: 10.1016/j.artint.2023.103905_br0360 – volume: vol. 36 start-page: 8823 year: 2022 ident: 10.1016/j.artint.2023.103905_br0480 article-title: Policy optimization with stochastic mirror descent – start-page: 22 year: 2017 ident: 10.1016/j.artint.2023.103905_br0020 article-title: Constrained policy optimization – volume: 4 issue: 31 year: 2019 ident: 10.1016/j.artint.2023.103905_br0030 article-title: Effortless creation of safe robots from modules through self-programming and self-verification publication-title: Sci. Robot. doi: 10.1126/scirobotics.aaw1924 – start-page: 1889 year: 2015 ident: 10.1016/j.artint.2023.103905_br0410 article-title: Trust region policy optimization – ident: 10.1016/j.artint.2023.103905_br0170 – volume: 61 start-page: 617 issue: 3 year: 2015 ident: 10.1016/j.artint.2023.103905_br0340 article-title: Distributed coordination control for multi-robot networks using Lyapunov-like barrier functions publication-title: IEEE Trans. Autom. Control doi: 10.1109/TAC.2015.2444131 – year: 2020 ident: 10.1016/j.artint.2023.103905_br0370 article-title: Learning safe multi-agent control with decentralized neural barrier certificates – volume: vol. 35 start-page: 750 year: 2021 ident: 10.1016/j.artint.2023.103905_br0520 article-title: Dear: deep reinforcement learning for online advertising impression in recommender systems – ident: 10.1016/j.artint.2023.103905_br0270 – start-page: 75 year: 2010 ident: 10.1016/j.artint.2023.103905_br0010 article-title: Optimizing debt collections using constrained reinforcement learning – volume: 529 start-page: 484 issue: 7587 year: 2016 ident: 10.1016/j.artint.2023.103905_br0450 article-title: Mastering the game of go with deep neural networks and tree search publication-title: Nature doi: 10.1038/nature16961 – volume: 31 year: 2018 ident: 10.1016/j.artint.2023.103905_br0130 article-title: A Lyapunov-based approach to safe reinforcement learning publication-title: Adv. Neural Inf. Process. Syst. – year: 1999 ident: 10.1016/j.artint.2023.103905_br0040 – volume: 21 start-page: 977 issue: 10–11 year: 2002 ident: 10.1016/j.artint.2023.103905_br0180 article-title: A framework and architecture for multi-robot coordination publication-title: Int. J. Robot. Res. doi: 10.1177/0278364902021010981 – volume: 16 start-page: 1437 issue: 1 year: 2015 ident: 10.1016/j.artint.2023.103905_br0190 article-title: A comprehensive survey on safe reinforcement learning publication-title: J. Mach. Learn. Res. – ident: 10.1016/j.artint.2023.103905_br0510 – start-page: 1451 year: 2012 ident: 10.1016/j.artint.2023.103905_br0320 article-title: Safe exploration in Markov decision processes – ident: 10.1016/j.artint.2023.103905_br0440 – volume: 43 start-page: 253 issue: 3 year: 2008 ident: 10.1016/j.artint.2023.103905_br0160 article-title: An atlas of physical human–robot interaction publication-title: Mech. Mach. Theory doi: 10.1016/j.mechmachtheory.2007.03.003 – volume: 235 start-page: 3 year: 2013 ident: 10.1016/j.artint.2023.103905_br0230 article-title: From model-based control to data-driven control: survey, classification and perspective publication-title: Inf. Sci. doi: 10.1016/j.ins.2012.07.014 – ident: 10.1016/j.artint.2023.103905_br0530 – volume: 48 start-page: 68 issue: 27 year: 2015 ident: 10.1016/j.artint.2023.103905_br0070 article-title: Control barrier certificates for safe swarm behavior publication-title: IFAC-PapersOnLine doi: 10.1016/j.ifacol.2015.11.154 – volume: 21 start-page: 1086 issue: 3 year: 2019 ident: 10.1016/j.artint.2023.103905_br0150 article-title: Multi-agent deep reinforcement learning for large-scale traffic signal control publication-title: IEEE Trans. Intell. Transp. Syst. doi: 10.1109/TITS.2019.2901791 – volume: 6 start-page: 2090 year: 2021 ident: 10.1016/j.artint.2023.103905_br0060 article-title: Online learning-based trajectory tracking for underactuated vehicles with uncertain dynamics publication-title: IEEE Control Syst. Lett. doi: 10.1109/LCSYS.2021.3138546 – year: 2020 ident: 10.1016/j.artint.2023.103905_br0260 – volume: vol. 35 start-page: 8767 year: 2021 ident: 10.1016/j.artint.2023.103905_br0290 article-title: Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning – volume: 11 start-page: 81 issue: 4 year: 2022 ident: 10.1016/j.artint.2023.103905_br0200 article-title: Constrained reinforcement learning for vehicle motion planning with topological reachability analysis publication-title: Robotics doi: 10.3390/robotics11040081 – start-page: 6064 year: 2021 ident: 10.1016/j.artint.2023.103905_br0300 article-title: Reinforcement learning for autonomous driving with latent state inference and spatial-temporal relationships – start-page: 157 year: 2021 ident: 10.1016/j.artint.2023.103905_br0280 article-title: Cmix: deep multi-agent reinforcement learning with peak and average constraints – year: 2018 ident: 10.1016/j.artint.2023.103905_br0460 – volume: 18 start-page: 6070 issue: 1 year: 2017 ident: 10.1016/j.artint.2023.103905_br0120 article-title: Risk-constrained reinforcement learning with percentile risk criteria publication-title: J. Mach. Learn. Res. – ident: 10.1016/j.artint.2023.103905_br0390 – start-page: 2186 year: 2019 ident: 10.1016/j.artint.2023.103905_br0400 article-title: The starcraft multi-agent challenge – ident: 10.1016/j.artint.2023.103905_br0430 – volume: 34 start-page: 12208 year: 2021 ident: 10.1016/j.artint.2023.103905_br0350 article-title: Factored multi-agent centralised policy gradients publication-title: Adv. Neural Inf. Process. Syst. – year: 2018 ident: 10.1016/j.artint.2023.103905_br0110 article-title: Cellular network traffic scheduling with deep reinforcement learning – start-page: 879 year: 2021 ident: 10.1016/j.artint.2023.103905_br0100 article-title: Multi-robot formation control and implementation – start-page: 4295 year: 2018 ident: 10.1016/j.artint.2023.103905_br0380 article-title: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning |
| SSID | ssj0003991 |
| Score | 2.6803298 |
| Snippet | A challenging problem in robotics is how to control multiple robots cooperatively and safely in real-world applications. Yet, developing multi-robot control... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 103905 |
| SubjectTerms | Constrained Markov game Constrained policy optimisation Safe multi-agent benchmarks Safe multi-robot control |
| Title | Safe multi-agent reinforcement learning for multi-robot control |
| URI | https://dx.doi.org/10.1016/j.artint.2023.103905 |
| Volume | 319 |
| WOSCitedRecordID | wos000966125700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: ScienceDirect Freedom Collection 2021 customDbUrl: eissn: 1872-7921 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0003991 issn: 0004-3702 databaseCode: AIEXJ dateStart: 20211212 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwELa2Sw-9UPoStAX5wK0ySuw87OOqompBQpWg0nJK48SWoCi7ChuE-us7fiXZgmg5cMnujhwnynw7npl8nkFoP0vKmkuhSJLpjCQii4ngUUliU9Ua9E9zqW2zifzkhM_n4vtk8jPshbm5ypuG396K5ZOqGmSgbLN19hHq7icFAXwHpcMR1A7H_1L8aamV4wmS0uyb-tQqWx21sonA0CbC0SfdsHYhF6tAWh97q7PWMolsX49R6c6es9PZ3KlJOfcroKXydPVv0z3guJOeifurkwOPwBm6865slupicKStEKKCcR6CsoEv1dvWBMxVtGZbmbeHzjqa1852k_Vdw-1yCJcHtnaC4bhSdjAMX6-T_df61bMKA2HtsnCzFGaWws3yDG3QPBV8ijZm3w7nR_1qDQ6a76ro7j5sr7QcwLt3c7_7MnJJzrbQpo8l8Mxh4BWaqOY1ehn6dGBvtt9AQAaQwCNI4DVI4AAJDCI8ggT2kHiLfnw5PPv8lfjGGaQCf3pFuOA8kfAP1DSVUV0xznQeKZ1XPIUfkTbvoqssNUL4oCwRcR1JqiQVVMuYvUPTZtGobYTBXy1hISor06Mp1RCemwJpEJbFdc1Unu0gFp5HUfmq8qa5yVXxkDZ2EOnPWrqqKv8Yn4dHXXjP0Hl8BeDnwTPfP_JKH9CLAdwf0XTVdmoXPa9uVhfX7Z4Hzx9fmYWe |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Safe+multi-agent+reinforcement+learning+for+multi-robot+control&rft.jtitle=Artificial+intelligence&rft.au=Gu%2C+Shangding&rft.au=Grudzien+Kuba%2C+Jakub&rft.au=Chen%2C+Yuanpei&rft.au=Du%2C+Yali&rft.date=2023-06-01&rft.issn=0004-3702&rft.volume=319&rft.spage=103905&rft_id=info:doi/10.1016%2Fj.artint.2023.103905&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_artint_2023_103905 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0004-3702&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0004-3702&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0004-3702&client=summon |