Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward
In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterf...
Uložené v:
| Vydané v: | Proceedings of ... International Joint Conference on Neural Networks s. 1 - 8 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.07.2020
|
| Predmet: | |
| ISSN: | 2161-4407 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterfactual reward mechanism, which is termed as CoRe algorithm. CoRe computes the global reward difference in condition that the agent does not take its actual action but takes other actions, while other agents fix their actual actions. This approach can determine each agent's contribution for the global reward. We evaluate CoRe in a simplified Pig Chase game with a decentralised Deep Q Network (DQN) framework. The proposed method helps agents learn end-to-end collaborative behaviors. Compared with other DQN variants with global reward, CoRe significantly improves learning efficiency and achieves better results. In addition, CoRe shows excellent performances in various size game environments. |
|---|---|
| AbstractList | In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterfactual reward mechanism, which is termed as CoRe algorithm. CoRe computes the global reward difference in condition that the agent does not take its actual action but takes other actions, while other agents fix their actual actions. This approach can determine each agent's contribution for the global reward. We evaluate CoRe in a simplified Pig Chase game with a decentralised Deep Q Network (DQN) framework. The proposed method helps agents learn end-to-end collaborative behaviors. Compared with other DQN variants with global reward, CoRe significantly improves learning efficiency and achieves better results. In addition, CoRe shows excellent performances in various size game environments. |
| Author | Shao, Kun Tang, Zhentao Zhao, Dongbin Zhu, Yuanheng |
| Author_xml | – sequence: 1 givenname: Kun surname: Shao fullname: Shao, Kun organization: Chinese Academy of Sciences,State Key Laboratory of Management and Control for Complex Systems, Institute of Automation,Beijing,China – sequence: 2 givenname: Yuanheng surname: Zhu fullname: Zhu, Yuanheng organization: Chinese Academy of Sciences,State Key Laboratory of Management and Control for Complex Systems, Institute of Automation,Beijing,China – sequence: 3 givenname: Zhentao surname: Tang fullname: Tang, Zhentao organization: Chinese Academy of Sciences,State Key Laboratory of Management and Control for Complex Systems, Institute of Automation,Beijing,China – sequence: 4 givenname: Dongbin surname: Zhao fullname: Zhao, Dongbin organization: Chinese Academy of Sciences,State Key Laboratory of Management and Control for Complex Systems, Institute of Automation,Beijing,China |
| BookMark | eNotj11LwzAYhaMouE5_gRf2D3TmO83lqF-TWkF2P9L0zYx0aclSh__eibs6h4fDAydDF2EIgNAdwQtCsL5fvVZNw0uJxYJiiheaYkWkPkMZUbQ8NirVOZpRIknBOVZXKNvvvzCmTGs2Q001DCNEk_w35G9Tn3yx3EJI-QPAmH-AD26IFnZ_qAYTgw_b_ODTZ14NU0gQnbFpMv1xejCxu0aXzvR7uDnlHK2fHtfVS1G_P6-qZV14ilkqSiGJEo44QajgqrOGcw1CGq47zR2zorRKWydb3jJjaWtbAYK0sislKy2bo9t_rQeAzRj9zsSfzek6-wVJqFE3 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/IJCNN48605.2020.9207169 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Digital Library IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore Digital Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1728169267 9781728169262 |
| EISSN | 2161-4407 |
| EndPage | 8 |
| ExternalDocumentID | 9207169 |
| Genre | orig-research |
| GroupedDBID | 29I 29O 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS |
| ID | FETCH-LOGICAL-i203t-856175f1f512547dca449e56a49d94f3c58c79cf6b4b3ac2bcb5e51b6d8638c3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000626021404075&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:31:19 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-856175f1f512547dca449e56a49d94f3c58c79cf6b4b3ac2bcb5e51b6d8638c3 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_9207169 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-July |
| PublicationDateYYYYMMDD | 2020-07-01 |
| PublicationDate_xml | – month: 07 year: 2020 text: 2020-July |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings of ... International Joint Conference on Neural Networks |
| PublicationTitleAbbrev | IJCNN |
| PublicationYear | 2020 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0023993 |
| Score | 1.7862514 |
| Snippet | In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Collaboration cooperative games counterfactual reward deep reinforcement learning Games Learning (artificial intelligence) Machine learning Multi-agent systems reinforcement learning Task analysis Training |
| Title | Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward |
| URI | https://ieeexplore.ieee.org/document/9207169 |
| WOSCitedRecordID | wos000626021404075&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB7a4sGTj1Z8k4NHt93u5rE5SrWoyFKkSG8lmcxKL91SW3-_m-xaEbx4CyEhkDBM5vF9H8CNkcJKbitDclZH3PEk0im6yLrYOSSTYcjpvr2oPM9mMz1pwe0OC0NEofmM-n4YavmuxK1PlQ10Entylza0lZI1VmsXXHlH2_RvDWM9eHoe5bkXWBJVDJjE_WbrLw2V4ELGB_87_BB6P1g8Ntl5mSNo0fIYDr7FGFhjm13IR2W5oprHmwVYbXTnYVPsnmjFXikwpGJIBrKGVPWd-Sws87B0L1ZtApakWuobaXswHT9MR49RI5YQLZI43URZ9RFSohgWlQcXXDk0nGsS0nDtNC9SFBkqjYW03KYGE4tWkBha6bLKBDE9gc6yXNIpMDt0ThFJi4XxZULDY66kwFQlWAXT6Rl0_e3MVzUdxry5mPO_py9g3z9A3eF6CZ3NektXsIefm8XH-jq84ReH4J8X |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4gmugJFYxve_Dowj7a3e3RoAQUN8QQw42001nDhSUI_n63ZcWYePHWNG2atJlM5_F9H8CtioWOuS4NyWjpccNDT0ZoPG18Y5BUii6n-zZMsiydTOSoBndbLAwRueYzatuhq-WbAtc2VdaRoW_JXXZg1ypnVWitbXhlXW3VwRX4sjN46maZlVgSZRQY-u1q8y8VFedEeo3_HX8IrR80Hhtt_cwR1Gh-DI1vOQZWWWcTsm5RLGjD5M0csNa7t8Ap9kC0YK_kOFLRpQNZRav6zmwelllgupWrVg5NUi61rbQtGPcex92-V8kleLPQj1ZeWn6FEpEHeenDBU8MKs4liVhxaSTPIxQpJhLzWHMdKQw1akEi0LFJSyPE6ATq82JOp8B0YExCFGvMlS0UKu7zJBYYJSGW4XR0Bk17O9PFhhBjWl3M-d_TN7DfH78Mp8NB9nwBB_YxNv2ul1BfLdd0BXv4uZp9LK_de34BATaiYA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+...+International+Joint+Conference+on+Neural+Networks&rft.atitle=Cooperative+Multi-Agent+Deep+Reinforcement+Learning+with+Counterfactual+Reward&rft.au=Shao%2C+Kun&rft.au=Zhu%2C+Yuanheng&rft.au=Tang%2C+Zhentao&rft.au=Zhao%2C+Dongbin&rft.date=2020-07-01&rft.pub=IEEE&rft.eissn=2161-4407&rft.spage=1&rft.epage=8&rft_id=info:doi/10.1109%2FIJCNN48605.2020.9207169&rft.externalDocID=9207169 |