Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward

In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterf...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of ... International Joint Conference on Neural Networks S. 1 - 8
Hauptverfasser: Shao, Kun, Zhu, Yuanheng, Tang, Zhentao, Zhao, Dongbin
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.07.2020
Schlagworte:
ISSN:2161-4407
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterfactual reward mechanism, which is termed as CoRe algorithm. CoRe computes the global reward difference in condition that the agent does not take its actual action but takes other actions, while other agents fix their actual actions. This approach can determine each agent's contribution for the global reward. We evaluate CoRe in a simplified Pig Chase game with a decentralised Deep Q Network (DQN) framework. The proposed method helps agents learn end-to-end collaborative behaviors. Compared with other DQN variants with global reward, CoRe significantly improves learning efficiency and achieves better results. In addition, CoRe shows excellent performances in various size game environments.
AbstractList In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterfactual reward mechanism, which is termed as CoRe algorithm. CoRe computes the global reward difference in condition that the agent does not take its actual action but takes other actions, while other agents fix their actual actions. This approach can determine each agent's contribution for the global reward. We evaluate CoRe in a simplified Pig Chase game with a decentralised Deep Q Network (DQN) framework. The proposed method helps agents learn end-to-end collaborative behaviors. Compared with other DQN variants with global reward, CoRe significantly improves learning efficiency and achieves better results. In addition, CoRe shows excellent performances in various size game environments.
Author Shao, Kun
Tang, Zhentao
Zhao, Dongbin
Zhu, Yuanheng
Author_xml – sequence: 1
  givenname: Kun
  surname: Shao
  fullname: Shao, Kun
  organization: Chinese Academy of Sciences,State Key Laboratory of Management and Control for Complex Systems, Institute of Automation,Beijing,China
– sequence: 2
  givenname: Yuanheng
  surname: Zhu
  fullname: Zhu, Yuanheng
  organization: Chinese Academy of Sciences,State Key Laboratory of Management and Control for Complex Systems, Institute of Automation,Beijing,China
– sequence: 3
  givenname: Zhentao
  surname: Tang
  fullname: Tang, Zhentao
  organization: Chinese Academy of Sciences,State Key Laboratory of Management and Control for Complex Systems, Institute of Automation,Beijing,China
– sequence: 4
  givenname: Dongbin
  surname: Zhao
  fullname: Zhao, Dongbin
  organization: Chinese Academy of Sciences,State Key Laboratory of Management and Control for Complex Systems, Institute of Automation,Beijing,China
BookMark eNotj11LwzAYhaMouE5_gRf2D3TmO83lqF-TWkF2P9L0zYx0aclSh__eibs6h4fDAydDF2EIgNAdwQtCsL5fvVZNw0uJxYJiiheaYkWkPkMZUbQ8NirVOZpRIknBOVZXKNvvvzCmTGs2Q001DCNEk_w35G9Tn3yx3EJI-QPAmH-AD26IFnZ_qAYTgw_b_ODTZ14NU0gQnbFpMv1xejCxu0aXzvR7uDnlHK2fHtfVS1G_P6-qZV14ilkqSiGJEo44QajgqrOGcw1CGq47zR2zorRKWydb3jJjaWtbAYK0sislKy2bo9t_rQeAzRj9zsSfzek6-wVJqFE3
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/IJCNN48605.2020.9207169
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1728169267
9781728169262
EISSN 2161-4407
EndPage 8
ExternalDocumentID 9207169
Genre orig-research
GroupedDBID 29I
29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i203t-856175f1f512547dca449e56a49d94f3c58c79cf6b4b3ac2bcb5e51b6d8638c3
IEDL.DBID RIE
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000626021404075&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:31:19 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-856175f1f512547dca449e56a49d94f3c58c79cf6b4b3ac2bcb5e51b6d8638c3
PageCount 8
ParticipantIDs ieee_primary_9207169
PublicationCentury 2000
PublicationDate 2020-July
PublicationDateYYYYMMDD 2020-07-01
PublicationDate_xml – month: 07
  year: 2020
  text: 2020-July
PublicationDecade 2020
PublicationTitle Proceedings of ... International Joint Conference on Neural Networks
PublicationTitleAbbrev IJCNN
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0023993
Score 1.786151
Snippet In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Collaboration
cooperative games
counterfactual reward
deep reinforcement learning
Games
Learning (artificial intelligence)
Machine learning
Multi-agent systems
reinforcement learning
Task analysis
Training
Title Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward
URI https://ieeexplore.ieee.org/document/9207169
WOSCitedRecordID wos000626021404075&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB7a4sGTj1Z8k4NH0243j02OUi0qshQp0lvZZCfSS7fU1t9vkl0rghdvISQEZgiTmcz3fQA3litmUGnqnGOUI9NUFyKlXJauMEIVLMLH3l6yPFezmZ604HaHhUHE2HyG_TCMf_llZbehVDbQaRLIXdrQzjJZY7V2yVUItE3_1jDRg6fnUZ4HgSXhc8A06Tdbf2moxBAyPvjf4YfQ-8HikckuyhxBC5fHcPAtxkCau9mFfFRVK6x5vEmE1dK7AJsi94gr8oqRIdXGYiBpSFXfSajCkgBLD2LVRcSS-KWhkbYH0_HDdPRIG7EEukgTtqHKP4Qy4YbOR3DBs9IWnGsUsuC61NwxK5TNtHXScMMKmxprBIqhkaXyV9CyE-gsqyWeApHCKP8wsw6DRriwWpVcspJ7R6rMcnkG3WCd-aqmw5g3hjn_e_oC9oMD6g7XS-hs1lu8gj37uVl8rK-jD78AQ76eGw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0gmugJFYzf9uDRwrJtd9ujQQkoboghhhvZdqeGyy5B8Pe7LSvGxIu3pmnTZCbNdKbz3gO4NVwyjVJRay2jHJmiKhUh5VFmUy1kyjx87G0UJ4mcTtW4BndbLAwi-uYzbLuh_8vPCrN2pbKOCgNH7rIDu045q0JrbdMrF2qrDq5uoDrDp16SOIklUWaBYdCuNv9SUfFBpN_43_GH0PpB45HxNs4cQQ3zY2h8yzGQ6nY2IekVxQI3TN7EA2vpvQNOkQfEBXlFz5FqfDmQVLSq78TVYYkDpju56tSjScqlrpW2BZP-46Q3oJVcAp2HAVtRWT6FYmG7tozhgseZSTlXKKKUq0xxy4yQJlbGRpprlppQGy1QdHWUyfISGnYC9bzI8RRIJLQsn2bGolMJF0bJjEcs46UrZWx4dAZNZ53ZYkOIMasMc_739A3sDyYvo9lomDxfwIFzxqbf9RLqq-Uar2DPfK7mH8tr788vsDGhZA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+...+International+Joint+Conference+on+Neural+Networks&rft.atitle=Cooperative+Multi-Agent+Deep+Reinforcement+Learning+with+Counterfactual+Reward&rft.au=Shao%2C+Kun&rft.au=Zhu%2C+Yuanheng&rft.au=Tang%2C+Zhentao&rft.au=Zhao%2C+Dongbin&rft.date=2020-07-01&rft.pub=IEEE&rft.eissn=2161-4407&rft.spage=1&rft.epage=8&rft_id=info:doi/10.1109%2FIJCNN48605.2020.9207169&rft.externalDocID=9207169