Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward

In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterf...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of ... International Joint Conference on Neural Networks s. 1 - 8
Hlavní autoři: Shao, Kun, Zhu, Yuanheng, Tang, Zhentao, Zhao, Dongbin
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.07.2020
Témata:
ISSN:2161-4407
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterfactual reward mechanism, which is termed as CoRe algorithm. CoRe computes the global reward difference in condition that the agent does not take its actual action but takes other actions, while other agents fix their actual actions. This approach can determine each agent's contribution for the global reward. We evaluate CoRe in a simplified Pig Chase game with a decentralised Deep Q Network (DQN) framework. The proposed method helps agents learn end-to-end collaborative behaviors. Compared with other DQN variants with global reward, CoRe significantly improves learning efficiency and achieves better results. In addition, CoRe shows excellent performances in various size game environments.
ISSN:2161-4407
DOI:10.1109/IJCNN48605.2020.9207169