Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward

In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterf...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings of ... International Joint Conference on Neural Networks s. 1 - 8
Hlavní autori: Shao, Kun, Zhu, Yuanheng, Tang, Zhentao, Zhao, Dongbin
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.07.2020
Predmet:
ISSN:2161-4407
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterfactual reward mechanism, which is termed as CoRe algorithm. CoRe computes the global reward difference in condition that the agent does not take its actual action but takes other actions, while other agents fix their actual actions. This approach can determine each agent's contribution for the global reward. We evaluate CoRe in a simplified Pig Chase game with a decentralised Deep Q Network (DQN) framework. The proposed method helps agents learn end-to-end collaborative behaviors. Compared with other DQN variants with global reward, CoRe significantly improves learning efficiency and achieves better results. In addition, CoRe shows excellent performances in various size game environments.
ISSN:2161-4407
DOI:10.1109/IJCNN48605.2020.9207169