Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward

In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterf...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of ... International Joint Conference on Neural Networks pp. 1 - 8
Main Authors:	Shao, Kun, Zhu, Yuanheng, Tang, Zhentao, Zhao, Dongbin
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.07.2020
Subjects:	Collaboration cooperative games counterfactual reward deep reinforcement learning Games Learning (artificial intelligence) Machine learning Multi-agent systems reinforcement learning Task analysis Training
ISSN:	2161-4407
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In partially observable fully cooperative games, agents generally tend to maximize global rewards with joint actions, so it is difficult for each agent to deduce their own contribution. To address this credit assignment problem, we propose a multi-agent reinforcement learning algorithm with counterfactual reward mechanism, which is termed as CoRe algorithm. CoRe computes the global reward difference in condition that the agent does not take its actual action but takes other actions, while other agents fix their actual actions. This approach can determine each agent's contribution for the global reward. We evaluate CoRe in a simplified Pig Chase game with a decentralised Deep Q Network (DQN) framework. The proposed method helps agents learn end-to-end collaborative behaviors. Compared with other DQN variants with global reward, CoRe significantly improves learning efficiency and achieves better results. In addition, CoRe shows excellent performances in various size game environments.
ISSN:	2161-4407
DOI:	10.1109/IJCNN48605.2020.9207169