FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks

In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent syste...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on cybernetics Ročník 47; číslo 6; s. 1367 - 1379
Hlavní autoři:	Zhang, Zhen, Zhao, Dongbin, Gao, Junwei, Wang, Dongqing, Dai, Yujie
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	United States IEEE 01.06.2017 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Algorithm design and analysis Algorithms Distributed sensor systems Economic models Game theory Games Heuristic algorithms Learning (artificial intelligence) Machine learning Multiagent reinforcement learning (MARL) Multiagent systems Nash equilibrium Optimization Performance indices Q-learning repeated game Stability criteria Stochastic processes
ISSN:	2168-2267, 2168-2275, 2168-2275
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2168-2267 2168-2275 2168-2275
DOI:	10.1109/TCYB.2016.2544866