Hindsight-aware deep reinforcement learning algorithm for multi-agent systems

Classic reinforcement learning algorithms generate experiences by the agent's constant trial and error, which leads to a large number of failure experiences stored in the replay buffer. As a result, the agents can only learn through these low-quality experiences. In the case of multi-agent syst...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	International journal of machine learning and cybernetics Ročník 13; číslo 7; s. 2045 - 2057
Hlavní autoři:	Li, Chengjing, Wang, Li, Huang, Zirong
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Berlin/Heidelberg Springer Berlin Heidelberg 01.07.2022 Springer Nature B.V
Témata:	Algorithms Artificial Intelligence Buffers Communication Complex Systems Computational Intelligence Control Curricula Deep learning Efficiency Engineering Failure Machine learning Mechatronics Multiagent systems Original Article Pattern Recognition Robotics Success Systems Biology Teaching methods Multi-agent system Hindsight Experience replay Artificial intelligence Machine learning Reinforcement learning
ISSN:	1868-8071, 1868-808X
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Classic reinforcement learning algorithms generate experiences by the agent's constant trial and error, which leads to a large number of failure experiences stored in the replay buffer. As a result, the agents can only learn through these low-quality experiences. In the case of multi-agent systems, this problem is more serious. MADDPG (Multi-Agent Deep Deterministic Policy Gradient) has achieved significant results in solving multi-agent problems by using a framework of centralized training with decentralized execution. Nevertheless, the problem of too many failure experiences in the replay buffer has not been resolved. In this paper, we propose HMADDPG (Hindsight Multi-Agent Deep Deterministic Policy Gradient) to mitigate the negative impact of failure experience. HMADDPG has a hindsight unit, which allows the agents to reflect and produces pseudo experiences that tend to succeed. Pseudo experiences are stored in the replay buffer, so that the agents can combine two kinds of experiences to learn. We have evaluated our algorithm on a number of environments. The results show that the algorithm can guide agents to learn better strategies and can be applied in multi-agent systems which are cooperative, competitive, or mixed cooperative and competitive.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1868-8071 1868-808X
DOI:	10.1007/s13042-022-01505-x