Hindsight-aware deep reinforcement learning algorithm for multi-agent systems

Classic reinforcement learning algorithms generate experiences by the agent's constant trial and error, which leads to a large number of failure experiences stored in the replay buffer. As a result, the agents can only learn through these low-quality experiences. In the case of multi-agent syst...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of machine learning and cybernetics Jg. 13; H. 7; S. 2045 - 2057
Hauptverfasser: Li, Chengjing, Wang, Li, Huang, Zirong
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Berlin/Heidelberg Springer Berlin Heidelberg 01.07.2022
Springer Nature B.V
Schlagworte:
ISSN:1868-8071, 1868-808X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Classic reinforcement learning algorithms generate experiences by the agent's constant trial and error, which leads to a large number of failure experiences stored in the replay buffer. As a result, the agents can only learn through these low-quality experiences. In the case of multi-agent systems, this problem is more serious. MADDPG (Multi-Agent Deep Deterministic Policy Gradient) has achieved significant results in solving multi-agent problems by using a framework of centralized training with decentralized execution. Nevertheless, the problem of too many failure experiences in the replay buffer has not been resolved. In this paper, we propose HMADDPG (Hindsight Multi-Agent Deep Deterministic Policy Gradient) to mitigate the negative impact of failure experience. HMADDPG has a hindsight unit, which allows the agents to reflect and produces pseudo experiences that tend to succeed. Pseudo experiences are stored in the replay buffer, so that the agents can combine two kinds of experiences to learn. We have evaluated our algorithm on a number of environments. The results show that the algorithm can guide agents to learn better strategies and can be applied in multi-agent systems which are cooperative, competitive, or mixed cooperative and competitive.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1868-8071
1868-808X
DOI:10.1007/s13042-022-01505-x