Hindsight-aware deep reinforcement learning algorithm for multi-agent systems
Classic reinforcement learning algorithms generate experiences by the agent's constant trial and error, which leads to a large number of failure experiences stored in the replay buffer. As a result, the agents can only learn through these low-quality experiences. In the case of multi-agent syst...
Uloženo v:
| Vydáno v: | International journal of machine learning and cybernetics Ročník 13; číslo 7; s. 2045 - 2057 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.07.2022
Springer Nature B.V |
| Témata: | |
| ISSN: | 1868-8071, 1868-808X |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Classic reinforcement learning algorithms generate experiences by the agent's constant trial and error, which leads to a large number of failure experiences stored in the replay buffer. As a result, the agents can only learn through these low-quality experiences. In the case of multi-agent systems, this problem is more serious. MADDPG (Multi-Agent Deep Deterministic Policy Gradient) has achieved significant results in solving multi-agent problems by using a framework of centralized training with decentralized execution. Nevertheless, the problem of too many failure experiences in the replay buffer has not been resolved. In this paper, we propose HMADDPG (Hindsight Multi-Agent Deep Deterministic Policy Gradient) to mitigate the negative impact of failure experience. HMADDPG has a hindsight unit, which allows the agents to reflect and produces pseudo experiences that tend to succeed. Pseudo experiences are stored in the replay buffer, so that the agents can combine two kinds of experiences to learn. We have evaluated our algorithm on a number of environments. The results show that the algorithm can guide agents to learn better strategies and can be applied in multi-agent systems which are cooperative, competitive, or mixed cooperative and competitive. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1868-8071 1868-808X |
| DOI: | 10.1007/s13042-022-01505-x |