Structural relational inference actor-critic for multi-agent reinforcement learning

•A novel MARL algorithm involving the interaction relationship between agents.•This algorithm is based on centralized training and decentralized execution.•We apply the variational autoencoder model to obtain the latent relations.•We employ graph attention network to gather the information of neighb...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neurocomputing (Amsterdam) Jg. 459; S. 383 - 394
Hauptverfasser: Zhang, Xianjie, Liu, Yu, Xu, Xiujuan, Huang, Qiong, Mao, Hangyu, Carie, Anil
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 12.10.2021
Schlagworte:
ISSN:0925-2312, 1872-8286
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A novel MARL algorithm involving the interaction relationship between agents.•This algorithm is based on centralized training and decentralized execution.•We apply the variational autoencoder model to obtain the latent relations.•We employ graph attention network to gather the information of neighbor agents. Multi-agent reinforcement learning (MARL) is essential for a wide range of high-dimensional scenarios and complicated tasks with multiple agents. Many attempts have been made for agents with prior domain knowledge and predefined structure. However, the interaction relationship between agents in a multi-agent system (MAS) in general is usually unknown, and previous methods could not tackle dynamical activities in an ever-changing environment. Here we propose a multi-agent Actor-Critic algorithm called Structural Relational Inference Actor-Critic (SRI-AC), which is based on the framework of centralized training and decentralized execution. SRI-AC utilizes the latent codes in variational autoencoder (VAE) to represent interactions between paired agents, and the reconstruction error is based on Graph Neural Network (GNN). With this framework, we test whether the reinforcement learning learners could form an interpretable structure while achieving better performance in both cooperative and competitive scenarios. The results indicate that SRI-AC could be applied to complex dynamic environments to find an interpretable structure while obtaining better performance compared to baseline algorithms.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2021.07.014