Graph Soft Actor-Critic Reinforcement Learning for Large-Scale Distributed Multirobot Coordination

Learning distributed cooperative policies for large-scale multirobot systems remains a challenging task in the multiagent reinforcement learning (MARL) context. In this work, we model the interactions among the robots as a graph and propose a novel off-policy actor-critic MARL algorithm to train dis...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE transaction on neural networks and learning systems Ročník 36; číslo 1; s. 665 - 676
Hlavní autori:	Hu, Yifan, Fu, Junjie, Wen, Guanghui
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	United States IEEE 01.01.2025
Predmet:	Distributed coordination graph neural network (GNN) Multi-robot systems multiagent reinforcement learning (MARL) multirobot system Protocols Reinforcement learning Robot kinematics Scalability Simulation soft \text{actor(!{-}!)critic} (SAC) algorithm Task analysis
ISSN:	2162-237X, 2162-2388, 2162-2388
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Learning distributed cooperative policies for large-scale multirobot systems remains a challenging task in the multiagent reinforcement learning (MARL) context. In this work, we model the interactions among the robots as a graph and propose a novel off-policy actor-critic MARL algorithm to train distributed coordination policies on the graph by leveraging the ability of information extraction of graph neural networks (GNNs). First, a new type of Gaussian policy parameterized by the GNNs is designed for distributed decision-making in continuous action spaces. Second, a scalable centralized value function network is designed based on a novel GNN-based value function decomposition technique. Then, based on the designed actor and the critic networks, a GNN-based MARL algorithm named graph soft actor-critic (G-SAC) is proposed and utilized to train the distributed policies in an effective and centralized fashion. Finally, two custom multirobot coordination environments are built, under which the simulation results are performed to empirically demonstrate both the sample efficiency and the scalability of G-SAC as well as the strong zero-shot generalization ability of the trained policy in large-scale multirobot coordination problems.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2023.3329530