Distributed Actor-Critic Algorithms for Multiagent Reinforcement Learning Over Directed Graphs

Actor-critic (AC) cooperative multiagent reinforcement learning (MARL) over directed graphs is studied in this article. The goal of the agents in MARL is to maximize the globally averaged return in a distributed way, i.e., each agent can only exchange information with its neighboring agents. AC meth...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems Vol. 34; no. 10; pp. 7210 - 7221
Main Authors:	Dai, Pengcheng, Yu, Wenwu, Wang, He, Baldi, Simone
Format:	Journal Article
Language:	English
Published:	United States IEEE 01.10.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Approximation algorithms Convergence Directed graph Directed graphs distributed actor–critic (AC) algorithm Function approximation Graph theory Graphs Linear functions Marl Matrices (mathematics) multiagent reinforcement learning (MARL) Multiagent systems Protocols push-sum protocol Q-learning Reinforcement Stochasticity Topology Weight
ISSN:	2162-237X, 2162-2388, 2162-2388
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Actor-critic (AC) cooperative multiagent reinforcement learning (MARL) over directed graphs is studied in this article. The goal of the agents in MARL is to maximize the globally averaged return in a distributed way, i.e., each agent can only exchange information with its neighboring agents. AC methods proposed in the literature require the communication graphs to be undirected and the weight matrices to be doubly stochastic (more precisely, the weight matrices are row stochastic and their expectation are column stochastic). Differently from these methods, we propose a distributed AC algorithm for MARL over directed graph with fixed topology that only requires the weight matrix to be row stochastic. Then, we also study the MARL over directed graphs (possibly not connected) with changing topologies, proposing a different distributed AC algorithm based on the push-sum protocol that only requires the weight matrices to be column stochastic. Convergence of the proposed algorithms is proven for linear function approximation of the action value function. Simulations are presented to demonstrate the effectiveness of the proposed algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2021.3139138