Push-Sum Distributed Online Optimization With Bandit Feedback

In this article, we concentrate on distributed online convex optimization problems over multiagent systems, where the communication between nodes is represented by a class of directed graphs that are time varying and uniformly strongly connected. This problem is in bandit feedback, in the sense that...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on cybernetics Jg. 52; H. 4; S. 2263 - 2273
Hauptverfasser:	Wang, Cong, Xu, Shengyuan, Yuan, Deming, Zhang, Baoyong, Zhang, Zhengqiang
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	United States IEEE 01.04.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Algorithms Computational geometry Convergence Convex analysis Convex functions Convexity Cost function Distributed online optimization Feedback Graph theory Graphical representations Heuristic algorithms Iterative methods Multi-agent systems Multiagent systems Nodes one-point bandit feedback Optimization push-sum algorithm
ISSN:	2168-2267, 2168-2275, 2168-2275
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this article, we concentrate on distributed online convex optimization problems over multiagent systems, where the communication between nodes is represented by a class of directed graphs that are time varying and uniformly strongly connected. This problem is in bandit feedback, in the sense that at each time only the cost function value at the committed point is revealed to each node. Then, nodes update their decisions by exchanging information with their neighbors only. To deal with Lipschitz continuous and strongly convex cost functions, a distributed online convex optimization algorithm that achieves sublinear individual regret for every node is developed. The algorithm is built on the algorithm called the push-sum scheme that releases the request of doubly stochastic weight matrices, and the one-point gradient estimator that requires the function value at only one point at every iteration, instead of the gradient information of loss function. The expected regret of our proposed algorithm scales as <inline-formula> <tex-math notation="LaTeX">\mathcal {O} (T^{2/3} \ln ^{2/3}(T)) </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula> is the number of iterations. To validate the performance of the algorithm developed in this article, we give a simulation of a common numerical example.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2168-2267 2168-2275 2168-2275
DOI:	10.1109/TCYB.2020.2999309