Adaptive Weighting Push-SUM for Decentralized Optimization With Statistical Diversity

Statistical diversity is a property of data distribution and can hinder the optimization of a decentralized network. However, the theoretical limitations of the Push-SUM protocol reduce the performance in handling the statistical diversity of optimization algorithms based on it. In this article, we...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on control of network systems Vol. 12; no. 3; pp. 2337 - 2349
Main Authors:	Zhou, Yiming, Cheng, Yifei, Xu, Linli, Chen, Enhong
Format:	Journal Article
Language:	English
Published:	Piscataway IEEE 01.09.2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Analytical models Communication networks Control systems Convergence directed decentralized optimization distributed algorithms/control learning Network systems Optimization Protocols Robustness Training Upper bound Upper bounds Vectors Weighting methods
ISSN:	2325-5870, 2372-2533
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Statistical diversity is a property of data distribution and can hinder the optimization of a decentralized network. However, the theoretical limitations of the Push-SUM protocol reduce the performance in handling the statistical diversity of optimization algorithms based on it. In this article, we theoretically and empirically reduce the negative impact of statistical diversity on decentralized optimization using the Push-SUM protocol. Specifically, we propose the adaptive weighting Push-SUM protocol, a theoretical generalization of the original Push-SUM protocol, where the latter is a special case of the former. Our theoretical analysis shows that, with sufficient communication, the upper bound on the consensus distance for the new protocol reduces to <inline-formula><tex-math notation="LaTeX">O(1/N)</tex-math></inline-formula>, whereas it remains at <inline-formula><tex-math notation="LaTeX">O(1)</tex-math></inline-formula> for the Push-SUM protocol. We adopt stochastic gradient descent (SGD) and momentum SGD on the new protocol and prove that the convergence rate of these two algorithms to statistical diversity is <inline-formula><tex-math notation="LaTeX">O(N/T)</tex-math></inline-formula> on the new protocol, while it is <inline-formula><tex-math notation="LaTeX">O(Nd/T)</tex-math></inline-formula> on the Push-SUM protocol, where <inline-formula><tex-math notation="LaTeX">d</tex-math></inline-formula> is the parameter size of the training model. To address statistical diversity in practical applications of the new protocol, we develop the Moreau weighting method for its generalized weight matrix definition. This method, derived from the Moreau envelope, is an approximate optimization of the distance penalty of the Moreau envelope. We verify that the adaptive weighting Push-SUM protocol is practically more efficient than the Push-SUM protocol via deep learning experiments.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2325-5870 2372-2533
DOI:	10.1109/TCNS.2025.3566329