Adaptive Weighting Push-SUM for Decentralized Optimization With Statistical Diversity

Statistical diversity is a property of data distribution and can hinder the optimization of a decentralized network. However, the theoretical limitations of the Push-SUM protocol reduce the performance in handling the statistical diversity of optimization algorithms based on it. In this article, we...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on control of network systems Ročník 12; číslo 3; s. 2337 - 2349
Hlavní autori: Zhou, Yiming, Cheng, Yifei, Xu, Linli, Chen, Enhong
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Piscataway IEEE 01.09.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:2325-5870, 2372-2533
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Statistical diversity is a property of data distribution and can hinder the optimization of a decentralized network. However, the theoretical limitations of the Push-SUM protocol reduce the performance in handling the statistical diversity of optimization algorithms based on it. In this article, we theoretically and empirically reduce the negative impact of statistical diversity on decentralized optimization using the Push-SUM protocol. Specifically, we propose the adaptive weighting Push-SUM protocol, a theoretical generalization of the original Push-SUM protocol, where the latter is a special case of the former. Our theoretical analysis shows that, with sufficient communication, the upper bound on the consensus distance for the new protocol reduces to <inline-formula><tex-math notation="LaTeX">O(1/N)</tex-math></inline-formula>, whereas it remains at <inline-formula><tex-math notation="LaTeX">O(1)</tex-math></inline-formula> for the Push-SUM protocol. We adopt stochastic gradient descent (SGD) and momentum SGD on the new protocol and prove that the convergence rate of these two algorithms to statistical diversity is <inline-formula><tex-math notation="LaTeX">O(N/T)</tex-math></inline-formula> on the new protocol, while it is <inline-formula><tex-math notation="LaTeX">O(Nd/T)</tex-math></inline-formula> on the Push-SUM protocol, where <inline-formula><tex-math notation="LaTeX">d</tex-math></inline-formula> is the parameter size of the training model. To address statistical diversity in practical applications of the new protocol, we develop the Moreau weighting method for its generalized weight matrix definition. This method, derived from the Moreau envelope, is an approximate optimization of the distance penalty of the Moreau envelope. We verify that the adaptive weighting Push-SUM protocol is practically more efficient than the Push-SUM protocol via deep learning experiments.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2325-5870
2372-2533
DOI:10.1109/TCNS.2025.3566329