Convergence Analysis of Distributed Gradient Descent Algorithms With One and Two Momentum Terms

For the centralized optimization, it is well known that adding one momentum term (also called the heavy-ball method) can obtain a faster convergence rate than the gradient method. However, for the distributed counterpart, there is quite few results about the effect of added momentum terms on the con...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE transactions on cybernetics Ročník 54; číslo 3; s. 1511 - 1522
Hlavní autori:	Liu, Bing, Chai, Li, Yi, Jingwen
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	United States IEEE 01.03.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	Algorithms Control theory Convergence Convergence rate Cost function distributed optimization Linear programming Momentum momentum term Network topology Newton method Privacy Routh criterion Signal processing algorithms
ISSN:	2168-2267, 2168-2275, 2168-2275
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	For the centralized optimization, it is well known that adding one momentum term (also called the heavy-ball method) can obtain a faster convergence rate than the gradient method. However, for the distributed counterpart, there is quite few results about the effect of added momentum terms on the convergence rate. This article is aimed at studying the issue in the distributed setup, where <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula> agents minimize the sum of their individual cost functions using local communication over a network. The cost functions are twice continuously differentiable. We first study the algorithm with one momentum term and develop a distributed heavy-ball (D-HB) method by adding one momentum term on to the distributed gradient algorithm. By borrowing tools from the control theory, we provide a simple convergence proof and an explicit expression of the optimal convergence rate. Furthermore, we consider adding two momentum terms case and propose a distributed double-heavy-ball (D-DHB) method. We show that adding one momentum term allows faster convergence while adding two momentum terms does not perform any superiorities. Finally, simulation examples are given to illustrate our findings.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2168-2267 2168-2275 2168-2275
DOI:	10.1109/TCYB.2022.3218663