Distributed Stochastic Consensus Optimization With Momentum for Nonconvex Nonsmooth Problems

While many distributed optimization algorithms have been proposed for solving smooth or convex problems over the networks, few of them can handle non-convex and non-smooth problems. Based on a proximal primal-dual approach, this paper presents a new (stochastic) distributed algorithm with Nesterov m...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on signal processing Vol. 69; pp. 4486 - 4501
Main Authors:	Wang, Zhiguo, Zhang, Jiawei, Chang, Tsung-Hui, Li, Jian, Luo, Zhi-Quan
Format:	Journal Article
Language:	English
Published:	New York IEEE 2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Artificial neural networks Communication Complexity Computation Computational complexity Convergence Distributed optimization Momentum Neural networks non-convex and non-smooth optimization Optimization Radio frequency Signal processing algorithms stochastic optimization Stochastic processes
ISSN:	1053-587X, 1941-0476
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	While many distributed optimization algorithms have been proposed for solving smooth or convex problems over the networks, few of them can handle non-convex and non-smooth problems. Based on a proximal primal-dual approach, this paper presents a new (stochastic) distributed algorithm with Nesterov momentum for accelerated optimization of non-convex and non-smooth problems. Theoretically, we show that the proposed algorithm can achieve an <inline-formula><tex-math notation="LaTeX">\epsilon</tex-math></inline-formula>-stationary solution under a constant step size with <inline-formula><tex-math notation="LaTeX">\mathcal {O}(1/\epsilon ^2)</tex-math></inline-formula> computation complexity and <inline-formula><tex-math notation="LaTeX">\mathcal {O}(1/\epsilon)</tex-math></inline-formula> communication complexity when the epigraph of the non-smooth term is a polyhedral set. When compared to the existing gradient tracking based methods, the proposed algorithm has the same order of computation complexity but lower order of communication complexity. To the best of our knowledge, the presented result is the first stochastic algorithm with the <inline-formula><tex-math notation="LaTeX">\mathcal {O}(1/\epsilon)</tex-math></inline-formula> communication complexity for non-convex and non-smooth problems. Numerical experiments for a distributed non-convex regression problem and a deep neural network based classification problem are presented to illustrate the effectiveness of the proposed algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2021.3097211