Distributed Learning in Non-Convex Environments-Part I: Agreement at a Linear Rate

Driven by the need to solve increasingly complex optimization problems in signal processing and machine learning, there has been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments. Most available works on distributed non-convex optimization pr...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on signal processing Ročník 69; s. 1242 - 1256
Hlavní autoři:	Vlaski, Stefan, Sayed, Ali H.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	adaptation Aggregates Algorithms Annealing Centroids Computational geometry Convexity Cost function Descent diffusion learning distributed optimization Eigenvalues and eigenfunctions gradient noise Heuristic algorithms Machine learning non-convex cost Optimization Polynomials Saddle points Signal processing Signal processing algorithms stationary points Stochastic optimization Stochastic processes
ISSN:	1053-587X, 1941-0476
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Driven by the need to solve increasingly complex optimization problems in signal processing and machine learning, there has been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments. Most available works on distributed non-convex optimization problems focus on the deterministic setting where exact gradients are available at each agent. In this work and its Part II, we consider stochastic cost functions, where exact gradients are replaced by stochastic approximations and the resulting gradient noise persistently seeps into the dynamics of the algorithm. We establish that the diffusion learning strategy continues to yield meaningful estimates non-convex scenarios in the sense that the iterates by the individual agents will cluster in a small region around the network centroid. We use this insight to motivate a short-term model for network evolution over a finite-horizon. In Part II of this work, we leverage this model to establish descent of the diffusion strategy through saddle points in O(1/μ) steps, where μ denotes the step-size, and the return of approximately second-order stationary points in a polynomial number of iterations.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2021.3050858