A Communication-Efficient Algorithm for Federated Multilevel Stochastic Compositional Optimization

Recent literature shows a growing interest in the integration of federated learning (FL) and multilevel stochastic compositional optimization (MSCO), which arises in meta-learning and reinforcement learning. It is known that a bottleneck in FL is communication efficiency, when compared to fully dece...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on signal processing Jg. 72; S. 1 - 15
Hauptverfasser: Yang, Shuoguang, Li, Fengpei
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.01.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:1053-587X, 1941-0476
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recent literature shows a growing interest in the integration of federated learning (FL) and multilevel stochastic compositional optimization (MSCO), which arises in meta-learning and reinforcement learning. It is known that a bottleneck in FL is communication efficiency, when compared to fully decentralized methods. Yet, it remains unclear whether communication-efficient algorithms exist for MSCO in distributed settings. Single-loop optimizations, used in recent methods, structurally require communications per fixed samples generated, resulting in communication complexity being no less than sample complexity, hence lower bounded by <inline-formula><tex-math notation="LaTeX">\mathcal O (1/\epsilon)</tex-math></inline-formula>, for reaching an ε -accurate solution. This paper studies distibuted MSCO of a smooth, strongly convex objective with smooth gradients. Based on a double-loop strategy, we proposed Federated Stochastic Compositional Gradient Extrapolation (F ed SCGE), a federated MSCO method that attains an optimal <inline-formula><tex-math notation="LaTeX">\mathcal O(\log\frac{1}{\epsilon})</tex-math></inline-formula> communication complexity while maintaining an (almost) optimal <inline-formula><tex-math notation="LaTeX">\tilde{\mathcal O}(1/\epsilon)</tex-math></inline-formula> sample complexity, both of which independent of client number, making the approach scalable. Our analysis leverages the random gradient extrapolation method (RGEM) in [19] and generalizes it by overcoming the biased gradients of MSCO. To the best of our knowledge, our work is the first to show the simultaneous attainability of both complexity bounds for distributed MSCO.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1053-587X
1941-0476
DOI:10.1109/TSP.2024.3392351