A Communication-Efficient Algorithm for Federated Multilevel Stochastic Compositional Optimization
Recent literature shows a growing interest in the integration of federated learning (FL) and multilevel stochastic compositional optimization (MSCO), which arises in meta-learning and reinforcement learning. It is known that a bottleneck in FL is communication efficiency, when compared to fully dece...
Gespeichert in:
| Veröffentlicht in: | IEEE transactions on signal processing Jg. 72; S. 1 - 15 |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
IEEE
01.01.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 1053-587X, 1941-0476 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Recent literature shows a growing interest in the integration of federated learning (FL) and multilevel stochastic compositional optimization (MSCO), which arises in meta-learning and reinforcement learning. It is known that a bottleneck in FL is communication efficiency, when compared to fully decentralized methods. Yet, it remains unclear whether communication-efficient algorithms exist for MSCO in distributed settings. Single-loop optimizations, used in recent methods, structurally require communications per fixed samples generated, resulting in communication complexity being no less than sample complexity, hence lower bounded by <inline-formula><tex-math notation="LaTeX">\mathcal O (1/\epsilon)</tex-math></inline-formula>, for reaching an ε -accurate solution. This paper studies distibuted MSCO of a smooth, strongly convex objective with smooth gradients. Based on a double-loop strategy, we proposed Federated Stochastic Compositional Gradient Extrapolation (F ed SCGE), a federated MSCO method that attains an optimal <inline-formula><tex-math notation="LaTeX">\mathcal O(\log\frac{1}{\epsilon})</tex-math></inline-formula> communication complexity while maintaining an (almost) optimal <inline-formula><tex-math notation="LaTeX">\tilde{\mathcal O}(1/\epsilon)</tex-math></inline-formula> sample complexity, both of which independent of client number, making the approach scalable. Our analysis leverages the random gradient extrapolation method (RGEM) in [19] and generalizes it by overcoming the biased gradients of MSCO. To the best of our knowledge, our work is the first to show the simultaneous attainability of both complexity bounds for distributed MSCO. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1053-587X 1941-0476 |
| DOI: | 10.1109/TSP.2024.3392351 |