A Communication-Efficient Algorithm for Federated Multilevel Stochastic Compositional Optimization
Recent literature shows a growing interest in the integration of federated learning (FL) and multilevel stochastic compositional optimization (MSCO), which arises in meta-learning and reinforcement learning. It is known that a bottleneck in FL is communication efficiency, when compared to fully dece...
Uložené v:
| Vydané v: | IEEE transactions on signal processing Ročník 72; s. 1 - 15 |
|---|---|
| Hlavní autori: | , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York
IEEE
01.01.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Predmet: | |
| ISSN: | 1053-587X, 1941-0476 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Recent literature shows a growing interest in the integration of federated learning (FL) and multilevel stochastic compositional optimization (MSCO), which arises in meta-learning and reinforcement learning. It is known that a bottleneck in FL is communication efficiency, when compared to fully decentralized methods. Yet, it remains unclear whether communication-efficient algorithms exist for MSCO in distributed settings. Single-loop optimizations, used in recent methods, structurally require communications per fixed samples generated, resulting in communication complexity being no less than sample complexity, hence lower bounded by <inline-formula><tex-math notation="LaTeX">\mathcal O (1/\epsilon)</tex-math></inline-formula>, for reaching an ε -accurate solution. This paper studies distibuted MSCO of a smooth, strongly convex objective with smooth gradients. Based on a double-loop strategy, we proposed Federated Stochastic Compositional Gradient Extrapolation (F ed SCGE), a federated MSCO method that attains an optimal <inline-formula><tex-math notation="LaTeX">\mathcal O(\log\frac{1}{\epsilon})</tex-math></inline-formula> communication complexity while maintaining an (almost) optimal <inline-formula><tex-math notation="LaTeX">\tilde{\mathcal O}(1/\epsilon)</tex-math></inline-formula> sample complexity, both of which independent of client number, making the approach scalable. Our analysis leverages the random gradient extrapolation method (RGEM) in [19] and generalizes it by overcoming the biased gradients of MSCO. To the best of our knowledge, our work is the first to show the simultaneous attainability of both complexity bounds for distributed MSCO. |
|---|---|
| Bibliografia: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1053-587X 1941-0476 |
| DOI: | 10.1109/TSP.2024.3392351 |