A Communication-Efficient Algorithm for Federated Multilevel Stochastic Compositional Optimization
Recent literature shows a growing interest in the integration of federated learning (FL) and multilevel stochastic compositional optimization (MSCO), which arises in meta-learning and reinforcement learning. It is known that a bottleneck in FL is communication efficiency, when compared to fully dece...
Saved in:
| Published in: | IEEE transactions on signal processing Vol. 72; pp. 1 - 15 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.01.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 1053-587X, 1941-0476 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Recent literature shows a growing interest in the integration of federated learning (FL) and multilevel stochastic compositional optimization (MSCO), which arises in meta-learning and reinforcement learning. It is known that a bottleneck in FL is communication efficiency, when compared to fully decentralized methods. Yet, it remains unclear whether communication-efficient algorithms exist for MSCO in distributed settings. Single-loop optimizations, used in recent methods, structurally require communications per fixed samples generated, resulting in communication complexity being no less than sample complexity, hence lower bounded by <inline-formula><tex-math notation="LaTeX">\mathcal O (1/\epsilon)</tex-math></inline-formula>, for reaching an ε -accurate solution. This paper studies distibuted MSCO of a smooth, strongly convex objective with smooth gradients. Based on a double-loop strategy, we proposed Federated Stochastic Compositional Gradient Extrapolation (F ed SCGE), a federated MSCO method that attains an optimal <inline-formula><tex-math notation="LaTeX">\mathcal O(\log\frac{1}{\epsilon})</tex-math></inline-formula> communication complexity while maintaining an (almost) optimal <inline-formula><tex-math notation="LaTeX">\tilde{\mathcal O}(1/\epsilon)</tex-math></inline-formula> sample complexity, both of which independent of client number, making the approach scalable. Our analysis leverages the random gradient extrapolation method (RGEM) in [19] and generalizes it by overcoming the biased gradients of MSCO. To the best of our knowledge, our work is the first to show the simultaneous attainability of both complexity bounds for distributed MSCO. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1053-587X 1941-0476 |
| DOI: | 10.1109/TSP.2024.3392351 |