M22: A Communication-Efficient Algorithm for Federated Learning Inspired by Rate-Distortion

In federated learning (FL), the communication constraint between the remote clients and the Parameter Server (PS) is a crucial bottleneck. For this reason, model updates must be compressed so as to minimize the loss in accuracy resulting from the communication constraint. This paper proposes "M...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on communications Ročník 72; číslo 2; s. 845 - 860
Hlavní autoři: Liu, Yangyi, Rini, Stefano, Salehkalaibar, Sadaf, Chen, Jun
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.02.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:0090-6778, 1558-0857
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In federated learning (FL), the communication constraint between the remote clients and the Parameter Server (PS) is a crucial bottleneck. For this reason, model updates must be compressed so as to minimize the loss in accuracy resulting from the communication constraint. This paper proposes "M-magnitude weighted L2 distortion + 2 degrees of freedom" (M22) algorithm, a rate-distortion inspired approach to gradient compression for federated training of deep neural networks (DNNs). In particular, we propose a family of distortion measures between the original gradient and the reconstruction we referred to as "<inline-formula> <tex-math notation="LaTeX">M </tex-math></inline-formula>-magnitude weighted <inline-formula> <tex-math notation="LaTeX">L_{2} </tex-math></inline-formula>" distortion, and we assume that gradient updates follow an i.i.d. distribution - generalized normal or Weibull, which have two degrees of freedom. In both the distortion measure and the gradient distribution, there is one free parameter for each that can be fitted as a function of the iteration number. Given a choice of gradient distribution and distortion measure, we design the quantizer to minimize the expected distortion in gradient reconstruction. To measure the gradient compression performance under a communication constraint, we define the per-bit accuracy as the optimal improvement in accuracy that one bit of communication brings to the centralized model over the training period. Using this performance measure, we systematically benchmark the choice of gradient distribution and distortion measure. We provide substantial insights on the role of these choices and argue that significant performance improvements can be attained using such a rate-distortion inspired compressor.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0090-6778
1558-0857
DOI:10.1109/TCOMM.2023.3327778