FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers

Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new...

Full description

Saved in:
Bibliographic Details
Published in:SC21: International Conference for High Performance Computing, Networking, Storage and Analysis pp. 1 - 17
Main Authors: Chai, Zheng, Chen, Yujing, Anwar, Ali, Zhao, Liang, Cheng, Yue, Rangwala, Huzefa
Format: Conference Proceeding
Language:English
Published: ACM 14.11.2021
Subjects:
ISSN:2167-4337
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem-where clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck-where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one single dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based, synchronous mechanisms to tackle the straggler problem. However, asynchronous methods can easily create a communication bottleneck, while tiering may introduce biases that favor faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning system with Asynchronous Tiers under Non-i.i.d. training data. FedAT synergistically combines synchronous, intra-tier training and asynchronous, cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training across clients for further accuracy improvement. FedAT compresses uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, which minimizes the communication cost. Results show that FedAT improves the prediction performance by up to 21.09% and reduces the communication cost by up to 8.5×, compared to state-of-the-art FL methods.
AbstractList Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem-where clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck-where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one single dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based, synchronous mechanisms to tackle the straggler problem. However, asynchronous methods can easily create a communication bottleneck, while tiering may introduce biases that favor faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning system with Asynchronous Tiers under Non-i.i.d. training data. FedAT synergistically combines synchronous, intra-tier training and asynchronous, cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training across clients for further accuracy improvement. FedAT compresses uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, which minimizes the communication cost. Results show that FedAT improves the prediction performance by up to 21.09% and reduces the communication cost by up to 8.5×, compared to state-of-the-art FL methods.
Author Rangwala, Huzefa
Chai, Zheng
Chen, Yujing
Cheng, Yue
Anwar, Ali
Zhao, Liang
Author_xml – sequence: 1
  givenname: Zheng
  surname: Chai
  fullname: Chai, Zheng
  email: zchai2@gmu.edu
  organization: George Mason University,Fairfax,VA,USA
– sequence: 2
  givenname: Yujing
  surname: Chen
  fullname: Chen, Yujing
  email: ychen37@gmu.edu
  organization: George Mason University,Fairfax,VA,USA
– sequence: 3
  givenname: Ali
  surname: Anwar
  fullname: Anwar, Ali
  email: ali.anwar2@ibm.com
  organization: IBM Research - Almaden,San Jose,CA,USA
– sequence: 4
  givenname: Liang
  surname: Zhao
  fullname: Zhao, Liang
  email: liang.zhao@emory.edu
  organization: Emory University,Atlanta,GA,USA
– sequence: 5
  givenname: Yue
  surname: Cheng
  fullname: Cheng, Yue
  email: yuecheng@gmu.edu
  organization: George Mason University,Fairfax,VA,USA
– sequence: 6
  givenname: Huzefa
  surname: Rangwala
  fullname: Rangwala, Huzefa
  email: rangwala@gmu.edu
  organization: George Mason University,Fairfax,VA,USA
BookMark eNotj11LwzAYhaMoOOeuvfAmf6Azn30T78rYnFBQcF6PNH27RWwqSYfs31vQq3Pg4TxwbslVHCIScs_ZknOlH6XSxnBYSgWl4PyCLCyYCTBplBL8kswEL6FQUsINWeT8yRgTBrgUbEYOG2yr3ROt6DYcjsUbpm5IvYseqYstXQ19f4rBuzEMsVh3XfAB40inFSY3YktrdCmGeKDv5zxiT3_CeKRVPkd_TEMcTpnuAqZ8R64795Vx8Z9z8rFZ71bbon59fllVdeEE6LHQDSs9c9CULTR2-gGN9LaxGrwGFIo5RLAAzqLRvGx9CaVBaVvFhGymNicPf96AiPvvFHqXzntrOeOSy1-BgVi8
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3458817.3476211
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450384421
1450384420
EISSN 2167-4337
EndPage 17
ExternalDocumentID 9910131
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation (NSF)
  grantid: CCF-1919075,CCF-1919113,CMMI-2134689,IIS-1755850,CNS-1841520,IIS-2007716,OAC-2007976,IIS-1942594,IIS-1907805
  funderid: 10.13039/100000001
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-a275t-5b06c0a7b6d7b99787b3c9b957c57e240aee7977a9e8516dc6768e39d4023b8e3
IEDL.DBID RIE
ISICitedReferencesCount 102
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000946520100091&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:18:35 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a275t-5b06c0a7b6d7b99787b3c9b957c57e240aee7977a9e8516dc6768e39d4023b8e3
PageCount 17
ParticipantIDs ieee_primary_9910131
PublicationCentury 2000
PublicationDate 2021-Nov.-14
PublicationDateYYYYMMDD 2021-11-14
PublicationDate_xml – month: 11
  year: 2021
  text: 2021-Nov.-14
  day: 14
PublicationDecade 2020
PublicationTitle SC21: International Conference for High Performance Computing, Networking, Storage and Analysis
PublicationTitleAbbrev SC
PublicationYear 2021
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0002871320
ssj0003204180
Score 2.2558942
Snippet Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms asynchronous distributed learning
communication efficiency
Computational modeling
Costs
Data models
Federated learning
Predictive models
tiering
Training
Training data
weighted aggregation
Title FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers
URI https://ieeexplore.ieee.org/document/9910131
WOSCitedRecordID wos000946520100091&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELbaioGJR4t4ywMjbpM4sWO2CjViqjoUqVvlx6ViSVEfSPx77tLQgsTCdsmQRI7i--5y3_cx9qCyMscspxG5xUqkZaqEsT4REEjcxZZlHrnabEKPx_lsZiYt9rjnwgBAPXwGfQrrf_lh6bfUKhsglqErtFlba7Xjau37KYT8ZQN96BjjNM6jRs0nTrOBJFJmrPsyxQ2AHIN-2KnU2aQ4-d9znLLegZbHJ_uEc8ZaUJ2zk29fBt58pl22KCAMp098yGmKQ0wO3ABuq8B_cULEqNaQwPvxgnQlEHoG3oiuLvhOz5xTs5YP15-VJynd5XbNp-Sg3WOvxWj6_CIaQwVhE51tROYi5SOrnQraGawftZPeOJNpn2nA3G4BNAJCawCBmApeYTEC0gQsMqXD6IJ1qmUFl4xj1WPixEOwucM9IORYGkkbQxlHxpZeXrEurdv8faeZMW-W7Prv0zfsOKFZERqvS29ZZ7Pawh078h-bt_Xqvn7RX3P5qCU
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8JAEN0gmugJFYzf7sGjC2132-16I4YGIxIOmHAj-1XjpRg-TPz3zpQKmnjxNu2hbbbpzpvpvPcIuU3iPIUsJwG5hQkTuUiY0jZi3qG4i87zNDCl2YQcDtPJRI1q5G7DhfHel8Nnvo1h-S_fzewKW2UdwDJ4hR2yGwsRBWu21qajgtifV-AHjyEWYRpUej6hiDscaZmhbHMBWwB6Bv0wVCnzSdb435McktaWmEdHm5RzRGq-OCaNb2cGWn2oTfKaedcd39MuxTkONtqyA6guHP3FCmG9UkUC7kczVJYA8OloJbv6SteK5hTbtbS7-CwsiunOVgs6Rg_tFnnJeuOHPqssFZiOZLxksQkSG2hpEieNggpSGm6VUbG0sfSQ3bX3EiChVh6gWOJsAuWI58pBmckNRCekXswKf0oo1D0qjKx3OjWwC7gUiiOuQ5-HgdK55Wekies2fV-rZkyrJTv_-_QN2e-PnwfTwePw6YIcRDg5gsN24pLUl_OVvyJ79mP5tphfly_9C2GBq2w
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC21%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=FedAT%3A+A+High-Performance+and+Communication-Efficient+Federated+Learning+System+with+Asynchronous+Tiers&rft.au=Chai%2C+Zheng&rft.au=Chen%2C+Yujing&rft.au=Anwar%2C+Ali&rft.au=Zhao%2C+Liang&rft.date=2021-11-14&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=17&rft_id=info:doi/10.1145%2F3458817.3476211&rft.externalDocID=9910131