FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers
Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new...
Uloženo v:
| Vydáno v: | SC21: International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 17 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
14.11.2021
|
| Témata: | |
| ISSN: | 2167-4337 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem-where clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck-where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one single dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based, synchronous mechanisms to tackle the straggler problem. However, asynchronous methods can easily create a communication bottleneck, while tiering may introduce biases that favor faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning system with Asynchronous Tiers under Non-i.i.d. training data. FedAT synergistically combines synchronous, intra-tier training and asynchronous, cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training across clients for further accuracy improvement. FedAT compresses uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, which minimizes the communication cost. Results show that FedAT improves the prediction performance by up to 21.09% and reduces the communication cost by up to 8.5×, compared to state-of-the-art FL methods. |
|---|---|
| AbstractList | Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem-where clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck-where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one single dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based, synchronous mechanisms to tackle the straggler problem. However, asynchronous methods can easily create a communication bottleneck, while tiering may introduce biases that favor faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning system with Asynchronous Tiers under Non-i.i.d. training data. FedAT synergistically combines synchronous, intra-tier training and asynchronous, cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training across clients for further accuracy improvement. FedAT compresses uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, which minimizes the communication cost. Results show that FedAT improves the prediction performance by up to 21.09% and reduces the communication cost by up to 8.5×, compared to state-of-the-art FL methods. |
| Author | Rangwala, Huzefa Chai, Zheng Chen, Yujing Cheng, Yue Anwar, Ali Zhao, Liang |
| Author_xml | – sequence: 1 givenname: Zheng surname: Chai fullname: Chai, Zheng email: zchai2@gmu.edu organization: George Mason University,Fairfax,VA,USA – sequence: 2 givenname: Yujing surname: Chen fullname: Chen, Yujing email: ychen37@gmu.edu organization: George Mason University,Fairfax,VA,USA – sequence: 3 givenname: Ali surname: Anwar fullname: Anwar, Ali email: ali.anwar2@ibm.com organization: IBM Research - Almaden,San Jose,CA,USA – sequence: 4 givenname: Liang surname: Zhao fullname: Zhao, Liang email: liang.zhao@emory.edu organization: Emory University,Atlanta,GA,USA – sequence: 5 givenname: Yue surname: Cheng fullname: Cheng, Yue email: yuecheng@gmu.edu organization: George Mason University,Fairfax,VA,USA – sequence: 6 givenname: Huzefa surname: Rangwala fullname: Rangwala, Huzefa email: rangwala@gmu.edu organization: George Mason University,Fairfax,VA,USA |
| BookMark | eNotj11LwzAYhaMoOOeuvfAmf6Azn30T78rYnFBQcF6PNH27RWwqSYfs31vQq3Pg4TxwbslVHCIScs_ZknOlH6XSxnBYSgWl4PyCLCyYCTBplBL8kswEL6FQUsINWeT8yRgTBrgUbEYOG2yr3ROt6DYcjsUbpm5IvYseqYstXQ19f4rBuzEMsVh3XfAB40inFSY3YktrdCmGeKDv5zxiT3_CeKRVPkd_TEMcTpnuAqZ8R64795Vx8Z9z8rFZ71bbon59fllVdeEE6LHQDSs9c9CULTR2-gGN9LaxGrwGFIo5RLAAzqLRvGx9CaVBaVvFhGymNicPf96AiPvvFHqXzntrOeOSy1-BgVi8 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3458817.3476211 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings Accès Toulouse INP et ENVT - IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781450384421 1450384420 |
| EISSN | 2167-4337 |
| EndPage | 17 |
| ExternalDocumentID | 9910131 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Science Foundation (NSF) grantid: CCF-1919075,CCF-1919113,CMMI-2134689,IIS-1755850,CNS-1841520,IIS-2007716,OAC-2007976,IIS-1942594,IIS-1907805 funderid: 10.13039/100000001 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-a275t-5b06c0a7b6d7b99787b3c9b957c57e240aee7977a9e8516dc6768e39d4023b8e3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 102 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000946520100091&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:18:35 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a275t-5b06c0a7b6d7b99787b3c9b957c57e240aee7977a9e8516dc6768e39d4023b8e3 |
| PageCount | 17 |
| ParticipantIDs | ieee_primary_9910131 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-Nov.-14 |
| PublicationDateYYYYMMDD | 2021-11-14 |
| PublicationDate_xml | – month: 11 year: 2021 text: 2021-Nov.-14 day: 14 |
| PublicationDecade | 2020 |
| PublicationTitle | SC21: International Conference for High Performance Computing, Networking, Storage and Analysis |
| PublicationTitleAbbrev | SC |
| PublicationYear | 2021 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0002871320 ssj0003204180 |
| Score | 2.2559931 |
| Snippet | Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | asynchronous distributed learning communication efficiency Computational modeling Costs Data models Federated learning Predictive models tiering Training Training data weighted aggregation |
| Title | FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers |
| URI | https://ieeexplore.ieee.org/document/9910131 |
| WOSCitedRecordID | wos000946520100091&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELbaioGJR4t4ywMjbuMkjmO2CjViqjoUqVvlx6ViSVEfSPx77tLQgsTCdsmQROc498h938fYg1WgZBZA-FQagW9IiVsqLUWKyYALpc2ts7XYhB6P89nMTFrscY-FAYB6-Az6ZNb_8sPSb6lVNsBchuhh2qytdbbDau37KZT5J03qQ8dopzKPGjYfmapBQqBMqftJih8AUgz6IadSR5Pi5H_Pccp6B1gen-wDzhlrQXXOTr51GXizTbtsUUAYTp_4kNMUh5gcsAHcVoH_woSIUc0hgffjBfFKYOoZeEO6uuA7PnNOzVo-XH9Wnqh0l9s1n5KCdo-9FqPp84toBBWEjbXaCOWizEdWuyxoZ7B-1C7xxhmlvdKAsd0CaEwIrYGcVtBnWIxAYgIWmYlD64J1qmUFl4xHEFu8EqkSBvSntaZMYvSuVGUUQMsr1iW_zd93nBnzxmXXf5--YccxzYrQeF16yzqb1Rbu2JH_2LytV_f1Qn8B6gKoZg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEN4gmugJFYxv9-DRQh-73dYbMTQYkXDAhBvZx9R4KYaHif_emVJBEy_epj20zWy38-h838fYrZYgg9iBZ0WQeviG5LilRO4JTAaMy3WijS7FJtRwmEwm6ajG7jZYGAAoh8-gTWb5L9_N7IpaZR3MZYgeZoftSiFCf43W2nRUKPePquSHjtEWQeJXfD6BkJ2IYJmBakcCPwGkGfRDUKWMJ1njf09yyFpbYB4fbULOEatBccwa38oMvNqoTfaageuO73mX0xyHN9qiA7guHP-FCvF6JYsE3o9nxCyByafjFe3qK18zmnNq1_Lu4rOwRKY7Wy34mDS0W-wl640f-l4lqeDpUMmlJ40fW18rEztlUqwglYlsalKprFSA0V0DKEwJdQoJraGNsRyBKHVYZkYGrRNWL2YFnDLuQ6jxSqRL6NCfWqd5FKJ3A5n7DlRwxprkt-n7mjVjWrns_O_TN2y_P34eTAePw6cLdhDS5AgN24lLVl_OV3DF9uzH8m0xvy4X_QvcAqut |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC21%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=FedAT%3A+A+High-Performance+and+Communication-Efficient+Federated+Learning+System+with+Asynchronous+Tiers&rft.au=Chai%2C+Zheng&rft.au=Chen%2C+Yujing&rft.au=Anwar%2C+Ali&rft.au=Zhao%2C+Liang&rft.date=2021-11-14&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=17&rft_id=info:doi/10.1145%2F3458817.3476211&rft.externalDocID=9910131 |