Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning

Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. How-ever, inherent heterogeneity in agents' resources (computation, communication, and task size) may lead to substantial variations in training time. This heterogeneity creates a bottle...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the International Conference on Distributed Computing Systems pp. 680 - 691
Main Authors: Sajjadi Mohammadabadi, Seyed Mahmoud, Yang, Lei, Yan, Feng, Zhang, Junshan
Format: Conference Proceeding
Language:English
Published: IEEE 23.07.2024
Subjects:
ISSN:2575-8411
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. How-ever, inherent heterogeneity in agents' resources (computation, communication, and task size) may lead to substantial variations in training time. This heterogeneity creates a bottleneck, lengthening the overall training time due to straggler effects and potentially wasting spare resources of faster agents. To minimize training time in heterogeneous environments, we present a Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning (ComDML), which balances the workload among agents through a decentralized approach. Leveraging local-loss split training, ComDML enables parallel updates, where slower agents offload part of their workload to faster agents. To minimize the overall training time, ComDML optimizes the workload balancing by jointly considering the communication and computation capacities of agents, which hinges upon integer programming. A dynamic decentralized pairing scheduler is developed to efficiently pair agents and determine optimal offloading amounts. We prove that in ComDML, both slower and faster agents' models converge, for convex and non-convex functions. Furthermore, extensive experimental results on popular datasets (CIFAR-10, CIFAR-100, and CINIC-10) and their non-I.I.D. variants, with large models such as ResNet-56 and ResNet-110, demonstrate that ComDML can significantly reduce the overall training time while maintaining model accuracy, compared to state-of-the-art methods.ComDML demonstrates robustness in heterogeneous environments, and privacy measures can be seamlessly integrated for enhanced data protection.
AbstractList Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. How-ever, inherent heterogeneity in agents' resources (computation, communication, and task size) may lead to substantial variations in training time. This heterogeneity creates a bottleneck, lengthening the overall training time due to straggler effects and potentially wasting spare resources of faster agents. To minimize training time in heterogeneous environments, we present a Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning (ComDML), which balances the workload among agents through a decentralized approach. Leveraging local-loss split training, ComDML enables parallel updates, where slower agents offload part of their workload to faster agents. To minimize the overall training time, ComDML optimizes the workload balancing by jointly considering the communication and computation capacities of agents, which hinges upon integer programming. A dynamic decentralized pairing scheduler is developed to efficiently pair agents and determine optimal offloading amounts. We prove that in ComDML, both slower and faster agents' models converge, for convex and non-convex functions. Furthermore, extensive experimental results on popular datasets (CIFAR-10, CIFAR-100, and CINIC-10) and their non-I.I.D. variants, with large models such as ResNet-56 and ResNet-110, demonstrate that ComDML can significantly reduce the overall training time while maintaining model accuracy, compared to state-of-the-art methods.ComDML demonstrates robustness in heterogeneous environments, and privacy measures can be seamlessly integrated for enhanced data protection.
Author Sajjadi Mohammadabadi, Seyed Mahmoud
Zhang, Junshan
Yan, Feng
Yang, Lei
Author_xml – sequence: 1
  givenname: Seyed Mahmoud
  surname: Sajjadi Mohammadabadi
  fullname: Sajjadi Mohammadabadi, Seyed Mahmoud
  email: mahmoud.sajjadi@unr.edu
  organization: University of Nevada, Reno,Department of Computer Science and Engineering,Reno,NV,USA
– sequence: 2
  givenname: Lei
  surname: Yang
  fullname: Yang, Lei
  email: leiy@unr.edu
  organization: University of Nevada, Reno,Department of Computer Science and Engineering,Reno,NV,USA
– sequence: 3
  givenname: Feng
  surname: Yan
  fullname: Yan, Feng
  email: fyan5@central.uh.edu
  organization: University of Houston,Department of Computer Science,Houston,TX,USA
– sequence: 4
  givenname: Junshan
  surname: Zhang
  fullname: Zhang, Junshan
  email: jazh@ucdavis.edu
  organization: University of California, Davis,Department of Electrical and Computer Engineering,Davis,CA,USA
BookMark eNotjE1OwzAUhA0Cibb0BiDlAinPdp5_liUtUCmIBUVIbCrj2JUhdZCTLuD0JILVaGa-mSk5i210hFxTWFAK-mZTrspnAXrwDFixAAChT8hcS604AlcCUJ-SCUOJuSoovSDTrvsYMFSCT8hb2R4Oxxis6UMb87X3wQYX-2ybTIgh7rPXNn02ramzW9OYaMfItylbOTtgyTThx9XZ47HpQ77cj8vKmTQuL8m5N03n5v86Iy936235kFdP95tyWeWBStHnVhsGBdTco-SgNTAK6KW0DBX3CpUVKB0IWVOsDSrNmRha9PS9kMp5PiNXf7_BObf7SuFg0veOguAUmOC_cVhU5A
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICDCS60910.2024.00069
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350386059
EISSN 2575-8411
EndPage 691
ExternalDocumentID 10631026
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  grantid: OIA-2148788,CAREER-2305491,CNS-2203239,CNS-2203412,CCSS-2203238
  funderid: 10.13039/100000001
GroupedDBID 29G
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i176t-c9a2040d3f57309902105f77c2583f858c657e067d15da58932677c5f1b478ef3
IEDL.DBID RIE
ISICitedReferencesCount 5
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001304430200060&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:32:38 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i176t-c9a2040d3f57309902105f77c2583f858c657e067d15da58932677c5f1b478ef3
PageCount 12
ParticipantIDs ieee_primary_10631026
PublicationCentury 2000
PublicationDate 2024-July-23
PublicationDateYYYYMMDD 2024-07-23
PublicationDate_xml – month: 07
  year: 2024
  text: 2024-July-23
  day: 23
PublicationDecade 2020
PublicationTitle Proceedings of the International Conference on Distributed Computing Systems
PublicationTitleAbbrev ICDCS
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0005863
Score 2.3379905
Snippet Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. How-ever, inherent heterogeneity in agents'...
SourceID ieee
SourceType Publisher
StartPage 680
SubjectTerms Accuracy
communication-efficient training
Computational modeling
Data models
decentralized multi-agent learning
edge computing
Fasteners
federated learning
heterogeneous agents
Integer programming
Time measurement
Training
workload balancing
Title Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning
URI https://ieeexplore.ieee.org/document/10631026
WOSCitedRecordID wos001304430200060&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5s8eCpPiq-ycFrdDfZvI7aB3opBSsUL2U3yUpBWqmtB3-9kzS1ePDgYWHJQhYmzH4zs983A3CNGQDiYu6pNZ7RwjpJjbWcirrkGTNl4UQVh02owUCPx2aYxOpRC-O9j-QzfxNu4798N7erUCpDD5cYjTDZgIZSai3W2vI5tORJopNn5vax0-08yYCGmASy0CI7i6Tm7QiViCD91j_fvQ_trRaPDH9Q5gB2_OwQWpthDCT55hG8_JJ60F5sDYFbklEaAkFCXfxtXjpyH-iMNixhxEq6PhE0p1_ekSjIpXdBcEVS79XXNjz3e6POA02DE-gUTb9Eu5cMndPxWqADI95gXidqpSwTmtdaaCuF8ohTLheuFDrEcPhU1HlVKO1rfgzN2XzmT4AwW0kj8DJcFhhdVrVwBncwNrMKvw2n0A62mryve2NMNmY6-2P9HPbCcYTqKOMX0FwuVv4Sdu3ncvqxuIon-g3vuaEP
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB60Cnqqj4pvc_Aa3U022eSofdBiLQUrFC9lN8lKQVqprQd_vZN0a_HgwcPCkoUsTJj9Zma_bwbgGjMAxMXYUaMdo4mxkmpjOBVFxiOms8SKPAybSHs9NRzqfilWD1oY51wgn7kbfxv-5dupWfhSGXq4xGiEyU3YEknC4qVca83oUJKXIp040redeqP-JD0eYhrIfJPsKNCa10NUAoa0qv98-x7U1mo80v_BmX3YcJMDqK7GMZDSOw_h5ZfYgzZDcwjckgzKMRDEV8bfppkl957QaPwSxqyk4UqK5vjLWRIkufTOS65I2X31tQbPreag3qbl6AQ6RuPP0fIZQ_e0vBDowog4mNmJIk0NE4oXSigjReoQqWwsbCaUj-LwqSjiPEmVK_gRVCbTiTsGwkwutcBLc5lgfJkXwmrcQZvIpPh1OIGat9XofdkdY7Qy0-kf61ew0x48dkfdTu_hDHb90fhaKePnUJnPFu4Cts3nfPwxuwyn-w2AeqRW
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+International+Conference+on+Distributed+Computing+Systems&rft.atitle=Communication-Efficient+Training+Workload+Balancing+for+Decentralized+Multi-Agent+Learning&rft.au=Sajjadi+Mohammadabadi%2C+Seyed+Mahmoud&rft.au=Yang%2C+Lei&rft.au=Yan%2C+Feng&rft.au=Zhang%2C+Junshan&rft.date=2024-07-23&rft.pub=IEEE&rft.eissn=2575-8411&rft.spage=680&rft.epage=691&rft_id=info:doi/10.1109%2FICDCS60910.2024.00069&rft.externalDocID=10631026