PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning

Large-scale deep neural networks (DNN) exhibit excellent performance for various tasks. As DNNs and datasets grow, distributed training becomes extremely time-consuming and demands larger clusters. A main bottleneck is the resulting gradient aggregation overhead. While gradient compression and spars...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7
Hlavní autoři: Wang, Yisu, Wu, Ruilong, Li, Xinjiao, Kutscher, Dirk
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 22.06.2025
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Large-scale deep neural networks (DNN) exhibit excellent performance for various tasks. As DNNs and datasets grow, distributed training becomes extremely time-consuming and demands larger clusters. A main bottleneck is the resulting gradient aggregation overhead. While gradient compression and sparse collective communication techniques are commonly employed to alleviate network load, many gradient compression schemes do not achieve acceleration of the training process while also preserving accuracy. This paper introduces PacTrain, a novel framework that accelerates distributed training by combining pruning with sparse gradient compression. Active pruning of the neural network makes the model weights and gradients sparse. By ensuring the global knowledge of the gradient sparsity among all distributed training workers, we can perform lightweight compression communication without harming accuracy. We show that the PacTrain compression scheme achieves a near-optimal compression strategy while remaining compatible with the allreduce primitive. Experimental evaluations show that PacTrain improves training throughput by 1.25 to 8.72 \times compared to state-of-the-art compression-enabled systems for representative vision and language models training tasks under bandwidth-constrained conditions.
AbstractList Large-scale deep neural networks (DNN) exhibit excellent performance for various tasks. As DNNs and datasets grow, distributed training becomes extremely time-consuming and demands larger clusters. A main bottleneck is the resulting gradient aggregation overhead. While gradient compression and sparse collective communication techniques are commonly employed to alleviate network load, many gradient compression schemes do not achieve acceleration of the training process while also preserving accuracy. This paper introduces PacTrain, a novel framework that accelerates distributed training by combining pruning with sparse gradient compression. Active pruning of the neural network makes the model weights and gradients sparse. By ensuring the global knowledge of the gradient sparsity among all distributed training workers, we can perform lightweight compression communication without harming accuracy. We show that the PacTrain compression scheme achieves a near-optimal compression strategy while remaining compatible with the allreduce primitive. Experimental evaluations show that PacTrain improves training throughput by 1.25 to 8.72 \times compared to state-of-the-art compression-enabled systems for representative vision and language models training tasks under bandwidth-constrained conditions.
Author Wang, Yisu
Wu, Ruilong
Kutscher, Dirk
Li, Xinjiao
Author_xml – sequence: 1
  givenname: Yisu
  surname: Wang
  fullname: Wang, Yisu
  organization: The Hong Kong University of Science and Technology (Guangzhou)
– sequence: 2
  givenname: Ruilong
  surname: Wu
  fullname: Wu, Ruilong
  organization: The Hong Kong University of Science and Technology (Guangzhou)
– sequence: 3
  givenname: Xinjiao
  surname: Li
  fullname: Li, Xinjiao
  organization: The Hong Kong University of Science and Technology (Guangzhou)
– sequence: 4
  givenname: Dirk
  surname: Kutscher
  fullname: Kutscher, Dirk
  email: dku@hkust-gz.edu.cn
  organization: The Hong Kong University of Science and Technology (Guangzhou)
BookMark eNo1kEFLAzEQhSPoQWv_gUj-QOsk2e1mvZXdWoUFC9ZzmSQTCbTZJbsVvPjb3Wo9Dcx88x7v3bDL2EZi7F7AXAgoH-pltVA6K-cSZD6uhFKZKC_YtCxKrZTIQUGmr9n3Bu02YYiPfJOOMcQPjtHxpcNuCJ_E3zpMPfF1QhcoDrxqD12ivg9t5L5NfOV9sOfLfk_292mEDqOWxeGEhcjr0A8pmONAjtdEHW8I08nsll153Pc0Pc8Je39abavnWfO6fqmWzQxFUQ4zRWhIYGYXBeYOjfaF9mMAZ5y2NndOGANSaTCZ01IgQAba6xw8wEIWUk3Y3Z9uIKJdl8IB09fuvxb1A5n0XsM
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/DAC63849.2025.11133419
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331503048
EndPage 7
ExternalDocumentID 11133419
Genre orig-research
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a179t-3eabe1a4c67a5dab8f78f304dbd8cc5dd1bb02380b4d821a00408f850f0062723
IEDL.DBID RIE
IngestDate Wed Oct 01 07:05:15 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a179t-3eabe1a4c67a5dab8f78f304dbd8cc5dd1bb02380b4d821a00408f850f0062723
PageCount 7
ParticipantIDs ieee_primary_11133419
PublicationCentury 2000
PublicationDate 2025-June-22
PublicationDateYYYYMMDD 2025-06-22
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-22
  day: 22
PublicationDecade 2020
PublicationTitle 2025 62nd ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev DAC
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 2.2949398
Snippet Large-scale deep neural networks (DNN) exhibit excellent performance for various tasks. As DNNs and datasets grow, distributed training becomes extremely...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Accuracy
Adaptation models
Artificial neural networks
Design automation
Distance learning
Graphics processing units
Load modeling
Machine vision
Throughput
Training
Title PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning
URI https://ieeexplore.ieee.org/document/11133419
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcAEiCLe8sCaNs_aZav6gAFVlSioW2X7zqhLGpWEkd-Oz0lBHRjYothRJJ_PZ5_v-z7G7t0RwC16kAWhwTBIjbSBjBIVmNjtRVJrrPEI77dnMZvJ5XIwb8DqHguDiL74DLv06O_yYWMqSpX1SBad-MdarCVEvwZrNajfKBz0xsORm00pwU_irLvrvCeb4qPG9Pif_zthnV_8HZ__RJZTdoD5GfuaK7MgQYcH11ZROoOrHPgQVEFLFn8p3BkV-ePWF3GVnDy9LnLNuduZ8okni6hbfLKePtrDh_B1zsdEpEsaWAh8jFjwhoD1vcNep5PF6Clo1BMC5ZysDBJUGiOVmr5QGSgtrZA2CVPQII3JACKtKWCHOgUZR4rcWVqZhZZwlSJOzlk73-R4wbiQYZIYJbVQKgUDMolAKMgoieX8X1yyDg3eqqgJMla7cbv64_01OyITUcVVHN-wdrmt8JYdms9y_bG982b9BnAvpt4
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMEEiCK-8cDq1kkc7LJV_aCIUlWioG6V43NQlzQqKSO_HZ-TgjowsEV2rEh27s4-33uPkFt3BHBOD2LGjeVMGJUyFUSamdDtRURqUuMR3m9DORqp6bQ1rsDqHgtjrfXFZ7aBj_4uHxZmhamyJsqiI__YNtmJhQh5CdeqcL8BbzW77Y77nwQCUMK4sX59QzjFx43-wT-_eEjqvwg8Ov6JLUdky2bH5GuszQQlHe5d3woTGlRnQNugc3Ra9CV3p1RLH5a-jKugaOtlmWtG3d6U9jxdRNnj0_U4aAMhQucZ7SKVLqpgWaBda3NaUbC-18lrvzfpDFiln8C0M7OCRVYnNtDC3Ekdg05UKlUacQEJKGNigCBJMGTzRIAKA40GrVIV8xSRlTKMTkgtW2T2lFCpeBQZrRKptQADKgpAaogxjeU8gDwjdZy8WV5SZMzW83b-R_sN2RtMnoez4ePo6YLs43Jh_VUYXpJasVzZK7JrPov5x_LaL_E3SCGqJQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=PacTrain%3A+Pruning+and+Adaptive+Sparse+Gradient+Compression+for+Efficient+Collective+Communication+in+Distributed+Deep+Learning&rft.au=Wang%2C+Yisu&rft.au=Wu%2C+Ruilong&rft.au=Li%2C+Xinjiao&rft.au=Kutscher%2C+Dirk&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11133419&rft.externalDocID=11133419