SparCML: High-Performance Sparse Communication for Machine Learning

Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a gl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:SC19: International Conference for High Performance Computing, Networking, Storage and Analysis S. 1 - 15
Hauptverfasser: Renggli, Cedric, Ashkboos, Saleh, Aghagolzadeh, Mehdi, Alistarh, Dan, Hoefler, Torsten
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: ACM 17.11.2019
Schlagworte:
ISSN:2167-4337
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads. We observe that frequently, many gradient values are (close to) zero, leading to sparse of sparsifyable communications. To exploit this insight, we analyze, design, and implement a set of communication-efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute arbitrary sparse input data vectors. Our generic communication library, Sparcml 1 1 Stands for Sparse Communication layer for Machine Learning, to be read as sparse ML., extends MPI to support additional features, such as non-blocking (asynchronous) operations and low-precision data representations. As such, Sparcml and its techniques will form the basis of future highly-scalable machine learning frameworks.
AbstractList Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads. We observe that frequently, many gradient values are (close to) zero, leading to sparse of sparsifyable communications. To exploit this insight, we analyze, design, and implement a set of communication-efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute arbitrary sparse input data vectors. Our generic communication library, Sparcml 1 1 Stands for Sparse Communication layer for Machine Learning, to be read as sparse ML., extends MPI to support additional features, such as non-blocking (asynchronous) operations and low-precision data representations. As such, Sparcml and its techniques will form the basis of future highly-scalable machine learning frameworks.
Author Renggli, Cedric
Ashkboos, Saleh
Alistarh, Dan
Hoefler, Torsten
Aghagolzadeh, Mehdi
Author_xml – sequence: 1
  givenname: Cedric
  surname: Renggli
  fullname: Renggli, Cedric
  organization: ETH Zurich
– sequence: 2
  givenname: Saleh
  surname: Ashkboos
  fullname: Ashkboos, Saleh
  organization: IST Austria
– sequence: 3
  givenname: Mehdi
  surname: Aghagolzadeh
  fullname: Aghagolzadeh, Mehdi
  organization: Microsoft
– sequence: 4
  givenname: Dan
  surname: Alistarh
  fullname: Alistarh, Dan
  organization: IST Austria
– sequence: 5
  givenname: Torsten
  surname: Hoefler
  fullname: Hoefler, Torsten
  organization: ETH Zurich
BookMark eNotzkFLw0AQBeBVFKw1Zy8e9g-kzu7sZjfeJFgrpCio5zJJZ9sVsylJPfjvjejp8fjg8S7FWeoTC3GtYKGUsbeoS2sBFoi20FqfiKx0fgLAqZZwKmZaFS43iO5CZOP4AQCowSgPM1G9Hmio1vWdXMXdPn_hIfRDR6ll-Ssjy6rvuq8UWzrGPslJ5ZrafUwsa6YhxbS7EueBPkfO_nMu3pcPb9Uqr58fn6r7Oidt_DEPGBjI03SLjfclKWe2ZaOUNY1DbIxjU4Am9KpRRbAFG0Xg2W0thtAAzsXN325k5s1hiB0N3xsFJWjtHf4ARAhLOw
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3295500.3356222
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450362290
145036229X
EISSN 2167-4337
EndPage 15
ExternalDocumentID 10902287
Genre orig-research
GrantInformation_xml – fundername: European Research Council
  funderid: 10.13039/501100000781
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-a248t-f3fe0a8a362e4889a174d9b1154b733b47e4602a381b16f56e41a08e7d53ffb03
IEDL.DBID RIE
ISICitedReferencesCount 70
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000545976800011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Mar 12 06:17:07 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a248t-f3fe0a8a362e4889a174d9b1154b733b47e4602a381b16f56e41a08e7d53ffb03
PageCount 15
ParticipantIDs ieee_primary_10902287
PublicationCentury 2000
PublicationDate 2019-Nov.-17
PublicationDateYYYYMMDD 2019-11-17
PublicationDate_xml – month: 11
  year: 2019
  text: 2019-Nov.-17
  day: 17
PublicationDecade 2010
PublicationTitle SC19: International Conference for High Performance Computing, Networking, Storage and Analysis
PublicationTitleAbbrev SC
PublicationYear 2019
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0003204180
ssj0002871321
Score 2.104348
Snippet Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Distributed databases
High performance computing
Industries
Libraries
Machine learning
Machine learning algorithms
Protocols
Scalability
Sparse AllGather
Sparse AllReduce
Sparse Input Vectors
Vectors
Title SparCML: High-Performance Sparse Communication for Machine Learning
URI https://ieeexplore.ieee.org/document/10902287
WOSCitedRecordID wos000545976800011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED3RioGpfBTxLQ-spknsxDFrRMVAq0iA1K2y4zNiaas25fdju24LAwNL5NiLZct6z7579wDum9zHljClSkhBecMt1aVA91EF00JLG80mxHhcTiayjmL1oIVBxJB8hg--GWL5Zt6s_VPZICQROorfgY4QxUastXtQ8dSfRSzy_yxLeFomsZxPyvMBy6Tj4-6ayhzoe7fcH34qAU6GvX9O5Bj6e2EeqXeQcwIHODuF3taZgcSDegbV60Itq9HLI_F5HLTeqwOIH1kh-SUMIW6UjEJaJZJYcfWjD-_Dp7fqmUa7BKoyXrbUMouJKpWDJHTHUip32TBS-3o7WjCmuUBeJJlyGK3TwuYF8lQlJQqTM2t1ws6hO5vP8AIIN6JBawQyqRzDajRmujAhTsOMsfoS-n5RpotNRYzpdj2u_ui_hiNHNKTX8KXiBrrtco23cNh8tZ-r5V3Yx2-ekJxm
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED1BQYKpfBTxjQdWQxI7ccxaURXRVpUoUrfKjs-Ipa36we_Hdt0WBgaWyIkXy5H13vnu3QO4r3KfW8KUKiEF5RW3VJcC3UMVTAstbTSbEL1eORzKfhSrBy0MIobiM3zww5DLN5Nq6a_KHkMRoaP4u7CXcxf4rORamysVT_5ZRCP_zrKEp2USG_qkPH9kmXSM3AWqzMG-98v94agSAKVV_-dSjqCxleaR_gZ0jmEHxydQX3szkHhUT6H5NlWzZrfzRHwlB-1v9QHEz8yR_JKGEDdLuqGwEknsufrRgPfW86DZptEwgaqMlwtqmcVElcqBErqDKZULN4zUvuOOFoxpLpAXSaYcSuu0sHmBPFVJicLkzFqdsDOojSdjPAfCjajQGoFMKsexKo2ZLkzI1DBjrL6Aht-U0XTVE2O03o_LP77fwUF70O2MOi-91ys4dLRDekVfKq6htpgt8Qb2q6_F53x2G_7pN_O6n60
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC19%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=SparCML%3A+High-Performance+Sparse+Communication+for+Machine+Learning&rft.au=Renggli%2C+Cedric&rft.au=Ashkboos%2C+Saleh&rft.au=Aghagolzadeh%2C+Mehdi&rft.au=Alistarh%2C+Dan&rft.date=2019-11-17&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1145%2F3295500.3356222&rft.externalDocID=10902287