SparCML: High-Performance Sparse Communication for Machine Learning
Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a gl...
Gespeichert in:
| Veröffentlicht in: | SC19: International Conference for High Performance Computing, Networking, Storage and Analysis S. 1 - 15 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
ACM
17.11.2019
|
| Schlagworte: | |
| ISSN: | 2167-4337 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads. We observe that frequently, many gradient values are (close to) zero, leading to sparse of sparsifyable communications. To exploit this insight, we analyze, design, and implement a set of communication-efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute arbitrary sparse input data vectors. Our generic communication library, Sparcml 1 1 Stands for Sparse Communication layer for Machine Learning, to be read as sparse ML., extends MPI to support additional features, such as non-blocking (asynchronous) operations and low-precision data representations. As such, Sparcml and its techniques will form the basis of future highly-scalable machine learning frameworks. |
|---|---|
| AbstractList | Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads. We observe that frequently, many gradient values are (close to) zero, leading to sparse of sparsifyable communications. To exploit this insight, we analyze, design, and implement a set of communication-efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute arbitrary sparse input data vectors. Our generic communication library, Sparcml 1 1 Stands for Sparse Communication layer for Machine Learning, to be read as sparse ML., extends MPI to support additional features, such as non-blocking (asynchronous) operations and low-precision data representations. As such, Sparcml and its techniques will form the basis of future highly-scalable machine learning frameworks. |
| Author | Renggli, Cedric Ashkboos, Saleh Alistarh, Dan Hoefler, Torsten Aghagolzadeh, Mehdi |
| Author_xml | – sequence: 1 givenname: Cedric surname: Renggli fullname: Renggli, Cedric organization: ETH Zurich – sequence: 2 givenname: Saleh surname: Ashkboos fullname: Ashkboos, Saleh organization: IST Austria – sequence: 3 givenname: Mehdi surname: Aghagolzadeh fullname: Aghagolzadeh, Mehdi organization: Microsoft – sequence: 4 givenname: Dan surname: Alistarh fullname: Alistarh, Dan organization: IST Austria – sequence: 5 givenname: Torsten surname: Hoefler fullname: Hoefler, Torsten organization: ETH Zurich |
| BookMark | eNotzkFLw0AQBeBVFKw1Zy8e9g-kzu7sZjfeJFgrpCio5zJJZ9sVsylJPfjvjejp8fjg8S7FWeoTC3GtYKGUsbeoS2sBFoi20FqfiKx0fgLAqZZwKmZaFS43iO5CZOP4AQCowSgPM1G9Hmio1vWdXMXdPn_hIfRDR6ll-Ssjy6rvuq8UWzrGPslJ5ZrafUwsa6YhxbS7EueBPkfO_nMu3pcPb9Uqr58fn6r7Oidt_DEPGBjI03SLjfclKWe2ZaOUNY1DbIxjU4Am9KpRRbAFG0Xg2W0thtAAzsXN325k5s1hiB0N3xsFJWjtHf4ARAhLOw |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3295500.3356222 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781450362290 145036229X |
| EISSN | 2167-4337 |
| EndPage | 15 |
| ExternalDocumentID | 10902287 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: European Research Council funderid: 10.13039/501100000781 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-a248t-f3fe0a8a362e4889a174d9b1154b733b47e4602a381b16f56e41a08e7d53ffb03 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 70 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000545976800011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Mar 12 06:17:07 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a248t-f3fe0a8a362e4889a174d9b1154b733b47e4602a381b16f56e41a08e7d53ffb03 |
| PageCount | 15 |
| ParticipantIDs | ieee_primary_10902287 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-Nov.-17 |
| PublicationDateYYYYMMDD | 2019-11-17 |
| PublicationDate_xml | – month: 11 year: 2019 text: 2019-Nov.-17 day: 17 |
| PublicationDecade | 2010 |
| PublicationTitle | SC19: International Conference for High Performance Computing, Networking, Storage and Analysis |
| PublicationTitleAbbrev | SC |
| PublicationYear | 2019 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0003204180 ssj0002871321 |
| Score | 2.104348 |
| Snippet | Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Distributed databases High performance computing Industries Libraries Machine learning Machine learning algorithms Protocols Scalability Sparse AllGather Sparse AllReduce Sparse Input Vectors Vectors |
| Title | SparCML: High-Performance Sparse Communication for Machine Learning |
| URI | https://ieeexplore.ieee.org/document/10902287 |
| WOSCitedRecordID | wos000545976800011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED3RioGpfBTxLQ-spknsxDFrRMVAq0iA1K2y4zNiaas25fdju24LAwNL5NiLZct6z7579wDum9zHljClSkhBecMt1aVA91EF00JLG80mxHhcTiayjmL1oIVBxJB8hg--GWL5Zt6s_VPZICQROorfgY4QxUastXtQ8dSfRSzy_yxLeFomsZxPyvMBy6Tj4-6ayhzoe7fcH34qAU6GvX9O5Bj6e2EeqXeQcwIHODuF3taZgcSDegbV60Itq9HLI_F5HLTeqwOIH1kh-SUMIW6UjEJaJZJYcfWjD-_Dp7fqmUa7BKoyXrbUMouJKpWDJHTHUip32TBS-3o7WjCmuUBeJJlyGK3TwuYF8lQlJQqTM2t1ws6hO5vP8AIIN6JBawQyqRzDajRmujAhTsOMsfoS-n5RpotNRYzpdj2u_ui_hiNHNKTX8KXiBrrtco23cNh8tZ-r5V3Yx2-ekJxm |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED1BQYKpfBTxjQdWQxI7ccxaURXRVpUoUrfKjs-Ipa36we_Hdt0WBgaWyIkXy5H13vnu3QO4r3KfW8KUKiEF5RW3VJcC3UMVTAstbTSbEL1eORzKfhSrBy0MIobiM3zww5DLN5Nq6a_KHkMRoaP4u7CXcxf4rORamysVT_5ZRCP_zrKEp2USG_qkPH9kmXSM3AWqzMG-98v94agSAKVV_-dSjqCxleaR_gZ0jmEHxydQX3szkHhUT6H5NlWzZrfzRHwlB-1v9QHEz8yR_JKGEDdLuqGwEknsufrRgPfW86DZptEwgaqMlwtqmcVElcqBErqDKZULN4zUvuOOFoxpLpAXSaYcSuu0sHmBPFVJicLkzFqdsDOojSdjPAfCjajQGoFMKsexKo2ZLkzI1DBjrL6Aht-U0XTVE2O03o_LP77fwUF70O2MOi-91ys4dLRDekVfKq6htpgt8Qb2q6_F53x2G_7pN_O6n60 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC19%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=SparCML%3A+High-Performance+Sparse+Communication+for+Machine+Learning&rft.au=Renggli%2C+Cedric&rft.au=Ashkboos%2C+Saleh&rft.au=Aghagolzadeh%2C+Mehdi&rft.au=Alistarh%2C+Dan&rft.date=2019-11-17&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1145%2F3295500.3356222&rft.externalDocID=10902287 |