Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication
Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of applications in many areas such as machine learning and graph algorithms. However, most previous work on parallel matrix multiplication considered only both dense or both sparse matrix operands. This...
Saved in:
| Published in: | Proceedings - IEEE International Parallel and Distributed Processing Symposium pp. 842 - 853 |
|---|---|
| Main Authors: | , , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.05.2016
|
| Subjects: | |
| ISSN: | 1530-2075 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of applications in many areas such as machine learning and graph algorithms. However, most previous work on parallel matrix multiplication considered only both dense or both sparse matrix operands. This paper analyzes the communication lower bounds and compares the communication costs of various classic parallel algorithms in the context of sparse-dense matrix-matrix multiplication. We also present new communication-avoiding algorithms based on a 1D decomposition, called 1.5D, which - while suboptimal in dense-dense and sparse-sparse cases - outperform the 2D and 3D variants both theoretically and in practice for sparse-dense multiplication. Our analysis separates one-time costs from per iteration costs in an iterative machine learning context. Experiments demonstrate speedups up to 100x over a baseline 3D SUMMA implementation and show parallel scaling over 10 thousand cores. |
|---|---|
| AbstractList | Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of applications in many areas such as machine learning and graph algorithms. However, most previous work on parallel matrix multiplication considered only both dense or both sparse matrix operands. This paper analyzes the communication lower bounds and compares the communication costs of various classic parallel algorithms in the context of sparse-dense matrix-matrix multiplication. We also present new communication-avoiding algorithms based on a 1D decomposition, called 1.5D, which - while suboptimal in dense-dense and sparse-sparse cases - outperform the 2D and 3D variants both theoretically and in practice for sparse-dense multiplication. Our analysis separates one-time costs from per iteration costs in an iterative machine learning context. Experiments demonstrate speedups up to 100x over a baseline 3D SUMMA implementation and show parallel scaling over 10 thousand cores. |
| Author | Azad, Ariful Buluc, Aydin Koanantakool, Penporn Sang-Yun Oh Morozov, Dmitriy Yelick, Katherine Oliker, Leonid |
| Author_xml | – sequence: 1 givenname: Penporn surname: Koanantakool fullname: Koanantakool, Penporn organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA – sequence: 2 givenname: Ariful surname: Azad fullname: Azad, Ariful organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA – sequence: 3 givenname: Aydin surname: Buluc fullname: Buluc, Aydin organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA – sequence: 4 givenname: Dmitriy surname: Morozov fullname: Morozov, Dmitriy organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA – sequence: 5 surname: Sang-Yun Oh fullname: Sang-Yun Oh organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA – sequence: 6 givenname: Leonid surname: Oliker fullname: Oliker, Leonid organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA – sequence: 7 givenname: Katherine surname: Yelick fullname: Yelick, Katherine organization: Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA |
| BookMark | eNotjslqwzAURVVIoUmabTfd-AeUvidZ0zI4HQwJNaRdB9mSi4o8YDul_fsGktXhLs7lLMis7VpPyAPCGhHMU15si8OaAcrzVjdkZZRGAQYYpiBnZI6CA2WgxB1ZjOM3AAOemjnJs65pTm2o7BS6lm5-uuBC-5UUdrAx-pgcejuMnm59O_pkb6ch_NILkv0pTqGPV_ee3NY2jn515ZJ8vjx_ZG909_6aZ5sdDSzVExXgSlOVKefgmK9V7ZwRWmu0ta1AOcbRgpBK4TleGGVNqdBDyaTklbOSL8nj5Td474_9EBo7_B2VQAka-T-PlU4_ |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/IPDPS.2016.117 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781509021406 150902140X |
| EndPage | 853 |
| ExternalDocumentID | 7516081 |
| Genre | orig-research |
| GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-i248t-50db9cb4330d2ef7fdd958881afac07d231a056771902597a9b71e0b2663cda63 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 32 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000391251800087&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1530-2075 |
| IngestDate | Wed Aug 27 02:11:36 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i248t-50db9cb4330d2ef7fdd958881afac07d231a056771902597a9b71e0b2663cda63 |
| OpenAccessLink | https://www.osti.gov/biblio/1769300 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_7516081 |
| PublicationCentury | 2000 |
| PublicationDate | 20160501 |
| PublicationDateYYYYMMDD | 2016-05-01 |
| PublicationDate_xml | – month: 05 year: 2016 text: 20160501 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings - IEEE International Parallel and Distributed Processing Symposium |
| PublicationTitleAbbrev | IPDPS |
| PublicationYear | 2016 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0020349 |
| Score | 1.8562605 |
| Snippet | Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of applications in many areas such as machine learning and... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 842 |
| SubjectTerms | Algorithm design and analysis communication-avoiding algorithms Covariance matrices linear algebra Machine learning algorithms Parallel algorithms Partitioning algorithms Sparse matrices sparse-dense matrix-matrix multiplication Three-dimensional displays |
| Title | Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication |
| URI | https://ieeexplore.ieee.org/document/7516081 |
| WOSCitedRecordID | wos000391251800087&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5q8eDJRyu-ycGjsXnsbjZHsRYLtixUpbeSbGahIG2pbfHnm-yulYIXTwmBEJghzPP7BuCWR6JQhbHUOJbSKEHudzKiiTEFpkVhWZlwe39Rw2E6HuusAXdbLAwils1neB-2ZS3fzfN1SJV1VMwTFnDWe0qpCqu1Da4Cz0rFjcq85lVcEzRypjv9rJuNQhdXEkqUO2NUSivSO_zf-0fQ_oXjkWxraI6hgbMTOPyZx0Dq79mC_g7agz5s5tNwgWRmGSamfJDRwoexSLs-dEUyCOT8X7RayKBqLKzvtuGt9_T6-EzrUQl0KqJ0RWPmrM5tJCVzAr30ndOxD265KUzOlPNenPGujlI8lBW1Mtoqjsx68yxzZxJ5Cs3ZfIZnQIQLqEr0foSQkUys5k4IIyW3xh_l6Tm0gmQmi4oNY1IL5eLv40s4CIKvWgSvoLlarvEa9vPNavq5vClV-A2aLpxz |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawIxEB7EFtqTbbX03Rx6bGpeu9k9llpRqrKgLd4k2WRBEBWr0p_fZHdrEXrpKSEwEGYIM5OZ7xuABypYJjOlsTIkwiK01O24wKFSmY2yTJP8w-2jJweDaDyOkwo87rAw1tq8-cw--W1eyzeLdOO_ypoyoCHxOOuDQAhGC7TWLr3yTCsFOypxtpdBSdFISdzsJq1k6Pu4Ql-k3BukkvuRdu1_NziBxi8gDyU7V3MKFTs_g9rPRAZUPtA6dPfwHvh5u5h6AZSolZ-ZMkPDpUtkLW655NWivqfn_8LFgvpFa2Ep24D39uvopYPLYQl4ykS0xgExOk614JwYZp3-jYkDl95SlamUSOPiOOWCHSmpLyzGUsVaUku0c9A8NSrk51CdL-b2AhAzHldpXSTBuOChjqlhTHFOtXJHaXQJda-ZybLgw5iUSrn6-_gejjqjfm_S6w7eruHYG6FoGLyB6nq1sbdwmG7X08_VXW7Ob-91n7o |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=Communication-Avoiding+Parallel+Sparse-Dense+Matrix-Matrix+Multiplication&rft.au=Koanantakool%2C+Penporn&rft.au=Azad%2C+Ariful&rft.au=Buluc%2C+Aydin&rft.au=Morozov%2C+Dmitriy&rft.date=2016-05-01&rft.pub=IEEE&rft.issn=1530-2075&rft.spage=842&rft.epage=853&rft_id=info:doi/10.1109%2FIPDPS.2016.117&rft.externalDocID=7516081 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-2075&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-2075&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-2075&client=summon |