Fast multiplication of random dense matrices with sparse matrices
This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation (on-the-fly random number generation) to accelerate this operatio...
Saved in:
| Published in: | Proceedings - IEEE International Parallel and Distributed Processing Symposium pp. 52 - 62 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
27.05.2024
|
| Subjects: | |
| ISSN: | 1530-2075 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation (on-the-fly random number generation) to accelerate this operation. The techniques we propose decrease memory movement, thereby increasing the algorithm's parallel scalability in shared memory architectures. On the Intel Frontera architecture, our algorithm can achieve 2x speedups over libraries such as Eigen and Intel MKL on some examples. In addition, with 32 threads, we can obtain a parallel efficiency of up to 45%.We also present a theoretical analysis for the memory movement lower bound of our algorithm, showing that under mild assumptions, it's possible to beat the data movement lower bound of general matrix-matrix multiply (GEMM) by a factor of M, where M is the cache size. Finally, we incorporate our sketching method into a randomized algorithm for overdetermined least squares with sparse data matrices. Our results are competitive with SuiteSparse for highly overdetermined problems; in some cases, we obtain a speedup of 10x over SuiteSparse. |
|---|---|
| AbstractList | This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation (on-the-fly random number generation) to accelerate this operation. The techniques we propose decrease memory movement, thereby increasing the algorithm's parallel scalability in shared memory architectures. On the Intel Frontera architecture, our algorithm can achieve 2x speedups over libraries such as Eigen and Intel MKL on some examples. In addition, with 32 threads, we can obtain a parallel efficiency of up to 45%.We also present a theoretical analysis for the memory movement lower bound of our algorithm, showing that under mild assumptions, it's possible to beat the data movement lower bound of general matrix-matrix multiply (GEMM) by a factor of M, where M is the cache size. Finally, we incorporate our sketching method into a randomized algorithm for overdetermined least squares with sparse data matrices. Our results are competitive with SuiteSparse for highly overdetermined problems; in some cases, we obtain a speedup of 10x over SuiteSparse. |
| Author | Buluc, Aydin Demmel, James Liang, Tianyu Murray, Riley |
| Author_xml | – sequence: 1 givenname: Tianyu surname: Liang fullname: Liang, Tianyu organization: UC Berkeley,Electrical Engineering and Computer Science Department – sequence: 2 givenname: Riley surname: Murray fullname: Murray, Riley organization: UC Berkeley,Electrical Engineering and Computer Science Department – sequence: 3 givenname: Aydin surname: Buluc fullname: Buluc, Aydin organization: Lawrence Berkeley National Lab,Computational Research Division – sequence: 4 givenname: James surname: Demmel fullname: Demmel, James organization: UC Berkeley,Electrical Engineering and Computer Science Department |
| BookMark | eNpNj81KAzEURqMo2Na-gUJeYMZ7k8kkWZZqtVCwoK7LzUyCkc4Pk4j49hZ04eqDszicb84u-qH3jN0ilIhg77b7-_2L0lapUoCoSgDA6owtrbZGKpBGI-pzNkMloRCg1RWbp_QBIEBWdsZWG0qZd5_HHMdjbCjHoedD4BP17dDx1vfJ847yFBuf-FfM7zyNNP2D1-wy0DH55d8u2Nvm4XX9VOyeH7fr1a6IAqpcOIeBjAmnSlQ1iQZ0aESNVqOV0koDQtYNWkPCS2cpmNa74FqDhozTlVywm19v9N4fxil2NH0fEE7n0Vr5A5HcTP4 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/IPDPS57955.2024.00014 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798350387117 |
| EISSN | 1530-2075 |
| EndPage | 62 |
| ExternalDocumentID | 10579199 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Science Foundation funderid: 10.13039/100000001 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-i204t-bb1fa88f202156a2c07fc2619719339380236c198a2e3b9af8debfbd818a8b743 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001270389600052&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:04:48 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i204t-bb1fa88f202156a2c07fc2619719339380236c198a2e3b9af8debfbd818a8b743 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_10579199 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-May-27 |
| PublicationDateYYYYMMDD | 2024-05-27 |
| PublicationDate_xml | – month: 05 year: 2024 text: 2024-May-27 day: 27 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings - IEEE International Parallel and Distributed Processing Symposium |
| PublicationTitleAbbrev | IPDPS |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0020349 |
| Score | 1.8711662 |
| Snippet | This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms.... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 52 |
| SubjectTerms | Distributed processing HPC Instruction sets Libraries Memory architecture Memory management Numerical Linear Algebra Scalability Sketching algorithm Sparse matrices |
| Title | Fast multiplication of random dense matrices with sparse matrices |
| URI | https://ieeexplore.ieee.org/document/10579199 |
| WOSCitedRecordID | wos001270389600052&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxcBUPor4lgfW0Nj5sD0ioIKligRI3SrbuZM6tEFNyu_Hl6alCwNbZEWKdHb87s5-7zF2j6GODesGI60NRMRljAzmGJUlgnZoEtQbswk1mejp1BQdWb3lwgBAe_kMHuixPcsvK7-mVtmIPGmNMKbHekqpDVlrV12R0EpH0RGxGb0Vz8V7eD3LQhEoSSI7JqbOnoVKiyDjwT-_fcyGv1w8XuxQ5oQdwPKUDbZmDLz7N8_Y49jWDe_uB3aNOF4hD1hUVgsetpca-KIV5IeaU_uVh81ktTc4ZJ_jl4-n16jzR4jmMk6byDmBVmuUhNu5lT5W6KkiUiErS0xC2m65F0ZbCYkzFnUJDl0ZMNpqF1KHc9ZfVku4YBwtZqh9nlI6mFjpvBKQCptB7stUyEs2pJDMvjYSGLNtNK7-GL9mRxR1OmaX6ob1m9Uabtmh_27m9equnbgfNcGajA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LSgMxFL1oFXRVHxXfZuF27CTzSpailhZrGbBCdyWvC13Ykc7U7zeZTms3LtwNgSGQ17n3JuccgHt0eaxbNxhwLmzguYyBwBQDY9ByhSJCvjKbyEYjPpmIvCGr11wYa239-Mw--M_6Lt8UeulLZV3vSSuoELuwl8Qxoyu61ia_8lIrDUmHhqI7yJ_zd_dDkrg0kHmR7NBzdbZMVGoM6bX_2fsRdH7ZeCTf4Mwx7Nj5CbTXdgyk2Z2n8NiTZUWaF4JNKY4USBwameKTuAOmtOSzluS3JfEFWOKOk8VWYwc-ei_jp37QOCQEMxbGVaAURck5Mo_cqWQ6zFD7nChzcVkkIq_ulmoquGQ2UkIiN1ahMg6lJVcueDiD1ryY23MgKDFBrtPYB4SRZEpn1MZUJjbVJqbsAjp-SKZfKxGM6Xo0Lv9ov4OD_vhtOB0ORq9XcOhnwF-6s-waWtViaW9gX39Xs3JxW0_iD2-qndM |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=Fast+multiplication+of+random+dense+matrices+with+sparse+matrices&rft.au=Liang%2C+Tianyu&rft.au=Murray%2C+Riley&rft.au=Buluc%2C+Aydin&rft.au=Demmel%2C+James&rft.date=2024-05-27&rft.pub=IEEE&rft.eissn=1530-2075&rft.spage=52&rft.epage=62&rft_id=info:doi/10.1109%2FIPDPS57955.2024.00014&rft.externalDocID=10579199 |