Fast multiplication of random dense matrices with sparse matrices

This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation (on-the-fly random number generation) to accelerate this operatio...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings - IEEE International Parallel and Distributed Processing Symposium s. 52 - 62
Hlavní autoři: Liang, Tianyu, Murray, Riley, Buluc, Aydin, Demmel, James
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 27.05.2024
Témata:
ISSN:1530-2075
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation (on-the-fly random number generation) to accelerate this operation. The techniques we propose decrease memory movement, thereby increasing the algorithm's parallel scalability in shared memory architectures. On the Intel Frontera architecture, our algorithm can achieve 2x speedups over libraries such as Eigen and Intel MKL on some examples. In addition, with 32 threads, we can obtain a parallel efficiency of up to 45%.We also present a theoretical analysis for the memory movement lower bound of our algorithm, showing that under mild assumptions, it's possible to beat the data movement lower bound of general matrix-matrix multiply (GEMM) by a factor of M, where M is the cache size. Finally, we incorporate our sketching method into a randomized algorithm for overdetermined least squares with sparse data matrices. Our results are competitive with SuiteSparse for highly overdetermined problems; in some cases, we obtain a speedup of 10x over SuiteSparse.
AbstractList This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation (on-the-fly random number generation) to accelerate this operation. The techniques we propose decrease memory movement, thereby increasing the algorithm's parallel scalability in shared memory architectures. On the Intel Frontera architecture, our algorithm can achieve 2x speedups over libraries such as Eigen and Intel MKL on some examples. In addition, with 32 threads, we can obtain a parallel efficiency of up to 45%.We also present a theoretical analysis for the memory movement lower bound of our algorithm, showing that under mild assumptions, it's possible to beat the data movement lower bound of general matrix-matrix multiply (GEMM) by a factor of M, where M is the cache size. Finally, we incorporate our sketching method into a randomized algorithm for overdetermined least squares with sparse data matrices. Our results are competitive with SuiteSparse for highly overdetermined problems; in some cases, we obtain a speedup of 10x over SuiteSparse.
Author Buluc, Aydin
Demmel, James
Liang, Tianyu
Murray, Riley
Author_xml – sequence: 1
  givenname: Tianyu
  surname: Liang
  fullname: Liang, Tianyu
  organization: UC Berkeley,Electrical Engineering and Computer Science Department
– sequence: 2
  givenname: Riley
  surname: Murray
  fullname: Murray, Riley
  organization: UC Berkeley,Electrical Engineering and Computer Science Department
– sequence: 3
  givenname: Aydin
  surname: Buluc
  fullname: Buluc, Aydin
  organization: Lawrence Berkeley National Lab,Computational Research Division
– sequence: 4
  givenname: James
  surname: Demmel
  fullname: Demmel, James
  organization: UC Berkeley,Electrical Engineering and Computer Science Department
BookMark eNpNj81KAzEURqMo2Na-gUJeYMZ7k8kkWZZqtVCwoK7LzUyCkc4Pk4j49hZ04eqDszicb84u-qH3jN0ilIhg77b7-_2L0lapUoCoSgDA6owtrbZGKpBGI-pzNkMloRCg1RWbp_QBIEBWdsZWG0qZd5_HHMdjbCjHoedD4BP17dDx1vfJ847yFBuf-FfM7zyNNP2D1-wy0DH55d8u2Nvm4XX9VOyeH7fr1a6IAqpcOIeBjAmnSlQ1iQZ0aESNVqOV0koDQtYNWkPCS2cpmNa74FqDhozTlVywm19v9N4fxil2NH0fEE7n0Vr5A5HcTP4
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IPDPS57955.2024.00014
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350387117
EISSN 1530-2075
EndPage 62
ExternalDocumentID 10579199
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  funderid: 10.13039/100000001
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-i204t-bb1fa88f202156a2c07fc2619719339380236c198a2e3b9af8debfbd818a8b743
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001270389600052&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:04:48 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-bb1fa88f202156a2c07fc2619719339380236c198a2e3b9af8debfbd818a8b743
PageCount 11
ParticipantIDs ieee_primary_10579199
PublicationCentury 2000
PublicationDate 2024-May-27
PublicationDateYYYYMMDD 2024-05-27
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-May-27
  day: 27
PublicationDecade 2020
PublicationTitle Proceedings - IEEE International Parallel and Distributed Processing Symposium
PublicationTitleAbbrev IPDPS
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020349
Score 1.8711662
Snippet This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms....
SourceID ieee
SourceType Publisher
StartPage 52
SubjectTerms Distributed processing
HPC
Instruction sets
Libraries
Memory architecture
Memory management
Numerical Linear Algebra
Scalability
Sketching algorithm
Sparse matrices
Title Fast multiplication of random dense matrices with sparse matrices
URI https://ieeexplore.ieee.org/document/10579199
WOSCitedRecordID wos001270389600052&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA1WPHiqHxW_ycHr2t0ku0mOohYFKQsq9FaSzQR6aFe6W3-_mXRbe_HgLYRAYELyMpO89wi5C2kPt9yKsJGUS4QCnigweaIF5NJ6KayLK_0mx2M1meiyI6tHLgwAxM9ncI_N-Jbv6mqFpbIhetLqTOse6Ukp12StbXaFQisdRSdL9fC1fCrfw_A8D0kgQ4nsFJk6OxYqEUFG_X_OfUQGv1w8Wm5R5pjsweKE9DdmDLTbm6fkYWSalnb_A7tCHK09DVjk6jkNx0sDdB4F-aGhWH6l4TBZ7nQOyOfo-ePxJen8EZIZS0WbWJt5o5RniNuFYVUqfYUZkQy3Mq45arsVVaaVYcCtNl45sN66gNFG2XB1OCP7i3oB54QKwR1nRjNTFEJLh-4bORfWgs-1r_ILMsCQTL_WEhjTTTQu_-i_IocYdXxmZ_Ka7LfLFdyQg-q7nTXL27hwP0fWmbM
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LSgMxFL1oFXRVHxXfZuF27EweM8lS1NJiLQNW6K4kkwS6sCOdqd9vkk5rNy7chUAI5JKc3JuccwDuXdpDFFHUbSSuI8oNibiRLBLUsEzZjCodIj3MRiM-mYi8IasHLowxJnw-Mw--Gd7ydVksfams6z1pRSLELuwxSnGyomtt8isvtdKQdJJYdAf5c_7uBjDm0kDsRbJjz9XZMlEJGNJr_3P2I-j8svFQvsGZY9gx8xNor-0YULM7T-GxJ6saNT8Em1IcKi1yaKTLT-QOmMqgzyDJbyrkC7DIHSeLrc4OfPRexk_9qHFIiGY4pnWkVGIl5xZ75E4lLuLMFj4nyty9jAji1d3SIhFcYkOUkJZro6zSDqUlV-7ycAateTk354AoJZpgKbBMUyoy7f03GKFKGcuELdgFdPySTL9WIhjT9Wpc_tF_Bwf98dtwOhyMXq_g0EfAP7rj7Bpa9WJpbmC_-K5n1eI2BPEHduuc-g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=Fast+multiplication+of+random+dense+matrices+with+sparse+matrices&rft.au=Liang%2C+Tianyu&rft.au=Murray%2C+Riley&rft.au=Buluc%2C+Aydin&rft.au=Demmel%2C+James&rft.date=2024-05-27&rft.pub=IEEE&rft.eissn=1530-2075&rft.spage=52&rft.epage=62&rft_id=info:doi/10.1109%2FIPDPS57955.2024.00014&rft.externalDocID=10579199