Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation

Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops s. 1696 - 1702
Hlavní autori: Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Basermann, A., Bishop, A. R.
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.05.2012
Predmet:
ISBN:1467309745, 9781467309745
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme while making no assumptions about the matrix structure. In our test scenarios the pJDS format cuts the overall spMVM memory footprint on the GPGPU by up to 70%, and achieves 91% to 130% of the ELLPACK-R performance. Using a suitable performance model we identify performance bottlenecks on the node level that invalidate some types of matrix structures for efficient multi-GPGPU parallelization. For appropriate sparsity patterns we extend previous work on distributed-memory parallel spMVM to demonstrate a scalable hybrid MPI-GPGPU code, achieving efficient overlap of communication and computation.
AbstractList Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme while making no assumptions about the matrix structure. In our test scenarios the pJDS format cuts the overall spMVM memory footprint on the GPGPU by up to 70%, and achieves 91% to 130% of the ELLPACK-R performance. Using a suitable performance model we identify performance bottlenecks on the node level that invalidate some types of matrix structures for efficient multi-GPGPU parallelization. For appropriate sparsity patterns we extend previous work on distributed-memory parallel spMVM to demonstrate a scalable hybrid MPI-GPGPU code, achieving efficient overlap of communication and computation.
Author Hager, G.
Wellein, G.
Bishop, A. R.
Fehske, H.
Basermann, A.
Kreutzer, M.
Author_xml – sequence: 1
  givenname: M.
  surname: Kreutzer
  fullname: Kreutzer, M.
  organization: Erlangen Regional Comput. Center, Erlangen, Germany
– sequence: 2
  givenname: G.
  surname: Hager
  fullname: Hager, G.
  organization: Erlangen Regional Comput. Center, Erlangen, Germany
– sequence: 3
  givenname: G.
  surname: Wellein
  fullname: Wellein, G.
  organization: Erlangen Regional Comput. Center, Erlangen, Germany
– sequence: 4
  givenname: H.
  surname: Fehske
  fullname: Fehske, H.
  organization: Ernst-Moritz-Arndt Univ. of Greifswald, Greifswald, Germany
– sequence: 5
  givenname: A.
  surname: Basermann
  fullname: Basermann, A.
  organization: Simulation & Software Technol., German Aerosp. Center (DLR), Cologne, Germany
– sequence: 6
  givenname: A. R.
  surname: Bishop
  fullname: Bishop, A. R.
  organization: Theor., Simulation & Comput. Directorate, Los Alamos Nat. Lab., Los Alamos, NM, USA
BookMark eNotjEtLAzEURgMqaGvXLtzkD0zNvZMmU3el2lpodWAqLktm5kYimQeZ-Pr3FvXwwdl8nBE7bbuWGLsCMQUQ85tNfpcXL1MUgFMEOGEjkEqnYq7l7JxNhuFNHNEZIIoL5orehIH4zsTgvpIPqmIX-O7dR9d7V5noupYft87X-TNf-vchUhhu-YI_0icvjmfzSnzVhcZEbtqaG15UxpvSE980vaeG2vhbuWRn1viBJv8es_3qfr98SLZP681ysU0cSoiJBLAq03WGSCh1mZZgRY0qA61rS0ClQmnnBpCUmqVlZWs0WVVDZRWmKh2z67-sI6JDH1xjwvdBoRaZlOkPbJxXHg
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IPDPSW.2012.211
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EndPage 1702
ExternalDocumentID 6270844
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADFMO
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
IERZE
OCL
RIB
RIC
RIE
RIL
ID FETCH-LOGICAL-i241t-411f687d822e247b3b1f0d268177dfe1eb624f9a12e6653bcfd2a8cd1cf62363
IEDL.DBID RIE
ISBN 1467309745
9781467309745
ISICitedReferencesCount 41
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000309409400218&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 04:57:33 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-411f687d822e247b3b1f0d268177dfe1eb624f9a12e6653bcfd2a8cd1cf62363
PageCount 7
ParticipantIDs ieee_primary_6270844
PublicationCentury 2000
PublicationDate 2012-05
PublicationDateYYYYMMDD 2012-05-01
PublicationDate_xml – month: 05
  year: 2012
  text: 2012-05
PublicationDecade 2010
PublicationTitle 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops
PublicationTitleAbbrev ipdpsw
PublicationYear 2012
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000781220
Score 1.683904
Snippet Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of...
SourceID ieee
SourceType Publisher
StartPage 1696
SubjectTerms Bandwidth
Computational modeling
CUDA
Error correction codes
GPGPU
Instruction sets
Kernel
Sparse matrices
Vectors
Title Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation
URI https://ieeexplore.ieee.org/document/6270844
WOSCitedRecordID wos000309409400218&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELXaioEJUIv4lgdG3MaOYydsqNDC0CpSiuhWObEtVYK06pf4-ZydUBhYkDIkVgbrnuU7n--9Q-g2TyLA1lKSaxYSnoSUxEpSYqiJaBhpzY31zSbkeBxPp0naQHd7Lowxxhefma579Xf5elFsXaqsJ5gMYs6bqCmlrLha-3yKE61hLPDcLQHLFuLk6FvS6fu7lvahQdJ7SR_T7M2VdrEuc-2DfvVW8a5lcPS_SR2jzg9HD6d773OCGqZso3m2hIOqwSMnvP9Jdj4lj0dV0WCdncPwDNNh-or771snk7C-xw8YNjucwc-wveCBj2OxKjVWOAMQHb0Kex3hj5qqVHbQZPA06T-TupkCmYOT3hBOqRWx1BAQGMZlHubUBpqJmEqpLUCTC8ZtoigzQkRhXljNVFxoWliIkER4ilrlojRnCAOUcIiSMB4JbnOdAKA8ASsoBfGhLM5R21lptqzkMma1gS7-Hr5Ehw6EqobwCrU2q625RgfFbjNfr248xl-z5KQ1
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZKQYIJUIt444GRtLHj2AkbKvQh2ipSiuhWJbEtVYK06kv8fM5OWhhYkDIkVgbrPst3Pt_3HUL3aegDtpo4qaSew0KPOEEiiKOI8onnS8mUts0mxHAYjMdhVEEPOy6MUsoWn6mGebV3-XKWrU2qrMmpcAPG9tC-zxglBVtrl1ExsjWUupa9xWHhQqTsb0Wdtt-luA9xw2Yveo7id1PcRRvUNBD61V3FOpf28f-mdYLqPyw9HO38zymqqLyGpvEcjqoKD4z0_pezsUl5PCjKBsv8HIanE3WiN9z6WBuhhOUjfsKw3eEYfoYNBrdtJIuTXOIExwCjIVhhqyT8WZKV8joatV9Gra5TtlNwpuCmVw4jRPNASAgJFGUi9VKiXUl5QISQGsBJOWU6TAhVnPtemmlJkyCTJNMQI3HvDFXzWa7OEQYw4RglYNznTKcyBEhZCFZIEogQRXaBasZKk3khmDEpDXT59_AdOuyOBv1Jvzd8vUJHBpCiovAaVVeLtbpBB9lmNV0ubi3e33Chp3w
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+26th+IEEE+International+Parallel+and+Distributed+Processing+Symposium+Workshops&rft.atitle=Sparse+Matrix-vector+Multiplication+on+GPGPU+Clusters%3A+A+New+Storage+Format+and+a+Scalable+Implementation&rft.au=Kreutzer%2C+M.&rft.au=Hager%2C+G.&rft.au=Wellein%2C+G.&rft.au=Fehske%2C+H.&rft.date=2012-05-01&rft.pub=IEEE&rft.isbn=9781467309745&rft.spage=1696&rft.epage=1702&rft_id=info:doi/10.1109%2FIPDPSW.2012.211&rft.externalDocID=6270844
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467309745/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467309745/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467309745/sc.gif&client=summon&freeimage=true