Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation
Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is...
Uložené v:
| Vydané v: | 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops s. 1696 - 1702 |
|---|---|
| Hlavní autori: | , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.05.2012
|
| Predmet: | |
| ISBN: | 1467309745, 9781467309745 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme while making no assumptions about the matrix structure. In our test scenarios the pJDS format cuts the overall spMVM memory footprint on the GPGPU by up to 70%, and achieves 91% to 130% of the ELLPACK-R performance. Using a suitable performance model we identify performance bottlenecks on the node level that invalidate some types of matrix structures for efficient multi-GPGPU parallelization. For appropriate sparsity patterns we extend previous work on distributed-memory parallel spMVM to demonstrate a scalable hybrid MPI-GPGPU code, achieving efficient overlap of communication and computation. |
|---|---|
| AbstractList | Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme while making no assumptions about the matrix structure. In our test scenarios the pJDS format cuts the overall spMVM memory footprint on the GPGPU by up to 70%, and achieves 91% to 130% of the ELLPACK-R performance. Using a suitable performance model we identify performance bottlenecks on the node level that invalidate some types of matrix structures for efficient multi-GPGPU parallelization. For appropriate sparsity patterns we extend previous work on distributed-memory parallel spMVM to demonstrate a scalable hybrid MPI-GPGPU code, achieving efficient overlap of communication and computation. |
| Author | Hager, G. Wellein, G. Bishop, A. R. Fehske, H. Basermann, A. Kreutzer, M. |
| Author_xml | – sequence: 1 givenname: M. surname: Kreutzer fullname: Kreutzer, M. organization: Erlangen Regional Comput. Center, Erlangen, Germany – sequence: 2 givenname: G. surname: Hager fullname: Hager, G. organization: Erlangen Regional Comput. Center, Erlangen, Germany – sequence: 3 givenname: G. surname: Wellein fullname: Wellein, G. organization: Erlangen Regional Comput. Center, Erlangen, Germany – sequence: 4 givenname: H. surname: Fehske fullname: Fehske, H. organization: Ernst-Moritz-Arndt Univ. of Greifswald, Greifswald, Germany – sequence: 5 givenname: A. surname: Basermann fullname: Basermann, A. organization: Simulation & Software Technol., German Aerosp. Center (DLR), Cologne, Germany – sequence: 6 givenname: A. R. surname: Bishop fullname: Bishop, A. R. organization: Theor., Simulation & Comput. Directorate, Los Alamos Nat. Lab., Los Alamos, NM, USA |
| BookMark | eNotjEtLAzEURgMqaGvXLtzkD0zNvZMmU3el2lpodWAqLktm5kYimQeZ-Pr3FvXwwdl8nBE7bbuWGLsCMQUQ85tNfpcXL1MUgFMEOGEjkEqnYq7l7JxNhuFNHNEZIIoL5orehIH4zsTgvpIPqmIX-O7dR9d7V5noupYft87X-TNf-vchUhhu-YI_0icvjmfzSnzVhcZEbtqaG15UxpvSE980vaeG2vhbuWRn1viBJv8es_3qfr98SLZP681ysU0cSoiJBLAq03WGSCh1mZZgRY0qA61rS0ClQmnnBpCUmqVlZWs0WVVDZRWmKh2z67-sI6JDH1xjwvdBoRaZlOkPbJxXHg |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/IPDPSW.2012.211 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EndPage | 1702 |
| ExternalDocumentID | 6270844 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ADFMO ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK IERZE OCL RIB RIC RIE RIL |
| ID | FETCH-LOGICAL-i241t-411f687d822e247b3b1f0d268177dfe1eb624f9a12e6653bcfd2a8cd1cf62363 |
| IEDL.DBID | RIE |
| ISBN | 1467309745 9781467309745 |
| ISICitedReferencesCount | 41 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000309409400218&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 04:57:33 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i241t-411f687d822e247b3b1f0d268177dfe1eb624f9a12e6653bcfd2a8cd1cf62363 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_6270844 |
| PublicationCentury | 2000 |
| PublicationDate | 2012-05 |
| PublicationDateYYYYMMDD | 2012-05-01 |
| PublicationDate_xml | – month: 05 year: 2012 text: 2012-05 |
| PublicationDecade | 2010 |
| PublicationTitle | 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops |
| PublicationTitleAbbrev | ipdpsw |
| PublicationYear | 2012 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0000781220 |
| Score | 1.683904 |
| Snippet | Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1696 |
| SubjectTerms | Bandwidth Computational modeling CUDA Error correction codes GPGPU Instruction sets Kernel Sparse matrices Vectors |
| Title | Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation |
| URI | https://ieeexplore.ieee.org/document/6270844 |
| WOSCitedRecordID | wos000309409400218&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELXaioEJUIv4lgdG3MaOYydsqNDC0CpSiuhWObEtVYK06pf4-ZydUBhYkDIkVgbrnuU7n--9Q-g2TyLA1lKSaxYSnoSUxEpSYqiJaBhpzY31zSbkeBxPp0naQHd7Lowxxhefma579Xf5elFsXaqsJ5gMYs6bqCmlrLha-3yKE61hLPDcLQHLFuLk6FvS6fu7lvahQdJ7SR_T7M2VdrEuc-2DfvVW8a5lcPS_SR2jzg9HD6d773OCGqZso3m2hIOqwSMnvP9Jdj4lj0dV0WCdncPwDNNh-or771snk7C-xw8YNjucwc-wveCBj2OxKjVWOAMQHb0Kex3hj5qqVHbQZPA06T-TupkCmYOT3hBOqRWx1BAQGMZlHubUBpqJmEqpLUCTC8ZtoigzQkRhXljNVFxoWliIkER4ilrlojRnCAOUcIiSMB4JbnOdAKA8ASsoBfGhLM5R21lptqzkMma1gS7-Hr5Ehw6EqobwCrU2q625RgfFbjNfr248xl-z5KQ1 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZKQYIJUIt444GRtLHj2AkbKvQh2ipSiuhWJbEtVYK06kv8fM5OWhhYkDIkVgbrPst3Pt_3HUL3aegDtpo4qaSew0KPOEEiiKOI8onnS8mUts0mxHAYjMdhVEEPOy6MUsoWn6mGebV3-XKWrU2qrMmpcAPG9tC-zxglBVtrl1ExsjWUupa9xWHhQqTsb0Wdtt-luA9xw2Yveo7id1PcRRvUNBD61V3FOpf28f-mdYLqPyw9HO38zymqqLyGpvEcjqoKD4z0_pezsUl5PCjKBsv8HIanE3WiN9z6WBuhhOUjfsKw3eEYfoYNBrdtJIuTXOIExwCjIVhhqyT8WZKV8joatV9Gra5TtlNwpuCmVw4jRPNASAgJFGUi9VKiXUl5QISQGsBJOWU6TAhVnPtemmlJkyCTJNMQI3HvDFXzWa7OEQYw4RglYNznTKcyBEhZCFZIEogQRXaBasZKk3khmDEpDXT59_AdOuyOBv1Jvzd8vUJHBpCiovAaVVeLtbpBB9lmNV0ubi3e33Chp3w |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+26th+IEEE+International+Parallel+and+Distributed+Processing+Symposium+Workshops&rft.atitle=Sparse+Matrix-vector+Multiplication+on+GPGPU+Clusters%3A+A+New+Storage+Format+and+a+Scalable+Implementation&rft.au=Kreutzer%2C+M.&rft.au=Hager%2C+G.&rft.au=Wellein%2C+G.&rft.au=Fehske%2C+H.&rft.date=2012-05-01&rft.pub=IEEE&rft.isbn=9781467309745&rft.spage=1696&rft.epage=1702&rft_id=info:doi/10.1109%2FIPDPSW.2012.211&rft.externalDocID=6270844 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467309745/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467309745/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467309745/sc.gif&client=summon&freeimage=true |

