A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices
We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced pe...
Uloženo v:
| Vydáno v: | 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) s. 908 - 919 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
01.05.2018
|
| Témata: | |
| ISSN: | 1530-2075 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication. We also analyze the asymptotic improvements for planar graphs (e.g., from 2D grid or mesh domains) and certain non-planar graphs (specifically for 3D grids and meshes). For planar graphs with n vertices, our algorithm reduces communication volume asymptotically in n by a factor of O{log n} and latency by a factor of O{log n}. For non-planar cases, our algorithm can reduce the per-process communication volume by 3× and latency by O{n^1/3} times. In all cases, the memory needed to achieve these gains is a constant factor. We implemented our algorithm by extending the 2D data structure used in superLU. Our new 3D code achieves speedups up to 27× for planar graphs and up to 3.3× for non-planar graphs over the baseline 2D superLU when run on 24,000 cores of a Cray XC30. |
|---|---|
| AbstractList | We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication. We also analyze the asymptotic improvements for planar graphs (e.g., from 2D grid or mesh domains) and certain non-planar graphs (specifically for 3D grids and meshes). For planar graphs with n vertices, our algorithm reduces communication volume asymptotically in n by a factor of O{log n} and latency by a factor of O{log n}. For non-planar cases, our algorithm can reduce the per-process communication volume by 3× and latency by O{n^1/3} times. In all cases, the memory needed to achieve these gains is a constant factor. We implemented our algorithm by extending the 2D data structure used in superLU. Our new 3D code achieves speedups up to 27× for planar graphs and up to 3.3× for non-planar graphs over the baseline 2D superLU when run on 24,000 cores of a Cray XC30. |
| Author | Vuduc, Richard Sao, Piyush Li, Xiaoye Sherry |
| Author_xml | – sequence: 1 givenname: Piyush surname: Sao fullname: Sao, Piyush – sequence: 2 givenname: Xiaoye Sherry surname: Li fullname: Li, Xiaoye Sherry – sequence: 3 givenname: Richard surname: Vuduc fullname: Vuduc, Richard |
| BookMark | eNotjF1LwzAYRqMouM3dC97kD7S--U4uS-d0UnEwdz2yJJ2RtR1pFfTXW9SLw8PhgTNFF23XBoRuCOSEgLlbrRfrTU6B6ByAAJyhuVGaCKYlZ1LLczQZBTIKSlyhad-_A1Bg3EzQU4HLrmk-2ujsELs2Kz676GN7wGyBqy1eWjd0KX7_nrg4HkYZ3hpcdwlvTjb1AT_bIUUX-mt0WdtjH-b_O0Pb5f1r-ZhVLw-rsqiySIEOmWDSjXgIVoPSTJk9p8xa5bj3yilPQwDjDRgiHdG-DmCDFtJLqdReCzZDt3_dGELYnVJsbPraaU4F5Zz9AIKqTfI |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/IPDPS.2018.00100 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEL url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781538643686 1538643685 |
| EISSN | 1530-2075 |
| EndPage | 919 |
| ExternalDocumentID | 8425244 |
| Genre | orig-research |
| GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-i202t-536c536d0ea8078379b423aa7c4dd7c7d2ee09d90916c18dfe0ae856d6677b853 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 12 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000444710900090&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:47:55 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i202t-536c536d0ea8078379b423aa7c4dd7c7d2ee09d90916c18dfe0ae856d6677b853 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_8425244 |
| PublicationCentury | 2000 |
| PublicationDate | 2018-May |
| PublicationDateYYYYMMDD | 2018-05-01 |
| PublicationDate_xml | – month: 05 year: 2018 text: 2018-May |
| PublicationDecade | 2010 |
| PublicationTitle | 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) |
| PublicationTitleAbbrev | IPDPS |
| PublicationYear | 2018 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0020349 ssj0002684650 |
| Score | 1.7353839 |
| Snippet | We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 908 |
| SubjectTerms | communication avoiding algorithm Matrices nested dissection Parallel processing Particle separators sparse direct solver sparse gaussian elimination Sparse matrices Three-dimensional displays Transmission line matrix methods Two dimensional displays |
| Title | A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices |
| URI | https://ieeexplore.ieee.org/document/8425244 |
| WOSCitedRecordID | wos000444710900090&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4A8eAJFYzv9ODRStlHu3skIlGjZBMk4Ub6mCqJAoGF32-7LKiJFw9N2iZN2tlsZtqZ7_sArt18GKnAUC1jRSOFjCZWSWqxbVCImIehLsQmRL-fjEZpVoGbHRYGEYviM7z13SKXb2Z65Z_KWj5l5NxRFapCiA1Wa_ee4llLuM_QlZctz7uyTUuytPWYdbOBr-TypZNtj2f7IaZS-JJe_X-7OIDmNyiPZDt3cwgVnB5BfavKQMqftAFPHfIL9UE769nELyFhlzwPSa9Q2Cnhl6Tz8eYG-fsncdErGczdPRfJS8Hbj8smDHv3r3cPtFRMoJOABTmNQ65dMwyl55EPRapcuCSl0JExQgsTILLUpC5I4LqdGItMYhJzw7kQynnuY6hNZ1M8AdJWgY2tRZlyFYkgVLFlkmlp04Qpd9RTaHjTjOcbUoxxaZWzv6fPYd_bflMpeAG1fLHCS9jT63yyXFwVX_ILpoGfvA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4QTfSECsa3PXi00n21u0ciElAgJEDCjfQxVRIFAgu_33ZZUBMvHpq0TZr0kWamnfm-D6F72x-E0tdEiUiSUAIlsZGCGPA0cB6xIFCZ2ATvduPRKOkV0MMOCwMAWfIZPLpqFsvXM7VyX2VVFzKy5mgP7Udh6HsbtNbuR8XxljAXo8ufW455ZRuYpEm11av3-i6XyyVPeg7R9kNOJbMmjdL_5nGMKt-wPNzbGZwTVIDpKSptdRlwfk3L6KWGf-E-SG09m7ghOKjj9hA3Mo2dHICJax9vtpG-f2Lrv-L-3L50AXcy5n5YVtCw8Tx4apJcM4FMfOqnJAqYskVTEI5JPuCJtA6TEFyFWnPFtQ9AE51YN4EpL9YGqIA4YpoxzqW13WeoOJ1N4RxhT_omMgZEwmTI_UBGhgqqhEliKu1SL1DZbc14vqHFGOe7cvl39x06bA467XG71X29QkfuHDZ5g9eomC5WcIMO1DqdLBe32al-ASZ0owM |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE+International+Parallel+and+Distributed+Processing+Symposium+%28IPDPS%29&rft.atitle=A+Communication-Avoiding+3D+LU+Factorization+Algorithm+for+Sparse+Matrices&rft.au=Sao%2C+Piyush&rft.au=Li%2C+Xiaoye+Sherry&rft.au=Vuduc%2C+Richard&rft.date=2018-05-01&rft.pub=IEEE&rft.eissn=1530-2075&rft.spage=908&rft.epage=919&rft_id=info:doi/10.1109%2FIPDPS.2018.00100&rft.externalDocID=8425244 |