A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices

We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced pe...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) s. 908 - 919
Hlavní autoři: Sao, Piyush, Li, Xiaoye Sherry, Vuduc, Richard
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.05.2018
Témata:
ISSN:1530-2075
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication. We also analyze the asymptotic improvements for planar graphs (e.g., from 2D grid or mesh domains) and certain non-planar graphs (specifically for 3D grids and meshes). For planar graphs with n vertices, our algorithm reduces communication volume asymptotically in n by a factor of O{log n} and latency by a factor of O{log n}. For non-planar cases, our algorithm can reduce the per-process communication volume by 3× and latency by O{n^1/3} times. In all cases, the memory needed to achieve these gains is a constant factor. We implemented our algorithm by extending the 2D data structure used in superLU. Our new 3D code achieves speedups up to 27× for planar graphs and up to 3.3× for non-planar graphs over the baseline 2D superLU when run on 24,000 cores of a Cray XC30.
AbstractList We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication. We also analyze the asymptotic improvements for planar graphs (e.g., from 2D grid or mesh domains) and certain non-planar graphs (specifically for 3D grids and meshes). For planar graphs with n vertices, our algorithm reduces communication volume asymptotically in n by a factor of O{log n} and latency by a factor of O{log n}. For non-planar cases, our algorithm can reduce the per-process communication volume by 3× and latency by O{n^1/3} times. In all cases, the memory needed to achieve these gains is a constant factor. We implemented our algorithm by extending the 2D data structure used in superLU. Our new 3D code achieves speedups up to 27× for planar graphs and up to 3.3× for non-planar graphs over the baseline 2D superLU when run on 24,000 cores of a Cray XC30.
Author Vuduc, Richard
Sao, Piyush
Li, Xiaoye Sherry
Author_xml – sequence: 1
  givenname: Piyush
  surname: Sao
  fullname: Sao, Piyush
– sequence: 2
  givenname: Xiaoye Sherry
  surname: Li
  fullname: Li, Xiaoye Sherry
– sequence: 3
  givenname: Richard
  surname: Vuduc
  fullname: Vuduc, Richard
BookMark eNotjF1LwzAYRqMouM3dC97kD7S--U4uS-d0UnEwdz2yJJ2RtR1pFfTXW9SLw8PhgTNFF23XBoRuCOSEgLlbrRfrTU6B6ByAAJyhuVGaCKYlZ1LLczQZBTIKSlyhad-_A1Bg3EzQU4HLrmk-2ujsELs2Kz676GN7wGyBqy1eWjd0KX7_nrg4HkYZ3hpcdwlvTjb1AT_bIUUX-mt0WdtjH-b_O0Pb5f1r-ZhVLw-rsqiySIEOmWDSjXgIVoPSTJk9p8xa5bj3yilPQwDjDRgiHdG-DmCDFtJLqdReCzZDt3_dGELYnVJsbPraaU4F5Zz9AIKqTfI
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IPDPS.2018.00100
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEL
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781538643686
1538643685
EISSN 1530-2075
EndPage 919
ExternalDocumentID 8425244
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-i202t-536c536d0ea8078379b423aa7c4dd7c7d2ee09d90916c18dfe0ae856d6677b853
IEDL.DBID RIE
ISICitedReferencesCount 12
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000444710900090&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:47:55 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i202t-536c536d0ea8078379b423aa7c4dd7c7d2ee09d90916c18dfe0ae856d6677b853
PageCount 12
ParticipantIDs ieee_primary_8425244
PublicationCentury 2000
PublicationDate 2018-May
PublicationDateYYYYMMDD 2018-05-01
PublicationDate_xml – month: 05
  year: 2018
  text: 2018-May
PublicationDecade 2010
PublicationTitle 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
PublicationTitleAbbrev IPDPS
PublicationYear 2018
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020349
ssj0002684650
Score 1.7353839
Snippet We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm...
SourceID ieee
SourceType Publisher
StartPage 908
SubjectTerms communication avoiding algorithm
Matrices
nested dissection
Parallel processing
Particle separators
sparse direct solver
sparse gaussian elimination
Sparse matrices
Three-dimensional displays
Transmission line matrix methods
Two dimensional displays
Title A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices
URI https://ieeexplore.ieee.org/document/8425244
WOSCitedRecordID wos000444710900090&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4A8eAJFYzv9ODRStlHu3skIlGjZBMk4Ub6mCqJAoGF32-7LKiJFw9N2iZN2tlsZtqZ7_sArt18GKnAUC1jRSOFjCZWSWqxbVCImIehLsQmRL-fjEZpVoGbHRYGEYviM7z13SKXb2Z65Z_KWj5l5NxRFapCiA1Wa_ee4llLuM_QlZctz7uyTUuytPWYdbOBr-TypZNtj2f7IaZS-JJe_X-7OIDmNyiPZDt3cwgVnB5BfavKQMqftAFPHfIL9UE769nELyFhlzwPSa9Q2Cnhl6Tz8eYG-fsncdErGczdPRfJS8Hbj8smDHv3r3cPtFRMoJOABTmNQ65dMwyl55EPRapcuCSl0JExQgsTILLUpC5I4LqdGItMYhJzw7kQynnuY6hNZ1M8AdJWgY2tRZlyFYkgVLFlkmlp04Qpd9RTaHjTjOcbUoxxaZWzv6fPYd_bflMpeAG1fLHCS9jT63yyXFwVX_ILpoGfvA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4QTfSECsa3PXi00n21u0ciElAgJEDCjfQxVRIFAgu_33ZZUBMvHpq0TZr0kWamnfm-D6F72x-E0tdEiUiSUAIlsZGCGPA0cB6xIFCZ2ATvduPRKOkV0MMOCwMAWfIZPLpqFsvXM7VyX2VVFzKy5mgP7Udh6HsbtNbuR8XxljAXo8ufW455ZRuYpEm11av3-i6XyyVPeg7R9kNOJbMmjdL_5nGMKt-wPNzbGZwTVIDpKSptdRlwfk3L6KWGf-E-SG09m7ghOKjj9hA3Mo2dHICJax9vtpG-f2Lrv-L-3L50AXcy5n5YVtCw8Tx4apJcM4FMfOqnJAqYskVTEI5JPuCJtA6TEFyFWnPFtQ9AE51YN4EpL9YGqIA4YpoxzqW13WeoOJ1N4RxhT_omMgZEwmTI_UBGhgqqhEliKu1SL1DZbc14vqHFGOe7cvl39x06bA467XG71X29QkfuHDZ5g9eomC5WcIMO1DqdLBe32al-ASZ0owM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE+International+Parallel+and+Distributed+Processing+Symposium+%28IPDPS%29&rft.atitle=A+Communication-Avoiding+3D+LU+Factorization+Algorithm+for+Sparse+Matrices&rft.au=Sao%2C+Piyush&rft.au=Li%2C+Xiaoye+Sherry&rft.au=Vuduc%2C+Richard&rft.date=2018-05-01&rft.pub=IEEE&rft.eissn=1530-2075&rft.spage=908&rft.epage=919&rft_id=info:doi/10.1109%2FIPDPS.2018.00100&rft.externalDocID=8425244