A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices

We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced pe...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) s. 908 - 919
Hlavní autoři:	Sao, Piyush, Li, Xiaoye Sherry, Vuduc, Richard
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.05.2018
Témata:	communication avoiding algorithm Matrices nested dissection Parallel processing Particle separators sparse direct solver sparse gaussian elimination Sparse matrices Three-dimensional displays Transmission line matrix methods Two dimensional displays
ISSN:	1530-2075
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication. We also analyze the asymptotic improvements for planar graphs (e.g., from 2D grid or mesh domains) and certain non-planar graphs (specifically for 3D grids and meshes). For planar graphs with n vertices, our algorithm reduces communication volume asymptotically in n by a factor of O{log n} and latency by a factor of O{log n}. For non-planar cases, our algorithm can reduce the per-process communication volume by 3× and latency by O{n^1/3} times. In all cases, the memory needed to achieve these gains is a constant factor. We implemented our algorithm by extending the 2D data structure used in superLU. Our new 3D code achieves speedups up to 27× for planar graphs and up to 3.3× for non-planar graphs over the baseline 2D superLU when run on 24,000 cores of a Cray XC30.
AbstractList	We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication. We also analyze the asymptotic improvements for planar graphs (e.g., from 2D grid or mesh domains) and certain non-planar graphs (specifically for 3D grids and meshes). For planar graphs with n vertices, our algorithm reduces communication volume asymptotically in n by a factor of O{log n} and latency by a factor of O{log n}. For non-planar cases, our algorithm can reduce the per-process communication volume by 3× and latency by O{n^1/3} times. In all cases, the memory needed to achieve these gains is a constant factor. We implemented our algorithm by extending the 2D data structure used in superLU. Our new 3D code achieves speedups up to 27× for planar graphs and up to 3.3× for non-planar graphs over the baseline 2D superLU when run on 24,000 cores of a Cray XC30.
Author	Vuduc, Richard Sao, Piyush Li, Xiaoye Sherry
Author_xml	– sequence: 1 givenname: Piyush surname: Sao fullname: Sao, Piyush – sequence: 2 givenname: Xiaoye Sherry surname: Li fullname: Li, Xiaoye Sherry – sequence: 3 givenname: Richard surname: Vuduc fullname: Vuduc, Richard
BookMark	eNotjF1LwzAYRqMouM3dC97kD7S--U4uS-d0UnEwdz2yJJ2RtR1pFfTXW9SLw8PhgTNFF23XBoRuCOSEgLlbrRfrTU6B6ByAAJyhuVGaCKYlZ1LLczQZBTIKSlyhad-_A1Bg3EzQU4HLrmk-2ujsELs2Kz676GN7wGyBqy1eWjd0KX7_nrg4HkYZ3hpcdwlvTjb1AT_bIUUX-mt0WdtjH-b_O0Pb5f1r-ZhVLw-rsqiySIEOmWDSjXgIVoPSTJk9p8xa5bj3yilPQwDjDRgiHdG-DmCDFtJLqdReCzZDt3_dGELYnVJsbPraaU4F5Zz9AIKqTfI
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/IPDPS.2018.00100
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEL url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9781538643686 1538643685
EISSN	1530-2075
EndPage	919
ExternalDocumentID	8425244
Genre	orig-research
GroupedDBID	29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL
ID	FETCH-LOGICAL-i202t-536c536d0ea8078379b423aa7c4dd7c7d2ee09d90916c18dfe0ae856d6677b853
IEDL.DBID	RIE
ISICitedReferencesCount	12
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000444710900090&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:47:55 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i202t-536c536d0ea8078379b423aa7c4dd7c7d2ee09d90916c18dfe0ae856d6677b853
PageCount	12
ParticipantIDs	ieee_primary_8425244
PublicationCentury	2000
PublicationDate	2018-May
PublicationDateYYYYMMDD	2018-05-01
PublicationDate_xml	– month: 05 year: 2018 text: 2018-May
PublicationDecade	2010
PublicationTitle	2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
PublicationTitleAbbrev	IPDPS
PublicationYear	2018
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0020349 ssj0002684650
Score	1.7353839
Snippet	We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm...
SourceID	ieee
SourceType	Publisher
StartPage	908
SubjectTerms	communication avoiding algorithm Matrices nested dissection Parallel processing Particle separators sparse direct solver sparse gaussian elimination Sparse matrices Three-dimensional displays Transmission line matrix methods Two dimensional displays
Title	A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices
URI	https://ieeexplore.ieee.org/document/8425244
WOSCitedRecordID	wos000444710900090&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4A8eAJFYzv9ODRStlHu3skIlGjZBMk4Ub6mCqJAoGF32-7LKiJFw9N2iZN2tlsZtqZ7_sArt18GKnAUC1jRSOFjCZWSWqxbVCImIehLsQmRL-fjEZpVoGbHRYGEYviM7z13SKXb2Z65Z_KWj5l5NxRFapCiA1Wa_ee4llLuM_QlZctz7uyTUuytPWYdbOBr-TypZNtj2f7IaZS-JJe_X-7OIDmNyiPZDt3cwgVnB5BfavKQMqftAFPHfIL9UE769nELyFhlzwPSa9Q2Cnhl6Tz8eYG-fsncdErGczdPRfJS8Hbj8smDHv3r3cPtFRMoJOABTmNQ65dMwyl55EPRapcuCSl0JExQgsTILLUpC5I4LqdGItMYhJzw7kQynnuY6hNZ1M8AdJWgY2tRZlyFYkgVLFlkmlp04Qpd9RTaHjTjOcbUoxxaZWzv6fPYd_bflMpeAG1fLHCS9jT63yyXFwVX_ILpoGfvA
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4QTfSECsa3PXi00n21u0ciElAgJEDCjfQxVRIFAgu_33ZZUBMvHpq0TZr0kWamnfm-D6F72x-E0tdEiUiSUAIlsZGCGPA0cB6xIFCZ2ATvduPRKOkV0MMOCwMAWfIZPLpqFsvXM7VyX2VVFzKy5mgP7Udh6HsbtNbuR8XxljAXo8ufW455ZRuYpEm11av3-i6XyyVPeg7R9kNOJbMmjdL_5nGMKt-wPNzbGZwTVIDpKSptdRlwfk3L6KWGf-E-SG09m7ghOKjj9hA3Mo2dHICJax9vtpG-f2Lrv-L-3L50AXcy5n5YVtCw8Tx4apJcM4FMfOqnJAqYskVTEI5JPuCJtA6TEFyFWnPFtQ9AE51YN4EpL9YGqIA4YpoxzqW13WeoOJ1N4RxhT_omMgZEwmTI_UBGhgqqhEliKu1SL1DZbc14vqHFGOe7cvl39x06bA467XG71X29QkfuHDZ5g9eomC5WcIMO1DqdLBe32al-ASZ0owM
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE+International+Parallel+and+Distributed+Processing+Symposium+%28IPDPS%29&rft.atitle=A+Communication-Avoiding+3D+LU+Factorization+Algorithm+for+Sparse+Matrices&rft.au=Sao%2C+Piyush&rft.au=Li%2C+Xiaoye+Sherry&rft.au=Vuduc%2C+Richard&rft.date=2018-05-01&rft.pub=IEEE&rft.eissn=1530-2075&rft.spage=908&rft.epage=919&rft_id=info:doi/10.1109%2FIPDPS.2018.00100&rft.externalDocID=8425244