A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems

This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds on the algorithmic approach of SuperLU_DIST, which is right-looking and statically pivoted. Our contribution is a novel algorithm, called the...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Proceedings - IEEE International Parallel and Distributed Processing Symposium s. 71 - 81
Hlavní autori:	Sao, Piyush, Xing Liu, Vuduc, Richard, Xiaoye Li
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 01.05.2015
Predmet:	Acceleration Communication-avoiding algorithm GPU Graphics processing units Heterogeneous computing Memory management Microwave integrated circuits MPI Multicore processing OpenMP Parallel processing Sparse Direct Solver Sparse matrices Xeon-Phi acceleration
ISSN:	1530-2075
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Abstract	This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds on the algorithmic approach of SuperLU_DIST, which is right-looking and statically pivoted. Our contribution is a novel algorithm, called the HALO. The name is shorthand for highly asynchronous lazy offload, it refers tithe way the algorithm combines highly aggressive use of asynchrony with accelerated offload, lazy updates, and data shadowing (a la halo or ghost zones), all of which serve to hide and reduce communication, whether to local memory, across the network, or over PCIe. We further augment HALO with a model-driven autotuning heuristicthat chooses the intra-node division of labor among CPU and Xeon Pico-processor components. When integrated into SuperLU_DIST and evaluated on a variety of realistic test problems in both single-node and multi-node configurations, the resulting implementation achieves speedups of unto 2.5× over an already efficient multicourse CPU implementation, and achieves up to 83% of a machine-specific upper-bound that we haveestimated. Our analysis quantifies how well our implementation performs and allows us to speculate on the potential speedups that might come from variety of future improvements to the algorithm and system.
AbstractList	This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds on the algorithmic approach of SuperLU_DIST, which is right-looking and statically pivoted. Our contribution is a novel algorithm, called the HALO. The name is shorthand for highly asynchronous lazy offload, it refers tithe way the algorithm combines highly aggressive use of asynchrony with accelerated offload, lazy updates, and data shadowing (a la halo or ghost zones), all of which serve to hide and reduce communication, whether to local memory, across the network, or over PCIe. We further augment HALO with a model-driven autotuning heuristicthat chooses the intra-node division of labor among CPU and Xeon Pico-processor components. When integrated into SuperLU_DIST and evaluated on a variety of realistic test problems in both single-node and multi-node configurations, the resulting implementation achieves speedups of unto 2.5× over an already efficient multicourse CPU implementation, and achieves up to 83% of a machine-specific upper-bound that we haveestimated. Our analysis quantifies how well our implementation performs and allows us to speculate on the potential speedups that might come from variety of future improvements to the algorithm and system.
Author	Sao, Piyush Vuduc, Richard Xiaoye Li Xing Liu
Author_xml	– sequence: 1 givenname: Piyush surname: Sao fullname: Sao, Piyush email: piyush3@gatech.edu organization: Georgia Inst. of Technol., Atlanta, GA, USA – sequence: 2 surname: Xing Liu fullname: Xing Liu email: xliu@us.ibm.com – sequence: 3 givenname: Richard surname: Vuduc fullname: Vuduc, Richard email: richie@gatech.edu organization: Georgia Inst. of Technol., Atlanta, GA, USA – sequence: 4 surname: Xiaoye Li fullname: Xiaoye Li email: xsli@lbl.gov
BookMark	eNotjMtKAzEUQCNUsK3dunGTH5iam8ljshxaH5WKA6PgruRxBwNtpySj0L93RFcHzoEzI5Njf0RCboAtAZi52zTrpl1yBnIJTFyQGQhtTKWEURMyBVmygjMtr8gi5-gYV3pMXE3Jc03bk00Z6Tom9ANt-_03Jtr1aTR5SNF9DRjoCx76dKYf2B9p8xmL2nvcY7K_rT3nAQ_5mlx2dp9x8c85eX-4f1s9FdvXx82q3haRCxiKsuuC8miCAwhGVIIxz4MNHjvolJUBlHS-cqUUnFccjdTcjEJr70qOrpyT279vRMTdKcWDTeedBgXC6PIHoWBOyw
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/IPDPS.2015.104
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	1479986496 9781479986491
EndPage	81
ExternalDocumentID	7161497
Genre	orig-research
GroupedDBID	29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL
ID	FETCH-LOGICAL-i241t-3ffd6ce9db11d948400c2dadcef1f6a5d165bc8b3542282e95729bc877cb32eb3
IEDL.DBID	RIE
ISICitedReferencesCount	13
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000380545200008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	1530-2075
IngestDate	Wed Aug 27 01:42:25 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i241t-3ffd6ce9db11d948400c2dadcef1f6a5d165bc8b3542282e95729bc877cb32eb3
PageCount	11
ParticipantIDs	ieee_primary_7161497
PublicationCentury	2000
PublicationDate	20150501
PublicationDateYYYYMMDD	2015-05-01
PublicationDate_xml	– month: 05 year: 2015 text: 20150501 day: 01
PublicationDecade	2010
PublicationTitle	Proceedings - IEEE International Parallel and Distributed Processing Symposium
PublicationTitleAbbrev	IPDPS
PublicationYear	2015
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssib026764926 ssj0020349
Score	1.6317196
Snippet	This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds...
SourceID	ieee
SourceType	Publisher
StartPage	71
SubjectTerms	Acceleration Communication-avoiding algorithm GPU Graphics processing units Heterogeneous computing Memory management Microwave integrated circuits MPI Multicore processing OpenMP Parallel processing Sparse Direct Solver Sparse matrices Xeon-Phi acceleration
Title	A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems
URI	https://ieeexplore.ieee.org/document/7161497
WOSCitedRecordID	wos000380545200008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB7a4sFT1Va0PtiDR9fmtbvJsahFBUugCr2VfWIvSelD8N87m6QVwYu3MKcw2c18s_vN9wHcJGEgjYw4NQl3NEmcoIp7fxOXGpcmOmVMVWYTYjJJZ7Msb8HtfhbGWluRz-ydf6zu8k2pt_6obIjYHgG9aENbCFHPau3WTsQF99p3-2bL667UWqkBrgTBGsHGMMiGz_lDPvWsLuZvOH_ZqlRVZdz93_scQf9nPI_k-8JzDC1bnEB3589Amu3ag5cRmS6xcbWk_rGRael50ARxKkbWtdeVNeTVs22_yMyWBck_FnSkNRYjryFhSKNo3of38ePb_RNtvBPoAmvyhsbOGa5tZlQYmizBNi7QkZFGWxc6LpkJOVM6VTHzGmCRzRiibAwIoVUcYYd9Cp2iLOwZEIMQRQqByAHhHYuddE4qqV0oU8Olis6h51MzX9byGPMmK4O_wxdw6DNfcwYvobNZbe0VHOjPzWK9uq6-6Tcs66D-
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JT8JAFH5BNNETKhh35-DRSpdZ2iNRCSiQJmDCjcwaubSExcR_70xbMCZevDXv1Mz2vjfzve8DuMeBzxUPqacwNR7GhnmCOn8TEysTYxkTIgqzCTYaxdNpktbgYdcLo7UuyGf60X0Wb_kqlxt3Vda22N4CerYH-wTjMCi7tbarJ6SMOvW7XbnllFdKtVTfrgVGKsnGwE_a_fQ5HTteF3FvnL-MVYq80m3874-OofXToIfSXeo5gZrOTqGxdWhA1YZtwmsHjRe2dNWoPNrQOHdMaGSRqo2sSrcrrdDQ8W2_0FTnGUo_5l5HSpuOnIqEQpWmeQveuy-Tp55XuSd4c5uV115kjKJSJ0oEgUqwLeR8GSqupDaBoZyogBIhYxERpwIW6oRYnG0DjEkRhbbGPoN6lmf6HJCyIIUzZrGDBXgkMtwYLrg0AY8V5SK8gKYbmtmiFMiYVaNy-Xf4Dg57k-FgNuiP3q7gyM1CySC8hvp6udE3cCA_1_PV8raY32-9k6RF
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=A+Sparse+Direct+Solver+for+Distributed+Memory+Xeon+Phi-Accelerated+Systems&rft.au=Sao%2C+Piyush&rft.au=Xing+Liu&rft.au=Vuduc%2C+Richard&rft.au=Xiaoye+Li&rft.date=2015-05-01&rft.pub=IEEE&rft.issn=1530-2075&rft.spage=71&rft.epage=81&rft_id=info:doi/10.1109%2FIPDPS.2015.104&rft.externalDocID=7161497
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-2075&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-2075&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-2075&client=summon