A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems

This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds on the algorithmic approach of SuperLU_DIST, which is right-looking and statically pivoted. Our contribution is a novel algorithm, called the...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings - IEEE International Parallel and Distributed Processing Symposium s. 71 - 81
Hlavní autori: Sao, Piyush, Xing Liu, Vuduc, Richard, Xiaoye Li
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.05.2015
Predmet:
ISSN:1530-2075
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds on the algorithmic approach of SuperLU_DIST, which is right-looking and statically pivoted. Our contribution is a novel algorithm, called the HALO. The name is shorthand for highly asynchronous lazy offload, it refers tithe way the algorithm combines highly aggressive use of asynchrony with accelerated offload, lazy updates, and data shadowing (a la halo or ghost zones), all of which serve to hide and reduce communication, whether to local memory, across the network, or over PCIe. We further augment HALO with a model-driven autotuning heuristicthat chooses the intra-node division of labor among CPU and Xeon Pico-processor components. When integrated into SuperLU_DIST and evaluated on a variety of realistic test problems in both single-node and multi-node configurations, the resulting implementation achieves speedups of unto 2.5× over an already efficient multicourse CPU implementation, and achieves up to 83% of a machine-specific upper-bound that we haveestimated. Our analysis quantifies how well our implementation performs and allows us to speculate on the potential speedups that might come from variety of future improvements to the algorithm and system.
AbstractList This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds on the algorithmic approach of SuperLU_DIST, which is right-looking and statically pivoted. Our contribution is a novel algorithm, called the HALO. The name is shorthand for highly asynchronous lazy offload, it refers tithe way the algorithm combines highly aggressive use of asynchrony with accelerated offload, lazy updates, and data shadowing (a la halo or ghost zones), all of which serve to hide and reduce communication, whether to local memory, across the network, or over PCIe. We further augment HALO with a model-driven autotuning heuristicthat chooses the intra-node division of labor among CPU and Xeon Pico-processor components. When integrated into SuperLU_DIST and evaluated on a variety of realistic test problems in both single-node and multi-node configurations, the resulting implementation achieves speedups of unto 2.5× over an already efficient multicourse CPU implementation, and achieves up to 83% of a machine-specific upper-bound that we haveestimated. Our analysis quantifies how well our implementation performs and allows us to speculate on the potential speedups that might come from variety of future improvements to the algorithm and system.
Author Sao, Piyush
Vuduc, Richard
Xiaoye Li
Xing Liu
Author_xml – sequence: 1
  givenname: Piyush
  surname: Sao
  fullname: Sao, Piyush
  email: piyush3@gatech.edu
  organization: Georgia Inst. of Technol., Atlanta, GA, USA
– sequence: 2
  surname: Xing Liu
  fullname: Xing Liu
  email: xliu@us.ibm.com
– sequence: 3
  givenname: Richard
  surname: Vuduc
  fullname: Vuduc, Richard
  email: richie@gatech.edu
  organization: Georgia Inst. of Technol., Atlanta, GA, USA
– sequence: 4
  surname: Xiaoye Li
  fullname: Xiaoye Li
  email: xsli@lbl.gov
BookMark eNotjMtKAzEUQCNUsK3dunGTH5iam8ljshxaH5WKA6PgruRxBwNtpySj0L93RFcHzoEzI5Njf0RCboAtAZi52zTrpl1yBnIJTFyQGQhtTKWEURMyBVmygjMtr8gi5-gYV3pMXE3Jc03bk00Z6Tom9ANt-_03Jtr1aTR5SNF9DRjoCx76dKYf2B9p8xmL2nvcY7K_rT3nAQ_5mlx2dp9x8c85eX-4f1s9FdvXx82q3haRCxiKsuuC8miCAwhGVIIxz4MNHjvolJUBlHS-cqUUnFccjdTcjEJr70qOrpyT279vRMTdKcWDTeedBgXC6PIHoWBOyw
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IPDPS.2015.104
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1479986496
9781479986491
EndPage 81
ExternalDocumentID 7161497
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-i241t-3ffd6ce9db11d948400c2dadcef1f6a5d165bc8b3542282e95729bc877cb32eb3
IEDL.DBID RIE
ISICitedReferencesCount 13
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000380545200008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1530-2075
IngestDate Wed Aug 27 01:42:25 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-3ffd6ce9db11d948400c2dadcef1f6a5d165bc8b3542282e95729bc877cb32eb3
PageCount 11
ParticipantIDs ieee_primary_7161497
PublicationCentury 2000
PublicationDate 20150501
PublicationDateYYYYMMDD 2015-05-01
PublicationDate_xml – month: 05
  year: 2015
  text: 20150501
  day: 01
PublicationDecade 2010
PublicationTitle Proceedings - IEEE International Parallel and Distributed Processing Symposium
PublicationTitleAbbrev IPDPS
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib026764926
ssj0020349
Score 1.6317196
Snippet This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds...
SourceID ieee
SourceType Publisher
StartPage 71
SubjectTerms Acceleration
Communication-avoiding algorithm
GPU
Graphics processing units
Heterogeneous computing
Memory management
Microwave integrated circuits
MPI
Multicore processing
OpenMP
Parallel processing
Sparse Direct Solver
Sparse matrices
Xeon-Phi acceleration
Title A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems
URI https://ieeexplore.ieee.org/document/7161497
WOSCitedRecordID wos000380545200008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB7a4sFT1Va0PtiDR9fmtbvJsahFBUugCr2VfWIvSelD8N87m6QVwYu3MKcw2c18s_vN9wHcJGEgjYw4NQl3NEmcoIp7fxOXGpcmOmVMVWYTYjJJZ7Msb8HtfhbGWluRz-ydf6zu8k2pt_6obIjYHgG9aENbCFHPau3WTsQF99p3-2bL667UWqkBrgTBGsHGMMiGz_lDPvWsLuZvOH_ZqlRVZdz93_scQf9nPI_k-8JzDC1bnEB3589Amu3ag5cRmS6xcbWk_rGRael50ARxKkbWtdeVNeTVs22_yMyWBck_FnSkNRYjryFhSKNo3of38ePb_RNtvBPoAmvyhsbOGa5tZlQYmizBNi7QkZFGWxc6LpkJOVM6VTHzGmCRzRiibAwIoVUcYYd9Cp2iLOwZEIMQRQqByAHhHYuddE4qqV0oU8Olis6h51MzX9byGPMmK4O_wxdw6DNfcwYvobNZbe0VHOjPzWK9uq6-6Tcs66D-
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JT8JAFH5BNNETKhh35-DRSpdZ2iNRCSiQJmDCjcwaubSExcR_70xbMCZevDXv1Mz2vjfzve8DuMeBzxUPqacwNR7GhnmCOn8TEysTYxkTIgqzCTYaxdNpktbgYdcLo7UuyGf60X0Wb_kqlxt3Vda22N4CerYH-wTjMCi7tbarJ6SMOvW7XbnllFdKtVTfrgVGKsnGwE_a_fQ5HTteF3FvnL-MVYq80m3874-OofXToIfSXeo5gZrOTqGxdWhA1YZtwmsHjRe2dNWoPNrQOHdMaGSRqo2sSrcrrdDQ8W2_0FTnGUo_5l5HSpuOnIqEQpWmeQveuy-Tp55XuSd4c5uV115kjKJSJ0oEgUqwLeR8GSqupDaBoZyogBIhYxERpwIW6oRYnG0DjEkRhbbGPoN6lmf6HJCyIIUzZrGDBXgkMtwYLrg0AY8V5SK8gKYbmtmiFMiYVaNy-Xf4Dg57k-FgNuiP3q7gyM1CySC8hvp6udE3cCA_1_PV8raY32-9k6RF
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=A+Sparse+Direct+Solver+for+Distributed+Memory+Xeon+Phi-Accelerated+Systems&rft.au=Sao%2C+Piyush&rft.au=Xing+Liu&rft.au=Vuduc%2C+Richard&rft.au=Xiaoye+Li&rft.date=2015-05-01&rft.pub=IEEE&rft.issn=1530-2075&rft.spage=71&rft.epage=81&rft_id=info:doi/10.1109%2FIPDPS.2015.104&rft.externalDocID=7161497
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-2075&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-2075&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-2075&client=summon