A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems
This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds on the algorithmic approach of SuperLU_DIST, which is right-looking and statically pivoted. Our contribution is a novel algorithm, called the...
Uložené v:
| Vydané v: | Proceedings - IEEE International Parallel and Distributed Processing Symposium s. 71 - 81 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.05.2015
|
| Predmet: | |
| ISSN: | 1530-2075 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds on the algorithmic approach of SuperLU_DIST, which is right-looking and statically pivoted. Our contribution is a novel algorithm, called the HALO. The name is shorthand for highly asynchronous lazy offload, it refers tithe way the algorithm combines highly aggressive use of asynchrony with accelerated offload, lazy updates, and data shadowing (a la halo or ghost zones), all of which serve to hide and reduce communication, whether to local memory, across the network, or over PCIe. We further augment HALO with a model-driven autotuning heuristicthat chooses the intra-node division of labor among CPU and Xeon Pico-processor components. When integrated into SuperLU_DIST and evaluated on a variety of realistic test problems in both single-node and multi-node configurations, the resulting implementation achieves speedups of unto 2.5× over an already efficient multicourse CPU implementation, and achieves up to 83% of a machine-specific upper-bound that we haveestimated. Our analysis quantifies how well our implementation performs and allows us to speculate on the potential speedups that might come from variety of future improvements to the algorithm and system. |
|---|---|
| AbstractList | This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds on the algorithmic approach of SuperLU_DIST, which is right-looking and statically pivoted. Our contribution is a novel algorithm, called the HALO. The name is shorthand for highly asynchronous lazy offload, it refers tithe way the algorithm combines highly aggressive use of asynchrony with accelerated offload, lazy updates, and data shadowing (a la halo or ghost zones), all of which serve to hide and reduce communication, whether to local memory, across the network, or over PCIe. We further augment HALO with a model-driven autotuning heuristicthat chooses the intra-node division of labor among CPU and Xeon Pico-processor components. When integrated into SuperLU_DIST and evaluated on a variety of realistic test problems in both single-node and multi-node configurations, the resulting implementation achieves speedups of unto 2.5× over an already efficient multicourse CPU implementation, and achieves up to 83% of a machine-specific upper-bound that we haveestimated. Our analysis quantifies how well our implementation performs and allows us to speculate on the potential speedups that might come from variety of future improvements to the algorithm and system. |
| Author | Sao, Piyush Vuduc, Richard Xiaoye Li Xing Liu |
| Author_xml | – sequence: 1 givenname: Piyush surname: Sao fullname: Sao, Piyush email: piyush3@gatech.edu organization: Georgia Inst. of Technol., Atlanta, GA, USA – sequence: 2 surname: Xing Liu fullname: Xing Liu email: xliu@us.ibm.com – sequence: 3 givenname: Richard surname: Vuduc fullname: Vuduc, Richard email: richie@gatech.edu organization: Georgia Inst. of Technol., Atlanta, GA, USA – sequence: 4 surname: Xiaoye Li fullname: Xiaoye Li email: xsli@lbl.gov |
| BookMark | eNotjMtKAzEUQCNUsK3dunGTH5iam8ljshxaH5WKA6PgruRxBwNtpySj0L93RFcHzoEzI5Njf0RCboAtAZi52zTrpl1yBnIJTFyQGQhtTKWEURMyBVmygjMtr8gi5-gYV3pMXE3Jc03bk00Z6Tom9ANt-_03Jtr1aTR5SNF9DRjoCx76dKYf2B9p8xmL2nvcY7K_rT3nAQ_5mlx2dp9x8c85eX-4f1s9FdvXx82q3haRCxiKsuuC8miCAwhGVIIxz4MNHjvolJUBlHS-cqUUnFccjdTcjEJr70qOrpyT279vRMTdKcWDTeedBgXC6PIHoWBOyw |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/IPDPS.2015.104 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1479986496 9781479986491 |
| EndPage | 81 |
| ExternalDocumentID | 7161497 |
| Genre | orig-research |
| GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-i241t-3ffd6ce9db11d948400c2dadcef1f6a5d165bc8b3542282e95729bc877cb32eb3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 13 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000380545200008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1530-2075 |
| IngestDate | Wed Aug 27 01:42:25 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i241t-3ffd6ce9db11d948400c2dadcef1f6a5d165bc8b3542282e95729bc877cb32eb3 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_7161497 |
| PublicationCentury | 2000 |
| PublicationDate | 20150501 |
| PublicationDateYYYYMMDD | 2015-05-01 |
| PublicationDate_xml | – month: 05 year: 2015 text: 20150501 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings - IEEE International Parallel and Distributed Processing Symposium |
| PublicationTitleAbbrev | IPDPS |
| PublicationYear | 2015 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib026764926 ssj0020349 |
| Score | 1.6317196 |
| Snippet | This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 71 |
| SubjectTerms | Acceleration Communication-avoiding algorithm GPU Graphics processing units Heterogeneous computing Memory management Microwave integrated circuits MPI Multicore processing OpenMP Parallel processing Sparse Direct Solver Sparse matrices Xeon-Phi acceleration |
| Title | A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems |
| URI | https://ieeexplore.ieee.org/document/7161497 |
| WOSCitedRecordID | wos000380545200008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB7a4sFT1Va0PtiDR9fmtbvJsahFBUugCr2VfWIvSelD8N87m6QVwYu3MKcw2c18s_vN9wHcJGEgjYw4NQl3NEmcoIp7fxOXGpcmOmVMVWYTYjJJZ7Msb8HtfhbGWluRz-ydf6zu8k2pt_6obIjYHgG9aENbCFHPau3WTsQF99p3-2bL667UWqkBrgTBGsHGMMiGz_lDPvWsLuZvOH_ZqlRVZdz93_scQf9nPI_k-8JzDC1bnEB3589Amu3ag5cRmS6xcbWk_rGRael50ARxKkbWtdeVNeTVs22_yMyWBck_FnSkNRYjryFhSKNo3of38ePb_RNtvBPoAmvyhsbOGa5tZlQYmizBNi7QkZFGWxc6LpkJOVM6VTHzGmCRzRiibAwIoVUcYYd9Cp2iLOwZEIMQRQqByAHhHYuddE4qqV0oU8Olis6h51MzX9byGPMmK4O_wxdw6DNfcwYvobNZbe0VHOjPzWK9uq6-6Tcs66D- |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JT8JAFH5BNNETKhh35-DRSpdZ2iNRCSiQJmDCjcwaubSExcR_70xbMCZevDXv1Mz2vjfzve8DuMeBzxUPqacwNR7GhnmCOn8TEysTYxkTIgqzCTYaxdNpktbgYdcLo7UuyGf60X0Wb_kqlxt3Vda22N4CerYH-wTjMCi7tbarJ6SMOvW7XbnllFdKtVTfrgVGKsnGwE_a_fQ5HTteF3FvnL-MVYq80m3874-OofXToIfSXeo5gZrOTqGxdWhA1YZtwmsHjRe2dNWoPNrQOHdMaGSRqo2sSrcrrdDQ8W2_0FTnGUo_5l5HSpuOnIqEQpWmeQveuy-Tp55XuSd4c5uV115kjKJSJ0oEgUqwLeR8GSqupDaBoZyogBIhYxERpwIW6oRYnG0DjEkRhbbGPoN6lmf6HJCyIIUzZrGDBXgkMtwYLrg0AY8V5SK8gKYbmtmiFMiYVaNy-Xf4Dg57k-FgNuiP3q7gyM1CySC8hvp6udE3cCA_1_PV8raY32-9k6RF |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=A+Sparse+Direct+Solver+for+Distributed+Memory+Xeon+Phi-Accelerated+Systems&rft.au=Sao%2C+Piyush&rft.au=Xing+Liu&rft.au=Vuduc%2C+Richard&rft.au=Xiaoye+Li&rft.date=2015-05-01&rft.pub=IEEE&rft.issn=1530-2075&rft.spage=71&rft.epage=81&rft_id=info:doi/10.1109%2FIPDPS.2015.104&rft.externalDocID=7161497 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-2075&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-2075&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-2075&client=summon |