A Feasibility Study for MPI over HDFS
With the increasing prominence of integrating highperformance computing (HPC) with big-data (BIGDATA) processing, running MPI over the Hadoop Distributed File System (HDFS) offers a promising approach for delivering better scalability and fault tolerance to traditional HPC applications. However, it...
Saved in:
| Published in: | IEEE Conference on High Performance Extreme Computing (Online) pp. 1 - 7 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
22.09.2020
|
| ISSN: | 2643-1971 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | With the increasing prominence of integrating highperformance computing (HPC) with big-data (BIGDATA) processing, running MPI over the Hadoop Distributed File System (HDFS) offers a promising approach for delivering better scalability and fault tolerance to traditional HPC applications. However, it comes with challenges that discourage such an approach: (1) two-sided MPI communication to support intermediate data processing, (2) a focus on enabling N-1 writes that is subject to the default HDFS block-placement policy, and (3) a pipelined writing mode in HDFS that cannot fully utilize the underlying HPC hardware. So, while directly integrating MPI with HDFS may deliver better scalability and fault tolerance to MPI applications, it will fall short of delivering competitive performance. Consequently, we present a performance study to evaluate the feasibility of integrating MPI applications to run over HDFS. Specifically, we show that by aggregating and reordering intermediate data and coordinating computation and 110 when running MPI over HDFS, we can deliver up to 1.92x and 1.78x speedup over MPI I/O and HDFS pipelined-write implementations, respectively. Consequently, we present a performance study to evaluate the feasibility of integrating MPI applications to run over HDFS. Specifically, we show that by aggregating and reordering intermediate data and coordinating computation and 110 when running MPI over HDFS, we can deliver up to 1.92x and 1.78x speedup over MPI I/O and HDFS pipelined-write implementations, respectively. |
|---|---|
| AbstractList | With the increasing prominence of integrating highperformance computing (HPC) with big-data (BIGDATA) processing, running MPI over the Hadoop Distributed File System (HDFS) offers a promising approach for delivering better scalability and fault tolerance to traditional HPC applications. However, it comes with challenges that discourage such an approach: (1) two-sided MPI communication to support intermediate data processing, (2) a focus on enabling N-1 writes that is subject to the default HDFS block-placement policy, and (3) a pipelined writing mode in HDFS that cannot fully utilize the underlying HPC hardware. So, while directly integrating MPI with HDFS may deliver better scalability and fault tolerance to MPI applications, it will fall short of delivering competitive performance. Consequently, we present a performance study to evaluate the feasibility of integrating MPI applications to run over HDFS. Specifically, we show that by aggregating and reordering intermediate data and coordinating computation and 110 when running MPI over HDFS, we can deliver up to 1.92x and 1.78x speedup over MPI I/O and HDFS pipelined-write implementations, respectively. Consequently, we present a performance study to evaluate the feasibility of integrating MPI applications to run over HDFS. Specifically, we show that by aggregating and reordering intermediate data and coordinating computation and 110 when running MPI over HDFS, we can deliver up to 1.92x and 1.78x speedup over MPI I/O and HDFS pipelined-write implementations, respectively. |
| Author | Zhang, D. Hou, K. Wang, H. Pumma, S. Zhang, J. Feng, W. |
| Author_xml | – sequence: 1 givenname: W. surname: Feng fullname: Feng, W. email: wfeng@vt.edu organization: Virginia Tech Blacksburg,Department of Computer Science,VA,USA – sequence: 2 givenname: D. surname: Zhang fullname: Zhang, D. email: daz3@vt.edu organization: Virginia Tech Blacksburg,Department of Computer Science,VA,USA – sequence: 3 givenname: J. surname: Zhang fullname: Zhang, J. email: zjing14@vt.edu organization: Virginia Tech Blacksburg,Department of Computer Science,VA,USA – sequence: 4 givenname: K. surname: Hou fullname: Hou, K. email: kaixihou@vt.edu organization: Virginia Tech Blacksburg,Department of Computer Science,VA,USA – sequence: 5 givenname: S. surname: Pumma fullname: Pumma, S. email: sarunya@vt.edu organization: Virginia Tech Blacksburg,Department of Computer Science,VA,USA – sequence: 6 givenname: H. surname: Wang fullname: Wang, H. email: hwang121@vt.edu organization: Virginia Tech Blacksburg,Department of Computer Science,VA,USA |
| BookMark | eNotz8tKw0AYQOFRFGxrn0CQ2bhM_C-TuSxLbE2hYqG6LtNkBkZqI0kU8vZd2NXZfXCm4ubUnoIQjwg5IrjnarssFWujcgKC3JHVVMCVmKIhi47Q6WsxIa04Q2fwTsz7_gsAmAkM80Q8LeQq-D4d0jENo9wNv80oY9vJt-1atn-hk9XLancvbqM_9mF-6Ux8rpYfZZVt3l_X5WKTJQIeMvIFeq2Niz42EAF9YBsU2IIaz4wKQ7BNtI7r2nKjABTVrokHw9FRVDwTD_9uCiHsf7r07btxf7niMwOEQM0 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/HPEC43674.2020.9286250 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1728192196 9781728192192 |
| EISSN | 2643-1971 |
| EndPage | 7 |
| ExternalDocumentID | 9286250 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-i203t-2a51a6679fafd0f01ae38e40852da33141ee8df893cc83d40042c9dfb73f92f43 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000674720500111&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:32:56 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-2a51a6679fafd0f01ae38e40852da33141ee8df893cc83d40042c9dfb73f92f43 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_9286250 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-Sept.-22 |
| PublicationDateYYYYMMDD | 2020-09-22 |
| PublicationDate_xml | – month: 09 year: 2020 text: 2020-Sept.-22 day: 22 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE Conference on High Performance Extreme Computing (Online) |
| PublicationTitleAbbrev | HPEC |
| PublicationYear | 2020 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003320733 ssib058575392 |
| Score | 1.7356142 |
| Snippet | With the increasing prominence of integrating highperformance computing (HPC) with big-data (BIGDATA) processing, running MPI over the Hadoop Distributed File... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| Title | A Feasibility Study for MPI over HDFS |
| URI | https://ieeexplore.ieee.org/document/9286250 |
| WOSCitedRecordID | wos000674720500111&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEJ4g8eAJFYzv7EFvFtqZ0u0eDUIwUdLER7iRdh8JFzAIJvx7O2XBmHjxttnDZN8zO_PNfAA3VjL9jcaAyDkOM5ogT7lsZaEtGmOkq5jn3p_kaJSOxyqrwd0uF8ZaW4HPbJubVSzfzPWKXWUdhaX9zR_0PSnlJldre3a6zDRJPkLIrzARMh-hTwqOQtUZZv1eTIlkTwqGbS_sF6tKpVQGjf8N5xBaP9l5ItvpnSOo2dkxNLb0DMLf1ibc3ovSwPPw17VgxOBalDaqeM4eBSM3xfBh8NKCt0H_tTcMPCtCMMWQlgHm3ShPEqlc7kzowii3lFouVIYmJ4riyNrUuNIO0Tolw3cUtTKukOQUuphOoD6bz-wpCFKlIBXHGGoVK6cL01VYpKVSN0ksnT6DJs968rEpfDHxEz7_u_sCDnhhGUyBeAn15WJlr2Bffy2nn4vrare-ARr6kXM |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4QNNETKhjf9qA3F3bb7nZ7NAhZIpBNRMON7PaRcAGCYMK_t7MsGBMv3poemnb6mOnMN_MBPBiB9DeKeoxZi2FG7WUxlq3MlaFaa2EL5rmPvhgO4_FYphV42ufCGGMK8JlpYrOI5eu5WqOrrCWps7_xg34Qck6DbbbW7vSEyDXJyhghvsOMUWQkLNOCA1-2krTT5iwS6EuhfrMc7hevSqFWurX_TegEGj_5eSTda55TqJjZGdR2BA2kvK91eHwmzsQrAbAbgpjBDXFWKhmkPYLYTZK8dN8a8N7tjNqJV_IieFPqs5VHszDIokhIm1ntWz_IDIsNliqjOmMs4IExsbbOElEqZhpvKVVS21wwK6nl7Byqs_nMXABh0g0knTB9Jbm0KtehpHns1LqOuLDqEuq46sliW_piUi746u_uezhKRoP-pN8bvl7DMQoZoRWU3kB1tVybWzhUX6vp5_Ku2Llv442Uug |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=IEEE+Conference+on+High+Performance+Extreme+Computing+%28Online%29&rft.atitle=A+Feasibility+Study+for+MPI+over+HDFS&rft.au=Feng%2C+W.&rft.au=Zhang%2C+D.&rft.au=Zhang%2C+J.&rft.au=Hou%2C+K.&rft.date=2020-09-22&rft.pub=IEEE&rft.eissn=2643-1971&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FHPEC43674.2020.9286250&rft.externalDocID=9286250 |