PGX.D a fast distributed graph processing engine
Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we pr...
Saved in:
| Published in: | Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis pp. 1 - 12 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
New York, NY, USA
ACM
15.11.2015
|
| Series: | ACM Conferences |
| Subjects: | |
| ISBN: | 1450337236, 9781450337236 |
| ISSN: | 2167-4337 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we present a fast distributed graph processing system, namely PGX.D. We show that PGX.D outperforms other distributed graph systems like GraphLab significantly (3x -- 90x). Furthermore, PGX.D on 4 to 16 machines is also faster than an implementation optimized for single-machine execution. Using a fast cooperative context-switching mechanism, we implement PGX.D as a low-overhead, bandwidth-efficient communication framework that supports remote data-pulling patterns. Moreover, PGX.D achieves large traffic reduction and good workload balance by applying selective ghost nodes, edge partitioning, and edge chunking transparently to the user. Our analysis confirms that each of these features is indeed crucial for overall performance of certain kinds of graph algorithms. Finally, we advocate the use of balanced beefy clusters where the sustained random DRAM-access bandwidth in aggregate is matched with the bandwidth of the underlying interconnection fabric. |
|---|---|
| AbstractList | Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we present a fast distributed graph processing system, namely PGX.D. We show that PGX.D outperforms other distributed graph systems like GraphLab significantly (3x - 90x). Furthermore, PGX.D on 4 to 16 machines is also faster than an implementation optimized for single-machine execution. Using a fast cooperative context-switching mechanism, we implement PGX.D as a low-overhead, bandwidth-efficient communication framework that supports remote data-pulling patterns. Moreover, PGX.D achieves large traffic reduction and good workload balance by applying selective ghost nodes, edge partitioning, and edge chunking transparently to the user. Our analysis confirms that each of these features is indeed crucial for overall performance of certain kinds of graph algorithms. Finally, we advocate the use of balanced beefy clusters where the sustained random DRAM-access bandwidth in aggregate is matched with the bandwidth of the underlying interconnection fabric. |
| Author | Depner, Siegfried Van Der Lugt, Jan Manhardt, Thomas Verstraaten, Merijn Hong, Sungpack Chafi, Hassan |
| Author_xml | – sequence: 1 givenname: Sungpack surname: Hong fullname: Hong, Sungpack email: sungpack.hong@oracle.com organization: Oracle Labs – sequence: 2 givenname: Siegfried surname: Depner fullname: Depner, Siegfried email: siegfried.depner@oracle.com organization: Oracle Labs – sequence: 3 givenname: Thomas surname: Manhardt fullname: Manhardt, Thomas email: thomas.manhardt@oracle.com organization: Oracle Labs – sequence: 4 givenname: Jan surname: Van Der Lugt fullname: Van Der Lugt, Jan email: janlugt@gmail.com organization: Google – sequence: 5 givenname: Merijn surname: Verstraaten fullname: Verstraaten, Merijn email: M.E.Verstraaten@uva.nl organization: Univ. Amsterdam, Netherlands – sequence: 6 givenname: Hassan surname: Chafi fullname: Chafi, Hassan email: hassan.chafi@oracle.com organization: Oracle Labs |
| BookMark | eNqNj01Lw0AQhsePgm3t2YN_wEvSndnPHKVqFQr1oOBt2e3OQtQ2knjx35vQHDwKAy8zD_PCM4PzQ3NggCsUJaLSS3LC6grLIQ2JE5j1VyGlJWlOYUpobKH69ewvuIBF170LIbAyFaKZwuR5_VbeXcIkh8-OF2PO4fXh_mX1WGy266fV7aYIpOx3YUWIEjlGZJdylDZoJVKm5IIjnbFyKZrAJgbronKc0GraGSJ2yNlkOYfrY2_NzP6rrfeh_fHWSeqnp-WRht3ex6b56DwKP9j60daPtj62NQ91N_98kL_0fE9R |
| ContentType | Conference Proceeding |
| Copyright | 2015 ACM |
| Copyright_xml | – notice: 2015 ACM |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/2807591.2807620 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1450337236 9781450337236 |
| EISSN | 2167-4337 |
| EndPage | 12 |
| ExternalDocumentID | 7832832 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR ACM ADPZR ALMA_UNASSIGNED_HOLDINGS APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK GUFHI IEGSK IERZE OCL RIB RIC RIE RIL 6IH AAWTH ABLEC ADZIZ CHZPO IPLJI |
| ID | FETCH-LOGICAL-a247t-70ab31ebb1e8dfb37a540df2d8a825f198db6ae6ba78b48ed1752c622e81ef6f3 |
| IEDL.DBID | RIE |
| ISBN | 1450337236 9781450337236 |
| ISICitedReferencesCount | 40 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000382162500059&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:01:31 EDT 2025 Wed Jan 31 06:51:33 EST 2024 Wed Jan 31 06:51:02 EST 2024 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| License | Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org |
| LinkModel | DirectLink |
| MeetingName | SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis |
| MergedId | FETCHMERGED-LOGICAL-a247t-70ab31ebb1e8dfb37a540df2d8a825f198db6ae6ba78b48ed1752c622e81ef6f3 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_7832832 acm_books_10_1145_2807591_2807620 acm_books_10_1145_2807591_2807620_brief |
| PublicationCentury | 2000 |
| PublicationDate | 20151115 2015-November |
| PublicationDateYYYYMMDD | 2015-11-15 2015-11-01 |
| PublicationDate_xml | – month: 11 year: 2015 text: 20151115 day: 15 |
| PublicationDecade | 2010 |
| PublicationPlace | New York, NY, USA |
| PublicationPlace_xml | – name: New York, NY, USA |
| PublicationSeriesTitle | ACM Conferences |
| PublicationTitle | Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis |
| PublicationTitleAbbrev | SC |
| PublicationYear | 2015 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0001969116 ssj0003204180 |
| Score | 1.8885924 |
| Snippet | Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed... |
| SourceID | ieee acm |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Algorithm design and analysis Bandwidth Clustering algorithms Computational modeling Computer systems organization -- Dependable and fault-tolerant systems and networks Data models General and reference -- Cross-computing tools and techniques -- Performance Kernel Mathematics of computing -- Discrete mathematics -- Graph theory -- Graph algorithms Networks -- Network performance evaluation Programming Theory of computation -- Models of computation -- Concurrency Theory of computation -- Models of computation -- Concurrency -- Parallel computing models |
| Subtitle | a fast distributed graph processing engine |
| Title | PGX.D |
| URI | https://ieeexplore.ieee.org/document/7832832 |
| WOSCitedRecordID | wos000382162500059&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwGP3YhgdPUzdx_iKC4MVuS9ImqVd1epCxg8puJWm-wA5uY-v8-03auiEIIhT6g1LKI-F7X9r3HsB1jKHoWRNZzX2DIkLMC9cu4p7u5g61dGVm5PuLHI_VdJpOGnC71cIgYvnzGfbDYfkt3y7yTVgqG0g__PzWhKaUotJq7dZTUuHnrdieczaMqRrWbj40TgbB9yVJfU_o9yLkezd1_vEjVKWsKaP2_97mALo7cR6ZbMvOITRwfgTt73QGUk_WDtDJ07T_cEc0cXpdEBscckO4FVpSulSTZaUR8A8hWLoSduFt9Ph6_xzVAQmRZrEsIjnUhlM0hqKyznCpPf-yjlmlfePnaKqsERqF0VKZWKH1XIHlgjFUFJ1w_Bha88UcT4BQRuOcMW11auMk54bnqVCa2dg3HEkie3Dl0coC819nlZg5yWpEsxrRHtz8eU9mVjN0PegEPLNl5aiR1VCe_n75DPY9TUkqBeA5tIrVBi9gL_8sZuvVZTkMvgAL8avS |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwGP3QKejTvOK8RhB8sdpcmqa-qlNxjj1M2VtJmi-wB6fs4u83aeuGIIhQ6IVSyiHhO1_acw7AmcBQ9KyJrOa-QZEh5oVrF3FPdwuHOnVlZuRrJ-121WCQ9ZbgYq6FQcTy5zO8DIflt3z7XszCUtlV6oef35ZhJRGCxZVaa7Gikkk_c-X8nLNYUBXXfj5UJFfB-SXJfFfo9zIkfC_r4u1HrEpZVdrN_73PBuws5HmkNy88m7CEoy1ofuczkHq6bgPt3Q8ub6-JJk5PpsQGj9wQb4WWlD7V5KNSCfiHECx9CXfgpX3Xv3mI6oiESDORTqM01oZTNIaiss7wVHsGZh2zSvvWz9FMWSM1SqNTZYRC69kCKyRjqCg66fguNEbvI9wDQhkVBWPa6syKpOCGF5lUmlnhW44kSVtw6tHKA_ef5JWcOclrRPMa0Rac_3lPbsZDdC3YDnjmH5WnRl5Duf_75RNYe-g_d_LOY_fpANY9aUkqPeAhNKbjGR7BavE5HU7Gx-WQ-AIxia8Z |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=PGX.D&rft.au=Hong%2C+Sungpack&rft.au=Depner%2C+Siegfried&rft.au=Manhardt%2C+Thomas&rft.au=Van+Der+Lugt%2C+Jan&rft.series=ACM+Conferences&rft.date=2015-11-15&rft.pub=ACM&rft.isbn=1450337236&rft.spage=1&rft.epage=12&rft_id=info:doi/10.1145%2F2807591.2807620 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450337236/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450337236/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450337236/sc.gif&client=summon&freeimage=true |

