PGX.D a fast distributed graph processing engine

Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we pr...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 12
Hlavní autoři: Hong, Sungpack, Depner, Siegfried, Manhardt, Thomas, Van Der Lugt, Jan, Verstraaten, Merijn, Chafi, Hassan
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: New York, NY, USA ACM 15.11.2015
Edice:ACM Conferences
Témata:
ISBN:1450337236, 9781450337236
ISSN:2167-4337
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we present a fast distributed graph processing system, namely PGX.D. We show that PGX.D outperforms other distributed graph systems like GraphLab significantly (3x -- 90x). Furthermore, PGX.D on 4 to 16 machines is also faster than an implementation optimized for single-machine execution. Using a fast cooperative context-switching mechanism, we implement PGX.D as a low-overhead, bandwidth-efficient communication framework that supports remote data-pulling patterns. Moreover, PGX.D achieves large traffic reduction and good workload balance by applying selective ghost nodes, edge partitioning, and edge chunking transparently to the user. Our analysis confirms that each of these features is indeed crucial for overall performance of certain kinds of graph algorithms. Finally, we advocate the use of balanced beefy clusters where the sustained random DRAM-access bandwidth in aggregate is matched with the bandwidth of the underlying interconnection fabric.
AbstractList Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we present a fast distributed graph processing system, namely PGX.D. We show that PGX.D outperforms other distributed graph systems like GraphLab significantly (3x - 90x). Furthermore, PGX.D on 4 to 16 machines is also faster than an implementation optimized for single-machine execution. Using a fast cooperative context-switching mechanism, we implement PGX.D as a low-overhead, bandwidth-efficient communication framework that supports remote data-pulling patterns. Moreover, PGX.D achieves large traffic reduction and good workload balance by applying selective ghost nodes, edge partitioning, and edge chunking transparently to the user. Our analysis confirms that each of these features is indeed crucial for overall performance of certain kinds of graph algorithms. Finally, we advocate the use of balanced beefy clusters where the sustained random DRAM-access bandwidth in aggregate is matched with the bandwidth of the underlying interconnection fabric.
Author Depner, Siegfried
Van Der Lugt, Jan
Manhardt, Thomas
Verstraaten, Merijn
Hong, Sungpack
Chafi, Hassan
Author_xml – sequence: 1
  givenname: Sungpack
  surname: Hong
  fullname: Hong, Sungpack
  email: sungpack.hong@oracle.com
  organization: Oracle Labs
– sequence: 2
  givenname: Siegfried
  surname: Depner
  fullname: Depner, Siegfried
  email: siegfried.depner@oracle.com
  organization: Oracle Labs
– sequence: 3
  givenname: Thomas
  surname: Manhardt
  fullname: Manhardt, Thomas
  email: thomas.manhardt@oracle.com
  organization: Oracle Labs
– sequence: 4
  givenname: Jan
  surname: Van Der Lugt
  fullname: Van Der Lugt, Jan
  email: janlugt@gmail.com
  organization: Google
– sequence: 5
  givenname: Merijn
  surname: Verstraaten
  fullname: Verstraaten, Merijn
  email: M.E.Verstraaten@uva.nl
  organization: Univ. Amsterdam, Netherlands
– sequence: 6
  givenname: Hassan
  surname: Chafi
  fullname: Chafi, Hassan
  email: hassan.chafi@oracle.com
  organization: Oracle Labs
BookMark eNqNj01Lw0AQhsePgm3t2YN_wEvSndnPHKVqFQr1oOBt2e3OQtQ2knjx35vQHDwKAy8zD_PCM4PzQ3NggCsUJaLSS3LC6grLIQ2JE5j1VyGlJWlOYUpobKH69ewvuIBF170LIbAyFaKZwuR5_VbeXcIkh8-OF2PO4fXh_mX1WGy266fV7aYIpOx3YUWIEjlGZJdylDZoJVKm5IIjnbFyKZrAJgbronKc0GraGSJ2yNlkOYfrY2_NzP6rrfeh_fHWSeqnp-WRht3ex6b56DwKP9j60daPtj62NQ91N_98kL_0fE9R
ContentType Conference Proceeding
Copyright 2015 ACM
Copyright_xml – notice: 2015 ACM
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/2807591.2807620
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1450337236
9781450337236
EISSN 2167-4337
EndPage 12
ExternalDocumentID 7832832
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
ACM
ADPZR
ALMA_UNASSIGNED_HOLDINGS
APO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
GUFHI
IEGSK
IERZE
OCL
RIB
RIC
RIE
RIL
6IH
AAWTH
ABLEC
ADZIZ
CHZPO
IPLJI
ID FETCH-LOGICAL-a247t-70ab31ebb1e8dfb37a540df2d8a825f198db6ae6ba78b48ed1752c622e81ef6f3
IEDL.DBID RIE
ISBN 1450337236
9781450337236
ISICitedReferencesCount 40
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000382162500059&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:01:31 EDT 2025
Wed Jan 31 06:51:33 EST 2024
Wed Jan 31 06:51:02 EST 2024
IsPeerReviewed false
IsScholarly false
Language English
License Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org
LinkModel DirectLink
MeetingName SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis
MergedId FETCHMERGED-LOGICAL-a247t-70ab31ebb1e8dfb37a540df2d8a825f198db6ae6ba78b48ed1752c622e81ef6f3
PageCount 12
ParticipantIDs ieee_primary_7832832
acm_books_10_1145_2807591_2807620
acm_books_10_1145_2807591_2807620_brief
PublicationCentury 2000
PublicationDate 20151115
2015-November
PublicationDateYYYYMMDD 2015-11-15
2015-11-01
PublicationDate_xml – month: 11
  year: 2015
  text: 20151115
  day: 15
PublicationDecade 2010
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationSeriesTitle ACM Conferences
PublicationTitle Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
PublicationTitleAbbrev SC
PublicationYear 2015
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0001969116
ssj0003204180
Score 1.8884876
Snippet Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed...
SourceID ieee
acm
SourceType Publisher
StartPage 1
SubjectTerms Algorithm design and analysis
Bandwidth
Clustering algorithms
Computational modeling
Computer systems organization -- Dependable and fault-tolerant systems and networks
Data models
General and reference -- Cross-computing tools and techniques -- Performance
Kernel
Mathematics of computing -- Discrete mathematics -- Graph theory -- Graph algorithms
Networks -- Network performance evaluation
Programming
Theory of computation -- Models of computation -- Concurrency
Theory of computation -- Models of computation -- Concurrency -- Parallel computing models
Subtitle a fast distributed graph processing engine
Title PGX.D
URI https://ieeexplore.ieee.org/document/7832832
WOSCitedRecordID wos000382162500059&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEB3a4sFT1VasX6wgeDFtNsl-xKtaPUjpQaW3sJudhR5sS5P6-91NYosgiBDIByGEYYd5u7PvPYDr3CoaGkYDo72FGYvTwE_BAlfNkXmErKvu-fuLmEzkbJZOW3C75cIgYrX5DIf-surlm2W-8UtlI-GGnzva0BaC11yt3XpKyl3e8u19HIUJlWGj5kMTNvK6Lyx1c0J35t7fu63yjx-mKlVNGXf_9zcH0N-R88h0W3YOoYWLI-h-uzOQJll7QKdPs-HDHVHEqqIkxivkenMrNKRSqSarmiPgPkKwUiXsw9v48fX-OWgMEgIVJaIMRKh0TFFritJYHQvl8JexkZHKTfwsTaXRXCHXSkidSDQOK0Q5jyKUFC238TF0FssFngBxiZsrrXyXM0mURmmRccV56OCT11AbwJWLVuaRf5HVZGaWNRHNmogO4ObPdzK9nqMdQM_HM1vVihpZE8rT3x-fwb6DKaxmAJ5Dp1xv8AL28s9yXqwvq2HwBRb1qv4
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1dS8MwFL3MKejT1E2cnxEEX-zWtE2a-qrOiXPsYcreStLcwB7cxj78_SZt3RAEEQr9oJRyyeWe5OacA3CdGUl9zainlbMwY2HiuSmYZ6s5MoeQVd49f-_F_b4YjZJBBW7XXBhEzDefYctd5r18Pc1WbqmsHdvhZ48t2GZRFPgFW2uzopJwm7l8fR8GfkSFX-r50Ii1nfILS-ys0J65c_jektnHD1uVvKp0av_7n31obOh5ZLAuPAdQwckh1L79GUiZrnWgg6dR6-GOSGLkYkm008h19laoSa5TTWYFS8B-hGCuS9iAt87j8L7rlRYJngyieOnFvlQhRaUoCm1UGEuLwLQJtJB26mdoIrTiErmSsVCRQG3RQpDxIEBB0XATHkF1Mp3gMRCbuplU0vU5o0gqFAYZl5z7FkA5FbUmXNlopQ77L9KCzszSMqJpGdEm3Pz5TqrmYzRNqLt4prNCUyMtQ3ny--NL2O0OX3tp77n_cgp7FrSwgg94BtXlfIXnsJN9LseL-UU-JL4AMOeuRQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC15%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=PGX.D%3A+a+fast+distributed+graph+processing+engine&rft.au=Hong%2C+Sungpack&rft.au=Depner%2C+Siegfried&rft.au=Manhardt%2C+Thomas&rft.au=Van+Der+Lugt%2C+Jan&rft.date=2015-11-01&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=12&rft_id=info:doi/10.1145%2F2807591.2807620&rft.externalDocID=7832832
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450337236/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450337236/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450337236/sc.gif&client=summon&freeimage=true