Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency

Graph-based structures are being increasingly used to model data and relations among data in a number of fields. Graph-based databases are becoming more popular as a means to better represent such data. Graph traversal is a key component in graph algorithms such as reachability and graph matching. S...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2012 IEEE 26th International Parallel and Distributed Processing Symposium S. 378 - 389
Hauptverfasser: Chhugani, J., Satish, N., Changkyu Kim, Sewall, J., Dubey, P.
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.05.2012
Schlagworte:
ISBN:1467309753, 9781467309752
ISSN:1530-2075
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Graph-based structures are being increasingly used to model data and relations among data in a number of fields. Graph-based databases are becoming more popular as a means to better represent such data. Graph traversal is a key component in graph algorithms such as reachability and graph matching. Since the scale of data stored and queried in these databases is increasing, it is important to obtain high performing implementations of graph traversal that can efficiently utilize the processing power of modern processors. In this work, we present a scalable Breadth-First Search Traversal algorithm for modern multi-socket, multi-core CPUs. Our algorithm uses lock- and atomic-free operations on a cache-resident structure for arbitrary sized graphs to filter out expensive main memory accesses, and completely and efficiently utilizes all available bandwidth resources. We propose a work distribution approach for multi-socket platforms that ensures load-balancing while keeping cross-socket communication low. We provide a detailed analytical model that accurately projects the performance of our single- and multi-socket traversal algorithms to within 5-10% of obtained performance. Our analytical model serves as a useful tool to analyze performance bottlenecks on modern CPUs. When measured on various synthetic and real-world graphs with a wide range of graph sizes, vertex degrees and graph diameters, our implementation on a dual-socket Intel ® Xeon ® X5570 (Intel microarchitecture code name Nehalem) system achieves 1.5X-13.2X performance speedup over the best reported numbers. We achieve around 1 Billion traversed edges per second on a scale-free R-MAT graph with 64M vertices and 2 Billion edges on a dual-socket Nehalem system. Our optimized algorithm is useful as a building block for efficient multi-node implementations and future exascale systems, thereby allowing them to ride the trend of increasing per-node compute and bandwidth resources.
AbstractList Graph-based structures are being increasingly used to model data and relations among data in a number of fields. Graph-based databases are becoming more popular as a means to better represent such data. Graph traversal is a key component in graph algorithms such as reachability and graph matching. Since the scale of data stored and queried in these databases is increasing, it is important to obtain high performing implementations of graph traversal that can efficiently utilize the processing power of modern processors. In this work, we present a scalable Breadth-First Search Traversal algorithm for modern multi-socket, multi-core CPUs. Our algorithm uses lock- and atomic-free operations on a cache-resident structure for arbitrary sized graphs to filter out expensive main memory accesses, and completely and efficiently utilizes all available bandwidth resources. We propose a work distribution approach for multi-socket platforms that ensures load-balancing while keeping cross-socket communication low. We provide a detailed analytical model that accurately projects the performance of our single- and multi-socket traversal algorithms to within 5-10% of obtained performance. Our analytical model serves as a useful tool to analyze performance bottlenecks on modern CPUs. When measured on various synthetic and real-world graphs with a wide range of graph sizes, vertex degrees and graph diameters, our implementation on a dual-socket Intel ® Xeon ® X5570 (Intel microarchitecture code name Nehalem) system achieves 1.5X-13.2X performance speedup over the best reported numbers. We achieve around 1 Billion traversed edges per second on a scale-free R-MAT graph with 64M vertices and 2 Billion edges on a dual-socket Nehalem system. Our optimized algorithm is useful as a building block for efficient multi-node implementations and future exascale systems, thereby allowing them to ride the trend of increasing per-node compute and bandwidth resources.
Author Changkyu Kim
Satish, N.
Chhugani, J.
Sewall, J.
Dubey, P.
Author_xml – sequence: 1
  givenname: J.
  surname: Chhugani
  fullname: Chhugani, J.
– sequence: 2
  givenname: N.
  surname: Satish
  fullname: Satish, N.
– sequence: 3
  surname: Changkyu Kim
  fullname: Changkyu Kim
– sequence: 4
  givenname: J.
  surname: Sewall
  fullname: Sewall, J.
– sequence: 5
  givenname: P.
  surname: Dubey
  fullname: Dubey, P.
BookMark eNo9j81OwkAURseIiYAsXbmZFyje-e-4IwhIgkoCJO7ILZ2BMaUl04aIT28TjZtzvtVJvh7plFXpCLlnMGQM7ON8-bxcDTkwPpTiigysScFoq6Q2yl6THmuHAGuU6JAuUwISDkbdkl5dfwJwENJ2yccU64ZimdOJ92EXXNnQWcTTga4jnl2ssaCjYl_F0ByO1FeRjpeb-om-4lc4hu9Q7umqReGStyp3_5Hd5Y7ceCxqN_hzn2ymk_X4JVm8z-bj0SIJXLImyT3PJUdUuTVSaCvBQiqYAA2YeoRMomGoc5UJ6S2C9C0VWiMyo7Ncij55-O0G59z2FMMR42WruTZp-_wHZ1VUXw
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IPDPS.2012.43
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9780769546759
0769546757
EndPage 389
ExternalDocumentID 6267875
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-i241t-df2d42aa5d97436940908313060a8fa0b4a71a6d5b34f9a04ff9a5a973b76bd43
IEDL.DBID RIE
ISBN 1467309753
9781467309752
ISICitedReferencesCount 41
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000309131900034&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1530-2075
IngestDate Wed Aug 27 04:45:00 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-df2d42aa5d97436940908313060a8fa0b4a71a6d5b34f9a04ff9a5a973b76bd43
PageCount 12
ParticipantIDs ieee_primary_6267875
PublicationCentury 2000
PublicationDate 2012-05
PublicationDateYYYYMMDD 2012-05-01
PublicationDate_xml – month: 05
  year: 2012
  text: 2012-05
PublicationDecade 2010
PublicationTitle 2012 IEEE 26th International Parallel and Distributed Processing Symposium
PublicationTitleAbbrev ipdps
PublicationYear 2012
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020349
ssj0000781219
Score 1.6779436
Snippet Graph-based structures are being increasingly used to model data and relations among data in a number of fields. Graph-based databases are becoming more...
SourceID ieee
SourceType Publisher
StartPage 378
SubjectTerms Arrays
Bandwidth
efficient
Graph traversal
Instruction sets
multi-socket
Partitioning algorithms
single node
Sockets
Title Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency
URI https://ieeexplore.ieee.org/document/6267875
WOSCitedRecordID wos000309131900034&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8JAEN0A8eAJFYzf2YNHC2vZ7VJvBkFNlDQBDDcy3d1qEz4MLUb99c62BTx48bJpe5g0u9PMvOm8N4RcepIr7jPhRJ4AhxtfOphGWNlKxaQLRqislP3yJPv99njsByVyteHCGGOy5jPTsJfZv3y9UCtbKmti8o3-JcqkLKXMuVqbeooVrXGtNFkBtqzuSq6VytATpMhIXR76s2WSrrWeint3K77ZfAzugoFt-XIblsnza-RKFnF61f-96x6pb6l7NNgEpX1SMvMDUl3PbqDFp1wj4x4kKYW5pt1MRAKN0XsrXk2HdiDRMoEpvZ2-LpZx-jajmNnSTjBKbugzfMaz-Btt0wEuU-P0F9psjKivOhn1usPOg1NMWXBijN6poyNXcxdAaIQWLc9HwGenjyGUYNCOgIUc5DV4WoQtHvnAeISrAF-2QumFmrcOSWW-mJsjQtGOYEqD9gzjSrkhJj9e5IcKQkBk5h6Tmt2oyXsupDEp9ujk78enZNceQ95deEYq6XJlzsmO-kjjZHmRnf4PuhGoqw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8JAEN0gmugJFYzf7sGjhbLsttSbQRAikCaA4Uamu1ttwoeBYtRf72wp4MGLl03bw6TZnWbmTee9IeTWcbnkni2s0BFgce25FqYRRrZS2i4DLWRSyn5pu91udTj0_Ay523BhtNZJ85kumsvkX76ayaUplZUw-Ub_EjtkV3DOyiu21qaiYmRrmBEnS-GWUV5ZqaXa6AuuSGhdDnq04ZKu1Z7Se7aV3yy1_Ee_Z5q-WNFweX4NXUliTiP3v7c9JIUteY_6m7B0RDJ6ekxy6-kNNP2Y82TYgEVMYapoPZGRQGP0ychX074ZSTRfwJg-jF9n8yh-m1DMbWnNHyzuaQc-o0n0jbZpD5extrozpTdG5FeBDBr1fq1ppXMWrAjjd2ypkCnOAIRCcFFxPIR8Zv4YggkbqiHYAQe3DI4SQYWHHtg8xFWA51YC1wkUr5yQ7HQ21aeEoh1hSwXK0TaXkgWY_jihF0gIALEZOyN5s1Gj95WUxijdo_O_H9-Q_Wa_0x61W93nC3JgjmTVa3hJsvF8qa_InvyIo8X8OvGEH7Tiq_I
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+IEEE+26th+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=Fast+and+Efficient+Graph+Traversal+Algorithm+for+CPUs%3A+Maximizing+Single-Node+Efficiency&rft.au=Chhugani%2C+J.&rft.au=Satish%2C+N.&rft.au=Changkyu+Kim&rft.au=Sewall%2C+J.&rft.date=2012-05-01&rft.pub=IEEE&rft.isbn=9781467309752&rft.issn=1530-2075&rft.spage=378&rft.epage=389&rft_id=info:doi/10.1109%2FIPDPS.2012.43&rft.externalDocID=6267875
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-2075&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-2075&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-2075&client=summon