Gluon-Async: A Bulk-Asynchronous System for Distributed and Heterogeneous Graph Analytics
Distributed graph analytics systems for CPUs, like D-Galois and Gemini, and for GPUs, like D-IrGL and Lux, use a bulk-synchronous parallel (BSP) programming and execution model. BSP permits bulk-communication and uses large messages which are supported efficiently by current message transport layers...
Gespeichert in:
| Veröffentlicht in: | Proceedings / International Conference on Parallel Architectures and Compilation Techniques S. 15 - 28 |
|---|---|
| Hauptverfasser: | , , , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.09.2019
|
| Schlagworte: | |
| ISSN: | 2641-7936 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Distributed graph analytics systems for CPUs, like D-Galois and Gemini, and for GPUs, like D-IrGL and Lux, use a bulk-synchronous parallel (BSP) programming and execution model. BSP permits bulk-communication and uses large messages which are supported efficiently by current message transport layers, but bulk-synchronization can exacerbate the performance impact of load imbalance because a round cannot be completed until every host has completed that round. Asynchronous distributed graph analytics systems circumvent this problem by permitting hosts to make progress at their own pace, but existing systems either use global locks and send small messages or send large messages but do not support general partitioning policies such as vertex-cuts. Consequently, they perform substantially worse than bulk-synchronous systems. Moreover, none of their programming or execution models can be easily adapted for heterogeneous devices like GPUs. In this paper, we design and implement a lock-free, non-blocking, bulk-asynchronous runtime called Gluon-Async for distributed and heterogeneous graph analytics. The runtime supports any partitioning policy and uses bulk-communication. We present the bulk-asynchronous parallel (BASP) model which allows the programmer to utilize the runtime by specifying only the abstract communication required. Applications written in this model are compared with the BSP programs written using (1) D-Galois and D-IrGL, the state-of-the-art distributed graph analytics systems (which are bulk-synchronous) for CPUs and GPUs, respectively, and (2) Lux, another (bulk-synchronous) distributed GPU graph analytical system. Our evaluation shows that programs written using BASP-style execution are on average ~1.5x faster than those in D-Galois and D-IrGL on real-world large-diameter graphs at scale. They are also on average ~12x faster than Lux. To the best of our knowledge, Gluon-Async is the first asynchronous distributed GPU graph analytics system. |
|---|---|
| AbstractList | Distributed graph analytics systems for CPUs, like D-Galois and Gemini, and for GPUs, like D-IrGL and Lux, use a bulk-synchronous parallel (BSP) programming and execution model. BSP permits bulk-communication and uses large messages which are supported efficiently by current message transport layers, but bulk-synchronization can exacerbate the performance impact of load imbalance because a round cannot be completed until every host has completed that round. Asynchronous distributed graph analytics systems circumvent this problem by permitting hosts to make progress at their own pace, but existing systems either use global locks and send small messages or send large messages but do not support general partitioning policies such as vertex-cuts. Consequently, they perform substantially worse than bulk-synchronous systems. Moreover, none of their programming or execution models can be easily adapted for heterogeneous devices like GPUs. In this paper, we design and implement a lock-free, non-blocking, bulk-asynchronous runtime called Gluon-Async for distributed and heterogeneous graph analytics. The runtime supports any partitioning policy and uses bulk-communication. We present the bulk-asynchronous parallel (BASP) model which allows the programmer to utilize the runtime by specifying only the abstract communication required. Applications written in this model are compared with the BSP programs written using (1) D-Galois and D-IrGL, the state-of-the-art distributed graph analytics systems (which are bulk-synchronous) for CPUs and GPUs, respectively, and (2) Lux, another (bulk-synchronous) distributed GPU graph analytical system. Our evaluation shows that programs written using BASP-style execution are on average ~1.5x faster than those in D-Galois and D-IrGL on real-world large-diameter graphs at scale. They are also on average ~12x faster than Lux. To the best of our knowledge, Gluon-Async is the first asynchronous distributed GPU graph analytics system. |
| Author | Gill, Gurbinder Jatala, Vishwesh Dang, Hoang-Vu Nandivada, V. Krishna Snir, Marc Dathathri, Roshan Hoang, Loc Pingali, Keshav |
| Author_xml | – sequence: 1 givenname: Roshan surname: Dathathri fullname: Dathathri, Roshan organization: University of Texas at Austin – sequence: 2 givenname: Gurbinder surname: Gill fullname: Gill, Gurbinder organization: University of Texas at Austin – sequence: 3 givenname: Loc surname: Hoang fullname: Hoang, Loc organization: University of Texas at Austin – sequence: 4 givenname: Vishwesh surname: Jatala fullname: Jatala, Vishwesh organization: University of Texas at Austin – sequence: 5 givenname: Keshav surname: Pingali fullname: Pingali, Keshav organization: University of Texas at Austin – sequence: 6 givenname: V. Krishna surname: Nandivada fullname: Nandivada, V. Krishna organization: Indian Institute of Technology Madras – sequence: 7 givenname: Hoang-Vu surname: Dang fullname: Dang, Hoang-Vu organization: University of Illinois at Urbana-Champaign – sequence: 8 givenname: Marc surname: Snir fullname: Snir, Marc organization: University of Illinois at Urbana-Champaign |
| BookMark | eNotj0FLwzAYhqMouE3PHrzkD7R--dKkrbe66SYMFJygp5G2X1y1S0fSHvrvVebp5YGHB94pO3OdI8auBcRCQH77Usw3MYLIYwAQcMKmIsVMSC3k-ymboE5ElOZSX7BpCF8AidBKTtjHsh06FxVhdNUdL_j90H4faec71w2Bv46hpz23neeLJvS-KYeeam5czVfUk-8-ydGfuPTmsOOFM-3YN1W4ZOfWtIGu_nfG3h4fNvNVtH5ePs2LdWQwVX1kDWR1WcrKJmgqBJ0oUqAQTJrYUqPGDEVekrVASv9eIizBCEosIMrayBm7OXYbItoefLM3ftxmWS40KvkDetJTYA |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/PACT.2019.00010 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 172813613X 9781728136134 |
| EISSN | 2641-7936 |
| EndPage | 28 |
| ExternalDocumentID | 8891625 |
| Genre | orig-research |
| GroupedDBID | 123 23M 29O 6IE 6IL ACGFS AFFNX ALMA_UNASSIGNED_HOLDINGS CBEJK M43 RIE RIL RNS |
| ID | FETCH-LOGICAL-a275t-fa08dbb3cf42ac20645e50520a74fb62628219beff0e56728e2b0a1e4f0223da3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 20 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000550990200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:43:19 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a275t-fa08dbb3cf42ac20645e50520a74fb62628219beff0e56728e2b0a1e4f0223da3 |
| PageCount | 14 |
| ParticipantIDs | ieee_primary_8891625 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-Sept. |
| PublicationDateYYYYMMDD | 2019-09-01 |
| PublicationDate_xml | – month: 09 year: 2019 text: 2019-Sept. |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings / International Conference on Parallel Architectures and Compilation Techniques |
| PublicationTitleAbbrev | PACT |
| PublicationYear | 2019 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0041653 ssib057737306 |
| Score | 2.2084458 |
| Snippet | Distributed graph analytics systems for CPUs, like D-Galois and Gemini, and for GPUs, like D-IrGL and Lux, use a bulk-synchronous parallel (BSP) programming... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 15 |
| SubjectTerms | Analytical models asynchronous parallel execution models BSP model Computational modeling distributed and heterogeneous graph analytics Graphics processing units Load modeling Mirrors Partitioning algorithms Programming |
| Title | Gluon-Async: A Bulk-Asynchronous System for Distributed and Heterogeneous Graph Analytics |
| URI | https://ieeexplore.ieee.org/document/8891625 |
| WOSCitedRecordID | wos000550990200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEN4A8eAJFYzv7MGjlaWvbb0hCpwIB0zwRPYxmxhJa4Ca-O-daQt68OKtbXpod2Y733Rmvo-xW4N-YhAYeE6kxgu1wS0lY-UpY2QaG0UF2FJsQk6nyWKRzhrsbj8LAwBl8xnc02FZy7e5KehXWS9JEMz4UZM1pZTVrNbOdyIpA3TWePcVRpwRBTWVT1-kvdlgOKdGLmKnFDQu-0tLpQwlo_b_HuKIdX9m8vhsH22OWQOyE9beiTLweo922Ot4VeSZN9h8ZeaBD_hjsXqvzogGF_N8XpGUc0Sr_Iloc0nxCixXmeUTao7J0aeAbhwTmTUvaUuIzLnLXkbP8-HEq_UTPOXLaOs5JRKrdWBc6BMTYxxGQLp1QsnQacxkMN3qpxqcExDF0k_A10L1IXQY2AOrglPWyvIMzhgXIC0CXEtVxxBNqAW-rZLgDI1u-facdWillh8VRcayXqSLvy9fskMyRdWqdcVa23UB1-zAfG7fNuub0q7fWAmkkA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT4NAEJ7UaqKnqq3x7R48il1gYcFbfbQ11qaHmtRTs-wjMTZg2mLiv3cHaPXgxRsQDrAzy3zDzHwfwKW0fiItMHAMjaXDEmm3FA-FI6TkcSgFFmALsQk-HEaTSTyqwdV6FkZrXTSf6Ws8LGr5KpM5_iprR5EFM16wAZsBY55bTmutvCfg3LfuGq6-wxZpBH5F5uPSuD3q3I2xlQv5KSkOzP5SUymCSbfxv8fYhdbPVB4ZrePNHtR0ug-NlSwDqXZpE157szxLnc7iK5U3pENu89l7eYZEuDbTJyVNObF4ldwjcS5qXmlFRKpIH9tjMutVGm_sIZ01KYhLkM65BS_dh_Fd36kUFBzh8WDpGEEjlSS-NMxDLsaQBRqV66jgzCQ2l7EJlxsn2hiqg5B7kfYSKlzNjA3tvhL-AdTTLNWHQKjmykJchXVHZo2YUPu2gmsjcXjLU0fQxJWafpQkGdNqkY7_vnwB2_3x82A6eBw-ncAOmqVs3DqF-nKe6zPYkp_Lt8X8vLDxNxSip9c |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques&rft.atitle=Gluon-Async%3A+A+Bulk-Asynchronous+System+for+Distributed+and+Heterogeneous+Graph+Analytics&rft.au=Dathathri%2C+Roshan&rft.au=Gill%2C+Gurbinder&rft.au=Hoang%2C+Loc&rft.au=Jatala%2C+Vishwesh&rft.date=2019-09-01&rft.pub=IEEE&rft.eissn=2641-7936&rft.spage=15&rft.epage=28&rft_id=info:doi/10.1109%2FPACT.2019.00010&rft.externalDocID=8891625 |