Efficient Scaling of Dynamic Graph Neural Networks

We present distributed algorithms for training dynamic Graph Neural Networks (GNN) on large scale graphs spanning multi-node, multi-GPU systems. To the best of our knowledge, this is the first scaling study on dynamic GNN. We devise mechanisms for reducing the GPU memory usage and identify two execu...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:SC21: International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 13
Hlavní autoři: Chakaravarthy, Venkatesan T., Pandian, Shivmaran S., Raje, Saurabh, Sabharwal, Yogish, Suzumura, Toyotaro, Ubaru, Shashanka
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 14.11.2021
Témata:
ISSN:2167-4337
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract We present distributed algorithms for training dynamic Graph Neural Networks (GNN) on large scale graphs spanning multi-node, multi-GPU systems. To the best of our knowledge, this is the first scaling study on dynamic GNN. We devise mechanisms for reducing the GPU memory usage and identify two execution time bottlenecks: CPU-GPU data transfer; and communication volume. Exploiting properties of dynamic graphs, we design a graph difference-based strategy to significantly reduce the transfer time. We develop a simple, but effective data distribution technique under which the communication volume remains fixed and linear in the input size, for any number of GPUs. Our experiments using billion-size graphs on a system of 128 GPUs shows that: (i) the distribution scheme achieves up to 30x speedup on 128 GPUs; (ii) the graph-difference technique reduces the transfer time by a factor of up to 4.1x and the overall execution time by up to 40%.
AbstractList We present distributed algorithms for training dynamic Graph Neural Networks (GNN) on large scale graphs spanning multi-node, multi-GPU systems. To the best of our knowledge, this is the first scaling study on dynamic GNN. We devise mechanisms for reducing the GPU memory usage and identify two execution time bottlenecks: CPU-GPU data transfer; and communication volume. Exploiting properties of dynamic graphs, we design a graph difference-based strategy to significantly reduce the transfer time. We develop a simple, but effective data distribution technique under which the communication volume remains fixed and linear in the input size, for any number of GPUs. Our experiments using billion-size graphs on a system of 128 GPUs shows that: (i) the distribution scheme achieves up to 30x speedup on 128 GPUs; (ii) the graph-difference technique reduces the transfer time by a factor of up to 4.1x and the overall execution time by up to 40%.
Author Ubaru, Shashanka
Raje, Saurabh
Sabharwal, Yogish
Suzumura, Toyotaro
Pandian, Shivmaran S.
Chakaravarthy, Venkatesan T.
Author_xml – sequence: 1
  givenname: Venkatesan T.
  surname: Chakaravarthy
  fullname: Chakaravarthy, Venkatesan T.
  email: vechakra@in.ibm.com
  organization: IBM Research India
– sequence: 2
  givenname: Shivmaran S.
  surname: Pandian
  fullname: Pandian, Shivmaran S.
  email: shivs017@in.ibm.com
  organization: IBM Research India
– sequence: 3
  givenname: Saurabh
  surname: Raje
  fullname: Raje, Saurabh
  email: saurabh.mraje@gmail.com
  organization: IBM Research India
– sequence: 4
  givenname: Yogish
  surname: Sabharwal
  fullname: Sabharwal, Yogish
  email: ysabharwal@in.ibm.com
  organization: IBM Research India
– sequence: 5
  givenname: Toyotaro
  surname: Suzumura
  fullname: Suzumura, Toyotaro
  email: suzumura@acm.org
  organization: IBM T.J. Watson Research Center,USA
– sequence: 6
  givenname: Shashanka
  surname: Ubaru
  fullname: Ubaru, Shashanka
  email: shashanka.ubaru@ibm.com
  organization: IBM T.J. Watson Research Center,USA
BookMark eNotjLFOwzAUAA0CiVIyM7DkB1Ken5-xPaJSClIFAzBXL44NhtSpkiDUvycSTCfdSXcuTnKXgxCXEhZSkr5WpK2VZqHIgtX2SBTO2CmAskQoj8UM5Y2pSClzJoph-AQAtEYqhJnAVYzJp5DH8sVzm_J72cXy7pB5l3y57nn_UT6F757bCeNP138NF-I0cjuE4p9z8Xa_el0-VJvn9ePydlMxGj1WXlNjjHIREJ0l67STFKjmSauI4DVyYwEboqiDq70PpGPUbIDrWJOai6u_bwohbPd92nF_2DonAQyqX4UIRZk
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3458817.3480858
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450384421
1450384420
EISSN 2167-4337
EndPage 13
ExternalDocumentID 9910072
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-a275t-c54d7739f022984895914e4ba4d73f20c52ad802d44f5e9bcce45ff5a70abfb43
IEDL.DBID RIE
ISICitedReferencesCount 21
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000946520100033&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:18:35 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a275t-c54d7739f022984895914e4ba4d73f20c52ad802d44f5e9bcce45ff5a70abfb43
PageCount 13
ParticipantIDs ieee_primary_9910072
PublicationCentury 2000
PublicationDate 2021-Nov.-14
PublicationDateYYYYMMDD 2021-11-14
PublicationDate_xml – month: 11
  year: 2021
  text: 2021-Nov.-14
  day: 14
PublicationDecade 2020
PublicationTitle SC21: International Conference for High Performance Computing, Networking, Storage and Analysis
PublicationTitleAbbrev SC
PublicationYear 2021
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0002871320
ssj0003204180
Score 1.9614563
Snippet We present distributed algorithms for training dynamic Graph Neural Networks (GNN) on large scale graphs spanning multi-node, multi-GPU systems. To the best of...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Data transfer
Distributed algorithms
dynamic graphs
Graph neural networks
Graphics processing units
Heuristic algorithms
High performance computing
learning
Training
Title Efficient Scaling of Dynamic Graph Neural Networks
URI https://ieeexplore.ieee.org/document/9910072
WOSCitedRecordID wos000946520100033&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQNTgRbxLQ-MpE3sc2zP0MKAqg4gdasc5ywhoRa1Kb8fOwlFSCxMcW6w7ETWPdv33gO4zbX05PM0QU0uwTLHREeuTEhWPCOTK9-4ljyr2UwvFmbegbs9F4aI6uIzGsVmfZdfrt0uHpWNA5aJStdd6CqVN1yt_XlKRP6ihT7xPbQx02mr5pOhHItIyszUSKAOOEP_slOps8m0_79xHMHwh5bH5vuEcwwdWp1A_9uXgbXLdAB8UutChD5CzEa-OVt79tB4z7PHKFHNoiiHfQ-Pugp8O4TX6eTl_ilpvRESy5WsEiexVEoYH3Kw0aiNNBkSFjaEheepk9yWOuUlopdkCucIpffSqtQWvkBxCr3VekVnwAJADLsoSTznhN5K64S36DJhSysL685hED_B8qORv1i2s7_4O3wJhzyWfcRKObyCXrXZ0TUcuM_qbbu5qf_ZF-jRlTM
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED2VggRTgRbxjQdG0ib2OXZmaCmiVB2K1K1ynLOEhFrUD34_dhqKkFiY4txg2Ymse7bvvQdwm2rpyKVxhJpshEWKkQ5cGZ-seEJZqtzGtWSghkM9mWSjGtxtuTBEVBafUTs0y7v8Ym7X4ais47FMULregd3gnFWxtbYnKgH7iwr8hHffxkTHlZ5PgrIjAi0zUW2B2iMN_ctQpcwnvcb_RnIIrR9iHhttU84R1Gh2DI1vZwZWLdQm8G6pDOH78DETGOds7tjDxn2ePQaRahZkOcy7f5R14MsWvPa64_t-VLkjRIYruYqsxEIpkTmfhTONOpNZgoS58WHheGwlN4WOeYHoJGW5tYTSOWlUbHKXoziB-mw-o1NgHiL6fZQknnJCZ6Sxwhm0iTCFkbmxZ9AMn2D6sRHAmFazP_87fAP7_fHLYDp4Gj5fwAEPRSChbg4vob5arOkK9uzn6m25uC7_3xfinph8
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC21%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Efficient+Scaling+of+Dynamic+Graph+Neural+Networks&rft.au=Chakaravarthy%2C+Venkatesan+T.&rft.au=Pandian%2C+Shivmaran+S.&rft.au=Raje%2C+Saurabh&rft.au=Sabharwal%2C+Yogish&rft.date=2021-11-14&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=13&rft_id=info:doi/10.1145%2F3458817.3480858&rft.externalDocID=9910072