Reducing Communication in Graph Neural Network Training

Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International Conference for High Performance Computing, Networking, Storage and Analysis (Online) Jg. 2020; S. 1 - 14
Hauptverfasser: Tripathy, Alok, Yelick, Katherine, Buluc, Aydin
Format: Tagungsbericht Journal Article
Sprache:Englisch
Veröffentlicht: United States IEEE 01.11.2020
Schlagworte:
ISSN:2167-4329
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks. We introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods. We implement these algorithms, which are based on 1D, 1. 5D, 2D, and 3D sparse-dense matrix multiplication, using torch.distributed on GPU-equipped clusters. Our algorithms optimize communication across the full GNN training pipeline. We train GNNs on over a hundred GPUs on multiple datasets, including a protein network with over a billion edges.
AbstractList Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks. Here, we introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods. We implement these algorithms, which are based on 1D, 1. 5D, 2D, and 3D sparse-dense matrix multiplication, using torch.distributed on GPU-equipped clusters. Our algorithms optimize communication across the full GNN training pipeline. We train GNNs on over a hundred GPUs on multiple datasets, including a protein network with over a billion edges.
Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks. We introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods. We implement these algorithms, which are based on 1D, 1. 5D, 2D, and 3D sparse-dense matrix multiplication, using torch.distributed on GPU-equipped clusters. Our algorithms optimize communication across the full GNN training pipeline. We train GNNs on over a hundred GPUs on multiple datasets, including a protein network with over a billion edges.
Author Buluc, Aydin
Yelick, Katherine
Tripathy, Alok
Author_xml – sequence: 1
  givenname: Alok
  surname: Tripathy
  fullname: Tripathy, Alok
  organization: University of California,Electrical Engineering and Computer Sciences,Berkeley
– sequence: 2
  givenname: Katherine
  surname: Yelick
  fullname: Yelick, Katherine
  organization: University of California,Electrical Engineering and Computer Sciences,Berkeley
– sequence: 3
  givenname: Aydin
  surname: Buluc
  fullname: Buluc, Aydin
  organization: University of California,Electrical Engineering and Computer Sciences,Berkeley
BackLink https://www.osti.gov/servlets/purl/1772909$$D View this record in Osti.gov
BookMark eNotzMtKw0AUgOERKtjWvoBugvvEc-aSySwlaBWKgtZ1mMuJHWwnJUkR395AXf2bj3_BZqlLxNgNQoEI5v6jlihBFRw4FACg5QVboOYVGmMqmLE5x1LnUnBzxVbDEB1IjrosKzVn-p3Cycf0ldXd4XBK0dsxdimLKVv39rjLXunU2_2U8afrv7Ntb2Oa-DW7bO1-oNV_l-zz6XFbP-ebt_VL_bDJI9dyzImUDlYCkmpFpa3QQbfOgYHgrAvIEYP0lgxVwUIglN4b5SZufFkGJ5bs7vzthjE2g48j-Z3vUiI_Nqg1N2AmdHtGkYiaYx8Ptv9tjFCKayH-AK1hVSA
CODEN IEEPAD
ContentType Conference Proceeding
Journal Article
CorporateAuthor Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
CorporateAuthor_xml – sequence: 0
  name: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
– sequence: 0
  name: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
DBID 6IE
6IL
CBEJK
RIE
RIL
OIOZB
OTOTI
DOI 10.1109/SC41405.2020.00074
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
OSTI.GOV - Hybrid
OSTI.GOV
DatabaseTitleList

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1728199980
9781728199986
EndPage 14
ExternalDocumentID 1772909
9355273
Genre orig-research
GrantInformation_xml – fundername: Office of Science
  funderid: 10.13039/100006132
– fundername: Advanced Scientific Computing Research
  funderid: 10.13039/100006192
– fundername: Oak Ridge National Laboratory
  funderid: 10.13039/100006228
– fundername: National Science Foundation
  funderid: 10.13039/100000001
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
GUFHI
LHSKQ
RIE
RIL
6IF
6IH
6IK
6IN
ABLEC
ADZIZ
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CHZPO
IEGSK
IPLJI
OCL
OIOZB
OTOTI
ID FETCH-LOGICAL-i274t-ee57da401e5f387a37d7fbb090dbabd1211d4cae9e8da0de14cc95b1e59c66db3
IEDL.DBID RIE
ISICitedReferencesCount 42
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000668022000049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2167-4329
IngestDate Thu Dec 05 06:25:38 EST 2024
Wed Aug 27 02:41:14 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i274t-ee57da401e5f387a37d7fbb090dbabd1211d4cae9e8da0de14cc95b1e59c66db3
Notes USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
National Science Foundation (NSF)
AC02-05CH11231; DGE 1752814; 1823034; AC05-00OR22725
USDOE National Nuclear Security Administration (NNSA)
OpenAccessLink https://www.osti.gov/servlets/purl/1772909
PageCount 14
ParticipantIDs ieee_primary_9355273
osti_scitechconnect_1772909
PublicationCentury 2000
PublicationDate 2020-11-01
PublicationDateYYYYMMDD 2020-11-01
PublicationDate_xml – month: 11
  year: 2020
  text: 2020-11-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle International Conference for High Performance Computing, Networking, Storage and Analysis (Online)
PublicationTitleAbbrev SC
PublicationYear 2020
Publisher IEEE
Publisher_xml – sequence: 0
  name: IEEE
– name: IEEE
SSID ssib042176685
ssj0003204180
Score 2.2233422
Snippet Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this...
SourceID osti
ieee
SourceType Open Access Repository
Publisher
StartPage 1
SubjectTerms Clustering algorithms
communication-avoiding algorithms
distributed training
Graph neural networks
MATHEMATICS AND COMPUTING
Proteins
Sparse matrices
Three-dimensional displays
Training
Two dimensional displays
Title Reducing Communication in Graph Neural Network Training
URI https://ieeexplore.ieee.org/document/9355273
https://www.osti.gov/servlets/purl/1772909
Volume 2020
WOSCitedRecordID wos000668022000049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8JAEJ0A8eBJDRgRNXvwaGWh2273TEQPhhBFw63ZjyEhMcVg8fc70wJq4sXTtk2bdKfdvJn2vTcA18qmgYDaRl4HNtVOVeQIFyIzRJt4lzqsVK6vj3oyyeZzM23AzV4Lg4gV-QxvebP6lx9WfsOfyvrsBU5w24Sm1rrWau3eHTVkp8Ms2elipOk_jxRVDwnVgEOmb0nm9FUdVGhY0fr5gSPjo__dwTF0vgV5YrqHmhNoYNEG_cS2q7Qvfqk8xLIQ92xDLdh4w77RUDG9xWzbDaIDL-O72egh2vZBiJZUM5YRYqKDpUIIk0WcaRtTWBfOSSODsy6wSVtQ3qLBLFgZcKC8N4mj041P0-DiU2gVqwLPQMgBZdR0GFP2qYu9XdB6p4wEjbRBhdCFNk86f6-tLvLtfLvQ44jlhMpsLeuZg-PLfMCpuTTnf1_Ug0MOfS3cu4BWud7gJRz4z3L5sb6qnt4XCm-bjg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEB1qFfSk0oq1fuTg0ei22Wyy52KtWEvRKr2F_ZhCQVKpqb_fmfRDBS-eNlkSyO5meTPJe28ALqVRnoDahC7xbKqtZGgJF0LdRhM7qyyWKtfXfjIYpOOxHlbgaqOFQcSSfIbXfFj-y_czt-BPZTfsBU5wuwXbsZTt1lKttX57qCNRKo3Xyhihb547kvKHmLLANhO4BLP6yhoq1MxoB_1Aku7-_57hAOrfkrxguAGbQ6hgXoPkiY1X6Tz4pfMIpnlwx0bUAVtvmDdqSq53MFrVg6jDS_d21OmFq0oI4ZSyxiJEjBNvKBXCeBKliYloYifWCi28NdazTZuXzqDG1BvhsSWd07Gly7VTytvoCKr5LMdjCESLYmrqRsVOdZEzE9rxFJOgFsZL7xtQ40Fn70uzi2w13gY0ecYywmU2l3XMwnFF1uLgXOiTv2-6gN3e6LGf9e8HD03Y42VYyvhOoVrMF3gGO-6zmH7Mz8uV_AJSJZ7V
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC20%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Reducing+Communication+in+Graph+Neural+Network+Training&rft.au=Tripathy%2C+Alok&rft.au=Yelick%2C+Katherine&rft.au=Buluc%2C+Aydin&rft.date=2020-11-01&rft.pub=IEEE&rft.spage=1&rft.epage=14&rft_id=info:doi/10.1109%2FSC41405.2020.00074&rft.externalDocID=9355273
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-4329&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-4329&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-4329&client=summon