ParGNN: A Scalable Graph Neural Network Training Framework on multi-GPUs

Full-batch Graph Neural Network (GNN) training is indispensable for interdisciplinary applications. Although fullbatch training has advantages in convergence accuracy and speed, it still faces challenges such as severe load imbalance and high communication traffic overhead. In order to address these...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2025 62nd ACM/IEEE Design Automation Conference (DAC) S. 1 - 7
Hauptverfasser: Gu, Junyu, Li, Shunde, Cao, Rongqiang, Wang, Jue, Wang, Zijian, Liang, Zhiqiang, Liu, Fang, Li, Shigang, Zhou, Chunbao, Wang, Yangang, Chi, Xuebin
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 22.06.2025
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Full-batch Graph Neural Network (GNN) training is indispensable for interdisciplinary applications. Although fullbatch training has advantages in convergence accuracy and speed, it still faces challenges such as severe load imbalance and high communication traffic overhead. In order to address these challenges, we propose ParGNN, an efficient full-batch training system for GNNs, which adopts a profiler-guided adaptive load balancing method along with graph over-partition to alleviate load imbalance. Based on the over-partition results, we present a subgraph pipeline algorithm to overlap communication and computation while maintaining the accuracy of GNN training. Extensive experiments demonstrate that ParGNN can not only obtain the highest accuracy but also reach the preset accuracy in the shortest time. In the end-to-end experiments performed on the four datasets, ParGNN outperforms the two state-of-theart full-batch GNN systems, PipeGCN and DGL, achieving the highest speedup of 2.7 \times and 21.8 \times times respectively.
AbstractList Full-batch Graph Neural Network (GNN) training is indispensable for interdisciplinary applications. Although fullbatch training has advantages in convergence accuracy and speed, it still faces challenges such as severe load imbalance and high communication traffic overhead. In order to address these challenges, we propose ParGNN, an efficient full-batch training system for GNNs, which adopts a profiler-guided adaptive load balancing method along with graph over-partition to alleviate load imbalance. Based on the over-partition results, we present a subgraph pipeline algorithm to overlap communication and computation while maintaining the accuracy of GNN training. Extensive experiments demonstrate that ParGNN can not only obtain the highest accuracy but also reach the preset accuracy in the shortest time. In the end-to-end experiments performed on the four datasets, ParGNN outperforms the two state-of-theart full-batch GNN systems, PipeGCN and DGL, achieving the highest speedup of 2.7 \times and 21.8 \times times respectively.
Author Chi, Xuebin
Wang, Jue
Wang, Zijian
Li, Shigang
Gu, Junyu
Zhou, Chunbao
Liang, Zhiqiang
Cao, Rongqiang
Li, Shunde
Wang, Yangang
Liu, Fang
Author_xml – sequence: 1
  givenname: Junyu
  surname: Gu
  fullname: Gu, Junyu
  email: jygu@cnic.cn
  organization: Chinese Academy of Sciences,Computer Network Information Center,Beijing,China
– sequence: 2
  givenname: Shunde
  surname: Li
  fullname: Li, Shunde
  email: lishunde@cnic.cn
  organization: Chinese Academy of Sciences,Computer Network Information Center,Beijing,China
– sequence: 3
  givenname: Rongqiang
  surname: Cao
  fullname: Cao, Rongqiang
  email: caorq@sccas.cn
  organization: Chinese Academy of Sciences,Computer Network Information Center,Beijing,China
– sequence: 4
  givenname: Jue
  surname: Wang
  fullname: Wang, Jue
  email: wangjue@sccas.cn
  organization: Chinese Academy of Sciences,Computer Network Information Center,Beijing,China
– sequence: 5
  givenname: Zijian
  surname: Wang
  fullname: Wang, Zijian
  email: wangzj@cnic.cn
  organization: Chinese Academy of Sciences,Computer Network Information Center,Beijing,China
– sequence: 6
  givenname: Zhiqiang
  surname: Liang
  fullname: Liang, Zhiqiang
  email: zqliang@cnic.cn
  organization: Chinese Academy of Sciences,Computer Network Information Center,Beijing,China
– sequence: 7
  givenname: Fang
  surname: Liu
  fullname: Liu, Fang
  email: liufang@sccas.cn
  organization: Chinese Academy of Sciences,Computer Network Information Center,Beijing,China
– sequence: 8
  givenname: Shigang
  surname: Li
  fullname: Li, Shigang
  email: shigangli.cs@gmail.com
  organization: Beijing University of Posts and Telecommunications,School of Computer Science,Beijing,China
– sequence: 9
  givenname: Chunbao
  surname: Zhou
  fullname: Zhou, Chunbao
  email: zhoucb@sccas.cn
  organization: Chinese Academy of Sciences,Computer Network Information Center,Beijing,China
– sequence: 10
  givenname: Yangang
  surname: Wang
  fullname: Wang, Yangang
  email: wangyg@sccas.cn
  organization: Chinese Academy of Sciences,Computer Network Information Center,Beijing,China
– sequence: 11
  givenname: Xuebin
  surname: Chi
  fullname: Chi, Xuebin
  email: chi@sccas.cn
  organization: Chinese Academy of Sciences,Computer Network Information Center,Beijing,China
BookMark eNo1j9tKw0AURUfQB639A5H5gdS5ZWaObyHaVCixYPtcTpITDeZSpini3zd4eVqb9bBg37DLfuiJsXspFlIKeHhKUqu9gYUSKp6U1FoKdcHm4MBPOxZaGH_NVhsMWZ4_8oS_ldhi0RLPAh4-eE6ngO2E8WsIn3wbsOmb_p0vA3b0o4aed6d2bKJsszvesqsa2yPN_zhju-XzNl1F69fsJU3WEUoHY6QqiLFEVaEFYZVxhbFxUUqDWIP3hTGqhKq2gETWkwOyiABOO5BGqVjP2N1vtyGi_SE0HYbv_f9BfQat4kjq
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/DAC63849.2025.11133102
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331503048
EndPage 7
ExternalDocumentID 11133102
Genre orig-research
GrantInformation_xml – fundername: Chinese Academy of Sciences
  funderid: 10.13039/501100002367
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a179t-2d95aca2da6906247b465bc14aaf988b442c9df69aee68e79e6aa997379142253
IEDL.DBID RIE
IngestDate Wed Oct 01 07:05:15 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a179t-2d95aca2da6906247b465bc14aaf988b442c9df69aee68e79e6aa997379142253
PageCount 7
ParticipantIDs ieee_primary_11133102
PublicationCentury 2000
PublicationDate 2025-June-22
PublicationDateYYYYMMDD 2025-06-22
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-22
  day: 22
PublicationDecade 2020
PublicationTitle 2025 62nd ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev DAC
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 2.295383
Snippet Full-batch Graph Neural Network (GNN) training is indispensable for interdisciplinary applications. Although fullbatch training has advantages in convergence...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Accuracy
Computation and communication overlapping
Convergence
Design automation
Faces
Full-batch distributed training
Graph neural network
Graph neural networks
Graphics processing units
Load balancing
Load management
Partitioning algorithms
Pipelines
Training
Title ParGNN: A Scalable Graph Neural Network Training Framework on multi-GPUs
URI https://ieeexplore.ieee.org/document/11133102
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5aPHhSseKbHLym3U3z9FaqbU_Lgi30VvKYgCBbqa2_3yTdKh48eEoIgcDkMcPk--ZD6EFxI5wLBSk9CMKCD0T5YIjlVqjAoBgom8UmZFWpxULXLVk9c2EAIIPPoJe6-S_fr9w2pcr6SRY9hiPxxT2UUu7IWi3rtyx0_2k4iqeJJfoJ5b395F-yKdlrjE_-ud4p6v7w73D97VnO0AE052ham_Wkqh7xEL9EwybKE56kctM4Fdgwb7HJiG48a1Uf8HgPvMKrBmfkIJnU848umo-fZ6MpaXUQiInXZUOo19w4Q73JVYWZtExw60pmTNBKWcao0z4IbQCEAqlBGKO1HEidMjx8cIE6zaqBS4Sl0FbFCKHUkjMApuPGhOjFZWELKTW9Qt1khuX7rtTFcm-B6z_Gb9BxMnbCTlF6izqb9Rbu0JH73Lx-rO_zBn0Bqb6RbA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LSgMxFA1SBV2pWPFtFm7TzkzzdFeqbcU6DNhCdyXJ3IAgU-nD7zdJp4oLF64SQiBw8rghOecehO4k09xal5C0BE6oKx2RpdPEMMOlo5B0pIlmEyLP5XSqilqsHrUwABDJZ9AK1fiXX87tOjyVtYMtur-O-BN3l1GapRu5Vq37TRPVfuj2_HqiQYCSsda2-y_jlBg3-of_HPEINX8UeLj4ji3HaAeqEzQs9GKQ5_e4i189tEH0hAch4TQOKTb0uy8ipxuPa98H3N9Sr_C8wpE7SAbFZNlEk_7juDcktRMC0X7DrEhWKqatzkod8wpTYShnxqZUa6ekNB4Mq0rHlQbgEoQCrrVSoiNUeONhnVPUqOYVnCEsuDLS3xFSJRgFoMpPjfNxXCQmEUJl56gZYJh9bJJdzLYIXPzRfov2h-OX0Wz0lD9fooMAfGBSZdkVaqwWa7hGe_Zz9bZc3MTJ-gJu55Sz
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=ParGNN%3A+A+Scalable+Graph+Neural+Network+Training+Framework+on+multi-GPUs&rft.au=Gu%2C+Junyu&rft.au=Li%2C+Shunde&rft.au=Cao%2C+Rongqiang&rft.au=Wang%2C+Jue&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11133102&rft.externalDocID=11133102