DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks

Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. It is challenging due to large memory capacity and bandwidth requirements on a single compute node and high communication v...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SC21: International Conference for High Performance Computing, Networking, Storage and Analysis S. 1 - 14
Hauptverfasser:	Md, Vasimuddin, Misra, Sanchit, Ma, Guixiang, Mohanty, Ramanarayan, Georganas, Evangelos, Heinecke, Alexander, Kalamkar, Dhiraj, Ahmed, Nesreen K., Avancha, Sasikanth
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	ACM 14.11.2021
Schlagworte:	Clustering algorithms Deep Graph Library Deep Learning Distributed Algorithm Graph Neural Networks Graph Partition High performance computing Memory management Proteins Social networking (online) Sockets Training
ISSN:	2167-4337
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. It is challenging due to large memory capacity and bandwidth requirements on a single compute node and high communication volumes across multiple nodes. In this paper, we present DistGNN that optimizes the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters via an efficient shared memory implementation, communication reduction using a minimum vertex-cut graph partitioning algorithm and communication avoidance using a family of delayed-update algorithms. Our results on four common GNN benchmark datasets: Reddit, OGB-Products, OGB-Papers and Proteins, show up to 3.7× speed-up using a single CPU socket and up to 97× speed-up using 128 CPU sockets, respectively, over baseline DGL implementations running on a single CPU socket.
ISSN:	2167-4337
DOI:	10.1145/3458817.3480856