DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks

Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. It is challenging due to large memory capacity and bandwidth requirements on a single compute node and high communication v...

Full description

Saved in:

Bibliographic Details
Published in:	SC21: International Conference for High Performance Computing, Networking, Storage and Analysis pp. 1 - 14
Main Authors:	Md, Vasimuddin, Misra, Sanchit, Ma, Guixiang, Mohanty, Ramanarayan, Georganas, Evangelos, Heinecke, Alexander, Kalamkar, Dhiraj, Ahmed, Nesreen K., Avancha, Sasikanth
Format:	Conference Proceeding
Language:	English
Published:	ACM 14.11.2021
Subjects:	Clustering algorithms Deep Graph Library Deep Learning Distributed Algorithm Graph Neural Networks Graph Partition High performance computing Memory management Proteins Social networking (online) Sockets Training
ISSN:	2167-4337
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. It is challenging due to large memory capacity and bandwidth requirements on a single compute node and high communication volumes across multiple nodes. In this paper, we present DistGNN that optimizes the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters via an efficient shared memory implementation, communication reduction using a minimum vertex-cut graph partitioning algorithm and communication avoidance using a family of delayed-update algorithms. Our results on four common GNN benchmark datasets: Reddit, OGB-Products, OGB-Papers and Proteins, show up to 3.7× speed-up using a single CPU socket and up to 97× speed-up using 128 CPU sockets, respectively, over baseline DGL implementations running on a single CPU socket.
ISSN:	2167-4337
DOI:	10.1145/3458817.3480856