DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks

Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. It is challenging due to large memory capacity and bandwidth requirements on a single compute node and high communication v...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	SC21: International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 14
Hlavní autoři:	Md, Vasimuddin, Misra, Sanchit, Ma, Guixiang, Mohanty, Ramanarayan, Georganas, Evangelos, Heinecke, Alexander, Kalamkar, Dhiraj, Ahmed, Nesreen K., Avancha, Sasikanth
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	ACM 14.11.2021
Témata:	Clustering algorithms Deep Graph Library Deep Learning Distributed Algorithm Graph Neural Networks Graph Partition High performance computing Memory management Proteins Social networking (online) Sockets Training
ISSN:	2167-4337
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. It is challenging due to large memory capacity and bandwidth requirements on a single compute node and high communication volumes across multiple nodes. In this paper, we present DistGNN that optimizes the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters via an efficient shared memory implementation, communication reduction using a minimum vertex-cut graph partitioning algorithm and communication avoidance using a family of delayed-update algorithms. Our results on four common GNN benchmark datasets: Reddit, OGB-Products, OGB-Papers and Proteins, show up to 3.7× speed-up using a single CPU socket and up to 97× speed-up using 128 CPU sockets, respectively, over baseline DGL implementations running on a single CPU socket.
ISSN:	2167-4337
DOI:	10.1145/3458817.3480856