Reducing Communication in Graph Neural Network Training

Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making...

Full description

Saved in:

Bibliographic Details
Published in:	International Conference for High Performance Computing, Networking, Storage and Analysis (Online) Vol. 2020; pp. 1 - 14
Main Authors:	Tripathy, Alok, Yelick, Katherine, Buluc, Aydin
Format:	Conference Proceeding Journal Article
Language:	English
Published:	United States IEEE 01.11.2020
Subjects:	Clustering algorithms communication-avoiding algorithms distributed training Graph neural networks MATHEMATICS AND COMPUTING Proteins Sparse matrices Three-dimensional displays Training Two dimensional displays
ISSN:	2167-4329
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks. We introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods. We implement these algorithms, which are based on 1D, 1. 5D, 2D, and 3D sparse-dense matrix multiplication, using torch.distributed on GPU-equipped clusters. Our algorithms optimize communication across the full GNN training pipeline. We train GNNs on over a hundred GPUs on multiple datasets, including a protein network with over a billion edges.
Bibliography:	USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) National Science Foundation (NSF) AC02-05CH11231; DGE 1752814; 1823034; AC05-00OR22725 USDOE National Nuclear Security Administration (NNSA)
ISSN:	2167-4329
DOI:	10.1109/SC41405.2020.00074