Scaling deep learning on GPU and knights landing clusters

Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory com...

Full description

Saved in:

Bibliographic Details
Published in:	International Conference for High Performance Computing, Networking, Storage and Analysis (Online) pp. 1 - 12
Main Authors:	You, Yang, Buluç, Aydın, Demmel, James
Format:	Conference Proceeding
Language:	English
Published:	New York, NY, USA ACM 12.11.2017
Series:	ACM Conferences
Subjects:	Clustering algorithms Computing methodologies > Parallel computing methodologies > Parallel algorithms > Massively parallel algorithms Deep learning Distributed Deep Learning Graphics processing units High performance computing Knights Landing Machine learning algorithms Neural networks Scalable Algorithm Training distributed deep learning scalable algorithm knights landing
ISBN:	9781450351140, 145035114X
ISSN:	2167-4337
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster than existing counter-part methods (Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Sync EASGD achieves 5.3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.
ISBN:	9781450351140 145035114X
ISSN:	2167-4337
DOI:	10.1145/3126908.3126912