Scaling deep learning on GPU and knights landing clusters

Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory com...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	International Conference for High Performance Computing, Networking, Storage and Analysis (Online) s. 1 - 12
Hlavní autori:	You, Yang, Buluç, Aydın, Demmel, James
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	New York, NY, USA ACM 12.11.2017
Edícia:	ACM Conferences
Predmet:	Clustering algorithms Computing methodologies > Parallel computing methodologies > Parallel algorithms > Massively parallel algorithms Deep learning Distributed Deep Learning Graphics processing units High performance computing Knights Landing Machine learning algorithms Neural networks Scalable Algorithm Training distributed deep learning scalable algorithm knights landing
ISBN:	9781450351140, 145035114X
ISSN:	2167-4337
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster than existing counter-part methods (Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Sync EASGD achieves 5.3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.
ISBN:	9781450351140 145035114X
ISSN:	2167-4337
DOI:	10.1145/3126908.3126912