Scaling deep learning on GPU and knights landing clusters

Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory com...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	International Conference for High Performance Computing, Networking, Storage and Analysis (Online) s. 1 - 12
Hlavní autoři:	You, Yang, Buluç, Aydın, Demmel, James
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	New York, NY, USA ACM 12.11.2017
Edice:	ACM Conferences
Témata:	Clustering algorithms Computing methodologies > Parallel computing methodologies > Parallel algorithms > Massively parallel algorithms Deep learning Distributed Deep Learning Graphics processing units High performance computing Knights Landing Machine learning algorithms Neural networks Scalable Algorithm Training distributed deep learning scalable algorithm knights landing
ISBN:	9781450351140, 145035114X
ISSN:	2167-4337
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster than existing counter-part methods (Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Sync EASGD achieves 5.3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.
ISBN:	9781450351140 145035114X
ISSN:	2167-4337
DOI:	10.1145/3126908.3126912