Scaling deep learning on GPU and knights landing clusters

Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory com...

Full description

Saved in:
Bibliographic Details
Published in:International Conference for High Performance Computing, Networking, Storage and Analysis (Online) pp. 1 - 12
Main Authors: You, Yang, Buluç, Aydın, Demmel, James
Format: Conference Proceeding
Language:English
Published: New York, NY, USA ACM 12.11.2017
Series:ACM Conferences
Subjects:
ISBN:9781450351140, 145035114X
ISSN:2167-4337
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster than existing counter-part methods (Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Sync EASGD achieves 5.3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.
AbstractList Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster than existing counter-part methods (Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Sync EASGD achieves 5.3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.
Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs.We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters.We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster than existing counterpart methods (Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Sync EASGD achieves 5. 3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation. CCS CONCEPTS * Computing methodologies → Massively parallel algorithms;
Author Buluç, Aydın
You, Yang
Demmel, James
Author_xml – sequence: 1
  givenname: Yang
  surname: You
  fullname: You, Yang
  email: youyang@cs.berkeley.edu
  organization: Computer Science Division
– sequence: 2
  givenname: Aydın
  surname: Buluç
  fullname: Buluç, Aydın
  email: abuluc@lbl.gov
  organization: Computer Science Division
– sequence: 3
  givenname: James
  surname: Demmel
  fullname: Demmel, James
  email: demmel@cs.berkeley.edu
  organization: Computer Science Division
BookMark eNqN0DtPwzAUBWDzkiglMwNLRpYEv507ogoKUiWQoLPlJNclaupUcRn49yQ0ExPT1dGnc4dzRc5DF5CQG0ZzxqS6F4xroEX-exk_IQmYYgAq1OD0lMw40yaTQpizP3ZJkhibkirKNFOSzgi8V65twiatEfdpi64PY-pCunxbpy7U6TY0m89DTNshjFS1X_GAfbwmF961EZPpzsn66fFj8ZytXpcvi4dV5kQhDhmvmEAOCgHAlOgc1U4KWksnCw-eioEMeKWorGoQteZKeg9YcK5LUxViTm6PfxtEtPu-2bn-2wJwzQ0bND-qq3a27LpttIzacSc77WSnnWzZN-iHwt0_C-IHNrNjSg
ContentType Conference Proceeding
Copyright 2017 ACM
Copyright_xml – notice: 2017 ACM
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3126908.3126912
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450351140
145035114X
EISSN 2167-4337
EndPage 12
ExternalDocumentID 9926271
Genre orig-research
GrantInformation_xml – fundername: Office of Science
  funderid: 10.13039/100006132
– fundername: Advanced Scientific Computing Research
  funderid: 10.13039/100006192
GroupedDBID 6IE
6IF
6IL
6IN
ABLEC
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
OCL
RIB
RIC
RIE
RIL
6IH
6IK
AAWTH
ADZIZ
CHZPO
IPLJI
ID FETCH-LOGICAL-a383t-2c13e295e9997beaa06a430d4a48f9f0395e79f5504cd93d6254ff9e8226b7c83
IEDL.DBID RIE
ISBN 9781450351140
145035114X
ISICitedReferencesCount 42
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000458161700009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:19:14 EDT 2025
Wed Jan 31 06:44:12 EST 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Keywords distributed deep learning
scalable algorithm
knights landing
Language English
License 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
LinkModel DirectLink
MeetingName SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis
MergedId FETCHMERGED-LOGICAL-a383t-2c13e295e9997beaa06a430d4a48f9f0395e79f5504cd93d6254ff9e8226b7c83
PageCount 12
ParticipantIDs acm_books_10_1145_3126908_3126912
acm_books_10_1145_3126908_3126912_brief
ieee_primary_9926271
PublicationCentury 2000
PublicationDate 2017-11-12
PublicationDateYYYYMMDD 2017-11-12
PublicationDate_xml – month: 11
  year: 2017
  text: 2017-11-12
  day: 12
PublicationDecade 2010
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationSeriesTitle ACM Conferences
PublicationTitle International Conference for High Performance Computing, Networking, Storage and Analysis (Online)
PublicationTitleAbbrev SC
PublicationYear 2017
Publisher ACM
Publisher_xml – name: ACM
SSID ssib050161540
ssj0003204180
Score 1.9056988
Snippet Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training...
SourceID ieee
acm
SourceType Publisher
StartPage 1
SubjectTerms Clustering algorithms
Computing methodologies -- Parallel computing methodologies -- Parallel algorithms -- Massively parallel algorithms
Deep learning
Distributed Deep Learning
Graphics processing units
High performance computing
Knights Landing
Machine learning algorithms
Neural networks
Scalable Algorithm
Training
Title Scaling deep learning on GPU and knights landing clusters
URI https://ieeexplore.ieee.org/document/9926271
WOSCitedRecordID wos000458161700009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1dS8MwFA3b8MGnqZs4v4gg-GK3NEnz8axOH2QMdLC3kqa3MtBu7MPfb9JmE0EQnxqaPJTTJPeeNudchK65C3KZ2wWjnKo84laDW3O6iGJJaEGMyoWxVbEJORqp6VSPG-h2p4UBgOrwGfR9s_qXn8_txn8qG2hvbucF400pZa3V2s6dpEpdgm-J34UZJTxWJLj5xDwZsJg6Kqj61dVXoGwa-_GjqEoVU4bt_z3NAep-i_PweBd2DlEDyiPU3lZnwGGxdpB-cfC7EfgeYIGDj-obnpf4cTzBpsxx7SKyws-1tAXb9423TVh10WT48Hr3FIVCCZFxBHMdURszoDoBl-3JDIwhwnBGcm64KnRBmOtyL8CREW5zzXLHeXhRaHDJgcikVewYtcp5CScIWyNiIJKClYnLpJjJhBGCOv4ttRFG9dCVQy31DGCV1qLmJA3IpgHZHrr5c0yaLWdQ9FDH45ouameNNEB6-vvtM7RPfVD1h_DoOWqtlxu4QHv2cz1bLS-r6fAFyeKtiw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1dS8MwFL3MKejT1E2cnxEEX-yWpmmbPKtz4hwDN9hbSdNbGWg39uHvN2m7iSCITy1tHsrJx72nyTkX4JqbIBebVdBJmEgcriWaOSdTxw0pS6kSSaB0Xmwi7PfFeCwHFbjdaGEQMT98hi17m-_lJ1O9sr_K2tKa21nB-LbPOXMLtdZ69Ph58lI6l9h12GOUu4KWfj4u99ueywwZFK38amtQbin98aOsSh5VOrX_fc8-NL7leWSwCTwHUMHsEGrr-gyknK51kK-mA0wLco84I6WT6huZZuRxMCIqS0jhI7IgvULcQvT7yhonLBow6jwM77pOWSrBUYZiLh2mXQ-Z9NHke2GMStFAcY8mXHGRypR65pXpAkNHuE6klxjWw9NUokkPgjjUwjuCajbN8BiIVoGLNGSoQ9_kUp6KAxUEzDDwUKpAiSZcGdQiywEWUSFr9qMS2ahEtgk3f7aJ4vkE0ybULa7RrPDWiEpIT35_fAm73eFLL-o99Z9PYY_ZEGuP5LEzqC7nKzyHHf25nCzmF_nQ-AIGW7DS
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Scaling+deep+learning+on+GPU+and+knights+landing+clusters&rft.au=You%2C+Yang&rft.au=Bulu%C3%A7%2C+Ayd%C4%B1n&rft.au=Demmel%2C+James&rft.series=ACM+Conferences&rft.date=2017-11-12&rft.pub=ACM&rft.isbn=9781450351140&rft.spage=1&rft.epage=12&rft_id=info:doi/10.1145%2F3126908.3126912
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450351140/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450351140/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450351140/sc.gif&client=summon&freeimage=true