Scaling deep learning on GPU and knights landing clusters
Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory com...
Saved in:
| Published in: | International Conference for High Performance Computing, Networking, Storage and Analysis (Online) pp. 1 - 12 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
New York, NY, USA
ACM
12.11.2017
|
| Series: | ACM Conferences |
| Subjects: | |
| ISBN: | 9781450351140, 145035114X |
| ISSN: | 2167-4337 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs.
We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters.
We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster than existing counter-part methods (Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Sync EASGD achieves 5.3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation. |
|---|---|
| AbstractList | Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs.
We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters.
We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster than existing counter-part methods (Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Sync EASGD achieves 5.3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation. Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs.We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters.We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster than existing counterpart methods (Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Sync EASGD achieves 5. 3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation. CCS CONCEPTS * Computing methodologies → Massively parallel algorithms; |
| Author | Buluç, Aydın You, Yang Demmel, James |
| Author_xml | – sequence: 1 givenname: Yang surname: You fullname: You, Yang email: youyang@cs.berkeley.edu organization: Computer Science Division – sequence: 2 givenname: Aydın surname: Buluç fullname: Buluç, Aydın email: abuluc@lbl.gov organization: Computer Science Division – sequence: 3 givenname: James surname: Demmel fullname: Demmel, James email: demmel@cs.berkeley.edu organization: Computer Science Division |
| BookMark | eNqN0DtPwzAUBWDzkiglMwNLRpYEv507ogoKUiWQoLPlJNclaupUcRn49yQ0ExPT1dGnc4dzRc5DF5CQG0ZzxqS6F4xroEX-exk_IQmYYgAq1OD0lMw40yaTQpizP3ZJkhibkirKNFOSzgi8V65twiatEfdpi64PY-pCunxbpy7U6TY0m89DTNshjFS1X_GAfbwmF961EZPpzsn66fFj8ZytXpcvi4dV5kQhDhmvmEAOCgHAlOgc1U4KWksnCw-eioEMeKWorGoQteZKeg9YcK5LUxViTm6PfxtEtPu-2bn-2wJwzQ0bND-qq3a27LpttIzacSc77WSnnWzZN-iHwt0_C-IHNrNjSg |
| ContentType | Conference Proceeding |
| Copyright | 2017 ACM |
| Copyright_xml | – notice: 2017 ACM |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3126908.3126912 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781450351140 145035114X |
| EISSN | 2167-4337 |
| EndPage | 12 |
| ExternalDocumentID | 9926271 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Office of Science funderid: 10.13039/100006132 – fundername: Advanced Scientific Computing Research funderid: 10.13039/100006192 |
| GroupedDBID | 6IE 6IF 6IL 6IN ABLEC ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK OCL RIB RIC RIE RIL 6IH 6IK AAWTH ADZIZ CHZPO IPLJI |
| ID | FETCH-LOGICAL-a383t-2c13e295e9997beaa06a430d4a48f9f0395e79f5504cd93d6254ff9e8226b7c83 |
| IEDL.DBID | RIE |
| ISBN | 9781450351140 145035114X |
| ISICitedReferencesCount | 42 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000458161700009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:19:14 EDT 2025 Wed Jan 31 06:44:12 EST 2024 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Keywords | distributed deep learning scalable algorithm knights landing |
| Language | English |
| License | 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. |
| LinkModel | DirectLink |
| MeetingName | SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis |
| MergedId | FETCHMERGED-LOGICAL-a383t-2c13e295e9997beaa06a430d4a48f9f0395e79f5504cd93d6254ff9e8226b7c83 |
| PageCount | 12 |
| ParticipantIDs | acm_books_10_1145_3126908_3126912 acm_books_10_1145_3126908_3126912_brief ieee_primary_9926271 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-11-12 |
| PublicationDateYYYYMMDD | 2017-11-12 |
| PublicationDate_xml | – month: 11 year: 2017 text: 2017-11-12 day: 12 |
| PublicationDecade | 2010 |
| PublicationPlace | New York, NY, USA |
| PublicationPlace_xml | – name: New York, NY, USA |
| PublicationSeriesTitle | ACM Conferences |
| PublicationTitle | International Conference for High Performance Computing, Networking, Storage and Analysis (Online) |
| PublicationTitleAbbrev | SC |
| PublicationYear | 2017 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssib050161540 ssj0003204180 |
| Score | 1.9056988 |
| Snippet | Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training... |
| SourceID | ieee acm |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Clustering algorithms Computing methodologies -- Parallel computing methodologies -- Parallel algorithms -- Massively parallel algorithms Deep learning Distributed Deep Learning Graphics processing units High performance computing Knights Landing Machine learning algorithms Neural networks Scalable Algorithm Training |
| Title | Scaling deep learning on GPU and knights landing clusters |
| URI | https://ieeexplore.ieee.org/document/9926271 |
| WOSCitedRecordID | wos000458161700009&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1dS8MwFL1swwefpm7i_CKC4Ivd2iRtmmd1-iBjoIO9lTa5kYF2Yx_-fpM2mwiC-NSSplAOt733pDnnAlwnQmpUKAKkTAdc8TAo7JBbt4q1NDZE4rrZhBiN0ulUjhtwu9PCIGK1-Qz77rT6l6_nauOWygbSmds5wXhTCFFrtbaxE1eli_ctcV9hRkMepaF384l4PGARtVQw7VdH14GymauPH01VqpwybP_vaQ6g-y3OI-Nd2jmEBpZH0N52ZyD-Ze2AfLHw2xnkHnFBvI_qG5mX5HE8IXmpSe0isiLPtbSFqPeNs01YdWEyfHi9ewp8o4QgtwRzHVAVMaQyRlvtiQLzPExyzkLNc54aaUJmLwlpLBnhSkumLefhxki0xUFSCJWyY2iV8xJPgKRFwRnX9hbUPLIUMylMSiUmKFUkedyDK4ta5hjAKqtFzXHmkc08sj24-XNOVixnaHrQcbhmi9pZI_OQnv4-fAb71CVVtwmPnkNrvdzgBeypz_VstbyswuEL10mtXA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFD7MKejT1E2c1wiCL3Zrk_SSZ3VOnGPgBnsrbXIiA-3GLv5-k7abCIL41JCmUE5Pes6X5PsOwHUQCoUSQwcpUw6X3HVS02XXrXwltHERvyg2Efb70XgsBhW43XBhEDE_fIYt28z38tVUruxSWVtYcTtLGN_2OadewdZae4-fJy-lcon9DzPqci9ySz0fj_tt5lEDBqNWfrU1KLcS-fGjrEoeVTq1_73PPjS-6XlksAk8B1DB7BBq6_oMpJyudRCv5gOYEeQecUZKJdU3Ms3I42BEkkyRQkdkQXoFuYXI95UVTlg0YNR5GN51nbJUgpMYiLl0qPQYUuGjyffCFJPEDRLOXMUTHmmhXWZuhUIbOMKlEkwZ1MO1FmjSgyANZcSOoJpNMzwGEqUpZ1yZR1Bxz4DMINURFRigkJ7gfhOujNViiwEWcUFr9uPSsnFp2Sbc_DkmTucT1E2oW7vGs0JbIy5NevJ79yXsdocvvbj31H8-hT1qQ6w9kkfPoLqcr_AcduTncrKYX-Su8QURsLCj |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=Scaling+deep+learning+on+GPU+and+knights+landing+clusters&rft.au=You%2C+Yang&rft.au=Bulu%C3%A7%2C+Ayd%C4%B1n&rft.au=Demmel%2C+James&rft.series=ACM+Conferences&rft.date=2017-11-12&rft.pub=ACM&rft.isbn=9781450351140&rft.spage=1&rft.epage=12&rft_id=info:doi/10.1145%2F3126908.3126912 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450351140/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450351140/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450351140/sc.gif&client=summon&freeimage=true |

