DeftNN addressing bottlenecks for DNN execution on GPUs via synapse vector elimination and near-compute data fission
Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images and video inputs. Although GPUs have gained popularity as a key accelerator platform for deep learning workloads, the increasing demand for D...
Uložené v:
| Vydané v: | MICRO-50 : the 50th annual IEEE/ACM International Symposium on Microarchitecture : proceedings : October 14-18, 2017, Cambridge, MA s. 786 - 799 |
|---|---|
| Hlavní autori: | , , , , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
New York, NY, USA
ACM
14.10.2017
|
| Edícia: | ACM Conferences |
| Predmet: | |
| ISBN: | 1450349528, 9781450349529 |
| ISSN: | 2379-3155 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images and video inputs. Although GPUs have gained popularity as a key accelerator platform for deep learning workloads, the increasing demand for DNN computation leaves a significant gap between the compute capabilities of GPU-enabled datacenters and the compute needed to service demand.
The state-of-the-art techniques to improve DNN performance have significant limitations in bridging the gap on real systems. Current network pruning techniques remove computation, but the resulting networks map poorly to GPU architectures, yielding no performance benefit or even slowdowns. Meanwhile, current bandwidth optimization techniques focus on reducing off-chip bandwidth while overlooking on-chip bandwidth, a key DNN bottleneck.
To address these limitations, this work introduces DeftNN, a GPU DNN execution framework that targets the key architectural bottlenecks of DNNs on GPUs to automatically and transparently improve execution performance. DeftNN is composed of two novel optimization techniques - (1) synapse vector elimination, a technique that identifies non-contributing synapses in the DNN and carefully transforms data and removes the computation and data movement of these synapses while fully utilizing the GPU to improve performance, and (2) near-compute data fission, a mechanism for scaling down the on-chip data movement requirements within DNN computations. Our evaluation of DeftNN spans 6 state-of-the-art DNNs. By applying both optimizations in concert, DeftNN is able to achieve an average speedup of 2.1X on real GPU hardware. We also introduce a small additional hardware unit per GPU core to facilitate efficient data fission operations, increasing the speedup achieved by DeftNN to 2.6X. |
|---|---|
| AbstractList | Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images and video inputs. Although GPUs have gained popularity as a key accelerator platform for deep learning workloads, the increasing demand for DNN computation leaves a significant gap between the compute capabilities of GPU-enabled data centers and the compute needed to service demand. The state-of-the-art techniques to improve DNN performance have significant limitations in bridging the gap on real systems. Current network pruning techniques remove computation, but the resulting networks map poorly to GPU architectures, yielding no performance benefit or even slowdowns. Meanwhile, current bandwidth optimization techniques focus on reducing off-chip bandwidth while overlooking on-chip bandwidth, a key DNN bottleneck. To address these limitations, this work introduces DeftNN, a GPU DNN execution framework that targets the key architectural bottlenecks of DNNs on GPUs to automatically and transparently improve execution performance. DeftNN is composed of two novel optimization techniques - (1) synapse vector elimination, a technique that identifies non-contributing synapses in the DNN and carefully transforms data and removes the computation and data movement of these synapses while fully utilizing the GPU to improve performance, and (2) near-compute data fission, a mechanism for scaling down the on-chip data movement requirements within DNN computations. Our evaluation of DeftNN spans 6 state-of-the-art DNNs. By applying both optimizations in concert, DeftNN is able to achieve an average speedup of 2.1× on real GPU hardware. We also introduce a small additional hardware unit per GPU core to facilitate efficient data fission operations, increasing the speedup achieved by DeftNN to 2.6×. Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images and video inputs. Although GPUs have gained popularity as a key accelerator platform for deep learning workloads, the increasing demand for DNN computation leaves a significant gap between the compute capabilities of GPU-enabled datacenters and the compute needed to service demand. The state-of-the-art techniques to improve DNN performance have significant limitations in bridging the gap on real systems. Current network pruning techniques remove computation, but the resulting networks map poorly to GPU architectures, yielding no performance benefit or even slowdowns. Meanwhile, current bandwidth optimization techniques focus on reducing off-chip bandwidth while overlooking on-chip bandwidth, a key DNN bottleneck. To address these limitations, this work introduces DeftNN, a GPU DNN execution framework that targets the key architectural bottlenecks of DNNs on GPUs to automatically and transparently improve execution performance. DeftNN is composed of two novel optimization techniques - (1) synapse vector elimination, a technique that identifies non-contributing synapses in the DNN and carefully transforms data and removes the computation and data movement of these synapses while fully utilizing the GPU to improve performance, and (2) near-compute data fission, a mechanism for scaling down the on-chip data movement requirements within DNN computations. Our evaluation of DeftNN spans 6 state-of-the-art DNNs. By applying both optimizations in concert, DeftNN is able to achieve an average speedup of 2.1X on real GPU hardware. We also introduce a small additional hardware unit per GPU core to facilitate efficient data fission operations, increasing the speedup achieved by DeftNN to 2.6X. |
| Author | Jain, Animesh Mahlke, Scott Hill, Parker Laurenzano, Michael A. Zamirai, Babak Hsu, Chang-Hong Hill, Mason Tang, Lingjia Mars, Jason |
| Author_xml | – sequence: 1 givenname: Parker surname: Hill fullname: Hill, Parker email: parkerhh@umich.edu organization: University of Michigan – sequence: 2 givenname: Animesh surname: Jain fullname: Jain, Animesh email: anijain@umich.edu organization: University of Michigan – sequence: 3 givenname: Mason surname: Hill fullname: Hill, Mason email: hillm3@unlv.nevada.edu organization: University of Michigan and University of Nevada – sequence: 4 givenname: Babak surname: Zamirai fullname: Zamirai, Babak email: zamirai@umich.edu organization: University of Michigan – sequence: 5 givenname: Chang-Hong surname: Hsu fullname: Hsu, Chang-Hong email: hsuch@umich.edu organization: University of Michigan – sequence: 6 givenname: Michael A. surname: Laurenzano fullname: Laurenzano, Michael A. email: mlaurenz@umich.edu organization: University of Michigan – sequence: 7 givenname: Scott surname: Mahlke fullname: Mahlke, Scott email: mahlke@umich.edu organization: University of Michigan – sequence: 8 givenname: Lingjia surname: Tang fullname: Tang, Lingjia email: lingjia@umich.edu organization: University of Michigan – sequence: 9 givenname: Jason surname: Mars fullname: Mars, Jason email: profmars@umich.edu organization: University of Michigan |
| BookMark | eNqNj7tOAzEQRc1TJCE1BT9As8uMx88ShUciRaGB2rK9trRAsmg3DX-PIVtQUl1p7tEdnSk73XW7xNgVQo0o5C0hJ0u2_k0NR2xarkDCSm6O2YSTthWhlCd_iws2H4Y3AOCorUKasPP7lPebzSU7y_5jSPMxZ-z18eFlsazWz0-rxd268lzofeVB8Zh1bAwo0sbbHLnwCEpRI0LMGQMI2QiJmEP5oFESaYXGIjZReJqx68Num1Jyn3279f2XM8qoApb25tD6uHWh694Hh-B-dN2o60bdgtb_RF3o25TpG68ATpE |
| ContentType | Conference Proceeding |
| Copyright | 2017 ACM |
| Copyright_xml | – notice: 2017 ACM |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3123939.3123970 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1450349528 9781450349529 |
| EISSN | 2379-3155 |
| EndPage | 799 |
| ExternalDocumentID | 8686533 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAJGR ABLEC ACM ADPZR ALMA_UNASSIGNED_HOLDINGS APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK GUFHI IEGSK OCL RIB RIC RIE RIL AAWTH LHSKQ |
| ID | FETCH-LOGICAL-a247t-a062cf7cd806378a9fc24a10663d4bcff1b045d4511fb796715337618911dc4a3 |
| IEDL.DBID | RIE |
| ISBN | 1450349528 9781450349529 |
| ISICitedReferencesCount | 44 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000455679300059&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:38:44 EDT 2025 Wed Jan 31 06:40:42 EST 2024 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Keywords | memory bandwidth GPU architecture performance optimization deep neural networks |
| Language | English |
| License | Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org |
| LinkModel | DirectLink |
| MeetingName | MICRO-50: The 50th Annual IEEE/ACM International Symposium on Microarchitecture |
| MergedId | FETCHMERGED-LOGICAL-a247t-a062cf7cd806378a9fc24a10663d4bcff1b045d4511fb796715337618911dc4a3 |
| PageCount | 14 |
| ParticipantIDs | acm_books_10_1145_3123939_3123970 acm_books_10_1145_3123939_3123970_brief ieee_primary_8686533 |
| PublicationCentury | 2000 |
| PublicationDate | 20171014 2017-Oct. |
| PublicationDateYYYYMMDD | 2017-10-14 2017-10-01 |
| PublicationDate_xml | – month: 10 year: 2017 text: 20171014 day: 14 |
| PublicationDecade | 2010 |
| PublicationPlace | New York, NY, USA |
| PublicationPlace_xml | – name: New York, NY, USA |
| PublicationSeriesTitle | ACM Conferences |
| PublicationTitle | MICRO-50 : the 50th annual IEEE/ACM International Symposium on Microarchitecture : proceedings : October 14-18, 2017, Cambridge, MA |
| PublicationTitleAbbrev | MICRO |
| PublicationYear | 2017 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0002179613 ssib030238632 ssib042476800 ssib023363937 |
| Score | 2.3371382 |
| Snippet | Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images... |
| SourceID | ieee acm |
| SourceType | Publisher |
| StartPage | 786 |
| SubjectTerms | Bandwidth Computer systems organization -- Architectures -- Parallel architectures -- Single instruction, multiple data Computing methodologies -- Machine learning -- Machine learning approaches -- Neural networks Deep Neural Networks General and reference -- Cross-computing tools and techniques -- Performance GPU Architecture Graphics processing units Hardware Memory Bandwidth Memory management Optimization Performance Optimization Synapses System-on-chip |
| Subtitle | addressing bottlenecks for DNN execution on GPUs via synapse vector elimination and near-compute data fission |
| Title | DeftNN |
| URI | https://ieeexplore.ieee.org/document/8686533 |
| WOSCitedRecordID | wos000455679300059&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFD5s4oNPXjZx3ogg-GLVNGna-aZu06cy0IlvJc1FhtrJ2on-e0-yegNBhEJLE2g4J8n5kuZ8H8B-rjHucJeNYzh1EmYskCYygeUytDieLDWJF5uI0zS5u-sOG3D4mQtjjPGHz8yRe_T_8vVEzdxW2XEiEoHwpAnNOBbzXK2PvhMyJti3UOu0cBLxlTPJQ47Auqaic7M0QvEuhrKa7Yfy6JhRRwfWPfJ3J17clOrph-iKjzmD5f-1dgXaX8l7ZPgZllahYYo1WP5QbyD1YG7BrGdslaan5Exrfxi2uCfnE0dpXBj1UBJEs6SXpqT_apTvnQSvy-GoJC9jSa7fCvlcGnLrd_1J_9Grg_lqstAkxQEUqPk3SU9WkgzQw1jahtGgf3NxFdQiDIFEk1WBPBGhsrHSCYKZOJFdq0IuqUMqmufKWorGjbSjObM5mjR2ADIWNMFZVCsu2TosFJPCbACRWKx1eBJpoXiOSzPKhAqF4wykLNesA3to8cytLspsnjAdZbVXstorHTj4s06WT8fGdqDlfJI9z1k7stodm7-_3oKl0AVsf0xvGxaq6czswKJ6qcbldNd3tXdfxcpH |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS-QwFD54A33yjuO6axYEX7Y6uTTt7Nu6M6OLWgZWF99KmosMakemHVn__Z5k6g2ERSi0NIGGc5KcL2nO9wHsFQbjjvDZOFZQL2HGI2VjGzmhmMPx5KhNg9hEkmXp1VVnMAPfnnNhrLXh8Jk98I_hX74Z6YnfKjtMZSoRnszCfCwEa0-ztZ56D-Nc8lfB1qvhpPIla1IwgdC6IaPz8zSC8Q4Gs4bvh4r4kFNPCNY5CHcvXzyr9N0b2ZUQdfrLH2vvCmy8pO-RwXNgWoUZW67B8pN-A2mG8zpMutbVWfad_DAmHIctr8nRyJMal1bfVATxLOlmGen9tTr0T4LX8eCyIg9DRX4_luq-suRP2PcnvdugDxaqqdKQDIdQpKffJF1VK9JHH2PpBlz2exc_T6JGhiFSaLI6Um3JtEu0SRHOJKnqOM2Eoh6rGFFo5ygaNzae6MwVaNLEQ8hE0hTnUaOF4pswV45KuwVEYbExrB0bqUWBizPKpWbSswZSXhjegq9o8dyvL6p8mjId541X8sYrLdj_b528GA-ta8G690l-P-XtyBt3bL__ehcWTy7Oz_KzX9npJ1hiPnyHQ3s7MFePJ_YzLOiHeliNv4Ru9w8Xwc2O |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+50th+Annual+IEEE%2FACM+International+Symposium+on+Microarchitecture&rft.atitle=DeftNN&rft.au=Hill%2C+Parker&rft.au=Jain%2C+Animesh&rft.au=Hill%2C+Mason&rft.au=Zamirai%2C+Babak&rft.series=ACM+Conferences&rft.date=2017-10-14&rft.pub=ACM&rft.isbn=1450349528&rft.spage=786&rft.epage=799&rft_id=info:doi/10.1145%2F3123939.3123970 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450349529/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450349529/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450349529/sc.gif&client=summon&freeimage=true |

