DeftNN addressing bottlenecks for DNN execution on GPUs via synapse vector elimination and near-compute data fission

Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images and video inputs. Although GPUs have gained popularity as a key accelerator platform for deep learning workloads, the increasing demand for D...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:MICRO-50 : the 50th annual IEEE/ACM International Symposium on Microarchitecture : proceedings : October 14-18, 2017, Cambridge, MA s. 786 - 799
Hlavní autori: Hill, Parker, Jain, Animesh, Hill, Mason, Zamirai, Babak, Hsu, Chang-Hong, Laurenzano, Michael A., Mahlke, Scott, Tang, Lingjia, Mars, Jason
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: New York, NY, USA ACM 14.10.2017
Edícia:ACM Conferences
Predmet:
ISBN:1450349528, 9781450349529
ISSN:2379-3155
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images and video inputs. Although GPUs have gained popularity as a key accelerator platform for deep learning workloads, the increasing demand for DNN computation leaves a significant gap between the compute capabilities of GPU-enabled datacenters and the compute needed to service demand. The state-of-the-art techniques to improve DNN performance have significant limitations in bridging the gap on real systems. Current network pruning techniques remove computation, but the resulting networks map poorly to GPU architectures, yielding no performance benefit or even slowdowns. Meanwhile, current bandwidth optimization techniques focus on reducing off-chip bandwidth while overlooking on-chip bandwidth, a key DNN bottleneck. To address these limitations, this work introduces DeftNN, a GPU DNN execution framework that targets the key architectural bottlenecks of DNNs on GPUs to automatically and transparently improve execution performance. DeftNN is composed of two novel optimization techniques - (1) synapse vector elimination, a technique that identifies non-contributing synapses in the DNN and carefully transforms data and removes the computation and data movement of these synapses while fully utilizing the GPU to improve performance, and (2) near-compute data fission, a mechanism for scaling down the on-chip data movement requirements within DNN computations. Our evaluation of DeftNN spans 6 state-of-the-art DNNs. By applying both optimizations in concert, DeftNN is able to achieve an average speedup of 2.1X on real GPU hardware. We also introduce a small additional hardware unit per GPU core to facilitate efficient data fission operations, increasing the speedup achieved by DeftNN to 2.6X.
AbstractList Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images and video inputs. Although GPUs have gained popularity as a key accelerator platform for deep learning workloads, the increasing demand for DNN computation leaves a significant gap between the compute capabilities of GPU-enabled data centers and the compute needed to service demand. The state-of-the-art techniques to improve DNN performance have significant limitations in bridging the gap on real systems. Current network pruning techniques remove computation, but the resulting networks map poorly to GPU architectures, yielding no performance benefit or even slowdowns. Meanwhile, current bandwidth optimization techniques focus on reducing off-chip bandwidth while overlooking on-chip bandwidth, a key DNN bottleneck. To address these limitations, this work introduces DeftNN, a GPU DNN execution framework that targets the key architectural bottlenecks of DNNs on GPUs to automatically and transparently improve execution performance. DeftNN is composed of two novel optimization techniques - (1) synapse vector elimination, a technique that identifies non-contributing synapses in the DNN and carefully transforms data and removes the computation and data movement of these synapses while fully utilizing the GPU to improve performance, and (2) near-compute data fission, a mechanism for scaling down the on-chip data movement requirements within DNN computations. Our evaluation of DeftNN spans 6 state-of-the-art DNNs. By applying both optimizations in concert, DeftNN is able to achieve an average speedup of 2.1× on real GPU hardware. We also introduce a small additional hardware unit per GPU core to facilitate efficient data fission operations, increasing the speedup achieved by DeftNN to 2.6×.
Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images and video inputs. Although GPUs have gained popularity as a key accelerator platform for deep learning workloads, the increasing demand for DNN computation leaves a significant gap between the compute capabilities of GPU-enabled datacenters and the compute needed to service demand. The state-of-the-art techniques to improve DNN performance have significant limitations in bridging the gap on real systems. Current network pruning techniques remove computation, but the resulting networks map poorly to GPU architectures, yielding no performance benefit or even slowdowns. Meanwhile, current bandwidth optimization techniques focus on reducing off-chip bandwidth while overlooking on-chip bandwidth, a key DNN bottleneck. To address these limitations, this work introduces DeftNN, a GPU DNN execution framework that targets the key architectural bottlenecks of DNNs on GPUs to automatically and transparently improve execution performance. DeftNN is composed of two novel optimization techniques - (1) synapse vector elimination, a technique that identifies non-contributing synapses in the DNN and carefully transforms data and removes the computation and data movement of these synapses while fully utilizing the GPU to improve performance, and (2) near-compute data fission, a mechanism for scaling down the on-chip data movement requirements within DNN computations. Our evaluation of DeftNN spans 6 state-of-the-art DNNs. By applying both optimizations in concert, DeftNN is able to achieve an average speedup of 2.1X on real GPU hardware. We also introduce a small additional hardware unit per GPU core to facilitate efficient data fission operations, increasing the speedup achieved by DeftNN to 2.6X.
Author Jain, Animesh
Mahlke, Scott
Hill, Parker
Laurenzano, Michael A.
Zamirai, Babak
Hsu, Chang-Hong
Hill, Mason
Tang, Lingjia
Mars, Jason
Author_xml – sequence: 1
  givenname: Parker
  surname: Hill
  fullname: Hill, Parker
  email: parkerhh@umich.edu
  organization: University of Michigan
– sequence: 2
  givenname: Animesh
  surname: Jain
  fullname: Jain, Animesh
  email: anijain@umich.edu
  organization: University of Michigan
– sequence: 3
  givenname: Mason
  surname: Hill
  fullname: Hill, Mason
  email: hillm3@unlv.nevada.edu
  organization: University of Michigan and University of Nevada
– sequence: 4
  givenname: Babak
  surname: Zamirai
  fullname: Zamirai, Babak
  email: zamirai@umich.edu
  organization: University of Michigan
– sequence: 5
  givenname: Chang-Hong
  surname: Hsu
  fullname: Hsu, Chang-Hong
  email: hsuch@umich.edu
  organization: University of Michigan
– sequence: 6
  givenname: Michael A.
  surname: Laurenzano
  fullname: Laurenzano, Michael A.
  email: mlaurenz@umich.edu
  organization: University of Michigan
– sequence: 7
  givenname: Scott
  surname: Mahlke
  fullname: Mahlke, Scott
  email: mahlke@umich.edu
  organization: University of Michigan
– sequence: 8
  givenname: Lingjia
  surname: Tang
  fullname: Tang, Lingjia
  email: lingjia@umich.edu
  organization: University of Michigan
– sequence: 9
  givenname: Jason
  surname: Mars
  fullname: Mars, Jason
  email: profmars@umich.edu
  organization: University of Michigan
BookMark eNqNj7tOAzEQRc1TJCE1BT9As8uMx88ShUciRaGB2rK9trRAsmg3DX-PIVtQUl1p7tEdnSk73XW7xNgVQo0o5C0hJ0u2_k0NR2xarkDCSm6O2YSTthWhlCd_iws2H4Y3AOCorUKasPP7lPebzSU7y_5jSPMxZ-z18eFlsazWz0-rxd268lzofeVB8Zh1bAwo0sbbHLnwCEpRI0LMGQMI2QiJmEP5oFESaYXGIjZReJqx68Num1Jyn3279f2XM8qoApb25tD6uHWh694Hh-B-dN2o60bdgtb_RF3o25TpG68ATpE
ContentType Conference Proceeding
Copyright 2017 ACM
Copyright_xml – notice: 2017 ACM
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3123939.3123970
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1450349528
9781450349529
EISSN 2379-3155
EndPage 799
ExternalDocumentID 8686533
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
ABLEC
ACM
ADPZR
ALMA_UNASSIGNED_HOLDINGS
APO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
GUFHI
IEGSK
OCL
RIB
RIC
RIE
RIL
AAWTH
LHSKQ
ID FETCH-LOGICAL-a247t-a062cf7cd806378a9fc24a10663d4bcff1b045d4511fb796715337618911dc4a3
IEDL.DBID RIE
ISBN 1450349528
9781450349529
ISICitedReferencesCount 44
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000455679300059&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:38:44 EDT 2025
Wed Jan 31 06:40:42 EST 2024
IsPeerReviewed false
IsScholarly true
Keywords memory bandwidth
GPU architecture
performance optimization
deep neural networks
Language English
License Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org
LinkModel DirectLink
MeetingName MICRO-50: The 50th Annual IEEE/ACM International Symposium on Microarchitecture
MergedId FETCHMERGED-LOGICAL-a247t-a062cf7cd806378a9fc24a10663d4bcff1b045d4511fb796715337618911dc4a3
PageCount 14
ParticipantIDs acm_books_10_1145_3123939_3123970
acm_books_10_1145_3123939_3123970_brief
ieee_primary_8686533
PublicationCentury 2000
PublicationDate 20171014
2017-Oct.
PublicationDateYYYYMMDD 2017-10-14
2017-10-01
PublicationDate_xml – month: 10
  year: 2017
  text: 20171014
  day: 14
PublicationDecade 2010
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationSeriesTitle ACM Conferences
PublicationTitle MICRO-50 : the 50th annual IEEE/ACM International Symposium on Microarchitecture : proceedings : October 14-18, 2017, Cambridge, MA
PublicationTitleAbbrev MICRO
PublicationYear 2017
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0002179613
ssib030238632
ssib042476800
ssib023363937
Score 2.3371382
Snippet Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images...
SourceID ieee
acm
SourceType Publisher
StartPage 786
SubjectTerms Bandwidth
Computer systems organization -- Architectures -- Parallel architectures -- Single instruction, multiple data
Computing methodologies -- Machine learning -- Machine learning approaches -- Neural networks
Deep Neural Networks
General and reference -- Cross-computing tools and techniques -- Performance
GPU Architecture
Graphics processing units
Hardware
Memory Bandwidth
Memory management
Optimization
Performance Optimization
Synapses
System-on-chip
Subtitle addressing bottlenecks for DNN execution on GPUs via synapse vector elimination and near-compute data fission
Title DeftNN
URI https://ieeexplore.ieee.org/document/8686533
WOSCitedRecordID wos000455679300059&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFD5s4oNPXjZx3ogg-GLVNGna-aZu06cy0IlvJc1FhtrJ2on-e0-yegNBhEJLE2g4J8n5kuZ8H8B-rjHucJeNYzh1EmYskCYygeUytDieLDWJF5uI0zS5u-sOG3D4mQtjjPGHz8yRe_T_8vVEzdxW2XEiEoHwpAnNOBbzXK2PvhMyJti3UOu0cBLxlTPJQ47Auqaic7M0QvEuhrKa7Yfy6JhRRwfWPfJ3J17clOrph-iKjzmD5f-1dgXaX8l7ZPgZllahYYo1WP5QbyD1YG7BrGdslaan5Exrfxi2uCfnE0dpXBj1UBJEs6SXpqT_apTvnQSvy-GoJC9jSa7fCvlcGnLrd_1J_9Grg_lqstAkxQEUqPk3SU9WkgzQw1jahtGgf3NxFdQiDIFEk1WBPBGhsrHSCYKZOJFdq0IuqUMqmufKWorGjbSjObM5mjR2ADIWNMFZVCsu2TosFJPCbACRWKx1eBJpoXiOSzPKhAqF4wykLNesA3to8cytLspsnjAdZbVXstorHTj4s06WT8fGdqDlfJI9z1k7stodm7-_3oKl0AVsf0xvGxaq6czswKJ6qcbldNd3tXdfxcpH
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS-QwFD54A33yjuO6axYEX7Y6uTTt7Nu6M6OLWgZWF99KmosMakemHVn__Z5k6g2ERSi0NIGGc5KcL2nO9wHsFQbjjvDZOFZQL2HGI2VjGzmhmMPx5KhNg9hEkmXp1VVnMAPfnnNhrLXh8Jk98I_hX74Z6YnfKjtMZSoRnszCfCwEa0-ztZ56D-Nc8lfB1qvhpPIla1IwgdC6IaPz8zSC8Q4Gs4bvh4r4kFNPCNY5CHcvXzyr9N0b2ZUQdfrLH2vvCmy8pO-RwXNgWoUZW67B8pN-A2mG8zpMutbVWfad_DAmHIctr8nRyJMal1bfVATxLOlmGen9tTr0T4LX8eCyIg9DRX4_luq-suRP2PcnvdugDxaqqdKQDIdQpKffJF1VK9JHH2PpBlz2exc_T6JGhiFSaLI6Um3JtEu0SRHOJKnqOM2Eoh6rGFFo5ygaNzae6MwVaNLEQ8hE0hTnUaOF4pswV45KuwVEYbExrB0bqUWBizPKpWbSswZSXhjegq9o8dyvL6p8mjId541X8sYrLdj_b528GA-ta8G690l-P-XtyBt3bL__ehcWTy7Oz_KzX9npJ1hiPnyHQ3s7MFePJ_YzLOiHeliNv4Ru9w8Xwc2O
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+50th+Annual+IEEE%2FACM+International+Symposium+on+Microarchitecture&rft.atitle=DeftNN&rft.au=Hill%2C+Parker&rft.au=Jain%2C+Animesh&rft.au=Hill%2C+Mason&rft.au=Zamirai%2C+Babak&rft.series=ACM+Conferences&rft.date=2017-10-14&rft.pub=ACM&rft.isbn=1450349528&rft.spage=786&rft.epage=799&rft_id=info:doi/10.1145%2F3123939.3123970
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450349529/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450349529/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450349529/sc.gif&client=summon&freeimage=true