ETA: An Efficient Training Accelerator for DNNs Based on Hardware-Algorithm Co-Optimization

Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transaction on neural networks and learning systems Ročník 34; číslo 10; s. 7660 - 7674
Hlavní autori: Lu, Jinming, Ni, Chao, Wang, Zhongfeng
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States IEEE 01.10.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:2162-237X, 2162-2388, 2162-2388
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal with these issues, in this work, we implement an efficient training accelerator (ETA) on field-programmable gate array (FPGA) by adopting a hardware-algorithm co-optimization approach. A novel training scheme is proposed to effectively train DNNs using 8-bit precision with arbitrary batch sizes, in which a compact but powerful data format and a hardware-oriented normalization layer are introduced. Thus the computational complexity and memory accesses are significantly reduced. In the ETA, a reconfigurable processing element (PE) is designed to support various computational patterns during training while avoiding redundant calculations from nonunit-stride convolutional layers. With a flexible network-on-chip (NoC) and a hierarchical PE array, computational parallelism and data reuse can be fully exploited, and memory accesses are further reduced. In addition, a unified computing core is developed to execute auxiliary layers such as normalization and weight update (WU), which works in a time-multiplexed manner and consumes only a small amount of hardware resources. The experiments show that our training scheme achieves the state-of-the-art accuracy across multiple models, including CIFAR-VGG16, CIFAR-ResNet20, CIFAR-InceptionV3, ResNet18, and ResNet50. Evaluated on three networks (CIFAR-VGG16, CIFAR-ResNet20, and ResNet18), our ETA on Xilinx VC709 FPGA achieves 610.98, 658.64, and 811.24 GOPS in terms of throughput, respectively. Compared with the prior art, our design demonstrates a speedup of <inline-formula> <tex-math notation="LaTeX">3.65\times </tex-math></inline-formula> and an energy efficiency improvement of <inline-formula> <tex-math notation="LaTeX">8.54\times </tex-math></inline-formula> on CIFAR-ResNet20.
AbstractList Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal with these issues, in this work, we implement an efficient training accelerator (ETA) on field-programmable gate array (FPGA) by adopting a hardware-algorithm co-optimization approach. A novel training scheme is proposed to effectively train DNNs using 8-bit precision with arbitrary batch sizes, in which a compact but powerful data format and a hardware-oriented normalization layer are introduced. Thus the computational complexity and memory accesses are significantly reduced. In the ETA, a reconfigurable processing element (PE) is designed to support various computational patterns during training while avoiding redundant calculations from nonunit-stride convolutional layers. With a flexible network-on-chip (NoC) and a hierarchical PE array, computational parallelism and data reuse can be fully exploited, and memory accesses are further reduced. In addition, a unified computing core is developed to execute auxiliary layers such as normalization and weight update (WU), which works in a time-multiplexed manner and consumes only a small amount of hardware resources. The experiments show that our training scheme achieves the state-of-the-art accuracy across multiple models, including CIFAR-VGG16, CIFAR-ResNet20, CIFAR-InceptionV3, ResNet18, and ResNet50. Evaluated on three networks (CIFAR-VGG16, CIFAR-ResNet20, and ResNet18), our ETA on Xilinx VC709 FPGA achieves 610.98, 658.64, and 811.24 GOPS in terms of throughput, respectively. Compared with the prior art, our design demonstrates a speedup of 3.65x and an energy efficiency improvement of 8.54x on CIFAR-ResNet20.
Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal with these issues, in this work, we implement an efficient training accelerator (ETA) on field-programmable gate array (FPGA) by adopting a hardware-algorithm co-optimization approach. A novel training scheme is proposed to effectively train DNNs using 8-bit precision with arbitrary batch sizes, in which a compact but powerful data format and a hardware-oriented normalization layer are introduced. Thus the computational complexity and memory accesses are significantly reduced. In the ETA, a reconfigurable processing element (PE) is designed to support various computational patterns during training while avoiding redundant calculations from nonunit-stride convolutional layers. With a flexible network-on-chip (NoC) and a hierarchical PE array, computational parallelism and data reuse can be fully exploited, and memory accesses are further reduced. In addition, a unified computing core is developed to execute auxiliary layers such as normalization and weight update (WU), which works in a time-multiplexed manner and consumes only a small amount of hardware resources. The experiments show that our training scheme achieves the state-of-the-art accuracy across multiple models, including CIFAR-VGG16, CIFAR-ResNet20, CIFAR-InceptionV3, ResNet18, and ResNet50. Evaluated on three networks (CIFAR-VGG16, CIFAR-ResNet20, and ResNet18), our ETA on Xilinx VC709 FPGA achieves 610.98, 658.64, and 811.24 GOPS in terms of throughput, respectively. Compared with the prior art, our design demonstrates a speedup of 3.65× and an energy efficiency improvement of 8.54× on CIFAR-ResNet20.Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal with these issues, in this work, we implement an efficient training accelerator (ETA) on field-programmable gate array (FPGA) by adopting a hardware-algorithm co-optimization approach. A novel training scheme is proposed to effectively train DNNs using 8-bit precision with arbitrary batch sizes, in which a compact but powerful data format and a hardware-oriented normalization layer are introduced. Thus the computational complexity and memory accesses are significantly reduced. In the ETA, a reconfigurable processing element (PE) is designed to support various computational patterns during training while avoiding redundant calculations from nonunit-stride convolutional layers. With a flexible network-on-chip (NoC) and a hierarchical PE array, computational parallelism and data reuse can be fully exploited, and memory accesses are further reduced. In addition, a unified computing core is developed to execute auxiliary layers such as normalization and weight update (WU), which works in a time-multiplexed manner and consumes only a small amount of hardware resources. The experiments show that our training scheme achieves the state-of-the-art accuracy across multiple models, including CIFAR-VGG16, CIFAR-ResNet20, CIFAR-InceptionV3, ResNet18, and ResNet50. Evaluated on three networks (CIFAR-VGG16, CIFAR-ResNet20, and ResNet18), our ETA on Xilinx VC709 FPGA achieves 610.98, 658.64, and 811.24 GOPS in terms of throughput, respectively. Compared with the prior art, our design demonstrates a speedup of 3.65× and an energy efficiency improvement of 8.54× on CIFAR-ResNet20.
Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal with these issues, in this work, we implement an efficient training accelerator (ETA) on field-programmable gate array (FPGA) by adopting a hardware-algorithm co-optimization approach. A novel training scheme is proposed to effectively train DNNs using 8-bit precision with arbitrary batch sizes, in which a compact but powerful data format and a hardware-oriented normalization layer are introduced. Thus the computational complexity and memory accesses are significantly reduced. In the ETA, a reconfigurable processing element (PE) is designed to support various computational patterns during training while avoiding redundant calculations from nonunit-stride convolutional layers. With a flexible network-on-chip (NoC) and a hierarchical PE array, computational parallelism and data reuse can be fully exploited, and memory accesses are further reduced. In addition, a unified computing core is developed to execute auxiliary layers such as normalization and weight update (WU), which works in a time-multiplexed manner and consumes only a small amount of hardware resources. The experiments show that our training scheme achieves the state-of-the-art accuracy across multiple models, including CIFAR-VGG16, CIFAR-ResNet20, CIFAR-InceptionV3, ResNet18, and ResNet50. Evaluated on three networks (CIFAR-VGG16, CIFAR-ResNet20, and ResNet18), our ETA on Xilinx VC709 FPGA achieves 610.98, 658.64, and 811.24 GOPS in terms of throughput, respectively. Compared with the prior art, our design demonstrates a speedup of <inline-formula> <tex-math notation="LaTeX">3.65\times </tex-math></inline-formula> and an energy efficiency improvement of <inline-formula> <tex-math notation="LaTeX">8.54\times </tex-math></inline-formula> on CIFAR-ResNet20.
Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal with these issues, in this work, we implement an efficient training accelerator (ETA) on field-programmable gate array (FPGA) by adopting a hardware-algorithm co-optimization approach. A novel training scheme is proposed to effectively train DNNs using 8-bit precision with arbitrary batch sizes, in which a compact but powerful data format and a hardware-oriented normalization layer are introduced. Thus the computational complexity and memory accesses are significantly reduced. In the ETA, a reconfigurable processing element (PE) is designed to support various computational patterns during training while avoiding redundant calculations from nonunit-stride convolutional layers. With a flexible network-on-chip (NoC) and a hierarchical PE array, computational parallelism and data reuse can be fully exploited, and memory accesses are further reduced. In addition, a unified computing core is developed to execute auxiliary layers such as normalization and weight update (WU), which works in a time-multiplexed manner and consumes only a small amount of hardware resources. The experiments show that our training scheme achieves the state-of-the-art accuracy across multiple models, including CIFAR-VGG16, CIFAR-ResNet20, CIFAR-InceptionV3, ResNet18, and ResNet50. Evaluated on three networks (CIFAR-VGG16, CIFAR-ResNet20, and ResNet18), our ETA on Xilinx VC709 FPGA achieves 610.98, 658.64, and 811.24 GOPS in terms of throughput, respectively. Compared with the prior art, our design demonstrates a speedup of [Formula Omitted] and an energy efficiency improvement of [Formula Omitted] on CIFAR-ResNet20.
Author Ni, Chao
Wang, Zhongfeng
Lu, Jinming
Author_xml – sequence: 1
  givenname: Jinming
  orcidid: 0000-0002-7134-6514
  surname: Lu
  fullname: Lu, Jinming
  email: jmlu@smail.nju.edu.cn
  organization: School of Electronic Science and Engineering, Nanjing University, Nanjing, China
– sequence: 2
  givenname: Chao
  orcidid: 0000-0002-5139-4607
  surname: Ni
  fullname: Ni, Chao
  email: nichao@smail.nju.edu.cn
  organization: School of Electronic Science and Engineering, Nanjing University, Nanjing, China
– sequence: 3
  givenname: Zhongfeng
  orcidid: 0000-0002-7227-4786
  surname: Wang
  fullname: Wang, Zhongfeng
  email: zfwang@nju.edu.cn
  organization: School of Electronic Science and Engineering, Nanjing University, Nanjing, China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/35133969$$D View this record in MEDLINE/PubMed
BookMark eNp9kV1rFDEUhoO02Fr7BxRkwBtvZpuPmXx4N66rLSzbC1cQvAjZ5ExNmUnWZBbRX2_a3faiFw2EHMLzHt5z3lfoKMQACL0heEYIVhfr1Wr5bUYxpTNGmla2-AU6pYTTmjIpjx5r8eMEned8i8vhuOWNeolOWEsYU1ydop-Ldfex6kK16HtvPYSpWifjgw83VWctDJDMFFPVl_t5tcrVJ5PBVTFUlya5PyZB3Q03Mfnp11jNY329nfzo_5nJx_AaHfdmyHB-eM_Q9y-L9fyyXl5_vZp3y9oWH1PNAKSgAE4J3hIhJTOwcdC41mBjmJMcE-a4kqL8SNI42wOGfqMst03vFDtDH_Z9tyn-3kGe9OhzsT6YAHGXNeVUlHklEwV9_wS9jbsUijtNpWCtlK3ChXp3oHabEZzeJj-a9Fc_rK0Acg_YFHNO0Gvrp_uZp7K8QROs70LS9yHpu5D0IaQipU-kD92fFb3dizwAPAqUwIJjyf4DGnibqw
CODEN ITNNAL
CitedBy_id crossref_primary_10_3390_s24072145
crossref_primary_10_1109_JETCAS_2025_3561330
crossref_primary_10_1109_TCAD_2023_3317789
crossref_primary_10_1109_TNNLS_2023_3323302
crossref_primary_10_1109_TCSI_2023_3338471
crossref_primary_10_1109_JETCAS_2025_3555970
crossref_primary_10_3390_electronics12244943
crossref_primary_10_1109_TCAD_2024_3445264
crossref_primary_10_3390_electronics14061182
crossref_primary_10_1109_TNNLS_2024_3430028
crossref_primary_10_1109_TVLSI_2022_3175582
crossref_primary_10_1109_TVLSI_2025_3561000
Cites_doi 10.18653/v1/2020.findings-emnlp.372
10.1109/CVPR.2016.308
10.1109/CVPR.2019.00065
10.1109/RECONFIG.2018.8641739
10.1109/JSSC.2020.3043870
10.1109/CVPR.2018.00474
10.1109/MSP.2020.2975749
10.1109/TC.2015.2479623
10.1109/CVPR42600.2020.00807
10.1109/CVPR42600.2020.01125
10.1016/j.neunet.2019.01.012
10.1109/JSSC.2016.2616357
10.1109/CVPR42600.2020.00852
10.1007/978-981-15-8135-9_9
10.1145/2847263.2847265
10.1145/3289602.3293977
10.1109/TCSI.2017.2767204
10.1145/2644865.2541967
10.1109/JSSC.2020.3021661
10.1109/FCCM.2018.00021
10.1109/TC.2020.2985971
10.1109/CVPR42600.2020.01318
10.1109/JPROC.2017.2761740
10.1109/TCSI.2020.3030663
10.1145/3400302.3415643
10.1109/CVPR42600.2020.00204
10.1016/j.neunet.2019.12.027
10.1109/ICFPT47387.2019.00009
10.1109/CVPR.2016.90
10.1109/JSSC.2020.3005786
10.1109/FPT.2017.8280142
10.1609/aaai.v35i4.16462
10.1109/FPL.2019.00034
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7QF
7QO
7QP
7QQ
7QR
7SC
7SE
7SP
7SR
7TA
7TB
7TK
7U5
8BQ
8FD
F28
FR3
H8D
JG9
JQ2
KR7
L7M
L~C
L~D
P64
7X8
DOI 10.1109/TNNLS.2022.3145850
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
Aluminium Industry Abstracts
Biotechnology Research Abstracts
Calcium & Calcified Tissue Abstracts
Ceramic Abstracts
Chemoreception Abstracts
Computer and Information Systems Abstracts
Corrosion Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
Materials Business File
Mechanical & Transportation Engineering Abstracts
Neurosciences Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
Materials Research Database
ProQuest Computer Science Collection
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Materials Research Database
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Materials Business File
Aerospace Database
Engineered Materials Abstracts
Biotechnology Research Abstracts
Chemoreception Abstracts
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Civil Engineering Abstracts
Aluminium Industry Abstracts
Electronics & Communications Abstracts
Ceramic Abstracts
Neurosciences Abstracts
METADEX
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
Solid State and Superconductivity Abstracts
Engineering Research Database
Calcium & Calcified Tissue Abstracts
Corrosion Abstracts
MEDLINE - Academic
DatabaseTitleList PubMed
MEDLINE - Academic

Materials Research Database
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2162-2388
EndPage 7674
ExternalDocumentID 35133969
10_1109_TNNLS_2022_3145850
9707608
Genre orig-research
Journal Article
GrantInformation_xml – fundername: Key Research Plan of Jiangsu Province of China
  grantid: BE2019003-4
– fundername: National Natural Science Foundation of China
  grantid: 61774082
  funderid: 10.13039/501100001809
– fundername: Fundamental Research Funds for the Central Universities
  grantid: 021014380065
  funderid: 10.13039/501100012226
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACIWK
ACPRK
AENEX
AFRAH
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IFIPE
IPLJI
JAVBF
M43
MS~
O9-
OCL
PQQKQ
RIA
RIE
RNS
AAYXX
CITATION
NPM
7QF
7QO
7QP
7QQ
7QR
7SC
7SE
7SP
7SR
7TA
7TB
7TK
7U5
8BQ
8FD
F28
FR3
H8D
JG9
JQ2
KR7
L7M
L~C
L~D
P64
7X8
ID FETCH-LOGICAL-c351t-3ee872eed976517883aebde4d5a0aa3d86013d6987d5a814dcfe0efb9c6c4fd93
IEDL.DBID RIE
ISICitedReferencesCount 15
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000754288200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2162-237X
2162-2388
IngestDate Mon Sep 29 04:53:57 EDT 2025
Sun Jun 29 15:41:26 EDT 2025
Mon Jul 21 05:56:36 EDT 2025
Sat Nov 29 01:40:18 EST 2025
Tue Nov 18 20:47:34 EST 2025
Wed Aug 27 02:50:36 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 10
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c351t-3ee872eed976517883aebde4d5a0aa3d86013d6987d5a814dcfe0efb9c6c4fd93
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-7227-4786
0000-0002-5139-4607
0000-0002-7134-6514
PMID 35133969
PQID 2873588590
PQPubID 85436
PageCount 15
ParticipantIDs pubmed_primary_35133969
crossref_citationtrail_10_1109_TNNLS_2022_3145850
crossref_primary_10_1109_TNNLS_2022_3145850
ieee_primary_9707608
proquest_journals_2873588590
proquest_miscellaneous_2627133837
PublicationCentury 2000
PublicationDate 2023-10-01
PublicationDateYYYYMMDD 2023-10-01
PublicationDate_xml – month: 10
  year: 2023
  text: 2023-10-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: Piscataway
PublicationTitle IEEE transaction on neural networks and learning systems
PublicationTitleAbbrev TNNLS
PublicationTitleAlternate IEEE Trans Neural Netw Learn Syst
PublicationYear 2023
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref15
ref14
ref11
ref10
li (ref8) 2021
zhao (ref38) 2016
ref17
ref16
zhang (ref7) 2021
ref19
ref18
micikevicius (ref24) 2018
ref51
ref50
devlin (ref3) 2019; 1
ref46
ref48
li (ref12) 2021; 17
ref42
ref41
ref44
ref43
ni (ref9) 2020
ref49
gustafson (ref45) 2017; 4
ref5
ref40
gupta (ref35) 2015
ref34
wang (ref47) 2020
ref31
ref33
zhou (ref37) 2016
ref32
ref1
ref39
miyashita (ref36) 2016
yang (ref28) 2019; 97
amodei (ref2) 2016
li (ref6) 2020
ioffe (ref30) 2015
ref23
ref26
ref20
ref22
bojarski (ref4) 2016
ref21
ref27
ref29
kalamkar (ref25) 2019
References_xml – ident: ref13
  doi: 10.18653/v1/2020.findings-emnlp.372
– ident: ref33
  doi: 10.1109/CVPR.2016.308
– ident: ref48
  doi: 10.1109/CVPR.2019.00065
– ident: ref41
  doi: 10.1109/RECONFIG.2018.8641739
– ident: ref19
  doi: 10.1109/JSSC.2020.3043870
– year: 2020
  ident: ref9
  article-title: WrapNet: Neural net inference with ultra-low-resolution arithmetic
  publication-title: arXiv 2007 13242
– ident: ref34
  doi: 10.1109/CVPR.2018.00474
– start-page: 173
  year: 2016
  ident: ref2
  article-title: Deep speech 2: End-to-end speech recognition in English and mandarin
  publication-title: Proc 33nd Int Conf Mach Learn (ICML)
– volume: 97
  start-page: 7015
  year: 2019
  ident: ref28
  article-title: SWALP: Stochastic weight averaging in low precision training
  publication-title: Proc 36th Int Conf Mach Learn (ICML)
– ident: ref20
  doi: 10.1109/MSP.2020.2975749
– ident: ref32
  doi: 10.1109/TC.2015.2479623
– ident: ref11
  doi: 10.1109/CVPR42600.2020.00807
– ident: ref46
  doi: 10.1109/CVPR42600.2020.01125
– ident: ref22
  doi: 10.1016/j.neunet.2019.01.012
– ident: ref15
  doi: 10.1109/JSSC.2016.2616357
– year: 2016
  ident: ref4
  article-title: End to end learning for self-driving cars
  publication-title: arXiv 1604 07316 [cs]
– ident: ref14
  doi: 10.1109/CVPR42600.2020.00852
– ident: ref49
  doi: 10.1007/978-981-15-8135-9_9
– start-page: 448
  year: 2015
  ident: ref30
  article-title: Batch normalization: Accelerating deep network training by reducing internal covariate shift
  publication-title: Proc 32nd Int Conf Mach Learn (ICML)
– ident: ref51
  doi: 10.1145/2847263.2847265
– ident: ref42
  doi: 10.1145/3289602.3293977
– ident: ref17
  doi: 10.1109/TCSI.2017.2767204
– ident: ref16
  doi: 10.1145/2644865.2541967
– start-page: 639
  year: 2020
  ident: ref6
  article-title: EagleEye: Fast sub-net evaluation for efficient neural network pruning
  publication-title: Eur Conf Comput Vis (ECCV)
– ident: ref23
  doi: 10.1109/JSSC.2020.3021661
– volume: 1
  start-page: 4171
  year: 2019
  ident: ref3
  article-title: BERT: Pre-training of deep bidirectional transformers for language understanding
  publication-title: Proc Conf North Amer Chapter Assoc Comput Linguistics Hum Lang Technol
– year: 2019
  ident: ref25
  article-title: A study of BFLOAT16 for deep learning training
  publication-title: arXiv 1905 12322
– year: 2016
  ident: ref36
  article-title: Convolutional neural networks using logarithmic data representation
  publication-title: arXiv 1603 01025
– year: 2021
  ident: ref7
  article-title: StructADMM: Achieving ultrahigh efficiency in structured pruning for DNNs
  publication-title: IEEE Trans Neural Netw Learn Syst
– ident: ref40
  doi: 10.1109/FCCM.2018.00021
– ident: ref29
  doi: 10.1109/TC.2020.2985971
– ident: ref10
  doi: 10.1109/CVPR42600.2020.01318
– year: 2020
  ident: ref47
  article-title: NITI: Training integer neural networks using integer-only arithmetic
  publication-title: arXiv 2009 13108
– ident: ref5
  doi: 10.1109/JPROC.2017.2761740
– ident: ref18
  doi: 10.1109/TCSI.2020.3030663
– ident: ref44
  doi: 10.1145/3400302.3415643
– year: 2021
  ident: ref8
  article-title: BRECQ: Pushing the limit of post-training quantization by block reconstruction
  publication-title: arXiv 2102 05426
– volume: 17
  start-page: 1
  year: 2021
  ident: ref12
  article-title: Heuristic rank selection with progressively searching tensor ring network
  publication-title: Complex Intell Syst
– start-page: 1
  year: 2018
  ident: ref24
  article-title: Mixed precision training
  publication-title: Proc 6th Int Conf Learn Represent (ICLR)
– ident: ref31
  doi: 10.1109/CVPR42600.2020.00204
– ident: ref26
  doi: 10.1016/j.neunet.2019.12.027
– ident: ref50
  doi: 10.1109/ICFPT47387.2019.00009
– volume: 4
  start-page: 71
  year: 2017
  ident: ref45
  article-title: Beating floating point at its own game: Posit arithmetic
  publication-title: Supercomputing Frontiers and Innovations
– ident: ref1
  doi: 10.1109/CVPR.2016.90
– ident: ref21
  doi: 10.1109/JSSC.2020.3005786
– start-page: 107
  year: 2016
  ident: ref38
  article-title: F-CNN: An FPGA-based framework for training convolutional neural networks
  publication-title: Proc IEEE 27th Int Conf Appl -Specific Syst Archit Processors (ASAP)
– ident: ref39
  doi: 10.1109/FPT.2017.8280142
– ident: ref27
  doi: 10.1609/aaai.v35i4.16462
– ident: ref43
  doi: 10.1109/FPL.2019.00034
– start-page: 1737
  year: 2015
  ident: ref35
  article-title: Deep learning with limited numerical precision
  publication-title: Proc 32nd Int Conf Mach Learn (ICML)
– year: 2016
  ident: ref37
  article-title: DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients
  publication-title: arXiv 1606 06160 [cs]
SSID ssj0000605649
Score 2.5213408
Snippet Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 7660
SubjectTerms Algorithms
Arrays
Artificial neural networks
Computational modeling
Computer applications
Data models
Deep neural networks (DNNs)
Energy efficiency
Field programmable gate arrays
field-programmable gate array (FPGA)
Hardware
hardware accelerator
Neural networks
normalization
Optimization
Quantization (signal)
System on chip
Throughput
Training
Title ETA: An Efficient Training Accelerator for DNNs Based on Hardware-Algorithm Co-Optimization
URI https://ieeexplore.ieee.org/document/9707608
https://www.ncbi.nlm.nih.gov/pubmed/35133969
https://www.proquest.com/docview/2873588590
https://www.proquest.com/docview/2627133837
Volume 34
WOSCitedRecordID wos000754288200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE/IET Electronic Library
  customDbUrl:
  eissn: 2162-2388
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000605649
  issn: 2162-237X
  databaseCode: RIE
  dateStart: 20120101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwELXaigMXChTollIZiRuYJnac2NxC2YpDFZBYpJU4RI49gUptUu1H-_c7dpyIAyBxixLbiTzjzBuPZx4hb6SwRjRFwbw9YVnTAGsyLZgwGqyWzrRtKJl_UVSVWi711x3ybsqFAYBw-Aze-8sQy3e93fqtslNd-DiS2iW7RZEPuVrTfkqCuDwPaJenOWdcFMsxRybRp4uquviG3iDn6KRmCJE9A5zw3CbaH3X-zSQFjpW_w81gds73_--DH5NHEV7SctCHJ2QHuqdkf6RuoHElH5Af80X5gZYdnYcSEjgKXUSyCFpai7YohN8pQlr6qarW9CNaO0f7jvpQ_51ZASuvfvary82va3rWsy_457mOKZ3PyPfz-eLsM4s8C8ziFGyYAFAFR2OJ0ESm6BMLA42DzEmTGCOcQqdNuFyrAu-oNHO2hQTaRtvcZq3T4jnZ6_oODgl1qUUE6LjkrcqMBz9J1igtG8g4JK2ckXSc6trGIuSeC-OqDs5IousgqdpLqo6SmpG3U5-boQTHP1sfeDlMLaMIZuR4lGgdV-m6Rm9RSKWkxl6vp8e4vnzQxHTQb7FNzqMfPyMvBk2Yxh4V6OjP73xJHnpy-uHo3zHZ26y28Io8sLeby_XqBJV4qU6CEt8DpofqOg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELZKQYILBQp0oQUjcQO3iZ2HzS0tWxWxBCSCtBKHyLEntFKboH3A32fsdSIOLRK3KLGdyDPOfOPxzEfI61QYLZo8Z86esKRpgDWJEkxoBUalVretL5k_y8tSzufqyxZ5O-bCAIA_fAaH7tLH8m1v1m6r7EjlLo4kb5HbjjkrZGuNOyoRIvPM410eZ5xxkc-HLJlIHVVlOfuK_iDn6KYmCJIdB5xw7CbKHXb-yyh5lpWbAac3PKc7__fJD8j9ADBpsdGIh2QLukdkZyBvoGEt75Lv06p4R4uOTn0RCRyFVoEughbGoDXyAXiKoJa-L8slPUZ7Z2nfURfs_60XwIrLH_3iYnV-RU969hn_PVchqfMx-XY6rU7OWGBaYAanYMUEgMw5mksEJ2mMXrHQ0FhIbKojrYWV6LYJmymZ4x0ZJ9a0EEHbKJOZpLVKPCHbXd_BHqE2NogBLU95KxPt4E-UNFKlDSQcojadkHiY6tqEMuSODeOy9u5IpGovqdpJqg6SmpA3Y5-fmyIc_2y96-QwtgwimJD9QaJ1WKfLGv1FkUqZKuz1anyMK8yFTXQH_RrbZDx48hPydKMJ49iDAj27_p0vyd2z6tOsnn0oPz4n9xxV_eYg4D7ZXi3WcEDumF-ri-XihVflP31d7Js
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ETA%3A+An+Efficient+Training+Accelerator+for+DNNs+Based+on+Hardware-Algorithm+Co-Optimization&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Lu%2C+Jinming&rft.au=Ni%2C+Chao&rft.au=Wang%2C+Zhongfeng&rft.date=2023-10-01&rft.issn=2162-237X&rft.eissn=2162-2388&rft.volume=34&rft.issue=10&rft.spage=7660&rft.epage=7674&rft_id=info:doi/10.1109%2FTNNLS.2022.3145850&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TNNLS_2022_3145850
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon