ETA: An Efficient Training Accelerator for DNNs Based on Hardware-Algorithm Co-Optimization
Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal...
Uložené v:
| Vydané v: | IEEE transaction on neural networks and learning systems Ročník 34; číslo 10; s. 7660 - 7674 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
United States
IEEE
01.10.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Predmet: | |
| ISSN: | 2162-237X, 2162-2388, 2162-2388 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal with these issues, in this work, we implement an efficient training accelerator (ETA) on field-programmable gate array (FPGA) by adopting a hardware-algorithm co-optimization approach. A novel training scheme is proposed to effectively train DNNs using 8-bit precision with arbitrary batch sizes, in which a compact but powerful data format and a hardware-oriented normalization layer are introduced. Thus the computational complexity and memory accesses are significantly reduced. In the ETA, a reconfigurable processing element (PE) is designed to support various computational patterns during training while avoiding redundant calculations from nonunit-stride convolutional layers. With a flexible network-on-chip (NoC) and a hierarchical PE array, computational parallelism and data reuse can be fully exploited, and memory accesses are further reduced. In addition, a unified computing core is developed to execute auxiliary layers such as normalization and weight update (WU), which works in a time-multiplexed manner and consumes only a small amount of hardware resources. The experiments show that our training scheme achieves the state-of-the-art accuracy across multiple models, including CIFAR-VGG16, CIFAR-ResNet20, CIFAR-InceptionV3, ResNet18, and ResNet50. Evaluated on three networks (CIFAR-VGG16, CIFAR-ResNet20, and ResNet18), our ETA on Xilinx VC709 FPGA achieves 610.98, 658.64, and 811.24 GOPS in terms of throughput, respectively. Compared with the prior art, our design demonstrates a speedup of <inline-formula> <tex-math notation="LaTeX">3.65\times </tex-math></inline-formula> and an energy efficiency improvement of <inline-formula> <tex-math notation="LaTeX">8.54\times </tex-math></inline-formula> on CIFAR-ResNet20. |
|---|---|
| AbstractList | Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal with these issues, in this work, we implement an efficient training accelerator (ETA) on field-programmable gate array (FPGA) by adopting a hardware-algorithm co-optimization approach. A novel training scheme is proposed to effectively train DNNs using 8-bit precision with arbitrary batch sizes, in which a compact but powerful data format and a hardware-oriented normalization layer are introduced. Thus the computational complexity and memory accesses are significantly reduced. In the ETA, a reconfigurable processing element (PE) is designed to support various computational patterns during training while avoiding redundant calculations from nonunit-stride convolutional layers. With a flexible network-on-chip (NoC) and a hierarchical PE array, computational parallelism and data reuse can be fully exploited, and memory accesses are further reduced. In addition, a unified computing core is developed to execute auxiliary layers such as normalization and weight update (WU), which works in a time-multiplexed manner and consumes only a small amount of hardware resources. The experiments show that our training scheme achieves the state-of-the-art accuracy across multiple models, including CIFAR-VGG16, CIFAR-ResNet20, CIFAR-InceptionV3, ResNet18, and ResNet50. Evaluated on three networks (CIFAR-VGG16, CIFAR-ResNet20, and ResNet18), our ETA on Xilinx VC709 FPGA achieves 610.98, 658.64, and 811.24 GOPS in terms of throughput, respectively. Compared with the prior art, our design demonstrates a speedup of 3.65x and an energy efficiency improvement of 8.54x on CIFAR-ResNet20. Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal with these issues, in this work, we implement an efficient training accelerator (ETA) on field-programmable gate array (FPGA) by adopting a hardware-algorithm co-optimization approach. A novel training scheme is proposed to effectively train DNNs using 8-bit precision with arbitrary batch sizes, in which a compact but powerful data format and a hardware-oriented normalization layer are introduced. Thus the computational complexity and memory accesses are significantly reduced. In the ETA, a reconfigurable processing element (PE) is designed to support various computational patterns during training while avoiding redundant calculations from nonunit-stride convolutional layers. With a flexible network-on-chip (NoC) and a hierarchical PE array, computational parallelism and data reuse can be fully exploited, and memory accesses are further reduced. In addition, a unified computing core is developed to execute auxiliary layers such as normalization and weight update (WU), which works in a time-multiplexed manner and consumes only a small amount of hardware resources. The experiments show that our training scheme achieves the state-of-the-art accuracy across multiple models, including CIFAR-VGG16, CIFAR-ResNet20, CIFAR-InceptionV3, ResNet18, and ResNet50. Evaluated on three networks (CIFAR-VGG16, CIFAR-ResNet20, and ResNet18), our ETA on Xilinx VC709 FPGA achieves 610.98, 658.64, and 811.24 GOPS in terms of throughput, respectively. Compared with the prior art, our design demonstrates a speedup of 3.65× and an energy efficiency improvement of 8.54× on CIFAR-ResNet20.Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal with these issues, in this work, we implement an efficient training accelerator (ETA) on field-programmable gate array (FPGA) by adopting a hardware-algorithm co-optimization approach. A novel training scheme is proposed to effectively train DNNs using 8-bit precision with arbitrary batch sizes, in which a compact but powerful data format and a hardware-oriented normalization layer are introduced. Thus the computational complexity and memory accesses are significantly reduced. In the ETA, a reconfigurable processing element (PE) is designed to support various computational patterns during training while avoiding redundant calculations from nonunit-stride convolutional layers. With a flexible network-on-chip (NoC) and a hierarchical PE array, computational parallelism and data reuse can be fully exploited, and memory accesses are further reduced. In addition, a unified computing core is developed to execute auxiliary layers such as normalization and weight update (WU), which works in a time-multiplexed manner and consumes only a small amount of hardware resources. The experiments show that our training scheme achieves the state-of-the-art accuracy across multiple models, including CIFAR-VGG16, CIFAR-ResNet20, CIFAR-InceptionV3, ResNet18, and ResNet50. Evaluated on three networks (CIFAR-VGG16, CIFAR-ResNet20, and ResNet18), our ETA on Xilinx VC709 FPGA achieves 610.98, 658.64, and 811.24 GOPS in terms of throughput, respectively. Compared with the prior art, our design demonstrates a speedup of 3.65× and an energy efficiency improvement of 8.54× on CIFAR-ResNet20. Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal with these issues, in this work, we implement an efficient training accelerator (ETA) on field-programmable gate array (FPGA) by adopting a hardware-algorithm co-optimization approach. A novel training scheme is proposed to effectively train DNNs using 8-bit precision with arbitrary batch sizes, in which a compact but powerful data format and a hardware-oriented normalization layer are introduced. Thus the computational complexity and memory accesses are significantly reduced. In the ETA, a reconfigurable processing element (PE) is designed to support various computational patterns during training while avoiding redundant calculations from nonunit-stride convolutional layers. With a flexible network-on-chip (NoC) and a hierarchical PE array, computational parallelism and data reuse can be fully exploited, and memory accesses are further reduced. In addition, a unified computing core is developed to execute auxiliary layers such as normalization and weight update (WU), which works in a time-multiplexed manner and consumes only a small amount of hardware resources. The experiments show that our training scheme achieves the state-of-the-art accuracy across multiple models, including CIFAR-VGG16, CIFAR-ResNet20, CIFAR-InceptionV3, ResNet18, and ResNet50. Evaluated on three networks (CIFAR-VGG16, CIFAR-ResNet20, and ResNet18), our ETA on Xilinx VC709 FPGA achieves 610.98, 658.64, and 811.24 GOPS in terms of throughput, respectively. Compared with the prior art, our design demonstrates a speedup of <inline-formula> <tex-math notation="LaTeX">3.65\times </tex-math></inline-formula> and an energy efficiency improvement of <inline-formula> <tex-math notation="LaTeX">8.54\times </tex-math></inline-formula> on CIFAR-ResNet20. Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user privacy. However, it is still a severe challenge since the DNN training involves intensive computations and a large amount of data access. To deal with these issues, in this work, we implement an efficient training accelerator (ETA) on field-programmable gate array (FPGA) by adopting a hardware-algorithm co-optimization approach. A novel training scheme is proposed to effectively train DNNs using 8-bit precision with arbitrary batch sizes, in which a compact but powerful data format and a hardware-oriented normalization layer are introduced. Thus the computational complexity and memory accesses are significantly reduced. In the ETA, a reconfigurable processing element (PE) is designed to support various computational patterns during training while avoiding redundant calculations from nonunit-stride convolutional layers. With a flexible network-on-chip (NoC) and a hierarchical PE array, computational parallelism and data reuse can be fully exploited, and memory accesses are further reduced. In addition, a unified computing core is developed to execute auxiliary layers such as normalization and weight update (WU), which works in a time-multiplexed manner and consumes only a small amount of hardware resources. The experiments show that our training scheme achieves the state-of-the-art accuracy across multiple models, including CIFAR-VGG16, CIFAR-ResNet20, CIFAR-InceptionV3, ResNet18, and ResNet50. Evaluated on three networks (CIFAR-VGG16, CIFAR-ResNet20, and ResNet18), our ETA on Xilinx VC709 FPGA achieves 610.98, 658.64, and 811.24 GOPS in terms of throughput, respectively. Compared with the prior art, our design demonstrates a speedup of [Formula Omitted] and an energy efficiency improvement of [Formula Omitted] on CIFAR-ResNet20. |
| Author | Ni, Chao Wang, Zhongfeng Lu, Jinming |
| Author_xml | – sequence: 1 givenname: Jinming orcidid: 0000-0002-7134-6514 surname: Lu fullname: Lu, Jinming email: jmlu@smail.nju.edu.cn organization: School of Electronic Science and Engineering, Nanjing University, Nanjing, China – sequence: 2 givenname: Chao orcidid: 0000-0002-5139-4607 surname: Ni fullname: Ni, Chao email: nichao@smail.nju.edu.cn organization: School of Electronic Science and Engineering, Nanjing University, Nanjing, China – sequence: 3 givenname: Zhongfeng orcidid: 0000-0002-7227-4786 surname: Wang fullname: Wang, Zhongfeng email: zfwang@nju.edu.cn organization: School of Electronic Science and Engineering, Nanjing University, Nanjing, China |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/35133969$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9kV1rFDEUhoO02Fr7BxRkwBtvZpuPmXx4N66rLSzbC1cQvAjZ5ExNmUnWZBbRX2_a3faiFw2EHMLzHt5z3lfoKMQACL0heEYIVhfr1Wr5bUYxpTNGmla2-AU6pYTTmjIpjx5r8eMEned8i8vhuOWNeolOWEsYU1ydop-Ldfex6kK16HtvPYSpWifjgw83VWctDJDMFFPVl_t5tcrVJ5PBVTFUlya5PyZB3Q03Mfnp11jNY329nfzo_5nJx_AaHfdmyHB-eM_Q9y-L9fyyXl5_vZp3y9oWH1PNAKSgAE4J3hIhJTOwcdC41mBjmJMcE-a4kqL8SNI42wOGfqMst03vFDtDH_Z9tyn-3kGe9OhzsT6YAHGXNeVUlHklEwV9_wS9jbsUijtNpWCtlK3ChXp3oHabEZzeJj-a9Fc_rK0Acg_YFHNO0Gvrp_uZp7K8QROs70LS9yHpu5D0IaQipU-kD92fFb3dizwAPAqUwIJjyf4DGnibqw |
| CODEN | ITNNAL |
| CitedBy_id | crossref_primary_10_3390_s24072145 crossref_primary_10_1109_JETCAS_2025_3561330 crossref_primary_10_1109_TCAD_2023_3317789 crossref_primary_10_1109_TNNLS_2023_3323302 crossref_primary_10_1109_TCSI_2023_3338471 crossref_primary_10_1109_JETCAS_2025_3555970 crossref_primary_10_3390_electronics12244943 crossref_primary_10_1109_TCAD_2024_3445264 crossref_primary_10_3390_electronics14061182 crossref_primary_10_1109_TNNLS_2024_3430028 crossref_primary_10_1109_TVLSI_2022_3175582 crossref_primary_10_1109_TVLSI_2025_3561000 |
| Cites_doi | 10.18653/v1/2020.findings-emnlp.372 10.1109/CVPR.2016.308 10.1109/CVPR.2019.00065 10.1109/RECONFIG.2018.8641739 10.1109/JSSC.2020.3043870 10.1109/CVPR.2018.00474 10.1109/MSP.2020.2975749 10.1109/TC.2015.2479623 10.1109/CVPR42600.2020.00807 10.1109/CVPR42600.2020.01125 10.1016/j.neunet.2019.01.012 10.1109/JSSC.2016.2616357 10.1109/CVPR42600.2020.00852 10.1007/978-981-15-8135-9_9 10.1145/2847263.2847265 10.1145/3289602.3293977 10.1109/TCSI.2017.2767204 10.1145/2644865.2541967 10.1109/JSSC.2020.3021661 10.1109/FCCM.2018.00021 10.1109/TC.2020.2985971 10.1109/CVPR42600.2020.01318 10.1109/JPROC.2017.2761740 10.1109/TCSI.2020.3030663 10.1145/3400302.3415643 10.1109/CVPR42600.2020.00204 10.1016/j.neunet.2019.12.027 10.1109/ICFPT47387.2019.00009 10.1109/CVPR.2016.90 10.1109/JSSC.2020.3005786 10.1109/FPT.2017.8280142 10.1609/aaai.v35i4.16462 10.1109/FPL.2019.00034 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
| DBID | 97E RIA RIE AAYXX CITATION NPM 7QF 7QO 7QP 7QQ 7QR 7SC 7SE 7SP 7SR 7TA 7TB 7TK 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 7X8 |
| DOI | 10.1109/TNNLS.2022.3145850 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed Aluminium Industry Abstracts Biotechnology Research Abstracts Calcium & Calcified Tissue Abstracts Ceramic Abstracts Chemoreception Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Neurosciences Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Materials Research Database ProQuest Computer Science Collection Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic |
| DatabaseTitle | CrossRef PubMed Materials Research Database Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Materials Business File Aerospace Database Engineered Materials Abstracts Biotechnology Research Abstracts Chemoreception Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts Neurosciences Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Calcium & Calcified Tissue Abstracts Corrosion Abstracts MEDLINE - Academic |
| DatabaseTitleList | PubMed MEDLINE - Academic Materials Research Database |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2162-2388 |
| EndPage | 7674 |
| ExternalDocumentID | 35133969 10_1109_TNNLS_2022_3145850 9707608 |
| Genre | orig-research Journal Article |
| GrantInformation_xml | – fundername: Key Research Plan of Jiangsu Province of China grantid: BE2019003-4 – fundername: National Natural Science Foundation of China grantid: 61774082 funderid: 10.13039/501100001809 – fundername: Fundamental Research Funds for the Central Universities grantid: 021014380065 funderid: 10.13039/501100012226 |
| GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK ACPRK AENEX AFRAH AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF M43 MS~ O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION NPM 7QF 7QO 7QP 7QQ 7QR 7SC 7SE 7SP 7SR 7TA 7TB 7TK 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 7X8 |
| ID | FETCH-LOGICAL-c351t-3ee872eed976517883aebde4d5a0aa3d86013d6987d5a814dcfe0efb9c6c4fd93 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 15 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000754288200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2162-237X 2162-2388 |
| IngestDate | Mon Sep 29 04:53:57 EDT 2025 Sun Jun 29 15:41:26 EDT 2025 Mon Jul 21 05:56:36 EDT 2025 Sat Nov 29 01:40:18 EST 2025 Tue Nov 18 20:47:34 EST 2025 Wed Aug 27 02:50:36 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | 10 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c351t-3ee872eed976517883aebde4d5a0aa3d86013d6987d5a814dcfe0efb9c6c4fd93 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0002-7227-4786 0000-0002-5139-4607 0000-0002-7134-6514 |
| PMID | 35133969 |
| PQID | 2873588590 |
| PQPubID | 85436 |
| PageCount | 15 |
| ParticipantIDs | pubmed_primary_35133969 crossref_citationtrail_10_1109_TNNLS_2022_3145850 crossref_primary_10_1109_TNNLS_2022_3145850 ieee_primary_9707608 proquest_journals_2873588590 proquest_miscellaneous_2627133837 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-10-01 |
| PublicationDateYYYYMMDD | 2023-10-01 |
| PublicationDate_xml | – month: 10 year: 2023 text: 2023-10-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States – name: Piscataway |
| PublicationTitle | IEEE transaction on neural networks and learning systems |
| PublicationTitleAbbrev | TNNLS |
| PublicationTitleAlternate | IEEE Trans Neural Netw Learn Syst |
| PublicationYear | 2023 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref15 ref14 ref11 ref10 li (ref8) 2021 zhao (ref38) 2016 ref17 ref16 zhang (ref7) 2021 ref19 ref18 micikevicius (ref24) 2018 ref51 ref50 devlin (ref3) 2019; 1 ref46 ref48 li (ref12) 2021; 17 ref42 ref41 ref44 ref43 ni (ref9) 2020 ref49 gustafson (ref45) 2017; 4 ref5 ref40 gupta (ref35) 2015 ref34 wang (ref47) 2020 ref31 ref33 zhou (ref37) 2016 ref32 ref1 ref39 miyashita (ref36) 2016 yang (ref28) 2019; 97 amodei (ref2) 2016 li (ref6) 2020 ioffe (ref30) 2015 ref23 ref26 ref20 ref22 bojarski (ref4) 2016 ref21 ref27 ref29 kalamkar (ref25) 2019 |
| References_xml | – ident: ref13 doi: 10.18653/v1/2020.findings-emnlp.372 – ident: ref33 doi: 10.1109/CVPR.2016.308 – ident: ref48 doi: 10.1109/CVPR.2019.00065 – ident: ref41 doi: 10.1109/RECONFIG.2018.8641739 – ident: ref19 doi: 10.1109/JSSC.2020.3043870 – year: 2020 ident: ref9 article-title: WrapNet: Neural net inference with ultra-low-resolution arithmetic publication-title: arXiv 2007 13242 – ident: ref34 doi: 10.1109/CVPR.2018.00474 – start-page: 173 year: 2016 ident: ref2 article-title: Deep speech 2: End-to-end speech recognition in English and mandarin publication-title: Proc 33nd Int Conf Mach Learn (ICML) – volume: 97 start-page: 7015 year: 2019 ident: ref28 article-title: SWALP: Stochastic weight averaging in low precision training publication-title: Proc 36th Int Conf Mach Learn (ICML) – ident: ref20 doi: 10.1109/MSP.2020.2975749 – ident: ref32 doi: 10.1109/TC.2015.2479623 – ident: ref11 doi: 10.1109/CVPR42600.2020.00807 – ident: ref46 doi: 10.1109/CVPR42600.2020.01125 – ident: ref22 doi: 10.1016/j.neunet.2019.01.012 – ident: ref15 doi: 10.1109/JSSC.2016.2616357 – year: 2016 ident: ref4 article-title: End to end learning for self-driving cars publication-title: arXiv 1604 07316 [cs] – ident: ref14 doi: 10.1109/CVPR42600.2020.00852 – ident: ref49 doi: 10.1007/978-981-15-8135-9_9 – start-page: 448 year: 2015 ident: ref30 article-title: Batch normalization: Accelerating deep network training by reducing internal covariate shift publication-title: Proc 32nd Int Conf Mach Learn (ICML) – ident: ref51 doi: 10.1145/2847263.2847265 – ident: ref42 doi: 10.1145/3289602.3293977 – ident: ref17 doi: 10.1109/TCSI.2017.2767204 – ident: ref16 doi: 10.1145/2644865.2541967 – start-page: 639 year: 2020 ident: ref6 article-title: EagleEye: Fast sub-net evaluation for efficient neural network pruning publication-title: Eur Conf Comput Vis (ECCV) – ident: ref23 doi: 10.1109/JSSC.2020.3021661 – volume: 1 start-page: 4171 year: 2019 ident: ref3 article-title: BERT: Pre-training of deep bidirectional transformers for language understanding publication-title: Proc Conf North Amer Chapter Assoc Comput Linguistics Hum Lang Technol – year: 2019 ident: ref25 article-title: A study of BFLOAT16 for deep learning training publication-title: arXiv 1905 12322 – year: 2016 ident: ref36 article-title: Convolutional neural networks using logarithmic data representation publication-title: arXiv 1603 01025 – year: 2021 ident: ref7 article-title: StructADMM: Achieving ultrahigh efficiency in structured pruning for DNNs publication-title: IEEE Trans Neural Netw Learn Syst – ident: ref40 doi: 10.1109/FCCM.2018.00021 – ident: ref29 doi: 10.1109/TC.2020.2985971 – ident: ref10 doi: 10.1109/CVPR42600.2020.01318 – year: 2020 ident: ref47 article-title: NITI: Training integer neural networks using integer-only arithmetic publication-title: arXiv 2009 13108 – ident: ref5 doi: 10.1109/JPROC.2017.2761740 – ident: ref18 doi: 10.1109/TCSI.2020.3030663 – ident: ref44 doi: 10.1145/3400302.3415643 – year: 2021 ident: ref8 article-title: BRECQ: Pushing the limit of post-training quantization by block reconstruction publication-title: arXiv 2102 05426 – volume: 17 start-page: 1 year: 2021 ident: ref12 article-title: Heuristic rank selection with progressively searching tensor ring network publication-title: Complex Intell Syst – start-page: 1 year: 2018 ident: ref24 article-title: Mixed precision training publication-title: Proc 6th Int Conf Learn Represent (ICLR) – ident: ref31 doi: 10.1109/CVPR42600.2020.00204 – ident: ref26 doi: 10.1016/j.neunet.2019.12.027 – ident: ref50 doi: 10.1109/ICFPT47387.2019.00009 – volume: 4 start-page: 71 year: 2017 ident: ref45 article-title: Beating floating point at its own game: Posit arithmetic publication-title: Supercomputing Frontiers and Innovations – ident: ref1 doi: 10.1109/CVPR.2016.90 – ident: ref21 doi: 10.1109/JSSC.2020.3005786 – start-page: 107 year: 2016 ident: ref38 article-title: F-CNN: An FPGA-based framework for training convolutional neural networks publication-title: Proc IEEE 27th Int Conf Appl -Specific Syst Archit Processors (ASAP) – ident: ref39 doi: 10.1109/FPT.2017.8280142 – ident: ref27 doi: 10.1609/aaai.v35i4.16462 – ident: ref43 doi: 10.1109/FPL.2019.00034 – start-page: 1737 year: 2015 ident: ref35 article-title: Deep learning with limited numerical precision publication-title: Proc 32nd Int Conf Mach Learn (ICML) – year: 2016 ident: ref37 article-title: DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients publication-title: arXiv 1606 06160 [cs] |
| SSID | ssj0000605649 |
| Score | 2.5213408 |
| Snippet | Recently, the efficient training of deep neural networks (DNNs) on resource-constrained platforms has attracted increasing attention for protecting user... |
| SourceID | proquest pubmed crossref ieee |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 7660 |
| SubjectTerms | Algorithms Arrays Artificial neural networks Computational modeling Computer applications Data models Deep neural networks (DNNs) Energy efficiency Field programmable gate arrays field-programmable gate array (FPGA) Hardware hardware accelerator Neural networks normalization Optimization Quantization (signal) System on chip Throughput Training |
| Title | ETA: An Efficient Training Accelerator for DNNs Based on Hardware-Algorithm Co-Optimization |
| URI | https://ieeexplore.ieee.org/document/9707608 https://www.ncbi.nlm.nih.gov/pubmed/35133969 https://www.proquest.com/docview/2873588590 https://www.proquest.com/docview/2627133837 |
| Volume | 34 |
| WOSCitedRecordID | wos000754288200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE/IET Electronic Library customDbUrl: eissn: 2162-2388 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000605649 issn: 2162-237X databaseCode: RIE dateStart: 20120101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwELXaigMXChTollIZiRuYJnac2NxC2YpDFZBYpJU4RI49gUptUu1H-_c7dpyIAyBxixLbiTzjzBuPZx4hb6SwRjRFwbw9YVnTAGsyLZgwGqyWzrRtKJl_UVSVWi711x3ybsqFAYBw-Aze-8sQy3e93fqtslNd-DiS2iW7RZEPuVrTfkqCuDwPaJenOWdcFMsxRybRp4uquviG3iDn6KRmCJE9A5zw3CbaH3X-zSQFjpW_w81gds73_--DH5NHEV7SctCHJ2QHuqdkf6RuoHElH5Af80X5gZYdnYcSEjgKXUSyCFpai7YohN8pQlr6qarW9CNaO0f7jvpQ_51ZASuvfvary82va3rWsy_457mOKZ3PyPfz-eLsM4s8C8ziFGyYAFAFR2OJ0ESm6BMLA42DzEmTGCOcQqdNuFyrAu-oNHO2hQTaRtvcZq3T4jnZ6_oODgl1qUUE6LjkrcqMBz9J1igtG8g4JK2ckXSc6trGIuSeC-OqDs5IousgqdpLqo6SmpG3U5-boQTHP1sfeDlMLaMIZuR4lGgdV-m6Rm9RSKWkxl6vp8e4vnzQxHTQb7FNzqMfPyMvBk2Yxh4V6OjP73xJHnpy-uHo3zHZ26y28Io8sLeby_XqBJV4qU6CEt8DpofqOg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELZKQYILBQp0oQUjcQO3iZ2HzS0tWxWxBCSCtBKHyLEntFKboH3A32fsdSIOLRK3KLGdyDPOfOPxzEfI61QYLZo8Z86esKRpgDWJEkxoBUalVretL5k_y8tSzufqyxZ5O-bCAIA_fAaH7tLH8m1v1m6r7EjlLo4kb5HbjjkrZGuNOyoRIvPM410eZ5xxkc-HLJlIHVVlOfuK_iDn6KYmCJIdB5xw7CbKHXb-yyh5lpWbAac3PKc7__fJD8j9ADBpsdGIh2QLukdkZyBvoGEt75Lv06p4R4uOTn0RCRyFVoEughbGoDXyAXiKoJa-L8slPUZ7Z2nfURfs_60XwIrLH_3iYnV-RU969hn_PVchqfMx-XY6rU7OWGBaYAanYMUEgMw5mksEJ2mMXrHQ0FhIbKojrYWV6LYJmymZ4x0ZJ9a0EEHbKJOZpLVKPCHbXd_BHqE2NogBLU95KxPt4E-UNFKlDSQcojadkHiY6tqEMuSODeOy9u5IpGovqdpJqg6SmpA3Y5-fmyIc_2y96-QwtgwimJD9QaJ1WKfLGv1FkUqZKuz1anyMK8yFTXQH_RrbZDx48hPydKMJ49iDAj27_p0vyd2z6tOsnn0oPz4n9xxV_eYg4D7ZXi3WcEDumF-ri-XihVflP31d7Js |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ETA%3A+An+Efficient+Training+Accelerator+for+DNNs+Based+on+Hardware-Algorithm+Co-Optimization&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Lu%2C+Jinming&rft.au=Ni%2C+Chao&rft.au=Wang%2C+Zhongfeng&rft.date=2023-10-01&rft.issn=2162-237X&rft.eissn=2162-2388&rft.volume=34&rft.issue=10&rft.spage=7660&rft.epage=7674&rft_id=info:doi/10.1109%2FTNNLS.2022.3145850&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TNNLS_2022_3145850 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon |