Enhancing Neural Network Reliability: Insights From Hardware/Software Collaboration With Neuron Vulnerability Quantization
Ensuring the reliability of deep neural networks (DNNs) is paramount in safety-critical applications. Although introducing supplementary fault-tolerant mechanisms can augment the reliability of DNNs, an efficiency tradeoff may be introduced. This study reveals the inherent fault tolerance of neural...
Saved in:
| Published in: | IEEE transactions on computers Vol. 73; no. 8; pp. 1953 - 1966 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.08.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 0018-9340, 1557-9956 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Ensuring the reliability of deep neural networks (DNNs) is paramount in safety-critical applications. Although introducing supplementary fault-tolerant mechanisms can augment the reliability of DNNs, an efficiency tradeoff may be introduced. This study reveals the inherent fault tolerance of neural networks, where individual neurons exhibit varying degrees of fault tolerance, by thoroughly exploring the structural attributes of DNNs. We thereby develop a hardware/software collaborative method that guarantees the reliability of DNNs while minimizing performance degradation. We introduce the neuron vulnerability factor (NVF) to quantify the susceptibility to soft errors. We propose two efficient methods that leverage the NVF to minimize the negative effects of soft errors on neurons. First, we present a novel computational scheduling scheme. By prioritizing error-prone neurons, the expedited completion of their computations is facilitated to mitigate the risk of neural computing errors that arise from soft errors without sacrificing efficiency. Second, we propose the NVF-guided heterogeneous memory system. We employ variable-strength error-correcting codes and tailor their error-correction mechanisms to the vulnerability profile of specific neurons to ensure a highly targeted approach for error mitigation. Our experimental results demonstrate that the proposed scheme enhances the neural network accuracy by 18% on average, while significantly reducing the fault-tolerance overhead. |
|---|---|
| AbstractList | Ensuring the reliability of deep neural networks (DNNs) is paramount in safety-critical applications. Although introducing supplementary fault-tolerant mechanisms can augment the reliability of DNNs, an efficiency tradeoff may be introduced. This study reveals the inherent fault tolerance of neural networks, where individual neurons exhibit varying degrees of fault tolerance, by thoroughly exploring the structural attributes of DNNs. We thereby develop a hardware/software collaborative method that guarantees the reliability of DNNs while minimizing performance degradation. We introduce the neuron vulnerability factor (NVF) to quantify the susceptibility to soft errors. We propose two efficient methods that leverage the NVF to minimize the negative effects of soft errors on neurons. First, we present a novel computational scheduling scheme. By prioritizing error-prone neurons, the expedited completion of their computations is facilitated to mitigate the risk of neural computing errors that arise from soft errors without sacrificing efficiency. Second, we propose the NVF-guided heterogeneous memory system. We employ variable-strength error-correcting codes and tailor their error-correction mechanisms to the vulnerability profile of specific neurons to ensure a highly targeted approach for error mitigation. Our experimental results demonstrate that the proposed scheme enhances the neural network accuracy by 18% on average, while significantly reducing the fault-tolerance overhead. |
| Author | Zhang, Weigong Zang, Di Zhu, Jinbin Wang, Jing Fu, Xin Li, Keyao |
| Author_xml | – sequence: 1 givenname: Jing orcidid: 0000-0003-3653-7013 surname: Wang fullname: Wang, Jing email: jwang@ruc.edu.cn organization: School of Information, Renmin University of China, Beijing, China – sequence: 2 givenname: Jinbin orcidid: 0000-0003-4955-7718 surname: Zhu fullname: Zhu, Jinbin email: zhujinbin@sdust.edu.cn organization: College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China – sequence: 3 givenname: Xin orcidid: 0000-0002-9458-4769 surname: Fu fullname: Fu, Xin email: xfu8@central.uh.edu organization: Electrical and Computer Engineering Department, University of Houston, Houston, TX, USA – sequence: 4 givenname: Di surname: Zang fullname: Zang, Di email: zd@cnu.edu.cn organization: College of Information Engineering, Capital Normal University, Beijing, China – sequence: 5 givenname: Keyao surname: Li fullname: Li, Keyao email: lky@cnu.edu.cn organization: College of Information Engineering, Capital Normal University, Beijing, China – sequence: 6 givenname: Weigong orcidid: 0000-0003-3969-5607 surname: Zhang fullname: Zhang, Weigong email: zwg771@cnu.edu.cn organization: College of Information Engineering, Capital Normal University, Beijing, China |
| BookMark | eNp9UM9PwjAYbQwmInr24mGJ58HXbqOrN7OAkBCNinpcWtpBcbTYdSHw1zt-HIwHT-8l3_uR712ilrFGIXSDoYsxsN406xIgcTeKWBozcobaOEloyFjSb6E2AE5DFsVwgS6ragkAfQKsjXYDs-Bmps08eFK142UDfmPdV_CqSs2FLrXf3gdjU-n5wlfB0NlVMOJObrhTvTdb-D0JMluWXFjHvbYm-NR-cYhr-EddGuVOQcFLzY3Xu4PsCp0XvKzU9Qk76H04mGajcPL8OM4eJuGMMPBhWsQAnImIFsCJSjEVoq8Ex4qIVEjFYylBCMJkX1IpYkkFibGgSvKYiVRGHXR3zF07-12ryudLWzvTVOYR0BRSihPcqHpH1czZqnKqyNdOr7jb5hjy_cD5NMv3A-engRtH8scx0_7wmXdcl__4bo8-rZT61ZIQGjXnH6YgjYI |
| CODEN | ITCOB4 |
| CitedBy_id | crossref_primary_10_3390_electronics14051042 |
| Cites_doi | 10.1109/TITS.2020.3032227 10.23919/DATE.2019.8714885 10.1109/ICCD.2018.00077 10.1145/2024723.2000118 10.1109/TAES.2021.3056086 10.1109/DSN.2014.2 10.1109/TSMC.2020.3020188 10.1109/MM.2004.86 10.1109/TC.2020.2973150 10.1109/DATE.2012.6176535 10.1145/2830772.2830829 10.1109/TNSM.2022.3164715 10.1145/2366231.2337200 10.1109/TC.2023.3283685 10.1109/TCSI.2014.2304658 10.1145/3464423 10.1109/TC.2005.202 10.1109/TDMR.2005.855790 10.1109/DATE.2005.181 10.1109/TIE.2020.3009604 10.1609/aaai.v33i01.33015565 10.1109/TITS.2021.3113928 10.1109/TNS.2009.2032090 10.1145/3386263.3406938 10.1109/MICRO.2003.1253181 10.1109/TPDS.2015.2426179 10.1109/TNS.2005.856543 10.1109/TNS.2018.2884460 10.1109/RAMS.2008.4925824 10.1145/3007787.3001165 10.1109/DAC.2018.8465834 10.23919/DATE.2018.8342151 10.1109/40.259894 10.1145/3140659.3080221 10.1109/TNNLS.2019.2958151 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TC.2024.3398492 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1557-9956 |
| EndPage | 1966 |
| ExternalDocumentID | 10_1109_TC_2024_3398492 10527392 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China (NSFC) grantid: 62076168 funderid: 10.13039/501100001809 |
| GroupedDBID | --Z -DZ -~X .55 .DC 0R~ 29I 3EH 3O- 4.4 5GY 5VS 6IK 85S 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACIWK ACNCT AENEX AETEA AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 IAAWW IBMZZ ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ MVM O9- OCL P2P PQQKQ RIA RIE RNI RNS RXW RZB TAE TN5 TWZ UHB UKR UPT VH1 X7M XJT XOL XZL YXB YYQ YZZ ZCG AAYXX ABUFD CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c290t-8f400a9b37f0a2e817bb6eba1e2b8bdea4dd0bb29d6d7db4d7b241b7eda49b8d3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001270596400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0018-9340 |
| IngestDate | Mon Jun 30 05:43:28 EDT 2025 Sat Nov 29 01:35:46 EST 2025 Tue Nov 18 22:16:48 EST 2025 Wed Aug 27 02:03:57 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 8 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c290t-8f400a9b37f0a2e817bb6eba1e2b8bdea4dd0bb29d6d7db4d7b241b7eda49b8d3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0003-4955-7718 0000-0003-3969-5607 0000-0003-3653-7013 0000-0002-9458-4769 |
| PQID | 3078087151 |
| PQPubID | 85452 |
| PageCount | 14 |
| ParticipantIDs | proquest_journals_3078087151 ieee_primary_10527392 crossref_citationtrail_10_1109_TC_2024_3398492 crossref_primary_10_1109_TC_2024_3398492 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-08-01 |
| PublicationDateYYYYMMDD | 2024-08-01 |
| PublicationDate_xml | – month: 08 year: 2024 text: 2024-08-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on computers |
| PublicationTitleAbbrev | TC |
| PublicationYear | 2024 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref35 ref12 ref34 ref15 ref37 ref14 ref36 ref31 Li (ref8) 2017 ref30 ref11 ref10 ref32 Deng (ref39) 2015 ref2 ref1 Zhang (ref28) 2015 ref17 ref16 ref38 ref19 ref18 Shi (ref41) 2017 Zhang (ref26) 2015 ref24 ref23 ref25 ref20 ref22 ref21 ref27 ref29 Jan (ref33) 2003 ref7 ref9 ref4 ref3 ref6 ref5 ref40 |
| References_xml | – ident: ref4 doi: 10.1109/TITS.2020.3032227 – start-page: 1 volume-title: Proc. Int. Conf. High Perform. Comput., Netw., Storage Anal. year: 2017 ident: ref8 article-title: Understanding error propagation in deep learning neural network (DNN) accelerators and applications – ident: ref19 doi: 10.23919/DATE.2019.8714885 – ident: ref18 doi: 10.1109/ICCD.2018.00077 – ident: ref32 doi: 10.1145/2024723.2000118 – ident: ref5 doi: 10.1109/TAES.2021.3056086 – ident: ref17 doi: 10.1109/DSN.2014.2 – ident: ref23 doi: 10.1109/TSMC.2020.3020188 – year: 2003 ident: ref33 article-title: Digital integrated circuits: A design perspective – ident: ref10 doi: 10.1109/MM.2004.86 – ident: ref30 doi: 10.1109/TC.2020.2973150 – ident: ref13 doi: 10.1109/DATE.2012.6176535 – ident: ref11 doi: 10.1145/2830772.2830829 – ident: ref3 doi: 10.1109/TNSM.2022.3164715 – start-page: 701 volume-title: Proc. Des., Automat. Test Europe Conf. Exhibition (DATE) year: 2015 ident: ref28 article-title: ApproxANN: An approximate computing framework for artificial neural network – ident: ref38 doi: 10.1145/2366231.2337200 – year: 2017 ident: ref41 article-title: Exploiting the tradeoff between program accuracy and soft-error resiliency overhead for machine learning workloads – ident: ref7 doi: 10.1109/TC.2023.3283685 – ident: ref35 doi: 10.1109/TCSI.2014.2304658 – ident: ref1 doi: 10.1145/3464423 – ident: ref12 doi: 10.1109/TC.2005.202 – ident: ref36 doi: 10.1109/TDMR.2005.855790 – ident: ref14 doi: 10.1109/DATE.2005.181 – ident: ref2 doi: 10.1109/TIE.2020.3009604 – ident: ref29 doi: 10.1609/aaai.v33i01.33015565 – ident: ref22 doi: 10.1109/TITS.2021.3113928 – ident: ref15 doi: 10.1109/TNS.2009.2032090 – ident: ref21 doi: 10.1145/3386263.3406938 – ident: ref9 doi: 10.1109/MICRO.2003.1253181 – ident: ref6 doi: 10.1109/TPDS.2015.2426179 – start-page: 701 volume-title: Proc. Des., Automat. Test Europe Conf. Exhibition (DATE) year: 2015 ident: ref26 article-title: ApproxANN: An approximate computing framework for artificial neural network – ident: ref34 doi: 10.1109/TNS.2005.856543 – ident: ref40 doi: 10.1109/TNS.2018.2884460 – ident: ref25 doi: 10.1109/RAMS.2008.4925824 – ident: ref37 doi: 10.1145/3007787.3001165 – ident: ref16 doi: 10.1109/DAC.2018.8465834 – ident: ref20 doi: 10.23919/DATE.2018.8342151 – ident: ref27 doi: 10.1109/40.259894 – ident: ref31 doi: 10.1145/3140659.3080221 – start-page: 593 volume-title: Proc. Des., Automat. Test Europe Conf. Exhibition (DATE) year: 2015 ident: ref39 article-title: Retraining-based timing error mitigation for hardware neural networks – ident: ref24 doi: 10.1109/TNNLS.2019.2958151 |
| SSID | ssj0006209 |
| Score | 2.4291906 |
| Snippet | Ensuring the reliability of deep neural networks (DNNs) is paramount in safety-critical applications. Although introducing supplementary fault-tolerant... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1953 |
| SubjectTerms | Artificial neural networks Collaboration Computer network reliability Computing time Error correcting codes Error correction Fault tolerance Fault tolerant systems Hardware memory protection Network reliability neural network Neural networks neuron vulnerability factor Neurons Performance degradation Reliability Safety critical soft error Soft errors Software Software reliability Structural reliability |
| Title | Enhancing Neural Network Reliability: Insights From Hardware/Software Collaboration With Neuron Vulnerability Quantization |
| URI | https://ieeexplore.ieee.org/document/10527392 https://www.proquest.com/docview/3078087151 |
| Volume | 73 |
| WOSCitedRecordID | wos001270596400003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1557-9956 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006209 issn: 0018-9340 databaseCode: RIE dateStart: 19680101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA4qHvTgW6wvcvDgZWt2kzaJNykWBSmK9XFbks1UBd1KH4r-eifZVCriwVsOk2Hhy0xmNjPfEHLgXFFwkL0Eg32RCGuLRCnOEglGmQa3mpvQKHwhOx11f68vY7N66IUBgFB8BnW_DG_5rl-M_a8ytHBPF6bR485K2ayatb7dbnNSz5GiBXPBIo9PyvRRt4WJYCbqnGsldPbjCgozVX454nC7tJf_-V0rZCmGkfSkwn2VzEC5RpYnIxpotNg1sjjFN7hOPk_LR8-vUT5QT8qBCjpVFTj1lckVY_fHMT0vhz5lH9L2oP9C_dv-uxnA0TV6bL-gremzQ--eRo9BHa5vx8-exbpSRK_GCFvs89wgN-3TbussicMXkiLTbJSoHlq30ZbLHjMZqFRa2wRrUsissg6McI5Zm2nXdNJZ4aTFYMBKcEZoqxzfJHNlv4QtQo0UDYeRkwENGL31DBiUw9SUoy7DihqpT_DIi8hM7gdkPOchQ2E677ZyD2AeAayRw-8NrxUpx9-iGx6vKbEKqhrZnSCeR6sd5ujvFMMMspFu_7Fthyx47VUF4C6ZGw3GsEfmi7fR03CwHw7kF9Sc4jI |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTxsxEB0hQGo5QEupmpYWH3roZYN37cR2b1VEBCKNQE1bbit7PSlIdIPyUQS_nrHXQakqDr35YI9Xep7xzHrmDcBH76tKoBpn5OzLTDpXZVoLnim02naEM8LGQuGBGg71xYU5S8XqsRYGEWPyGbbDML7l-0m1CL_KSMMDXZghi7vRkbLgTbnWo-HtLjM6ctJhIXli8sm5ORz1KBQsZFsIo6Up_rqEYleVf0xxvF_6O__5ZS9gOzmS7EuD_EtYw3oXdpZNGljS2V3YWmEcfAX3R_VlYNiof7FAy0EChk0eOAu5yQ1n991ndlLPQtA-Y_3p5DcLr_u3doqH38hmhwHrrZ4e9vNqfhnF0fjH4jrwWDeC2PmCgEuVnnvwvX806h1nqf1CVhWGzzM9Jv22xgk15rZAnSvnuuhsjoXTzqOV3nPnCuO7XnknvXLkDjiF3krjtBevYb2e1PgGmFWy48l3smiQ_LexRUvzKDgVJMvyqgXtJR5llbjJQ4uM6zLGKNyUo14ZACwTgC349LjgpqHleHrqXsBrZVoDVQv2l4iXSW9nJVk8zSmG7ORvn1h2AM-OR18H5eBkePoOnoedmnzAfVifTxf4HjarP_Or2fRDPJwPudvleQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Enhancing+Neural+Network+Reliability%3A+Insights+From+Hardware%2FSoftware+Collaboration+With+Neuron+Vulnerability+Quantization&rft.jtitle=IEEE+transactions+on+computers&rft.au=Wang%2C+Jing&rft.au=Zhu%2C+Jinbin&rft.au=Fu%2C+Xin&rft.au=Zang%2C+Di&rft.date=2024-08-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0018-9340&rft.eissn=1557-9956&rft.volume=73&rft.issue=8&rft.spage=1953&rft_id=info:doi/10.1109%2FTC.2024.3398492&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9340&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9340&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9340&client=summon |