Enhancing Neural Network Reliability: Insights From Hardware/Software Collaboration With Neuron Vulnerability Quantization

Ensuring the reliability of deep neural networks (DNNs) is paramount in safety-critical applications. Although introducing supplementary fault-tolerant mechanisms can augment the reliability of DNNs, an efficiency tradeoff may be introduced. This study reveals the inherent fault tolerance of neural...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on computers Vol. 73; no. 8; pp. 1953 - 1966
Main Authors: Wang, Jing, Zhu, Jinbin, Fu, Xin, Zang, Di, Li, Keyao, Zhang, Weigong
Format: Journal Article
Language:English
Published: New York IEEE 01.08.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:0018-9340, 1557-9956
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Ensuring the reliability of deep neural networks (DNNs) is paramount in safety-critical applications. Although introducing supplementary fault-tolerant mechanisms can augment the reliability of DNNs, an efficiency tradeoff may be introduced. This study reveals the inherent fault tolerance of neural networks, where individual neurons exhibit varying degrees of fault tolerance, by thoroughly exploring the structural attributes of DNNs. We thereby develop a hardware/software collaborative method that guarantees the reliability of DNNs while minimizing performance degradation. We introduce the neuron vulnerability factor (NVF) to quantify the susceptibility to soft errors. We propose two efficient methods that leverage the NVF to minimize the negative effects of soft errors on neurons. First, we present a novel computational scheduling scheme. By prioritizing error-prone neurons, the expedited completion of their computations is facilitated to mitigate the risk of neural computing errors that arise from soft errors without sacrificing efficiency. Second, we propose the NVF-guided heterogeneous memory system. We employ variable-strength error-correcting codes and tailor their error-correction mechanisms to the vulnerability profile of specific neurons to ensure a highly targeted approach for error mitigation. Our experimental results demonstrate that the proposed scheme enhances the neural network accuracy by 18% on average, while significantly reducing the fault-tolerance overhead.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0018-9340
1557-9956
DOI:10.1109/TC.2024.3398492