Special Session: Fault Criticality Assessment in AI Accelerators

The ubiquitous application of deep neural networks (DNN) has led to a rise in demand for AI accelerators. DNN-specific functional criticality analysis identifies faults that cause measurable and significant deviations from acceptable requirements such as the inferencing accuracy. This paper examines...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Proceedings - IEEE VLSI Test Symposium s. 1 - 4
Hlavní autori:	Chaudhuri, Arjun, Talukdar, Jonti, Chakrabarty, Krishnendu
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 25.04.2022
Predmet:	AI accelerators Deep learning Fault diagnosis Neural networks Supervised learning Training Very large scale integration
ISSN:	2375-1053
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	The ubiquitous application of deep neural networks (DNN) has led to a rise in demand for AI accelerators. DNN-specific functional criticality analysis identifies faults that cause measurable and significant deviations from acceptable requirements such as the inferencing accuracy. This paper examines the problem of classifying structural faults in the processing elements (PEs) of systolic-array accelerators. We first present a two-tier machine-learning (ML) based method to assess the functional criticality of faults. The problem of minimizing misclassification is addressed by utilizing generative adversarial networks (GANs). The two-tier ML/GAN-based criticality assessment method leads to less than 1% test escapes during functional criticality evaluation of structural faults. While supervised learning techniques can be used to accurately estimate fault criticality, it requires a considerable amount of ground truth for model training. We therefore describe a neural-twin framework for analyzing fault criticality with a negligible amount of ground-truth data. A recently proposed misclassification-driven training algorithm is used to sensitize and identify biases that are critical to the functioning of the accelerator for a given application workload. The proposed framework achieves up to 100% accuracy in fault-criticality classification in 16-bit and 32-bit PEs by using the criticality knowledge of only 2% of the total faults in a PE.
ISSN:	2375-1053
DOI:	10.1109/VTS52500.2021.9794215