Special Session: Fault Criticality Assessment in AI Accelerators

The ubiquitous application of deep neural networks (DNN) has led to a rise in demand for AI accelerators. DNN-specific functional criticality analysis identifies faults that cause measurable and significant deviations from acceptable requirements such as the inferencing accuracy. This paper examines...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings - IEEE VLSI Test Symposium S. 1 - 4
Hauptverfasser: Chaudhuri, Arjun, Talukdar, Jonti, Chakrabarty, Krishnendu
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 25.04.2022
Schlagworte:
ISSN:2375-1053
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The ubiquitous application of deep neural networks (DNN) has led to a rise in demand for AI accelerators. DNN-specific functional criticality analysis identifies faults that cause measurable and significant deviations from acceptable requirements such as the inferencing accuracy. This paper examines the problem of classifying structural faults in the processing elements (PEs) of systolic-array accelerators. We first present a two-tier machine-learning (ML) based method to assess the functional criticality of faults. The problem of minimizing misclassification is addressed by utilizing generative adversarial networks (GANs). The two-tier ML/GAN-based criticality assessment method leads to less than 1% test escapes during functional criticality evaluation of structural faults. While supervised learning techniques can be used to accurately estimate fault criticality, it requires a considerable amount of ground truth for model training. We therefore describe a neural-twin framework for analyzing fault criticality with a negligible amount of ground-truth data. A recently proposed misclassification-driven training algorithm is used to sensitize and identify biases that are critical to the functioning of the accelerator for a given application workload. The proposed framework achieves up to 100% accuracy in fault-criticality classification in 16-bit and 32-bit PEs by using the criticality knowledge of only 2% of the total faults in a PE.
ISSN:2375-1053
DOI:10.1109/VTS52500.2021.9794215