Special Session: Fault Criticality Assessment in AI Accelerators

The ubiquitous application of deep neural networks (DNN) has led to a rise in demand for AI accelerators. DNN-specific functional criticality analysis identifies faults that cause measurable and significant deviations from acceptable requirements such as the inferencing accuracy. This paper examines...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings - IEEE VLSI Test Symposium s. 1 - 4
Hlavní autoři: Chaudhuri, Arjun, Talukdar, Jonti, Chakrabarty, Krishnendu
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 25.04.2022
Témata:
ISSN:2375-1053
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The ubiquitous application of deep neural networks (DNN) has led to a rise in demand for AI accelerators. DNN-specific functional criticality analysis identifies faults that cause measurable and significant deviations from acceptable requirements such as the inferencing accuracy. This paper examines the problem of classifying structural faults in the processing elements (PEs) of systolic-array accelerators. We first present a two-tier machine-learning (ML) based method to assess the functional criticality of faults. The problem of minimizing misclassification is addressed by utilizing generative adversarial networks (GANs). The two-tier ML/GAN-based criticality assessment method leads to less than 1% test escapes during functional criticality evaluation of structural faults. While supervised learning techniques can be used to accurately estimate fault criticality, it requires a considerable amount of ground truth for model training. We therefore describe a neural-twin framework for analyzing fault criticality with a negligible amount of ground-truth data. A recently proposed misclassification-driven training algorithm is used to sensitize and identify biases that are critical to the functioning of the accelerator for a given application workload. The proposed framework achieves up to 100% accuracy in fault-criticality classification in 16-bit and 32-bit PEs by using the criticality knowledge of only 2% of the total faults in a PE.
ISSN:2375-1053
DOI:10.1109/VTS52500.2021.9794215