Special Session: Fault Criticality Assessment in AI Accelerators

The ubiquitous application of deep neural networks (DNN) has led to a rise in demand for AI accelerators. DNN-specific functional criticality analysis identifies faults that cause measurable and significant deviations from acceptable requirements such as the inferencing accuracy. This paper examines...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings - IEEE VLSI Test Symposium pp. 1 - 4
Main Authors: Chaudhuri, Arjun, Talukdar, Jonti, Chakrabarty, Krishnendu
Format: Conference Proceeding
Language:English
Published: IEEE 25.04.2022
Subjects:
ISSN:2375-1053
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The ubiquitous application of deep neural networks (DNN) has led to a rise in demand for AI accelerators. DNN-specific functional criticality analysis identifies faults that cause measurable and significant deviations from acceptable requirements such as the inferencing accuracy. This paper examines the problem of classifying structural faults in the processing elements (PEs) of systolic-array accelerators. We first present a two-tier machine-learning (ML) based method to assess the functional criticality of faults. The problem of minimizing misclassification is addressed by utilizing generative adversarial networks (GANs). The two-tier ML/GAN-based criticality assessment method leads to less than 1% test escapes during functional criticality evaluation of structural faults. While supervised learning techniques can be used to accurately estimate fault criticality, it requires a considerable amount of ground truth for model training. We therefore describe a neural-twin framework for analyzing fault criticality with a negligible amount of ground-truth data. A recently proposed misclassification-driven training algorithm is used to sensitize and identify biases that are critical to the functioning of the accelerator for a given application workload. The proposed framework achieves up to 100% accuracy in fault-criticality classification in 16-bit and 32-bit PEs by using the criticality knowledge of only 2% of the total faults in a PE.
ISSN:2375-1053
DOI:10.1109/VTS52500.2021.9794215