Robin: A Novel Method to Produce Robust Interpreters for Deep Learning-Based Code Classifiers

Deep learning has been widely used in source code classification tasks, such as code classification according to their functionalities, code authorship attribution, and vulnerability detection. Unfortunately, the black-box nature of deep learning makes it hard to interpret and understand why a class...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 27 - 39
Hlavní autori: Li, Zhen, Zhang, Ruqian, Zou, Deqing, Wang, Ning, Li, Yating, Xu, Shouhuai, Chen, Chen, Jin, Hai
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 11.09.2023
Predmet:
ISSN:2643-1572
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Deep learning has been widely used in source code classification tasks, such as code classification according to their functionalities, code authorship attribution, and vulnerability detection. Unfortunately, the black-box nature of deep learning makes it hard to interpret and understand why a classifier (i.e., classification model) makes a particular prediction on a given example. This lack of interpretability (or explainability) might have hindered their adoption by practitioners because it is not clear when they should or should not trust a classifier's prediction. The lack of interpretability has motivated a number of studies in recent years. However, existing methods are neither robust nor able to cope with out-of-distribution examples. In this paper, we propose a novel method to produce Robust interpreters for a given deep learning-based code classifier; the method is dubbed Robin. The key idea behind Robin is a novel hybrid structure combining an interpreter and two approximators, while leveraging the ideas of adversarial training and data augmentation. Experimental results show that on average the interpreter produced by Robin achieves a 6.11% higher fidelity (evaluated on the classifier), 67.22% higher fidelity (evaluated on the approximator), and 15.87x higher robustness than that of the three existing interpreters we evaluated. Moreover, the interpreter is 47.31% less affected by out-of-distribution examples than that of LEMNA.
ISSN:2643-1572
DOI:10.1109/ASE56229.2023.00164