A Method and Experiment to evaluate Deep Neural Networks as Test Oracles for Scientific Software

Testing scientific software is challenging because usually such type of systems have non-deterministic behaviours and, in addition, they generate non-trivial outputs such as images. Artificial intelligence (AI) is now a reality which is also helping in the development of the software testing activit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test S. 40 - 51
1. Verfasser: de Santiago Junior, Valdivino Alexandre
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: ACM 01.05.2022
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Testing scientific software is challenging because usually such type of systems have non-deterministic behaviours and, in addition, they generate non-trivial outputs such as images. Artificial intelligence (AI) is now a reality which is also helping in the development of the software testing activity. In this article, we evaluate seven deep neural networks (DNNs), precisely deep convolutional neural networks (CNNs) with up to 161layers, playing the role of test oracle procedures for testing scientific models. Firstly, we propose a method, TOrC, which starts by generating training, validation, and test image datasets via combinatorial interaction testing applied to the original codes and second-order mutants. Within TOrC we also have classical steps such as transfer learning, a technique recommended for DNNs. Then, we verified the performance of the oracles (CNNs). The main conclusions of this research are: i) not necessarily a greater number of layers means that a CNN will present better performance; ii) transfer learning is a valuable technique but eventually we may need extended solutions to get better performances; iii) data-centric AI is an interesting path to follow; and iv) there is not a clear correlation between the software bugs, in the scientific models, and the errors (image misclassifications) presented by the CNNs. CCS CONCEPTS * Software and its engineering → Software testing and debugging;. Computing methodologies → Neural networks; Supervised learning by classification; Computer vision.
DOI:10.1145/3524481.3527232