A Method and Experiment to evaluate Deep Neural Networks as Test Oracles for Scientific Software

Testing scientific software is challenging because usually such type of systems have non-deterministic behaviours and, in addition, they generate non-trivial outputs such as images. Artificial intelligence (AI) is now a reality which is also helping in the development of the software testing activit...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test s. 40 - 51
Hlavný autor: de Santiago Junior, Valdivino Alexandre
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: ACM 01.05.2022
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Testing scientific software is challenging because usually such type of systems have non-deterministic behaviours and, in addition, they generate non-trivial outputs such as images. Artificial intelligence (AI) is now a reality which is also helping in the development of the software testing activity. In this article, we evaluate seven deep neural networks (DNNs), precisely deep convolutional neural networks (CNNs) with up to 161layers, playing the role of test oracle procedures for testing scientific models. Firstly, we propose a method, TOrC, which starts by generating training, validation, and test image datasets via combinatorial interaction testing applied to the original codes and second-order mutants. Within TOrC we also have classical steps such as transfer learning, a technique recommended for DNNs. Then, we verified the performance of the oracles (CNNs). The main conclusions of this research are: i) not necessarily a greater number of layers means that a CNN will present better performance; ii) transfer learning is a valuable technique but eventually we may need extended solutions to get better performances; iii) data-centric AI is an interesting path to follow; and iv) there is not a clear correlation between the software bugs, in the scientific models, and the errors (image misclassifications) presented by the CNNs. CCS CONCEPTS * Software and its engineering → Software testing and debugging;. Computing methodologies → Neural networks; Supervised learning by classification; Computer vision.
DOI:10.1145/3524481.3527232