Multiple-Boundary Clustering and Prioritization to Promote Neural Network Retraining

With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) s. 410 - 422
Hlavní autori: Shen, Weijun, Li, Yanhui, Chen, Lin, Han, Yuanlei, Zhou, Yuming, Xu, Baowen
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: ACM 01.09.2020
Predmet:
ISSN:2643-1572
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the testing data to evaluate them. To bridge the gap, testers aim to collect an effective subset of inputs from the testing contexts, with limited labeling effort, for retraining DL models.To assist the subset selection, we propose Multiple-Boundary Clustering and Prioritization (MCP), a technique to cluster test samples into the boundary areas of multiple boundaries for DL models and specify the priority to select samples evenly from all boundary areas, to make sure enough useful samples for each boundary reconstruction. To evaluate MCP, we conduct an extensive empirical study with three popular DL models and 33 simulated testing contexts. The experiment results show that, compared with state-of-the-art baseline methods, on effectiveness, our approach MCP has a significantly better performance by evaluating the improved quality of retrained DL models; on efficiency, MCP also has the advantages in time costs.
ISSN:2643-1572
DOI:10.1145/3324884.3416621