Multiple-Boundary Clustering and Prioritization to Promote Neural Network Retraining

With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) s. 410 - 422
Hlavní autori: Shen, Weijun, Li, Yanhui, Chen, Lin, Han, Yuanlei, Zhou, Yuming, Xu, Baowen
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: ACM 01.09.2020
Predmet:
ISSN:2643-1572
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the testing data to evaluate them. To bridge the gap, testers aim to collect an effective subset of inputs from the testing contexts, with limited labeling effort, for retraining DL models.To assist the subset selection, we propose Multiple-Boundary Clustering and Prioritization (MCP), a technique to cluster test samples into the boundary areas of multiple boundaries for DL models and specify the priority to select samples evenly from all boundary areas, to make sure enough useful samples for each boundary reconstruction. To evaluate MCP, we conduct an extensive empirical study with three popular DL models and 33 simulated testing contexts. The experiment results show that, compared with state-of-the-art baseline methods, on effectiveness, our approach MCP has a significantly better performance by evaluating the improved quality of retrained DL models; on efficiency, MCP also has the advantages in time costs.
AbstractList With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the testing data to evaluate them. To bridge the gap, testers aim to collect an effective subset of inputs from the testing contexts, with limited labeling effort, for retraining DL models.To assist the subset selection, we propose Multiple-Boundary Clustering and Prioritization (MCP), a technique to cluster test samples into the boundary areas of multiple boundaries for DL models and specify the priority to select samples evenly from all boundary areas, to make sure enough useful samples for each boundary reconstruction. To evaluate MCP, we conduct an extensive empirical study with three popular DL models and 33 simulated testing contexts. The experiment results show that, compared with state-of-the-art baseline methods, on effectiveness, our approach MCP has a significantly better performance by evaluating the improved quality of retrained DL models; on efficiency, MCP also has the advantages in time costs.
Author Zhou, Yuming
Shen, Weijun
Xu, Baowen
Li, Yanhui
Chen, Lin
Han, Yuanlei
Author_xml – sequence: 1
  givenname: Weijun
  surname: Shen
  fullname: Shen, Weijun
  email: shenweijun@smail.nju.edu.cn
  organization: State Key Laboratory for Novel Software Technology, Nanjing University,China
– sequence: 2
  givenname: Yanhui
  surname: Li
  fullname: Li, Yanhui
  email: yanhuili@nju.edu.cn
  organization: State Key Laboratory for Novel Software Technology, Nanjing University,China
– sequence: 3
  givenname: Lin
  surname: Chen
  fullname: Chen, Lin
  email: lchen@nju.edu.cn
  organization: State Key Laboratory for Novel Software Technology, Nanjing University,China
– sequence: 4
  givenname: Yuanlei
  surname: Han
  fullname: Han, Yuanlei
  email: mg1833022@smail.nju.edu.cn
  organization: State Key Laboratory for Novel Software Technology, Nanjing University,China
– sequence: 5
  givenname: Yuming
  surname: Zhou
  fullname: Zhou, Yuming
  email: zhouyuming@nju.edu.cn
  organization: State Key Laboratory for Novel Software Technology, Nanjing University,China
– sequence: 6
  givenname: Baowen
  surname: Xu
  fullname: Xu, Baowen
  email: bwxu@nju.edu.cn
  organization: State Key Laboratory for Novel Software Technology, Nanjing University,China
BookMark eNotjstOwzAURA0CibZ0zYKNfyDF9vUrS4goIJWHUFlXTnKDDKldOY4QfD2RYHU0M5rRzMlJiAEJueBsxblUVwBCWitXILnWgh-RZWnsFDDQRlt5TGZCSyi4MuKMzIfhgzE1CTMj28exz_7QY3ETx9C69E2rfhwyJh_eqQstfUk-Jp_9j8s-Bprj5MR9zEifcEyun5C_Yvqkr5iT82HqnZPTzvUDLv-5IG_r2211X2ye7x6q603hhDS5AM0tdq50aBorueKNrRsNLTNNLVvjlANjalvaUpWWM940zCiGIKDuLNQCFuTyb9cj4u6Q_H76vyuF1RwAfgGmzFJY
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3324884.3416621
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450367684
1450367682
EISSN 2643-1572
EndPage 422
ExternalDocumentID 9286133
Genre orig-research
GroupedDBID 29I
6IE
6IF
6IH
6IK
6IL
6IM
6IN
6J9
AAJGR
AAWTH
ABLEC
ACREN
ADYOE
ADZIZ
AFYQB
ALMA_UNASSIGNED_HOLDINGS
AMTXH
APO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-a247t-3618efa9ae7c84151c8bc63d07cb4d7a5a377b8989598101cc0750e323bf83b23
IEDL.DBID RIE
ISICitedReferencesCount 44
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000651313500036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:33:27 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-3618efa9ae7c84151c8bc63d07cb4d7a5a377b8989598101cc0750e323bf83b23
PageCount 13
ParticipantIDs ieee_primary_9286133
PublicationCentury 2000
PublicationDate 2020-Sept.
PublicationDateYYYYMMDD 2020-09-01
PublicationDate_xml – month: 09
  year: 2020
  text: 2020-Sept.
PublicationDecade 2020
PublicationTitle 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)
PublicationTitleAbbrev ASE
PublicationYear 2020
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0051577
ssj0002871035
Score 2.4094944
Snippet With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in...
SourceID ieee
SourceType Publisher
StartPage 410
SubjectTerms Context modeling
Data models
Deep learning
Labeling
Multiple-Boundary
Neural network
Retraining
Software testing
Task analysis
Testing
Training
Training data
Title Multiple-Boundary Clustering and Prioritization to Promote Neural Network Retraining
URI https://ieeexplore.ieee.org/document/9286133
WOSCitedRecordID wos000651313500036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH5sw4OnqZv4mxw8mq1tmr7k6nB40DFkym6jSVMYjFZmK_jfm2Rd9eDFU0shIeSRfi_J-74P4DZU6EyQkMogyGmccEUlppyi4jzlWWiUZ6W9PeFsJpZLOe_AXcuFMcb44jMzcq_-Lj8rde2OysYyEhZ9WBe6iLjjarXnKS7zD1ib-lqYRmykfMKYj5lNHISIR_annSROGPSXl4qHkmn_f4M4guEPJ4_MW7Q5ho4pTqC_N2UgzRodwOK5KRGk994xaftFJpvaySHYZiQtMtvLunRSRjsCJqlK16-NmCFOqSPd2IcvDScvZm8gMYTX6cNi8kgb6wSaRjFWlCWhMLlT3kYtLEaHWiidsCxAreLMBiNliMpZR3LpJL60dqmDYRFTuWAqYqfQK8rCnAERMskFzxhXUsciUMouWbtxk5HhJmSoz2HgJmn1vlPHWDXzc_H350s4jNyO1VdpXUGv2tbmGg70Z7X-2N74kH4DH-ChIA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwGP2YU9DT1E38bQ4ezdY2SZNcHY6J2xgyZbfRpCkIo5XZCf73JllXPXjx1FJICPlI35fke-8B3IaKOxMkjmUQZJjGTGHJE4a5YixhaWiUZ6W9jvhkIuZzOW3AXc2FMcb44jPTda_-Lj8t9NodlfVkJCz6kB3YZZRG4YatVZ-ouNw_IHXya4Ga80rMJ6SsR2zqIATt2t92HDtp0F9uKh5MBq3_DeMQOj-sPDSt8eYIGiY_htbWlgFVq7QNs3FVJIjvvWfS6gv1l2sniGCboSRPbS9vhRMz2lAwUVm4fm3MDHJaHcnSPnxxOHo2WwuJDrwMHmb9Ia7ME3ASUV5iEofCZE57m2thUTrUQumYpAHXiqY2HAnhXDnzSCadyJfWLnkwJCIqE0RF5ASaeZGbU0BCxplgKWFKaioCpeyitVs3GRlmQsL1GbTdJC3eN_oYi2p-zv_-fAP7w9l4tBg9Tp4u4CBy-1dfs3UJzXK1Nlewpz_Lt4_VtQ_vN7CppGc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+35th+IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%28ASE%29&rft.atitle=Multiple-Boundary+Clustering+and+Prioritization+to+Promote+Neural+Network+Retraining&rft.au=Shen%2C+Weijun&rft.au=Li%2C+Yanhui&rft.au=Chen%2C+Lin&rft.au=Han%2C+Yuanlei&rft.date=2020-09-01&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=410&rft.epage=422&rft_id=info:doi/10.1145%2F3324884.3416621&rft.externalDocID=9286133