Multiple-Boundary Clustering and Prioritization to Promote Neural Network Retraining
With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the...
Uložené v:
| Vydané v: | 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) s. 410 - 422 |
|---|---|
| Hlavní autori: | , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
ACM
01.09.2020
|
| Predmet: | |
| ISSN: | 2643-1572 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the testing data to evaluate them. To bridge the gap, testers aim to collect an effective subset of inputs from the testing contexts, with limited labeling effort, for retraining DL models.To assist the subset selection, we propose Multiple-Boundary Clustering and Prioritization (MCP), a technique to cluster test samples into the boundary areas of multiple boundaries for DL models and specify the priority to select samples evenly from all boundary areas, to make sure enough useful samples for each boundary reconstruction. To evaluate MCP, we conduct an extensive empirical study with three popular DL models and 33 simulated testing contexts. The experiment results show that, compared with state-of-the-art baseline methods, on effectiveness, our approach MCP has a significantly better performance by evaluating the improved quality of retrained DL models; on efficiency, MCP also has the advantages in time costs. |
|---|---|
| AbstractList | With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the testing data to evaluate them. To bridge the gap, testers aim to collect an effective subset of inputs from the testing contexts, with limited labeling effort, for retraining DL models.To assist the subset selection, we propose Multiple-Boundary Clustering and Prioritization (MCP), a technique to cluster test samples into the boundary areas of multiple boundaries for DL models and specify the priority to select samples evenly from all boundary areas, to make sure enough useful samples for each boundary reconstruction. To evaluate MCP, we conduct an extensive empirical study with three popular DL models and 33 simulated testing contexts. The experiment results show that, compared with state-of-the-art baseline methods, on effectiveness, our approach MCP has a significantly better performance by evaluating the improved quality of retrained DL models; on efficiency, MCP also has the advantages in time costs. |
| Author | Zhou, Yuming Shen, Weijun Xu, Baowen Li, Yanhui Chen, Lin Han, Yuanlei |
| Author_xml | – sequence: 1 givenname: Weijun surname: Shen fullname: Shen, Weijun email: shenweijun@smail.nju.edu.cn organization: State Key Laboratory for Novel Software Technology, Nanjing University,China – sequence: 2 givenname: Yanhui surname: Li fullname: Li, Yanhui email: yanhuili@nju.edu.cn organization: State Key Laboratory for Novel Software Technology, Nanjing University,China – sequence: 3 givenname: Lin surname: Chen fullname: Chen, Lin email: lchen@nju.edu.cn organization: State Key Laboratory for Novel Software Technology, Nanjing University,China – sequence: 4 givenname: Yuanlei surname: Han fullname: Han, Yuanlei email: mg1833022@smail.nju.edu.cn organization: State Key Laboratory for Novel Software Technology, Nanjing University,China – sequence: 5 givenname: Yuming surname: Zhou fullname: Zhou, Yuming email: zhouyuming@nju.edu.cn organization: State Key Laboratory for Novel Software Technology, Nanjing University,China – sequence: 6 givenname: Baowen surname: Xu fullname: Xu, Baowen email: bwxu@nju.edu.cn organization: State Key Laboratory for Novel Software Technology, Nanjing University,China |
| BookMark | eNotjstOwzAURA0CibZ0zYKNfyDF9vUrS4goIJWHUFlXTnKDDKldOY4QfD2RYHU0M5rRzMlJiAEJueBsxblUVwBCWitXILnWgh-RZWnsFDDQRlt5TGZCSyi4MuKMzIfhgzE1CTMj28exz_7QY3ETx9C69E2rfhwyJh_eqQstfUk-Jp_9j8s-Bprj5MR9zEifcEyun5C_Yvqkr5iT82HqnZPTzvUDLv-5IG_r2211X2ye7x6q603hhDS5AM0tdq50aBorueKNrRsNLTNNLVvjlANjalvaUpWWM940zCiGIKDuLNQCFuTyb9cj4u6Q_H76vyuF1RwAfgGmzFJY |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3324884.3416621 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781450367684 1450367682 |
| EISSN | 2643-1572 |
| EndPage | 422 |
| ExternalDocumentID | 9286133 |
| Genre | orig-research |
| GroupedDBID | 29I 6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-a247t-3618efa9ae7c84151c8bc63d07cb4d7a5a377b8989598101cc0750e323bf83b23 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 44 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000651313500036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:33:27 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a247t-3618efa9ae7c84151c8bc63d07cb4d7a5a377b8989598101cc0750e323bf83b23 |
| PageCount | 13 |
| ParticipantIDs | ieee_primary_9286133 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-Sept. |
| PublicationDateYYYYMMDD | 2020-09-01 |
| PublicationDate_xml | – month: 09 year: 2020 text: 2020-Sept. |
| PublicationDecade | 2020 |
| PublicationTitle | 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) |
| PublicationTitleAbbrev | ASE |
| PublicationYear | 2020 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0051577 ssj0002871035 |
| Score | 2.4094944 |
| Snippet | With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 410 |
| SubjectTerms | Context modeling Data models Deep learning Labeling Multiple-Boundary Neural network Retraining Software testing Task analysis Testing Training Training data |
| Title | Multiple-Boundary Clustering and Prioritization to Promote Neural Network Retraining |
| URI | https://ieeexplore.ieee.org/document/9286133 |
| WOSCitedRecordID | wos000651313500036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH5sw4OnqZv4mxw8mq1tmr7k6nB40DFkym6jSVMYjFZmK_jfm2Rd9eDFU0shIeSRfi_J-74P4DZU6EyQkMogyGmccEUlppyi4jzlWWiUZ6W9PeFsJpZLOe_AXcuFMcb44jMzcq_-Lj8rde2OysYyEhZ9WBe6iLjjarXnKS7zD1ib-lqYRmykfMKYj5lNHISIR_annSROGPSXl4qHkmn_f4M4guEPJ4_MW7Q5ho4pTqC_N2UgzRodwOK5KRGk994xaftFJpvaySHYZiQtMtvLunRSRjsCJqlK16-NmCFOqSPd2IcvDScvZm8gMYTX6cNi8kgb6wSaRjFWlCWhMLlT3kYtLEaHWiidsCxAreLMBiNliMpZR3LpJL60dqmDYRFTuWAqYqfQK8rCnAERMskFzxhXUsciUMouWbtxk5HhJmSoz2HgJmn1vlPHWDXzc_H350s4jNyO1VdpXUGv2tbmGg70Z7X-2N74kH4DH-ChIA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwGP2YU9DT1E38bQ4ezdY2SZNcHY6J2xgyZbfRpCkIo5XZCf73JllXPXjx1FJICPlI35fke-8B3IaKOxMkjmUQZJjGTGHJE4a5YixhaWiUZ6W9jvhkIuZzOW3AXc2FMcb44jPTda_-Lj8t9NodlfVkJCz6kB3YZZRG4YatVZ-ouNw_IHXya4Ga80rMJ6SsR2zqIATt2t92HDtp0F9uKh5MBq3_DeMQOj-sPDSt8eYIGiY_htbWlgFVq7QNs3FVJIjvvWfS6gv1l2sniGCboSRPbS9vhRMz2lAwUVm4fm3MDHJaHcnSPnxxOHo2WwuJDrwMHmb9Ia7ME3ASUV5iEofCZE57m2thUTrUQumYpAHXiqY2HAnhXDnzSCadyJfWLnkwJCIqE0RF5ASaeZGbU0BCxplgKWFKaioCpeyitVs3GRlmQsL1GbTdJC3eN_oYi2p-zv_-fAP7w9l4tBg9Tp4u4CBy-1dfs3UJzXK1Nlewpz_Lt4_VtQ_vN7CppGc |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+35th+IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%28ASE%29&rft.atitle=Multiple-Boundary+Clustering+and+Prioritization+to+Promote+Neural+Network+Retraining&rft.au=Shen%2C+Weijun&rft.au=Li%2C+Yanhui&rft.au=Chen%2C+Lin&rft.au=Han%2C+Yuanlei&rft.date=2020-09-01&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=410&rft.epage=422&rft_id=info:doi/10.1145%2F3324884.3416621&rft.externalDocID=9286133 |