On-the-fly Improving Performance of Deep Code Models via Input Denoising
Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these deep code models have achieved great success, even state-of-the-art models suffer from noise present in inputs leading to erroneous predictions....
Saved in:
| Published in: | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] pp. 560 - 572 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
11.09.2023
|
| Subjects: | |
| ISSN: | 2643-1572 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these deep code models have achieved great success, even state-of-the-art models suffer from noise present in inputs leading to erroneous predictions. While it is possible to enhance models through retraining/fine-tuning, this is not a once-and-for-all approach and incurs significant overhead. In particular, these techniques cannot on-the-fly improve performance of (deployed) models. There are currently some techniques for input denoising in other domains (such as image processing), but since code input is discrete and must strictly abide by complex syntactic and semantic constraints, input denoising techniques in other fields are almost not applicable. In this work, we propose the first input denoising technique (i.e., CodeDenoise) for deep code models. Its key idea is to localize noisy identifiers in (likely) mispredicted inputs, and denoise such inputs by cleansing the located identifiers. It does not need to retrain or reconstruct the model, but only needs to cleanse inputs on-the-fly to improve performance. Our experiments on 18 deep code models (i.e., three pre-trained models with six code-based datasets) demonstrate the effectiveness and efficiency of CodeDenoise. For example, on average, CodeDenoise successfully denoises 21.91% of mispredicted inputs and improves the original models by 2.04% in terms of the model accuracy across all the subjects in an average of 0.48 second spent on each input, substantially outperforming the widely-used fine-tuning strategy. |
|---|---|
| AbstractList | Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these deep code models have achieved great success, even state-of-the-art models suffer from noise present in inputs leading to erroneous predictions. While it is possible to enhance models through retraining/fine-tuning, this is not a once-and-for-all approach and incurs significant overhead. In particular, these techniques cannot on-the-fly improve performance of (deployed) models. There are currently some techniques for input denoising in other domains (such as image processing), but since code input is discrete and must strictly abide by complex syntactic and semantic constraints, input denoising techniques in other fields are almost not applicable. In this work, we propose the first input denoising technique (i.e., CodeDenoise) for deep code models. Its key idea is to localize noisy identifiers in (likely) mispredicted inputs, and denoise such inputs by cleansing the located identifiers. It does not need to retrain or reconstruct the model, but only needs to cleanse inputs on-the-fly to improve performance. Our experiments on 18 deep code models (i.e., three pre-trained models with six code-based datasets) demonstrate the effectiveness and efficiency of CodeDenoise. For example, on average, CodeDenoise successfully denoises 21.91% of mispredicted inputs and improves the original models by 2.04% in terms of the model accuracy across all the subjects in an average of 0.48 second spent on each input, substantially outperforming the widely-used fine-tuning strategy. |
| Author | Zhang, Xiangyu Tian, Zhao Chen, Junjie |
| Author_xml | – sequence: 1 givenname: Zhao surname: Tian fullname: Tian, Zhao email: tianzhao@tju.edu.cn organization: College of Intelligence and Computing, Tianjin University,China – sequence: 2 givenname: Junjie surname: Chen fullname: Chen, Junjie email: junjiechen@tju.edu.cn organization: College of Intelligence and Computing, Tianjin University,China – sequence: 3 givenname: Xiangyu surname: Zhang fullname: Zhang, Xiangyu email: xyzhang@cs.purdue.edu organization: Purdue University,Department of Computer Science,USA |
| BookMark | eNotj91Kw0AUhFdRsK19Ar3YF0g8e_Yn2csSqw1UKqjXZZOc1UiyCUks9O0N6M0Mw8cMzJJdhS4QY3cCYiHAPmzettog2hgBZQwgjLlga5vYVGqQaK1Rl2yBRslI6ARv2HIcvwH0HJIF2x1CNH1R5Jszz9t-6E51-OSvNPhuaF0oiXeePxL1POsq4i-zNCM_1Y7nof-ZZhS6epw7t-zau2ak9b-v2MfT9j3bRfvDc55t9pHDVE1RWYGplPXeS-kKJ9Kq0IW3BQE5cpoQARJMElei8paEsZJUhV5rQ5aMkCt2_7dbE9GxH-rWDeejAJz_Ki1_Ae5ZTr0 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ASE56229.2023.00166 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore Digital Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798350329964 |
| EISSN | 2643-1572 |
| EndPage | 572 |
| ExternalDocumentID | 10298345 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: NSF grantid: 1901242,1910300 funderid: 10.13039/100000001 – fundername: National Natural Science Foundation of China grantid: 62322208,62002256 funderid: 10.13039/501100001809 – fundername: CCF funderid: 10.13039/100000143 – fundername: CAST funderid: 10.13039/100010097 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-a284t-cd06d49fff33aba18db5bf9be0eaea5e22007277ac24f9e1693e4d2f556e9e613 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 6 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001103357200045&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:32:28 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a284t-cd06d49fff33aba18db5bf9be0eaea5e22007277ac24f9e1693e4d2f556e9e613 |
| PageCount | 13 |
| ParticipantIDs | ieee_primary_10298345 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Sept.-11 |
| PublicationDateYYYYMMDD | 2023-09-11 |
| PublicationDate_xml | – month: 09 year: 2023 text: 2023-Sept.-11 day: 11 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] |
| PublicationTitleAbbrev | ASE |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0051577 ssib057256115 |
| Score | 2.313577 |
| Snippet | Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 560 |
| SubjectTerms | Code Model Codes Deep Learning Input Denoising Location awareness Noise measurement Noise reduction Predictive models Semantics Syntactics |
| Title | On-the-fly Improving Performance of Deep Code Models via Input Denoising |
| URI | https://ieeexplore.ieee.org/document/10298345 |
| WOSCitedRecordID | wos001103357200045&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NawMhEB2a0ENP6UdKv_HQq-3q6roeS5qQXtJAW8gtuOsIgbAbkk2g_7662SS99NCbKIiMI09G33sAj8opbq0RNLYmpkJmmqYiT2hijct9VsvImNpsQo1G6WSixw1ZvebCIGL9-QyfQrN-y7dlvg6lMn_CuU5jIVvQUkptyVq75JHKgzdj-7uvx2mlGpkhFunnl4--h3oeuCk8iJqyoIv4y1ClxpNB558rOYXugZlHxnvMOYMjLM6hs7NmIM1JvYDhe0H91Y66-TfZ1w3I-MASIKUjr4gL0istkuCINl-RzcyQt8JP5YeKchbKCF34GvQ_e0PamCZQ45GmormNEiu0cy6OTWZYajOZOZ1hhAaNRB6Kk1wpk3PhNAYtFhSWOykT1OjB_RLaRVngFRCN2qJhLkozFHEidM4jw7UIml6cK34N3RCZ6WKrizHdBeXmj_5bOAnBD78tGLuDdrVc4z0c55tqtlo-1Lv5A42CnyU |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NawIxFAytLbQn-2Hpd3PoNe0mm2w2x2IVpdYKteBNspsXEGRXdBX675usq_bSQ28hgRBeEubxkplB6FFayYzRnIRGh4SLRJGYpxGJjLapO9Ui0Lo0m5D9fjwaqUFFVi-5MABQfj6DJ98s3_JNni59qczdcKbikIt9dCA4Z3RN19ocHyEdfFO6zX4dUktZCQ3RQD2_fLYc2DPPTmFe1pR6ZcRfliolorTr_1zLCWrsuHl4sEWdU7QH2Rmqb8wZcHVXz1HnIyMuuSN2-o23lQM82PEEcG7xK8AMN3MD2HuiTRd4NdG4m7mp3FCWT3whoYG-2q1hs0Mq2wSiHdYUJDVBZLiy1oahTjSNTSISqxIIQIMWwHx5kkmpU8atAq_GAtwwK0QEChy8X6BalmdwibACZUBTG8QJ8DDiKmWBZop7VS_GJLtCDR-Z8WytjDHeBOX6j_4HdNQZvvfGvW7_7QYd-43wfy8ovUW1Yr6EO3SYrorJYn5f7uwPBsOibA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=On-the-fly+Improving+Performance+of+Deep+Code+Models+via+Input+Denoising&rft.au=Tian%2C+Zhao&rft.au=Chen%2C+Junjie&rft.au=Zhang%2C+Xiangyu&rft.date=2023-09-11&rft.pub=IEEE&rft.eissn=2643-1572&rft.spage=560&rft.epage=572&rft_id=info:doi/10.1109%2FASE56229.2023.00166&rft.externalDocID=10298345 |