On-the-fly Improving Performance of Deep Code Models via Input Denoising

Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these deep code models have achieved great success, even state-of-the-art models suffer from noise present in inputs leading to erroneous predictions....

Full description

Saved in:

Bibliographic Details
Published in:	IEEE/ACM International Conference on Automated Software Engineering : [proceedings] pp. 560 - 572
Main Authors:	Tian, Zhao, Chen, Junjie, Zhang, Xiangyu
Format:	Conference Proceeding
Language:	English
Published:	IEEE 11.09.2023
Subjects:	Code Model Codes Deep Learning Input Denoising Location awareness Noise measurement Noise reduction Predictive models Semantics Syntactics
ISSN:	2643-1572
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these deep code models have achieved great success, even state-of-the-art models suffer from noise present in inputs leading to erroneous predictions. While it is possible to enhance models through retraining/fine-tuning, this is not a once-and-for-all approach and incurs significant overhead. In particular, these techniques cannot on-the-fly improve performance of (deployed) models. There are currently some techniques for input denoising in other domains (such as image processing), but since code input is discrete and must strictly abide by complex syntactic and semantic constraints, input denoising techniques in other fields are almost not applicable. In this work, we propose the first input denoising technique (i.e., CodeDenoise) for deep code models. Its key idea is to localize noisy identifiers in (likely) mispredicted inputs, and denoise such inputs by cleansing the located identifiers. It does not need to retrain or reconstruct the model, but only needs to cleanse inputs on-the-fly to improve performance. Our experiments on 18 deep code models (i.e., three pre-trained models with six code-based datasets) demonstrate the effectiveness and efficiency of CodeDenoise. For example, on average, CodeDenoise successfully denoises 21.91% of mispredicted inputs and improves the original models by 2.04% in terms of the model accuracy across all the subjects in an average of 0.48 second spent on each input, substantially outperforming the widely-used fine-tuning strategy.
AbstractList	Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these deep code models have achieved great success, even state-of-the-art models suffer from noise present in inputs leading to erroneous predictions. While it is possible to enhance models through retraining/fine-tuning, this is not a once-and-for-all approach and incurs significant overhead. In particular, these techniques cannot on-the-fly improve performance of (deployed) models. There are currently some techniques for input denoising in other domains (such as image processing), but since code input is discrete and must strictly abide by complex syntactic and semantic constraints, input denoising techniques in other fields are almost not applicable. In this work, we propose the first input denoising technique (i.e., CodeDenoise) for deep code models. Its key idea is to localize noisy identifiers in (likely) mispredicted inputs, and denoise such inputs by cleansing the located identifiers. It does not need to retrain or reconstruct the model, but only needs to cleanse inputs on-the-fly to improve performance. Our experiments on 18 deep code models (i.e., three pre-trained models with six code-based datasets) demonstrate the effectiveness and efficiency of CodeDenoise. For example, on average, CodeDenoise successfully denoises 21.91% of mispredicted inputs and improves the original models by 2.04% in terms of the model accuracy across all the subjects in an average of 0.48 second spent on each input, substantially outperforming the widely-used fine-tuning strategy.
Author	Zhang, Xiangyu Tian, Zhao Chen, Junjie
Author_xml	– sequence: 1 givenname: Zhao surname: Tian fullname: Tian, Zhao email: tianzhao@tju.edu.cn organization: College of Intelligence and Computing, Tianjin University,China – sequence: 2 givenname: Junjie surname: Chen fullname: Chen, Junjie email: junjiechen@tju.edu.cn organization: College of Intelligence and Computing, Tianjin University,China – sequence: 3 givenname: Xiangyu surname: Zhang fullname: Zhang, Xiangyu email: xyzhang@cs.purdue.edu organization: Purdue University,Department of Computer Science,USA
BookMark	eNotj91Kw0AUhFdRsK19Ar3YF0g8e_Yn2csSqw1UKqjXZZOc1UiyCUks9O0N6M0Mw8cMzJJdhS4QY3cCYiHAPmzettog2hgBZQwgjLlga5vYVGqQaK1Rl2yBRslI6ARv2HIcvwH0HJIF2x1CNH1R5Jszz9t-6E51-OSvNPhuaF0oiXeePxL1POsq4i-zNCM_1Y7nof-ZZhS6epw7t-zau2ak9b-v2MfT9j3bRfvDc55t9pHDVE1RWYGplPXeS-kKJ9Kq0IW3BQE5cpoQARJMElei8paEsZJUhV5rQ5aMkCt2_7dbE9GxH-rWDeejAJz_Ki1_Ae5ZTr0
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ASE56229.2023.00166
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore Digital Library IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9798350329964
EISSN	2643-1572
EndPage	572
ExternalDocumentID	10298345
Genre	orig-research
GrantInformation_xml	– fundername: NSF grantid: 1901242,1910300 funderid: 10.13039/100000001 – fundername: National Natural Science Foundation of China grantid: 62322208,62002256 funderid: 10.13039/501100001809 – fundername: CCF funderid: 10.13039/100000143 – fundername: CAST funderid: 10.13039/100010097
GroupedDBID	6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL
ID	FETCH-LOGICAL-a284t-cd06d49fff33aba18db5bf9be0eaea5e22007277ac24f9e1693e4d2f556e9e613
IEDL.DBID	RIE
ISICitedReferencesCount	6
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001103357200045&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:32:28 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a284t-cd06d49fff33aba18db5bf9be0eaea5e22007277ac24f9e1693e4d2f556e9e613
PageCount	13
ParticipantIDs	ieee_primary_10298345
PublicationCentury	2000
PublicationDate	2023-Sept.-11
PublicationDateYYYYMMDD	2023-09-11
PublicationDate_xml	– month: 09 year: 2023 text: 2023-Sept.-11 day: 11
PublicationDecade	2020
PublicationTitle	IEEE/ACM International Conference on Automated Software Engineering : [proceedings]
PublicationTitleAbbrev	ASE
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0051577 ssib057256115
Score	2.313577
Snippet	Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these...
SourceID	ieee
SourceType	Publisher
StartPage	560
SubjectTerms	Code Model Codes Deep Learning Input Denoising Location awareness Noise measurement Noise reduction Predictive models Semantics Syntactics
Title	On-the-fly Improving Performance of Deep Code Models via Input Denoising
URI	https://ieeexplore.ieee.org/document/10298345
WOSCitedRecordID	wos001103357200045&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NawMhEB2a0ENP6UdKv_HQq-3q6roeS5qQXtJAW8gtuOsIgbAbkk2g_7662SS99NCbKIiMI09G33sAj8opbq0RNLYmpkJmmqYiT2hijct9VsvImNpsQo1G6WSixw1ZvebCIGL9-QyfQrN-y7dlvg6lMn_CuU5jIVvQUkptyVq75JHKgzdj-7uvx2mlGpkhFunnl4--h3oeuCk8iJqyoIv4y1ClxpNB558rOYXugZlHxnvMOYMjLM6hs7NmIM1JvYDhe0H91Y66-TfZ1w3I-MASIKUjr4gL0istkuCINl-RzcyQt8JP5YeKchbKCF34GvQ_e0PamCZQ45GmormNEiu0cy6OTWZYajOZOZ1hhAaNRB6Kk1wpk3PhNAYtFhSWOykT1OjB_RLaRVngFRCN2qJhLkozFHEidM4jw7UIml6cK34N3RCZ6WKrizHdBeXmj_5bOAnBD78tGLuDdrVc4z0c55tqtlo-1Lv5A42CnyU
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NawIxFAytLbQn-2Hpd3PoNe0mm2w2x2IVpdYKteBNspsXEGRXdBX675usq_bSQ28hgRBeEubxkplB6FFayYzRnIRGh4SLRJGYpxGJjLapO9Ui0Lo0m5D9fjwaqUFFVi-5MABQfj6DJ98s3_JNni59qczdcKbikIt9dCA4Z3RN19ocHyEdfFO6zX4dUktZCQ3RQD2_fLYc2DPPTmFe1pR6ZcRfliolorTr_1zLCWrsuHl4sEWdU7QH2Rmqb8wZcHVXz1HnIyMuuSN2-o23lQM82PEEcG7xK8AMN3MD2HuiTRd4NdG4m7mp3FCWT3whoYG-2q1hs0Mq2wSiHdYUJDVBZLiy1oahTjSNTSISqxIIQIMWwHx5kkmpU8atAq_GAtwwK0QEChy8X6BalmdwibACZUBTG8QJ8DDiKmWBZop7VS_GJLtCDR-Z8WytjDHeBOX6j_4HdNQZvvfGvW7_7QYd-43wfy8ovUW1Yr6EO3SYrorJYn5f7uwPBsOibA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=On-the-fly+Improving+Performance+of+Deep+Code+Models+via+Input+Denoising&rft.au=Tian%2C+Zhao&rft.au=Chen%2C+Junjie&rft.au=Zhang%2C+Xiangyu&rft.date=2023-09-11&rft.pub=IEEE&rft.eissn=2643-1572&rft.spage=560&rft.epage=572&rft_id=info:doi/10.1109%2FASE56229.2023.00166&rft.externalDocID=10298345