On-the-fly Improving Performance of Deep Code Models via Input Denoising

Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these deep code models have achieved great success, even state-of-the-art models suffer from noise present in inputs leading to erroneous predictions....

Full description

Saved in:
Bibliographic Details
Published in:IEEE/ACM International Conference on Automated Software Engineering : [proceedings] pp. 560 - 572
Main Authors: Tian, Zhao, Chen, Junjie, Zhang, Xiangyu
Format: Conference Proceeding
Language:English
Published: IEEE 11.09.2023
Subjects:
ISSN:2643-1572
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these deep code models have achieved great success, even state-of-the-art models suffer from noise present in inputs leading to erroneous predictions. While it is possible to enhance models through retraining/fine-tuning, this is not a once-and-for-all approach and incurs significant overhead. In particular, these techniques cannot on-the-fly improve performance of (deployed) models. There are currently some techniques for input denoising in other domains (such as image processing), but since code input is discrete and must strictly abide by complex syntactic and semantic constraints, input denoising techniques in other fields are almost not applicable. In this work, we propose the first input denoising technique (i.e., CodeDenoise) for deep code models. Its key idea is to localize noisy identifiers in (likely) mispredicted inputs, and denoise such inputs by cleansing the located identifiers. It does not need to retrain or reconstruct the model, but only needs to cleanse inputs on-the-fly to improve performance. Our experiments on 18 deep code models (i.e., three pre-trained models with six code-based datasets) demonstrate the effectiveness and efficiency of CodeDenoise. For example, on average, CodeDenoise successfully denoises 21.91% of mispredicted inputs and improves the original models by 2.04% in terms of the model accuracy across all the subjects in an average of 0.48 second spent on each input, substantially outperforming the widely-used fine-tuning strategy.
AbstractList Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these deep code models have achieved great success, even state-of-the-art models suffer from noise present in inputs leading to erroneous predictions. While it is possible to enhance models through retraining/fine-tuning, this is not a once-and-for-all approach and incurs significant overhead. In particular, these techniques cannot on-the-fly improve performance of (deployed) models. There are currently some techniques for input denoising in other domains (such as image processing), but since code input is discrete and must strictly abide by complex syntactic and semantic constraints, input denoising techniques in other fields are almost not applicable. In this work, we propose the first input denoising technique (i.e., CodeDenoise) for deep code models. Its key idea is to localize noisy identifiers in (likely) mispredicted inputs, and denoise such inputs by cleansing the located identifiers. It does not need to retrain or reconstruct the model, but only needs to cleanse inputs on-the-fly to improve performance. Our experiments on 18 deep code models (i.e., three pre-trained models with six code-based datasets) demonstrate the effectiveness and efficiency of CodeDenoise. For example, on average, CodeDenoise successfully denoises 21.91% of mispredicted inputs and improves the original models by 2.04% in terms of the model accuracy across all the subjects in an average of 0.48 second spent on each input, substantially outperforming the widely-used fine-tuning strategy.
Author Zhang, Xiangyu
Tian, Zhao
Chen, Junjie
Author_xml – sequence: 1
  givenname: Zhao
  surname: Tian
  fullname: Tian, Zhao
  email: tianzhao@tju.edu.cn
  organization: College of Intelligence and Computing, Tianjin University,China
– sequence: 2
  givenname: Junjie
  surname: Chen
  fullname: Chen, Junjie
  email: junjiechen@tju.edu.cn
  organization: College of Intelligence and Computing, Tianjin University,China
– sequence: 3
  givenname: Xiangyu
  surname: Zhang
  fullname: Zhang, Xiangyu
  email: xyzhang@cs.purdue.edu
  organization: Purdue University,Department of Computer Science,USA
BookMark eNotj91Kw0AUhFdRsK19Ar3YF0g8e_Yn2csSqw1UKqjXZZOc1UiyCUks9O0N6M0Mw8cMzJJdhS4QY3cCYiHAPmzettog2hgBZQwgjLlga5vYVGqQaK1Rl2yBRslI6ARv2HIcvwH0HJIF2x1CNH1R5Jszz9t-6E51-OSvNPhuaF0oiXeePxL1POsq4i-zNCM_1Y7nof-ZZhS6epw7t-zau2ak9b-v2MfT9j3bRfvDc55t9pHDVE1RWYGplPXeS-kKJ9Kq0IW3BQE5cpoQARJMElei8paEsZJUhV5rQ5aMkCt2_7dbE9GxH-rWDeejAJz_Ki1_Ae5ZTr0
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ASE56229.2023.00166
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore Digital Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350329964
EISSN 2643-1572
EndPage 572
ExternalDocumentID 10298345
Genre orig-research
GrantInformation_xml – fundername: NSF
  grantid: 1901242,1910300
  funderid: 10.13039/100000001
– fundername: National Natural Science Foundation of China
  grantid: 62322208,62002256
  funderid: 10.13039/501100001809
– fundername: CCF
  funderid: 10.13039/100000143
– fundername: CAST
  funderid: 10.13039/100010097
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IM
6IN
6J9
AAJGR
AAWTH
ABLEC
ACREN
ADYOE
ADZIZ
AFYQB
ALMA_UNASSIGNED_HOLDINGS
AMTXH
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-a284t-cd06d49fff33aba18db5bf9be0eaea5e22007277ac24f9e1693e4d2f556e9e613
IEDL.DBID RIE
ISICitedReferencesCount 6
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001103357200045&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:32:28 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a284t-cd06d49fff33aba18db5bf9be0eaea5e22007277ac24f9e1693e4d2f556e9e613
PageCount 13
ParticipantIDs ieee_primary_10298345
PublicationCentury 2000
PublicationDate 2023-Sept.-11
PublicationDateYYYYMMDD 2023-09-11
PublicationDate_xml – month: 09
  year: 2023
  text: 2023-Sept.-11
  day: 11
PublicationDecade 2020
PublicationTitle IEEE/ACM International Conference on Automated Software Engineering : [proceedings]
PublicationTitleAbbrev ASE
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0051577
ssib057256115
Score 2.313577
Snippet Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these...
SourceID ieee
SourceType Publisher
StartPage 560
SubjectTerms Code Model
Codes
Deep Learning
Input Denoising
Location awareness
Noise measurement
Noise reduction
Predictive models
Semantics
Syntactics
Title On-the-fly Improving Performance of Deep Code Models via Input Denoising
URI https://ieeexplore.ieee.org/document/10298345
WOSCitedRecordID wos001103357200045&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NawMhEB2a0ENP6UdKv_HQq-3q6roeS5qQXtJAW8gtuOsIgbAbkk2g_7662SS99NCbKIiMI09G33sAj8opbq0RNLYmpkJmmqYiT2hijct9VsvImNpsQo1G6WSixw1ZvebCIGL9-QyfQrN-y7dlvg6lMn_CuU5jIVvQUkptyVq75JHKgzdj-7uvx2mlGpkhFunnl4--h3oeuCk8iJqyoIv4y1ClxpNB558rOYXugZlHxnvMOYMjLM6hs7NmIM1JvYDhe0H91Y66-TfZ1w3I-MASIKUjr4gL0istkuCINl-RzcyQt8JP5YeKchbKCF34GvQ_e0PamCZQ45GmormNEiu0cy6OTWZYajOZOZ1hhAaNRB6Kk1wpk3PhNAYtFhSWOykT1OjB_RLaRVngFRCN2qJhLkozFHEidM4jw7UIml6cK34N3RCZ6WKrizHdBeXmj_5bOAnBD78tGLuDdrVc4z0c55tqtlo-1Lv5A42CnyU
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NawIxFAytLbQn-2Hpd3PoNe0mm2w2x2IVpdYKteBNspsXEGRXdBX675usq_bSQ28hgRBeEubxkplB6FFayYzRnIRGh4SLRJGYpxGJjLapO9Ui0Lo0m5D9fjwaqUFFVi-5MABQfj6DJ98s3_JNni59qczdcKbikIt9dCA4Z3RN19ocHyEdfFO6zX4dUktZCQ3RQD2_fLYc2DPPTmFe1pR6ZcRfliolorTr_1zLCWrsuHl4sEWdU7QH2Rmqb8wZcHVXz1HnIyMuuSN2-o23lQM82PEEcG7xK8AMN3MD2HuiTRd4NdG4m7mp3FCWT3whoYG-2q1hs0Mq2wSiHdYUJDVBZLiy1oahTjSNTSISqxIIQIMWwHx5kkmpU8atAq_GAtwwK0QEChy8X6BalmdwibACZUBTG8QJ8DDiKmWBZop7VS_GJLtCDR-Z8WytjDHeBOX6j_4HdNQZvvfGvW7_7QYd-43wfy8ovUW1Yr6EO3SYrorJYn5f7uwPBsOibA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=On-the-fly+Improving+Performance+of+Deep+Code+Models+via+Input+Denoising&rft.au=Tian%2C+Zhao&rft.au=Chen%2C+Junjie&rft.au=Zhang%2C+Xiangyu&rft.date=2023-09-11&rft.pub=IEEE&rft.eissn=2643-1572&rft.spage=560&rft.epage=572&rft_id=info:doi/10.1109%2FASE56229.2023.00166&rft.externalDocID=10298345