Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework

Automated program repair (APR) is a key technique for enhancing software maintenance productivity by fixing buggy code automatically. Recently, large code language models (CLMs) have exhibited impressive capabilities in code generation. However, for complex programming tasks, especially program repa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings - Conference on Software Maintenance (1987) S. 136 - 146
Hauptverfasser: Hao, Sichong, Shi, Xianjun, Liu, Hongwei, Shu, Yanjun
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.10.2023
Schlagworte:
ISSN:2576-3148
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Automated program repair (APR) is a key technique for enhancing software maintenance productivity by fixing buggy code automatically. Recently, large code language models (CLMs) have exhibited impressive capabilities in code generation. However, for complex programming tasks, especially program repair, the success rate of CLMs is still low. One of the reasons is that CLMs are typically developed for general purpose and their potential for APR applications has yet to be fully explored. In this paper, we propose APRFiT, a general curricular fine-tuning framework that improves the success rate of CLMs for APR. Firstly, APRFiT generates syntactically diverse but semantically equivalent bug-fixing programs via code augmentation operators to enrich the diversity of bug-fixing dataset automatically. Secondly, APRFiT designs a curriculum learning-based mechanism to help CLMs develop deep understanding of program semantics from these augmented bug-fixing code variants and improve the effectiveness of fine-tuning for APR tasks. We implement APRFiT on different CLMs and evaluate them on Bugs2Fix small and medium datasets. The extensive experiments demonstrate that, the existing CLMs implemented with APRFiT substantially outperform original models and generate 2.5 to 14.5 percent more correct patches than baselines both effectively and efficiently.
AbstractList Automated program repair (APR) is a key technique for enhancing software maintenance productivity by fixing buggy code automatically. Recently, large code language models (CLMs) have exhibited impressive capabilities in code generation. However, for complex programming tasks, especially program repair, the success rate of CLMs is still low. One of the reasons is that CLMs are typically developed for general purpose and their potential for APR applications has yet to be fully explored. In this paper, we propose APRFiT, a general curricular fine-tuning framework that improves the success rate of CLMs for APR. Firstly, APRFiT generates syntactically diverse but semantically equivalent bug-fixing programs via code augmentation operators to enrich the diversity of bug-fixing dataset automatically. Secondly, APRFiT designs a curriculum learning-based mechanism to help CLMs develop deep understanding of program semantics from these augmented bug-fixing code variants and improve the effectiveness of fine-tuning for APR tasks. We implement APRFiT on different CLMs and evaluate them on Bugs2Fix small and medium datasets. The extensive experiments demonstrate that, the existing CLMs implemented with APRFiT substantially outperform original models and generate 2.5 to 14.5 percent more correct patches than baselines both effectively and efficiently.
Author Shi, Xianjun
Liu, Hongwei
Hao, Sichong
Shu, Yanjun
Author_xml – sequence: 1
  givenname: Sichong
  surname: Hao
  fullname: Hao, Sichong
  email: schao@stu.hit.edu.cn
  organization: Harbin Institute of Technology,Faculty of Computing,China
– sequence: 2
  givenname: Xianjun
  surname: Shi
  fullname: Shi, Xianjun
  email: shixianjun@hit.edu.cn
  organization: Harbin Institute of Technology,Faculty of Computing,China
– sequence: 3
  givenname: Hongwei
  surname: Liu
  fullname: Liu, Hongwei
  email: liuhw@hit.edu.cn
  organization: Harbin Institute of Technology,Faculty of Computing,China
– sequence: 4
  givenname: Yanjun
  surname: Shu
  fullname: Shu, Yanjun
  email: yjshu@hit.edu.cn
  organization: Harbin Institute of Technology,Faculty of Computing,China
BookMark eNotjNFKwzAYRqMouM29gUJeoDPJnybppZRtDjYUddcjTf7W6JaOdEX29lb06uN8HM6YXMU2IiH3nM04Z8XDqnzbzHNjpJoJJmDGGBPygkwLXRjIGQhtgF2Skci1yoBLc0PGXffJWC41yBHZzuOHjS7EhpatR7q2seltg3Qz0L6jdZvoS2qbZA_0FY82JFqdadmnFFy_t4kuQsTs1MffwmKw8LtNX7fkurb7Dqf_OyHbxfy9fMrWz8tV-bjOgmDylEllZWUqIeqKCwTBhfNYW-O1QqUdk9wpL_RwVDZXAwNwB8L7QqGrmIcJufvrBkTcHVM42HTecQagAAr4AWlfVDM
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICSME58846.2023.00024
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350327830
EISSN 2576-3148
EndPage 146
ExternalDocumentID 10336339
Genre orig-research
GrantInformation_xml – fundername: National Key Research and Development Program of China
  funderid: 10.13039/501100012166
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IPLJI
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i204t-46a4b8b22fb12e3212cdefa8d76e67c041c6d27a8dba56c04331c32dd96ecb0d3
IEDL.DBID RIE
ISICitedReferencesCount 6
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001125977500012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:23:03 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-46a4b8b22fb12e3212cdefa8d76e67c041c6d27a8dba56c04331c32dd96ecb0d3
PageCount 11
ParticipantIDs ieee_primary_10336339
PublicationCentury 2000
PublicationDate 2023-Oct.-1
PublicationDateYYYYMMDD 2023-10-01
PublicationDate_xml – month: 10
  year: 2023
  text: 2023-Oct.-1
  day: 01
PublicationDecade 2020
PublicationTitle Proceedings - Conference on Software Maintenance (1987)
PublicationTitleAbbrev ICSME
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0054734
Score 1.9350325
Snippet Automated program repair (APR) is a key technique for enhancing software maintenance productivity by fixing buggy code automatically. Recently, large code...
SourceID ieee
SourceType Publisher
StartPage 136
SubjectTerms Codes
Computer architecture
Computer bugs
Curriculum Learning
Large Language Models of Code
Productivity
Program Repair
Semantics
Software maintenance
Training
Title Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework
URI https://ieeexplore.ieee.org/document/10336339
WOSCitedRecordID wos001125977500012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JS8NAFB5s8eCpLhV35uA1mlk6k5xDg4KWghZ6K7O8aEFSSVPBf--8NFEvHryFOSQwj8lb5lsIuU6YK7wNnapT1kRSmFGUGiYiXViujWfAR7Yxm9CTSTKfp9OWrN5wYQCgAZ_BDT42d_l-5TY4KgsnXAglRNojPa31lqzV_XbRQ1e2FB0Wp7f32dPjGFmYiEPgKGMaI6v9l4VKk0HywT-_vU-GP1w8Ov3OMgdkB8pDMujMGGh7No_IbFy-onZG-UKzlQf60A4iKbqdva1pKE7xPQjGoqHqNsuK2k-atRNAU9E81JtRvcE5Cc07yNaQzPLxc3YXtZ4J0ZLHso6kMtImlvPCMg4iJCbnoTCJ1wqUdrFkTvkQhcRbM1IO5cuYE9z7VIGzsRfHpF-uSjgh1EsDHDsSX4xQFcvGLoFUikKytAhd0ikZ4jYt3reyGItuh87-WD8nexiJLRLugvTragOXZNd91Mt1ddUE8wsSY6FP
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JS8NAFB60CnqqS8XdOXiNZpZMknNoaLEtBVvorcwWLUgiaSr4752XJurFg7cwhwTmMXnLfAtC9xHRmVGuU9VCSY8zGXixJMwLM0VDaYilgarNJsLJJFos4mlDVq-5MNbaGnxmH-Cxvss3hd7AqMydcMYEY_Eu2gs4p2RL12p_vOCiyxuSDvHjx2HyPO4DDxOQCBSETH3gtf8yUalzSNr959ePUO-HjYen33nmGO3Y_AR1WzsG3JzOUzTv56-gnpG_4KQwFo-aUSQGv7O3NXblKbwH4FjY1d1yVWL1iZNmBihLnLqK06s2MCnBaQva6qF52p8lA69xTfBW1OeVx4XkKlKUZopQy1xq0sZmMjKhsCLUPidaGBeHyCgZCA0CZkQzakwsrFa-YWeokxe5PUfYcGkp9CQmC0AXS_k6sjFnGSdx5vqkC9SDbVq-b4Uxlu0OXf6xfocOBrPxaDkaTp6u0CFEZYuLu0adqtzYG7SvP6rVurytA_sFpdKklg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+Conference+on+Software+Maintenance+%281987%29&rft.atitle=Enhancing+Code+Language+Models+for+Program+Repair+by+Curricular+Fine-tuning+Framework&rft.au=Hao%2C+Sichong&rft.au=Shi%2C+Xianjun&rft.au=Liu%2C+Hongwei&rft.au=Shu%2C+Yanjun&rft.date=2023-10-01&rft.pub=IEEE&rft.eissn=2576-3148&rft.spage=136&rft.epage=146&rft_id=info:doi/10.1109%2FICSME58846.2023.00024&rft.externalDocID=10336339