Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework
Automated program repair (APR) is a key technique for enhancing software maintenance productivity by fixing buggy code automatically. Recently, large code language models (CLMs) have exhibited impressive capabilities in code generation. However, for complex programming tasks, especially program repa...
Saved in:
| Published in: | Proceedings - Conference on Software Maintenance (1987) pp. 136 - 146 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.10.2023
|
| Subjects: | |
| ISSN: | 2576-3148 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Automated program repair (APR) is a key technique for enhancing software maintenance productivity by fixing buggy code automatically. Recently, large code language models (CLMs) have exhibited impressive capabilities in code generation. However, for complex programming tasks, especially program repair, the success rate of CLMs is still low. One of the reasons is that CLMs are typically developed for general purpose and their potential for APR applications has yet to be fully explored. In this paper, we propose APRFiT, a general curricular fine-tuning framework that improves the success rate of CLMs for APR. Firstly, APRFiT generates syntactically diverse but semantically equivalent bug-fixing programs via code augmentation operators to enrich the diversity of bug-fixing dataset automatically. Secondly, APRFiT designs a curriculum learning-based mechanism to help CLMs develop deep understanding of program semantics from these augmented bug-fixing code variants and improve the effectiveness of fine-tuning for APR tasks. We implement APRFiT on different CLMs and evaluate them on Bugs2Fix small and medium datasets. The extensive experiments demonstrate that, the existing CLMs implemented with APRFiT substantially outperform original models and generate 2.5 to 14.5 percent more correct patches than baselines both effectively and efficiently. |
|---|---|
| AbstractList | Automated program repair (APR) is a key technique for enhancing software maintenance productivity by fixing buggy code automatically. Recently, large code language models (CLMs) have exhibited impressive capabilities in code generation. However, for complex programming tasks, especially program repair, the success rate of CLMs is still low. One of the reasons is that CLMs are typically developed for general purpose and their potential for APR applications has yet to be fully explored. In this paper, we propose APRFiT, a general curricular fine-tuning framework that improves the success rate of CLMs for APR. Firstly, APRFiT generates syntactically diverse but semantically equivalent bug-fixing programs via code augmentation operators to enrich the diversity of bug-fixing dataset automatically. Secondly, APRFiT designs a curriculum learning-based mechanism to help CLMs develop deep understanding of program semantics from these augmented bug-fixing code variants and improve the effectiveness of fine-tuning for APR tasks. We implement APRFiT on different CLMs and evaluate them on Bugs2Fix small and medium datasets. The extensive experiments demonstrate that, the existing CLMs implemented with APRFiT substantially outperform original models and generate 2.5 to 14.5 percent more correct patches than baselines both effectively and efficiently. |
| Author | Shi, Xianjun Liu, Hongwei Hao, Sichong Shu, Yanjun |
| Author_xml | – sequence: 1 givenname: Sichong surname: Hao fullname: Hao, Sichong email: schao@stu.hit.edu.cn organization: Harbin Institute of Technology,Faculty of Computing,China – sequence: 2 givenname: Xianjun surname: Shi fullname: Shi, Xianjun email: shixianjun@hit.edu.cn organization: Harbin Institute of Technology,Faculty of Computing,China – sequence: 3 givenname: Hongwei surname: Liu fullname: Liu, Hongwei email: liuhw@hit.edu.cn organization: Harbin Institute of Technology,Faculty of Computing,China – sequence: 4 givenname: Yanjun surname: Shu fullname: Shu, Yanjun email: yjshu@hit.edu.cn organization: Harbin Institute of Technology,Faculty of Computing,China |
| BookMark | eNotjNFKwzAYRqMouM29gUJeoDPJnybppZRtDjYUddcjTf7W6JaOdEX29lb06uN8HM6YXMU2IiH3nM04Z8XDqnzbzHNjpJoJJmDGGBPygkwLXRjIGQhtgF2Skci1yoBLc0PGXffJWC41yBHZzuOHjS7EhpatR7q2seltg3Qz0L6jdZvoS2qbZA_0FY82JFqdadmnFFy_t4kuQsTs1MffwmKw8LtNX7fkurb7Dqf_OyHbxfy9fMrWz8tV-bjOgmDylEllZWUqIeqKCwTBhfNYW-O1QqUdk9wpL_RwVDZXAwNwB8L7QqGrmIcJufvrBkTcHVM42HTecQagAAr4AWlfVDM |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ICSME58846.2023.00024 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798350327830 |
| EISSN | 2576-3148 |
| EndPage | 146 |
| ExternalDocumentID | 10336339 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Key Research and Development Program of China funderid: 10.13039/501100012166 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IPLJI OCL RIE RIL RIO RNS |
| ID | FETCH-LOGICAL-i204t-46a4b8b22fb12e3212cdefa8d76e67c041c6d27a8dba56c04331c32dd96ecb0d3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 6 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001125977500012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:23:03 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i204t-46a4b8b22fb12e3212cdefa8d76e67c041c6d27a8dba56c04331c32dd96ecb0d3 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_10336339 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Oct.-1 |
| PublicationDateYYYYMMDD | 2023-10-01 |
| PublicationDate_xml | – month: 10 year: 2023 text: 2023-Oct.-1 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings - Conference on Software Maintenance (1987) |
| PublicationTitleAbbrev | ICSME |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0054734 |
| Score | 1.9350325 |
| Snippet | Automated program repair (APR) is a key technique for enhancing software maintenance productivity by fixing buggy code automatically. Recently, large code... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 136 |
| SubjectTerms | Codes Computer architecture Computer bugs Curriculum Learning Large Language Models of Code Productivity Program Repair Semantics Software maintenance Training |
| Title | Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework |
| URI | https://ieeexplore.ieee.org/document/10336339 |
| WOSCitedRecordID | wos001125977500012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62ePBUHxXf5OA1unk02ZyXLgpaClroreS1WpCtbLeC_97Mdle9eJBcQg4JzJDMMPm--RC6lkKNqEkpKSyTRLiCEa3ciDjraBzOieAasQk1maTzuZ62ZPWGCxNCaMBn4QamzV--X7kNlMriDedccq57qKeU2pK1umcXNHRFS9Ghib69z54ex8DCBBwCgzamCbDaf0moNBEkH_zz7H00_OHi4el3lDlAO6E8RINOjAG3d_MIzcblK_TOKF9wtvIBP7SFSAxqZ29rHJNT2AfAWDhm3WZZYfuJs7YCaCqcx3yT1Buok-C8g2wN0SwfP2d3pNVMIEuWiJoIaYRNLWOFpSzwGJicD4VJvZJBKpcI6qRnKi5YM5IO2pdRx5n3WgZnE8-PUb9cleEEYZ8acJfgidPC0MIobzX8qmqlQ7TrKRqCmRbv27YYi85CZ3-sn6M98MQWCXeB-nW1CZdo133Uy3V11TjzCyyQoYo |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5aBT3VR8W3OXhd3WSzyea8tLTYloIt9FbyWi3IVrZbwX9vZrurXjxILiGHBGZIZph833wI3XMmYqISEmSa8oCZjAZSmDgw2hA_jGHOVGITYjxO5nM5qcnqFRfGOVeBz9wDTKu_fLsyGyiV-RseRTyK5C7aixmjZEvXah5eUNFlNUmHhPJxkD6PusDDBCQChUamIfDaf4moVDGk1_7n6Ueo88PGw5PvOHOMdlx-gtqNHAOub-cpmnXzV-iekb_gdGUdHtalSAx6Z29r7NNT2AfgWNjn3WpZYP2J07oGqArc8xlnUG6gUoJ7DWirg2a97jTtB7VqQrCkISsDxhXTiaY004S6yIcmY12mEiu448KEjBhuqfALWsXcQAMzYiJqreTO6NBGZ6iVr3J3jrBNFDiMRaGRTJFMCasl_KtKIZ236wXqgJkW79vGGIvGQpd_rN-hg_50NFwMB-OnK3QIXtni4q5Rqyw27gbtm49yuS5uK8d-AcLVpNE |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+Conference+on+Software+Maintenance+%281987%29&rft.atitle=Enhancing+Code+Language+Models+for+Program+Repair+by+Curricular+Fine-tuning+Framework&rft.au=Hao%2C+Sichong&rft.au=Shi%2C+Xianjun&rft.au=Liu%2C+Hongwei&rft.au=Shu%2C+Yanjun&rft.date=2023-10-01&rft.pub=IEEE&rft.eissn=2576-3148&rft.spage=136&rft.epage=146&rft_id=info:doi/10.1109%2FICSME58846.2023.00024&rft.externalDocID=10336339 |