Large Language Models for Test-Free Fault Localization
Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL techniques assume the provision of input tests, and often require extensive program analysis, program instrumentation, or data preprocessing. Prior w...
Uloženo v:
| Vydáno v: | Proceedings / International Conference on Software Engineering s. 165 - 176 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
14.04.2024
|
| Témata: | |
| ISSN: | 1558-1225 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL techniques assume the provision of input tests, and often require extensive program analysis, program instrumentation, or data preprocessing. Prior work on deep learning for APR struggles to learn from small datasets and produces limited results on real-world programs. Inspired by the ability of large language models (LLMs) of code to adapt to new tasks based on very few examples, we investigate the applicability of LLMs to line level fault localization. Specifically, we propose to overcome the left-to-right nature of LLMs by fine-tuning a small set of bidirectional adapter layers on top of the representations learned by LLMs to produce LLMAO, the first language model based fault localization approach that locates buggy lines of code without any test coverage information. We fine-tune LLMs with 350 million, 6 billion, and 16 billion parameters on small, manually curated corpora of buggy programs such as the Defects4\mathcal{J} corpus. We observe that our technique achieves substantially more confidence in fault localization when built on the larger models, with bug localization performance scaling consistently with the LLM size. Our empirical evaluation shows that LLMAO improves the Top-1 results over the state-of-the-art machine learning fault localization (MLFL) baselines by 2.3%-54.4%, and Top-5 results by 14.4%-35.6%. LLMAO is also the first FL technique trained using a language model architecture that can detect security vulnerabilities down to the code line level. |
|---|---|
| AbstractList | Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL techniques assume the provision of input tests, and often require extensive program analysis, program instrumentation, or data preprocessing. Prior work on deep learning for APR struggles to learn from small datasets and produces limited results on real-world programs. Inspired by the ability of large language models (LLMs) of code to adapt to new tasks based on very few examples, we investigate the applicability of LLMs to line level fault localization. Specifically, we propose to overcome the left-to-right nature of LLMs by fine-tuning a small set of bidirectional adapter layers on top of the representations learned by LLMs to produce LLMAO, the first language model based fault localization approach that locates buggy lines of code without any test coverage information. We fine-tune LLMs with 350 million, 6 billion, and 16 billion parameters on small, manually curated corpora of buggy programs such as the Defects4\mathcal{J} corpus. We observe that our technique achieves substantially more confidence in fault localization when built on the larger models, with bug localization performance scaling consistently with the LLM size. Our empirical evaluation shows that LLMAO improves the Top-1 results over the state-of-the-art machine learning fault localization (MLFL) baselines by 2.3%-54.4%, and Top-5 results by 14.4%-35.6%. LLMAO is also the first FL technique trained using a language model architecture that can detect security vulnerabilities down to the code line level. |
| Author | Goues, Claire Le Yang, Aidan Z.H. Martins, Ruben Hellendoorn, Vincent J. |
| Author_xml | – sequence: 1 givenname: Aidan Z.H. surname: Yang fullname: Yang, Aidan Z.H. email: aidan@cmu.edu organization: Carnegie Mellon University,Pittsburgh,United States – sequence: 2 givenname: Claire Le surname: Goues fullname: Goues, Claire Le email: clegoues@cs.cmu.edu organization: Carnegie Mellon University,Pittsburgh,United States – sequence: 3 givenname: Ruben surname: Martins fullname: Martins, Ruben email: rubenm@cs.cmu.edu organization: Carnegie Mellon University,Pittsburgh,United States – sequence: 4 givenname: Vincent J. surname: Hellendoorn fullname: Hellendoorn, Vincent J. email: vhellendoorn@cmu.edu organization: Carnegie Mellon University,Pittsburgh,United States |
| BookMark | eNotj0FLw0AQhVdRsNacvXjIH0jd2ZnJbo5SrAoRL_VcJptJCcREkvSgv96I8g7vgwcfvGtz0Q-9GnMLdgNAfI9ceLa4wdwhkjszSeGLQNZ668DTuVkBc8jAOb4yyTS1lWVC9jnhyuSljEdNS-mPJ1ngdai1m9JmGNO9TnO2G1XTnZy6OS2HKF37LXM79DfmspFu0uS_1-Z997jfPmfl29PL9qHMxKFzGUokS5iTypKCKhu9DzFoXqNC_bvFiA2yusiuqSDEGqkOIIEqpgbX5u7P26rq4XNsP2T8OsByIECB-AMzpEdO |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK ESBDL RIE RIO |
| DOI | 10.1145/3597503.3623342 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Open Access Journals IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798400702174 |
| EISSN | 1558-1225 |
| EndPage | 176 |
| ExternalDocumentID | 10548193 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: ANI grantid: 045917 funderid: 10.13039/501100007434 |
| GroupedDBID | -~X .4S .DC 29O 5VS 6IE 6IF 6IH 6IK 6IL 6IM 6IN 8US AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS ARCSS AVWKF BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO EDO ESBDL FEDTE I-F IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-a2322-3ac404364eaeae94b0c778c8e6d3e1d4043cc3f35e2c52fb18cd34d81a84b54f3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 01:52:38 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a2322-3ac404364eaeae94b0c778c8e6d3e1d4043cc3f35e2c52fb18cd34d81a84b54f3 |
| OpenAccessLink | https://ieeexplore.ieee.org/document/10548193 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_10548193 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-April-14 |
| PublicationDateYYYYMMDD | 2024-04-14 |
| PublicationDate_xml | – month: 04 year: 2024 text: 2024-April-14 day: 14 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings / International Conference on Software Engineering |
| PublicationTitleAbbrev | ICSE |
| PublicationYear | 2024 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssib054357643 ssib055306466 ssj0006499 |
| Score | 2.6019065 |
| Snippet | Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 165 |
| SubjectTerms | Adaptation models Codes Computer bugs Computing methodologies → Neural networks Deep learning Location awareness Manuals Software and its engineering → Software functional properties Training |
| Title | Large Language Models for Test-Free Fault Localization |
| URI | https://ieeexplore.ieee.org/document/10548193 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8MgEEVN1KFT-pGq32LoShoMGDNXtTpEUYZUyhZhOKRKUVLlo7-_dw5pu3SovGAsIQsM7xnu3WPs0YEKMtgoGuuM0FVjRVMlKaIyyWtZIKqE1mzCjsfVbOYmWazeamEAoA0-gwEV27P8uAo72irDGY78GhlHh3WstXux1uHjMYj79lduKbLDKTVxlbwsl8jtc24fqc2TQiZthmqAC7hSZNT-y1ylxZa698-3OmX9H5Uen3zjzxk7guU56x1sGnietResHFG0Nx_lnUlO9meLDUe2yqeICaJeA_Da7xZbPiJky8rMPnurX6bPryLbJQhf0C-l8oFS5ZQaPF5ON8NgbRUqKKMCGelZCCopA0UwRWpwFKLSsZK-0o3RSV2y7nK1hCvGI97rYhhSoiA2H7ApEvk6p7VLUIZr1qd-mH_sM2LMD11w80f9LTspkAzQKYzUd6y7Xe_gnh2Hz-37Zv3QjuMXDb-avw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB60Cnqqj4pvc_C6tdkk-ziLS8W19FCht5LHBITSSh_-fjPbVL14kL0su7CEySbfl2S--QDuSxSW29wlJi9VIguTJ6bwPHFCeS15GlDFNmYT-WBQjMflMIrVGy0MIjbJZ9il2-Ys383tmrbKwggP_Dowjl3YU1KmfCPX2v4-KiB__qu6FBniZJLYSpyYs8DuY3UfLtWDCFxa9UQ3TOFCkFX7L3uVBl2q9j_bdQSdH50eG34j0DHs4OwE2lujBhbH7SlkNeV7szruTTIyQJsuWeCrbBRQIakWiKzS6-mK1YRtUZvZgbfqafTYT6JhQqJTWlQKbalYTiZRh6uUpmfzvLAFZk4gd_TOWuGFwtSq1JvQD05IV3BdSKOkF2fQms1neA7MGYpwz3pPaWzahk-RzLcspSw9ZvYCOhSHycemJsZkG4LLP57fwUF_9FpP6ufByxUcpoEa0JkMl9fQWi3WeAP79nP1vlzcNn36BU9RngY |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Large+Language+Models+for+Test-Free+Fault+Localization&rft.au=Yang%2C+Aidan+Z.H.&rft.au=Goues%2C+Claire+Le&rft.au=Martins%2C+Ruben&rft.au=Hellendoorn%2C+Vincent+J.&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=1558-1225&rft.spage=165&rft.epage=176&rft_id=info:doi/10.1145%2F3597503.3623342&rft.externalDocID=10548193 |