Large Language Models for Test-Free Fault Localization

Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL techniques assume the provision of input tests, and often require extensive program analysis, program instrumentation, or data preprocessing. Prior w...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings / International Conference on Software Engineering s. 165 - 176
Hlavní autoři: Yang, Aidan Z.H., Goues, Claire Le, Martins, Ruben, Hellendoorn, Vincent J.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 14.04.2024
Témata:
ISSN:1558-1225
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL techniques assume the provision of input tests, and often require extensive program analysis, program instrumentation, or data preprocessing. Prior work on deep learning for APR struggles to learn from small datasets and produces limited results on real-world programs. Inspired by the ability of large language models (LLMs) of code to adapt to new tasks based on very few examples, we investigate the applicability of LLMs to line level fault localization. Specifically, we propose to overcome the left-to-right nature of LLMs by fine-tuning a small set of bidirectional adapter layers on top of the representations learned by LLMs to produce LLMAO, the first language model based fault localization approach that locates buggy lines of code without any test coverage information. We fine-tune LLMs with 350 million, 6 billion, and 16 billion parameters on small, manually curated corpora of buggy programs such as the Defects4\mathcal{J} corpus. We observe that our technique achieves substantially more confidence in fault localization when built on the larger models, with bug localization performance scaling consistently with the LLM size. Our empirical evaluation shows that LLMAO improves the Top-1 results over the state-of-the-art machine learning fault localization (MLFL) baselines by 2.3%-54.4%, and Top-5 results by 14.4%-35.6%. LLMAO is also the first FL technique trained using a language model architecture that can detect security vulnerabilities down to the code line level.
AbstractList Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL techniques assume the provision of input tests, and often require extensive program analysis, program instrumentation, or data preprocessing. Prior work on deep learning for APR struggles to learn from small datasets and produces limited results on real-world programs. Inspired by the ability of large language models (LLMs) of code to adapt to new tasks based on very few examples, we investigate the applicability of LLMs to line level fault localization. Specifically, we propose to overcome the left-to-right nature of LLMs by fine-tuning a small set of bidirectional adapter layers on top of the representations learned by LLMs to produce LLMAO, the first language model based fault localization approach that locates buggy lines of code without any test coverage information. We fine-tune LLMs with 350 million, 6 billion, and 16 billion parameters on small, manually curated corpora of buggy programs such as the Defects4\mathcal{J} corpus. We observe that our technique achieves substantially more confidence in fault localization when built on the larger models, with bug localization performance scaling consistently with the LLM size. Our empirical evaluation shows that LLMAO improves the Top-1 results over the state-of-the-art machine learning fault localization (MLFL) baselines by 2.3%-54.4%, and Top-5 results by 14.4%-35.6%. LLMAO is also the first FL technique trained using a language model architecture that can detect security vulnerabilities down to the code line level.
Author Goues, Claire Le
Yang, Aidan Z.H.
Martins, Ruben
Hellendoorn, Vincent J.
Author_xml – sequence: 1
  givenname: Aidan Z.H.
  surname: Yang
  fullname: Yang, Aidan Z.H.
  email: aidan@cmu.edu
  organization: Carnegie Mellon University,Pittsburgh,United States
– sequence: 2
  givenname: Claire Le
  surname: Goues
  fullname: Goues, Claire Le
  email: clegoues@cs.cmu.edu
  organization: Carnegie Mellon University,Pittsburgh,United States
– sequence: 3
  givenname: Ruben
  surname: Martins
  fullname: Martins, Ruben
  email: rubenm@cs.cmu.edu
  organization: Carnegie Mellon University,Pittsburgh,United States
– sequence: 4
  givenname: Vincent J.
  surname: Hellendoorn
  fullname: Hellendoorn, Vincent J.
  email: vhellendoorn@cmu.edu
  organization: Carnegie Mellon University,Pittsburgh,United States
BookMark eNotj0FLw0AQhVdRsNacvXjIH0jd2ZnJbo5SrAoRL_VcJptJCcREkvSgv96I8g7vgwcfvGtz0Q-9GnMLdgNAfI9ceLa4wdwhkjszSeGLQNZ668DTuVkBc8jAOb4yyTS1lWVC9jnhyuSljEdNS-mPJ1ngdai1m9JmGNO9TnO2G1XTnZy6OS2HKF37LXM79DfmspFu0uS_1-Z997jfPmfl29PL9qHMxKFzGUokS5iTypKCKhu9DzFoXqNC_bvFiA2yusiuqSDEGqkOIIEqpgbX5u7P26rq4XNsP2T8OsByIECB-AMzpEdO
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
ESBDL
RIE
RIO
DOI 10.1145/3597503.3623342
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Open Access Journals
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798400702174
EISSN 1558-1225
EndPage 176
ExternalDocumentID 10548193
Genre orig-research
GrantInformation_xml – fundername: ANI
  grantid: 045917
  funderid: 10.13039/501100007434
GroupedDBID -~X
.4S
.DC
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
ESBDL
FEDTE
I-F
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-a2322-3ac404364eaeae94b0c778c8e6d3e1d4043cc3f35e2c52fb18cd34d81a84b54f3
IEDL.DBID RIE
IngestDate Wed Aug 27 01:52:38 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a2322-3ac404364eaeae94b0c778c8e6d3e1d4043cc3f35e2c52fb18cd34d81a84b54f3
OpenAccessLink https://ieeexplore.ieee.org/document/10548193
PageCount 12
ParticipantIDs ieee_primary_10548193
PublicationCentury 2000
PublicationDate 2024-April-14
PublicationDateYYYYMMDD 2024-04-14
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-April-14
  day: 14
PublicationDecade 2020
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib054357643
ssib055306466
ssj0006499
Score 2.6019065
Snippet Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL...
SourceID ieee
SourceType Publisher
StartPage 165
SubjectTerms Adaptation models
Codes
Computer bugs
Computing methodologies → Neural networks
Deep learning
Location awareness
Manuals
Software and its engineering → Software functional properties
Training
Title Large Language Models for Test-Free Fault Localization
URI https://ieeexplore.ieee.org/document/10548193
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8MgEEVN1KFT-pGq32LoShoMGDNXtTpEUYZUyhZhOKRKUVLlo7-_dw5pu3SovGAsIQsM7xnu3WPs0YEKMtgoGuuM0FVjRVMlKaIyyWtZIKqE1mzCjsfVbOYmWazeamEAoA0-gwEV27P8uAo72irDGY78GhlHh3WstXux1uHjMYj79lduKbLDKTVxlbwsl8jtc24fqc2TQiZthmqAC7hSZNT-y1ylxZa698-3OmX9H5Uen3zjzxk7guU56x1sGnietResHFG0Nx_lnUlO9meLDUe2yqeICaJeA_Da7xZbPiJky8rMPnurX6bPryLbJQhf0C-l8oFS5ZQaPF5ON8NgbRUqKKMCGelZCCopA0UwRWpwFKLSsZK-0o3RSV2y7nK1hCvGI97rYhhSoiA2H7ApEvk6p7VLUIZr1qd-mH_sM2LMD11w80f9LTspkAzQKYzUd6y7Xe_gnh2Hz-37Zv3QjuMXDb-avw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB60Cnqqj4pvc_C6tdkk-ziLS8W19FCht5LHBITSSh_-fjPbVL14kL0su7CEySbfl2S--QDuSxSW29wlJi9VIguTJ6bwPHFCeS15GlDFNmYT-WBQjMflMIrVGy0MIjbJZ9il2-Ys383tmrbKwggP_Dowjl3YU1KmfCPX2v4-KiB__qu6FBniZJLYSpyYs8DuY3UfLtWDCFxa9UQ3TOFCkFX7L3uVBl2q9j_bdQSdH50eG34j0DHs4OwE2lujBhbH7SlkNeV7szruTTIyQJsuWeCrbBRQIakWiKzS6-mK1YRtUZvZgbfqafTYT6JhQqJTWlQKbalYTiZRh6uUpmfzvLAFZk4gd_TOWuGFwtSq1JvQD05IV3BdSKOkF2fQms1neA7MGYpwz3pPaWzahk-RzLcspSw9ZvYCOhSHycemJsZkG4LLP57fwUF_9FpP6ufByxUcpoEa0JkMl9fQWi3WeAP79nP1vlzcNn36BU9RngY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Large+Language+Models+for+Test-Free+Fault+Localization&rft.au=Yang%2C+Aidan+Z.H.&rft.au=Goues%2C+Claire+Le&rft.au=Martins%2C+Ruben&rft.au=Hellendoorn%2C+Vincent+J.&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=1558-1225&rft.spage=165&rft.epage=176&rft_id=info:doi/10.1145%2F3597503.3623342&rft.externalDocID=10548193