Language Models for Code Completion: A Practical Evaluation

Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public code language models when completing real-world code. We first...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings / International Conference on Software Engineering s. 956 - 968
Hlavní autoři:	Izadi, Maliheh, Katzy, Jonathan, van Dam, Tim, Otten, Marc, Popescu, Razvan Mihai, van Deursen, Arie
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	ACM 14.04.2024
Témata:	Analytical models Automatic Code Completion CodeGPT Codes Data models Evaluation IDE InCoder Language Models Open Source Predictive models Training Training data Transformers UniXcoder
ISSN:	1558-1225
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public code language models when completing real-world code. We first developed an open-source IDE extension, Code4Me, for the online evaluation of the models. We collected real auto-completion usage data for over a year from more than 1200 users, resulting in over 600K valid completions. These models were then evaluated using six standard metrics across twelve programming languages. Next, we conducted a qualitative study of 1690 real-world completion requests to identify the reasons behind the poor model performance. A comparative analysis of the models' performance in online and offline settings was also performed, using benchmark synthetic datasets and two masking strategies. Our findings suggest that while developers utilize code completion across various languages, the best results are achieved for mainstream languages such as Python and Java. InCoder outper-formed the other models across all programming languages, high-lighting the significance of training data and objectives. Our study also revealed that offline evaluations do not accurately reflect real-world scenarios. Upon qualitative analysis of the models' predictions, we found that 66.3% of failures were due to models' limitations, 24.4% occurred due to inappropriate model usage in a development context, and 9.3% were valid requests that developers overwrote. Given these findings, we propose several strategies to overcome the current limitations. These include refining training objectives, improving resilience to typographical errors, adopting hybrid approaches, and enhancing implementations and usability.
AbstractList	Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public code language models when completing real-world code. We first developed an open-source IDE extension, Code4Me, for the online evaluation of the models. We collected real auto-completion usage data for over a year from more than 1200 users, resulting in over 600K valid completions. These models were then evaluated using six standard metrics across twelve programming languages. Next, we conducted a qualitative study of 1690 real-world completion requests to identify the reasons behind the poor model performance. A comparative analysis of the models' performance in online and offline settings was also performed, using benchmark synthetic datasets and two masking strategies. Our findings suggest that while developers utilize code completion across various languages, the best results are achieved for mainstream languages such as Python and Java. InCoder outper-formed the other models across all programming languages, high-lighting the significance of training data and objectives. Our study also revealed that offline evaluations do not accurately reflect real-world scenarios. Upon qualitative analysis of the models' predictions, we found that 66.3% of failures were due to models' limitations, 24.4% occurred due to inappropriate model usage in a development context, and 9.3% were valid requests that developers overwrote. Given these findings, we propose several strategies to overcome the current limitations. These include refining training objectives, improving resilience to typographical errors, adopting hybrid approaches, and enhancing implementations and usability.
Author	Popescu, Razvan Mihai van Deursen, Arie Otten, Marc Katzy, Jonathan Izadi, Maliheh van Dam, Tim
Author_xml	– sequence: 1 givenname: Maliheh surname: Izadi fullname: Izadi, Maliheh email: m.izadi@tudelft.nl organization: Delft University of Technology,Delft,Netherlands – sequence: 2 givenname: Jonathan surname: Katzy fullname: Katzy, Jonathan email: j.b.katzy@tudelft.nl organization: Delft University of Technology,Delft,Netherlands – sequence: 3 givenname: Tim surname: van Dam fullname: van Dam, Tim email: t.o.vandam@student.tudelft.nl organization: Delft University of Technology,Delft,Netherlands – sequence: 4 givenname: Marc surname: Otten fullname: Otten, Marc email: m.j.c.otten@student.tudelft.nl organization: Delft University of Technology,Delft,Netherlands – sequence: 5 givenname: Razvan Mihai surname: Popescu fullname: Popescu, Razvan Mihai email: r.popescu-3@student.tudelft.nl organization: Delft University of Technology,Delft,Netherlands – sequence: 6 givenname: Arie surname: van Deursen fullname: van Deursen, Arie email: arie.vandeursen@tudelft.nl organization: Delft University of Technology,Delft,Netherlands
BookMark	eNotj01PwzAQRA0CiVJy5sLBfyDF9npjG05VVT6kIDjAudrE6ypSmlRJi8S_JxVcZp7m8KS5Fhdd37EQt1ottLZ4DxgcKlhAAUGDPxNZcMFbpZwy2tlzMdOIPtfG4JXIxrGpFFpAV1iYiceSuu2Rtizf-sjtKFM_yNWEU-z2LR-avnuQS_kxUH1oamrl-pvaI532G3GZqB05---5-Hpaf65e8vL9-XW1LHMyYHxuYwCiKqVYmFglbws0ziTtGJhjTJqciQp1QaR0jT5GWzkMjOCtq8nDXNz9eRtm3uyHZkfDz0ZPL3xwAL__R0ll
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK ESBDL RIE RIO
DOI	10.1145/3597503.3639138
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Open Access Journals IEEE/IET Electronic Library (IEL) (UW System Shared) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9798400702174
EISSN	1558-1225
EndPage	968
ExternalDocumentID	10548973
Genre	orig-research
GroupedDBID	-~X .4S .DC 29O 5VS 6IE 6IF 6IH 6IK 6IL 6IM 6IN 8US AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS ARCSS AVWKF BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO EDO ESBDL FEDTE I-F IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO
ID	FETCH-LOGICAL-a2328-4d93aabffd62dbf8465272f17e3eeddf1a72d0516aa01c58dd4b759e53847ca83
IEDL.DBID	RIE
IngestDate	Wed Aug 27 01:53:12 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a2328-4d93aabffd62dbf8465272f17e3eeddf1a72d0516aa01c58dd4b759e53847ca83
OpenAccessLink	https://ieeexplore.ieee.org/document/10548973
PageCount	13
ParticipantIDs	ieee_primary_10548973
PublicationCentury	2000
PublicationDate	2024-April-14
PublicationDateYYYYMMDD	2024-04-14
PublicationDate_xml	– month: 04 year: 2024 text: 2024-April-14 day: 14
PublicationDecade	2020
PublicationTitle	Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev	ICSE
PublicationYear	2024
Publisher	ACM
Publisher_xml	– name: ACM
SSID	ssib054357643 ssib055306466 ssj0006499
Score	2.4258568
Snippet	Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This...
SourceID	ieee
SourceType	Publisher
StartPage	956
SubjectTerms	Analytical models Automatic Code Completion CodeGPT Codes Data models Evaluation IDE InCoder Language Models Open Source Predictive models Training Training data Transformers UniXcoder
Title	Language Models for Code Completion: A Practical Evaluation
URI	https://ieeexplore.ieee.org/document/10548973
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED7RioGpPIp4ywNrShy_YphQ1YqhqjoA6lY5OVtCQinqg9_POU1LFwY2K8pgne98X-x83wdw72OiKKTkFUomknpSYlNVJqlBHwpdqLy2b3sfmfE4n07tpCGr11wY733985nvxWF9l4_zch2PyqjCCV9bI1rQMkZvyFrb5FHU982etlS0w9EyYpVmW9aE7RttHy7VgyAkrVLRE9SieWSn7Jmr1L1l2PnnrI6h-8vSY5Nd_zmBA1-dQmdr08Caqj2Dp1FzJsmi8dnnkhFOZX0asvhuFN-eV4_smW2ki2jN2GAnAd6Ft-Hgtf-SNJ4JiSNslCcSrXCuCAF1hkUgdKEykwVuvKDZYODOZEiFqJ1LealyRFkYZT3te9KULhfn0K7mlb8ApnWgeJYYqazSoqMPM2Wc4zpg5vPCXkI3BmP2tZHFmG3jcPXH82s4yggRxKsYLm-gvVqs_S0clt-rj-Xirl7MHyXRnB0
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED5BQYKpPIp444E1JY5fMUwItSoiVB0K6lY5sS0hoRT1we_nnKalCwObFWWwzne-L3a-7wO4dSFRhMXkZYJHHHtSpGNRRLGyzucyF2ll3_aeqX4_HY30oCarV1wY51z185lrh2F1l28nxSIclWGFI77Wim3DjuA8iZd0rVX6COz8akNdKhjiSB7QSr0xS0T3tboP5eKOIZYWMWszbNI08FM27FWq7tJt_nNeB9D65emRwboDHcKWK4-guTJqIHXdHsNDVp9KkmB99jkjiFTJEw5JeDfIb0_Ke_JIluJFuGqksxYBb8FbtzN86kW1a0JkEB2lEbeaGZN7b2Vic4_4QiQq8VQ5hrOxnhqVWCxFaUxMC5Fay3MltMOdj6vCpOwEGuWkdKdApPQYz8IGMivX1uCnmVDGUOlt4tJcn0ErBGP8tRTGGK_icP7H8xvY6w1fs3H23H-5gP0E8UG4mKH8Ehrz6cJdwW7xPf-YTa-rhf0BeCyfZA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Language+Models+for+Code+Completion%3A+A+Practical+Evaluation&rft.au=Izadi%2C+Maliheh&rft.au=Katzy%2C+Jonathan&rft.au=van+Dam%2C+Tim&rft.au=Otten%2C+Marc&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=1558-1225&rft.spage=956&rft.epage=968&rft_id=info:doi/10.1145%2F3597503.3639138&rft.externalDocID=10548973