Language Models for Code Completion: A Practical Evaluation
Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public code language models when completing real-world code. We first...
Uloženo v:
| Vydáno v: | Proceedings / International Conference on Software Engineering s. 956 - 968 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
14.04.2024
|
| Témata: | |
| ISSN: | 1558-1225 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public code language models when completing real-world code. We first developed an open-source IDE extension, Code4Me, for the online evaluation of the models. We collected real auto-completion usage data for over a year from more than 1200 users, resulting in over 600K valid completions. These models were then evaluated using six standard metrics across twelve programming languages. Next, we conducted a qualitative study of 1690 real-world completion requests to identify the reasons behind the poor model performance. A comparative analysis of the models' performance in online and offline settings was also performed, using benchmark synthetic datasets and two masking strategies. Our findings suggest that while developers utilize code completion across various languages, the best results are achieved for mainstream languages such as Python and Java. InCoder outper-formed the other models across all programming languages, high-lighting the significance of training data and objectives. Our study also revealed that offline evaluations do not accurately reflect real-world scenarios. Upon qualitative analysis of the models' predictions, we found that 66.3% of failures were due to models' limitations, 24.4% occurred due to inappropriate model usage in a development context, and 9.3% were valid requests that developers overwrote. Given these findings, we propose several strategies to overcome the current limitations. These include refining training objectives, improving resilience to typographical errors, adopting hybrid approaches, and enhancing implementations and usability. |
|---|---|
| AbstractList | Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qualitative assessments of three public code language models when completing real-world code. We first developed an open-source IDE extension, Code4Me, for the online evaluation of the models. We collected real auto-completion usage data for over a year from more than 1200 users, resulting in over 600K valid completions. These models were then evaluated using six standard metrics across twelve programming languages. Next, we conducted a qualitative study of 1690 real-world completion requests to identify the reasons behind the poor model performance. A comparative analysis of the models' performance in online and offline settings was also performed, using benchmark synthetic datasets and two masking strategies. Our findings suggest that while developers utilize code completion across various languages, the best results are achieved for mainstream languages such as Python and Java. InCoder outper-formed the other models across all programming languages, high-lighting the significance of training data and objectives. Our study also revealed that offline evaluations do not accurately reflect real-world scenarios. Upon qualitative analysis of the models' predictions, we found that 66.3% of failures were due to models' limitations, 24.4% occurred due to inappropriate model usage in a development context, and 9.3% were valid requests that developers overwrote. Given these findings, we propose several strategies to overcome the current limitations. These include refining training objectives, improving resilience to typographical errors, adopting hybrid approaches, and enhancing implementations and usability. |
| Author | Popescu, Razvan Mihai van Deursen, Arie Otten, Marc Katzy, Jonathan Izadi, Maliheh van Dam, Tim |
| Author_xml | – sequence: 1 givenname: Maliheh surname: Izadi fullname: Izadi, Maliheh email: m.izadi@tudelft.nl organization: Delft University of Technology,Delft,Netherlands – sequence: 2 givenname: Jonathan surname: Katzy fullname: Katzy, Jonathan email: j.b.katzy@tudelft.nl organization: Delft University of Technology,Delft,Netherlands – sequence: 3 givenname: Tim surname: van Dam fullname: van Dam, Tim email: t.o.vandam@student.tudelft.nl organization: Delft University of Technology,Delft,Netherlands – sequence: 4 givenname: Marc surname: Otten fullname: Otten, Marc email: m.j.c.otten@student.tudelft.nl organization: Delft University of Technology,Delft,Netherlands – sequence: 5 givenname: Razvan Mihai surname: Popescu fullname: Popescu, Razvan Mihai email: r.popescu-3@student.tudelft.nl organization: Delft University of Technology,Delft,Netherlands – sequence: 6 givenname: Arie surname: van Deursen fullname: van Deursen, Arie email: arie.vandeursen@tudelft.nl organization: Delft University of Technology,Delft,Netherlands |
| BookMark | eNotj01PwzAQRA0CiVJy5sLBfyDF9npjG05VVT6kIDjAudrE6ypSmlRJi8S_JxVcZp7m8KS5Fhdd37EQt1ottLZ4DxgcKlhAAUGDPxNZcMFbpZwy2tlzMdOIPtfG4JXIxrGpFFpAV1iYiceSuu2Rtizf-sjtKFM_yNWEU-z2LR-avnuQS_kxUH1oamrl-pvaI532G3GZqB05---5-Hpaf65e8vL9-XW1LHMyYHxuYwCiKqVYmFglbws0ziTtGJhjTJqciQp1QaR0jT5GWzkMjOCtq8nDXNz9eRtm3uyHZkfDz0ZPL3xwAL__R0ll |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK ESBDL RIE RIO |
| DOI | 10.1145/3597503.3639138 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Open Access Journals IEEE/IET Electronic Library (IEL) (UW System Shared) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798400702174 |
| EISSN | 1558-1225 |
| EndPage | 968 |
| ExternalDocumentID | 10548973 |
| Genre | orig-research |
| GroupedDBID | -~X .4S .DC 29O 5VS 6IE 6IF 6IH 6IK 6IL 6IM 6IN 8US AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS ARCSS AVWKF BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO EDO ESBDL FEDTE I-F IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-a2328-4d93aabffd62dbf8465272f17e3eeddf1a72d0516aa01c58dd4b759e53847ca83 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 01:53:12 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a2328-4d93aabffd62dbf8465272f17e3eeddf1a72d0516aa01c58dd4b759e53847ca83 |
| OpenAccessLink | https://ieeexplore.ieee.org/document/10548973 |
| PageCount | 13 |
| ParticipantIDs | ieee_primary_10548973 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-April-14 |
| PublicationDateYYYYMMDD | 2024-04-14 |
| PublicationDate_xml | – month: 04 year: 2024 text: 2024-April-14 day: 14 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings / International Conference on Software Engineering |
| PublicationTitleAbbrev | ICSE |
| PublicationYear | 2024 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssib054357643 ssib055306466 ssj0006499 |
| Score | 2.4258568 |
| Snippet | Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 956 |
| SubjectTerms | Analytical models Automatic Code Completion CodeGPT Codes Data models Evaluation IDE InCoder Language Models Open Source Predictive models Training Training data Transformers UniXcoder |
| Title | Language Models for Code Completion: A Practical Evaluation |
| URI | https://ieeexplore.ieee.org/document/10548973 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED7RioGpPIp4ywNrShy_YphQ1YqhqjoA6lY5OVtCQinqg9_POU1LFwY2K8pgne98X-x83wdw72OiKKTkFUomknpSYlNVJqlBHwpdqLy2b3sfmfE4n07tpCGr11wY733985nvxWF9l4_zch2PyqjCCV9bI1rQMkZvyFrb5FHU982etlS0w9EyYpVmW9aE7RttHy7VgyAkrVLRE9SieWSn7Jmr1L1l2PnnrI6h-8vSY5Nd_zmBA1-dQmdr08Caqj2Dp1FzJsmi8dnnkhFOZX0asvhuFN-eV4_smW2ki2jN2GAnAd6Ft-Hgtf-SNJ4JiSNslCcSrXCuCAF1hkUgdKEykwVuvKDZYODOZEiFqJ1LealyRFkYZT3te9KULhfn0K7mlb8ApnWgeJYYqazSoqMPM2Wc4zpg5vPCXkI3BmP2tZHFmG3jcPXH82s4yggRxKsYLm-gvVqs_S0clt-rj-Xirl7MHyXRnB0 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED5BQYKpPIp444E1JY5fMUwItSoiVB0K6lY5sS0hoRT1we_nnKalCwObFWWwzne-L3a-7wO4dSFRhMXkZYJHHHtSpGNRRLGyzucyF2ll3_aeqX4_HY30oCarV1wY51z185lrh2F1l28nxSIclWGFI77Wim3DjuA8iZd0rVX6COz8akNdKhjiSB7QSr0xS0T3tboP5eKOIZYWMWszbNI08FM27FWq7tJt_nNeB9D65emRwboDHcKWK4-guTJqIHXdHsNDVp9KkmB99jkjiFTJEw5JeDfIb0_Ke_JIluJFuGqksxYBb8FbtzN86kW1a0JkEB2lEbeaGZN7b2Vic4_4QiQq8VQ5hrOxnhqVWCxFaUxMC5Fay3MltMOdj6vCpOwEGuWkdKdApPQYz8IGMivX1uCnmVDGUOlt4tJcn0ErBGP8tRTGGK_icP7H8xvY6w1fs3H23H-5gP0E8UG4mKH8Ehrz6cJdwW7xPf-YTa-rhf0BeCyfZA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Language+Models+for+Code+Completion%3A+A+Practical+Evaluation&rft.au=Izadi%2C+Maliheh&rft.au=Katzy%2C+Jonathan&rft.au=van+Dam%2C+Tim&rft.au=Otten%2C+Marc&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=1558-1225&rft.spage=956&rft.epage=968&rft_id=info:doi/10.1145%2F3597503.3639138&rft.externalDocID=10548973 |