GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph
The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion...
Uloženo v:
| Vydáno v: | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 570 - 581 |
|---|---|
| Hlavní autoři: | , , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
27.10.2024
|
| Témata: | |
| ISSN: | 2643-1572 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.CCS Concepts* Software and its engineering → Search-based software engineering; * Information systems → Language models; Query representation; * Mathematics of computing → Graph algorithms. |
|---|---|
| AbstractList | The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.CCS Concepts* Software and its engineering → Search-based software engineering; * Information systems → Language models; Query representation; * Mathematics of computing → Graph algorithms. |
| Author | Wang, Qianxiang Liu, Wei Zhang, Wei Shen, Bo Yu, Ailun Zhao, Haiyan Jin, Zhi Zan, Daoguang |
| Author_xml | – sequence: 1 givenname: Wei surname: Liu fullname: Liu, Wei email: weiliu@stu.pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 2 givenname: Ailun surname: Yu fullname: Yu, Ailun email: yuailun@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 3 givenname: Daoguang surname: Zan fullname: Zan, Daoguang email: daoguang@iscas.ac.cn organization: Chinese Academy of Sciences,Institute of Software,Beijing,China – sequence: 4 givenname: Bo surname: Shen fullname: Shen, Bo email: shenbo21@huawei.com organization: Huawei Cloud Computing Technologies Co., Ltd.,Beijing,China – sequence: 5 givenname: Wei surname: Zhang fullname: Zhang, Wei email: zhangw.sei@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 6 givenname: Haiyan surname: Zhao fullname: Zhao, Haiyan email: zhhy.sei@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 7 givenname: Zhi surname: Jin fullname: Jin, Zhi email: zhijin@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 8 givenname: Qianxiang surname: Wang fullname: Wang, Qianxiang email: wangqianxiang@huawei.com organization: Huawei Cloud Computing Technologies Co., Ltd.,Beijing,China |
| BookMark | eNotT0tLAzEQjqJgrT178ZA_kJp3dr1pqVVYEETPJbuZ2MA2u2RDsf_eqL3MNwPfa67RRRwiIHTL6JIxqe6FrpnmdFlQUSXP0KI2dSUpNYzLypyjGddSEKYMv0KLaQotLavSjOkZGjbJjrvV4CA94HXc2diF-IXfYRymkId0JA0coMe_jDL2Yw85DBEfgi2nTROQPBAfIhRNTgEOtsdPdgKHC-ukihm-M_5LukGX3vYTLE44R5_P64_VC2neNq-rx4bY0jkTpyUA9XXFnLeuppW3nHnVOte1vGpNJzUVlAlLNfcCZCuMk46XR01VuU6LObr79w0AsB1T2Nt03DJqtKK0Fj-bQlxV |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3691620.3695054 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798400712487 |
| EISSN | 2643-1572 |
| EndPage | 581 |
| ExternalDocumentID | 10765009 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-a248t-d64ee0f981dfad908fa21f5bddcb28b7c4603013a062f3e4b37d4d2264788dc63 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001353105400046&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Jan 15 06:20:43 EST 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a248t-d64ee0f981dfad908fa21f5bddcb28b7c4603013a062f3e4b37d4d2264788dc63 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_10765009 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-Oct.-27 |
| PublicationDateYYYYMMDD | 2024-10-27 |
| PublicationDate_xml | – month: 10 year: 2024 text: 2024-Oct.-27 day: 27 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] |
| PublicationTitleAbbrev | ASE |
| PublicationYear | 2024 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssib057256116 ssj0051577 |
| Score | 2.2926023 |
| Snippet | The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 570 |
| SubjectTerms | Accuracy Code completion Code graphs Codes Computational modeling Filtering Information systems Large language model Mathematical models Process control Retrieval augmented generation Software Software algorithms Software engineering |
| Title | GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph |
| URI | https://ieeexplore.ieee.org/document/10765009 |
| WOSCitedRecordID | wos001353105400046&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagYmAqjyLe8sBqSGzHjhmpWhhQ1QGkblVin6FLgkra389dmoIYGNiSyKdEfuT7zr7vjrGbPPPOhySKqCiptk-jcLlJRUT2qnURNUDZFpuwk0k-m7lpJ1ZvtTAA0AafwS1dtmf5ofYr2irDFW6RUJBcb9dasxFrbSdPZhG8U-I6m98w4rS1XS6fVGd3yiARkuijGoegr38VU2mxZNz_51ccsMGPKo9Pv_HmkO1AdcT627IMvFulx6x-pCTUQypyds9H1Ttl1KjeOFHtzwWdqYtnihTi1IKTOeXfriu-XhR4i44uiKYWEekn2lC9LZyM_AHRLnBs1VlVFDHC2zcN2Ot49DJ8El1dBVFInTciGByBJDqkqrEILsljIdOYlSH4Uual9dqQo6SKxMioQJfKBh1IcYv-cvBGnbBeVVdwyrjMIOaqTGKaBg0WCuMdZF4hygFI787YgDpw_rFJnTHf9t35H88v2L5E1kDgIO0l6zXLFVyxPb9uFp_L63bAvwC6EKzH |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZQQYKpPIp444HVkDiOnTBSUYooVYcidasS-wxdEtSm_f3cpSmIgYEtiXxK5Ee-7-z77hi7SWKbWhd44SNKqm1DL9JEh8Ije1Uq8wogr4tNmOEwmUzSUSNWr7UwAFAHn8EtXdZn-a60S9oqwxVukFCQXG87VkoGa7nWZvrEBuE7JLaz_hEjUhvTZPMJVXwXaaRCEr1UnSLsq1_lVGo06bX_-R37rPOjy-Ojb8Q5YFtQHLL2pjADb9bpESufKA11l8qc3fPH4oNyahTvnMj2Ykan6mJAsUKcWnAypwzcZcFXswxv0dUFUZXCIwFFG6q4hdORPyDeOY6tGquCYkZ4_aYOe-s9jrt90VRWEJlUSSWcxjEIfIpk1WcuDRKfydDHuXM2l0lurNLkKkVZoKWPQOWRccqR5hY9Zmd1dMxaRVnACeMyBp9EeeDD0CkwkGmbQmwjxDkAadNT1qEOnH6uk2dMN3139sfza7bbH78OpoPn4cs525PIIQgqpLlgrWq-hEu2Y1fVbDG_qgf_C-NosA4 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=GraphCoder%3A+Enhancing+Repository-Level+Code+Completion+via+Coarse-to-fine+Retrieval+Based+on+Code+Context+Graph&rft.au=Liu%2C+Wei&rft.au=Yu%2C+Ailun&rft.au=Zan%2C+Daoguang&rft.au=Shen%2C+Bo&rft.date=2024-10-27&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=570&rft.epage=581&rft_id=info:doi/10.1145%2F3691620.3695054&rft.externalDocID=10765009 |