GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph
The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion...
Saved in:
| Published in: | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] pp. 570 - 581 |
|---|---|
| Main Authors: | , , , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
ACM
27.10.2024
|
| Subjects: | |
| ISSN: | 2643-1572 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.CCS Concepts* Software and its engineering → Search-based software engineering; * Information systems → Language models; Query representation; * Mathematics of computing → Graph algorithms. |
|---|---|
| AbstractList | The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.CCS Concepts* Software and its engineering → Search-based software engineering; * Information systems → Language models; Query representation; * Mathematics of computing → Graph algorithms. |
| Author | Wang, Qianxiang Liu, Wei Zhang, Wei Shen, Bo Yu, Ailun Zhao, Haiyan Jin, Zhi Zan, Daoguang |
| Author_xml | – sequence: 1 givenname: Wei surname: Liu fullname: Liu, Wei email: weiliu@stu.pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 2 givenname: Ailun surname: Yu fullname: Yu, Ailun email: yuailun@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 3 givenname: Daoguang surname: Zan fullname: Zan, Daoguang email: daoguang@iscas.ac.cn organization: Chinese Academy of Sciences,Institute of Software,Beijing,China – sequence: 4 givenname: Bo surname: Shen fullname: Shen, Bo email: shenbo21@huawei.com organization: Huawei Cloud Computing Technologies Co., Ltd.,Beijing,China – sequence: 5 givenname: Wei surname: Zhang fullname: Zhang, Wei email: zhangw.sei@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 6 givenname: Haiyan surname: Zhao fullname: Zhao, Haiyan email: zhhy.sei@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 7 givenname: Zhi surname: Jin fullname: Jin, Zhi email: zhijin@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 8 givenname: Qianxiang surname: Wang fullname: Wang, Qianxiang email: wangqianxiang@huawei.com organization: Huawei Cloud Computing Technologies Co., Ltd.,Beijing,China |
| BookMark | eNotT0tLAzEQjqJgrT178ZA_kJp3dr1pqVVYEETPJbuZ2MA2u2RDsf_eqL3MNwPfa67RRRwiIHTL6JIxqe6FrpnmdFlQUSXP0KI2dSUpNYzLypyjGddSEKYMv0KLaQotLavSjOkZGjbJjrvV4CA94HXc2diF-IXfYRymkId0JA0coMe_jDL2Yw85DBEfgi2nTROQPBAfIhRNTgEOtsdPdgKHC-ukihm-M_5LukGX3vYTLE44R5_P64_VC2neNq-rx4bY0jkTpyUA9XXFnLeuppW3nHnVOte1vGpNJzUVlAlLNfcCZCuMk46XR01VuU6LObr79w0AsB1T2Nt03DJqtKK0Fj-bQlxV |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3691620.3695054 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798400712487 |
| EISSN | 2643-1572 |
| EndPage | 581 |
| ExternalDocumentID | 10765009 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-a248t-d64ee0f981dfad908fa21f5bddcb28b7c4603013a062f3e4b37d4d2264788dc63 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001353105400046&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Jan 15 06:20:43 EST 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a248t-d64ee0f981dfad908fa21f5bddcb28b7c4603013a062f3e4b37d4d2264788dc63 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_10765009 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-Oct.-27 |
| PublicationDateYYYYMMDD | 2024-10-27 |
| PublicationDate_xml | – month: 10 year: 2024 text: 2024-Oct.-27 day: 27 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] |
| PublicationTitleAbbrev | ASE |
| PublicationYear | 2024 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssib057256116 ssj0051577 |
| Score | 2.2927208 |
| Snippet | The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 570 |
| SubjectTerms | Accuracy Code completion Code graphs Codes Computational modeling Filtering Information systems Large language model Mathematical models Process control Retrieval augmented generation Software Software algorithms Software engineering |
| Title | GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph |
| URI | https://ieeexplore.ieee.org/document/10765009 |
| WOSCitedRecordID | wos001353105400046&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYmAqH0V8ywOrIYkdO2akamFAVQeQulWOfYYuCSppfz93aQpiYGBLIp8SObbfO_vuHmM3LnVKRWlFqg06KNbhOliWiUiDl1qmEFUIrdiEmUyK2cxOu2T1NhcGANrgM7ily_YsP9R-RVtlOMMNEgpK19s1xmyStbaDJzcI3ilxnc0yjDhtTFfLJ1X5ndRIhDL0UbVF0Fe_xFRaLBn3__kVB2zwk5XHp994c8h2oDpi_a0sA-9m6TGrH6kI9ZBEzu75qHqnihrVGyeq_bmgM3XxTJFCnFpwMqf623XF1wuHt-jogmhqEZF-og3pbeFg5A-IdoFjq86qoogR3r5pwF7Ho5fhk-h0FYTLVNGIoBVAEi1S1eiCTYrosjTmZQi-zIrSeKXJUZIu0VmUoEppggqUcYv-cvBanrBeVVdwyrjLQeY2GJ1AoXKc_c5bUFEnyudI9dwZG1AHzj82pTPm2747_-P5BdvPkDUQOGTmkvWa5Qqu2J5fN4vP5XX7w78At-Grvg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZQQYKpPIp444HVEMeOnTBSUYooVYcidascP6BLgtq0v5-7NAUxMLAlkU-JHNvfd_bdfYTcGG6kDCJjXGlwUDID62CeR4w7K5TgPkjnarEJPRymk0k2apLV61wY730dfOZv8bI-y3elXeJWGcxwDYQC0_W2Eyljvk7X2gyfRAN8c2Q764UYkFrrppoPl8mdUECFYvBSVQawL3_JqdRo0mv_8zv2SecnL4-OvhHngGz54pC0N8IMtJmnR6R8wjLUXZQ5u6ePxQfW1CjeKZLtxQxP1dkAY4UotqBojhW4y4KuZgZuwdX1rCpZAAIKNqi4BcORPgDeOQqtGqsCY0Zo_aYOees9jrt91igrMBPLtGJOSe-jkAFZDcZlURpMzEOSO2fzOM21lQpdJWEiFQfhZS60kw5zbsFjdlaJY9IqysKfEGoSL5LMaRX5VCYw_43NvAwqkjYBsmdOSQc7cPq5Lp4x3fTd2R_Pr8luf_w6mA6ehy_nZC8GDoFQEesL0qrmS39Jduyqmi3mV_XP_wLUK68F |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=GraphCoder%3A+Enhancing+Repository-Level+Code+Completion+via+Coarse-to-fine+Retrieval+Based+on+Code+Context+Graph&rft.au=Liu%2C+Wei&rft.au=Yu%2C+Ailun&rft.au=Zan%2C+Daoguang&rft.au=Shen%2C+Bo&rft.date=2024-10-27&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=570&rft.epage=581&rft_id=info:doi/10.1145%2F3691620.3695054&rft.externalDocID=10765009 |