GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph
The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion...
Uloženo v:
| Vydáno v: | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 570 - 581 |
|---|---|
| Hlavní autoři: | , , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
27.10.2024
|
| Témata: | |
| ISSN: | 2643-1572 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.CCS Concepts* Software and its engineering → Search-based software engineering; * Information systems → Language models; Query representation; * Mathematics of computing → Graph algorithms. |
|---|---|
| AbstractList | The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.CCS Concepts* Software and its engineering → Search-based software engineering; * Information systems → Language models; Query representation; * Mathematics of computing → Graph algorithms. |
| Author | Wang, Qianxiang Liu, Wei Zhang, Wei Shen, Bo Yu, Ailun Zhao, Haiyan Jin, Zhi Zan, Daoguang |
| Author_xml | – sequence: 1 givenname: Wei surname: Liu fullname: Liu, Wei email: weiliu@stu.pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 2 givenname: Ailun surname: Yu fullname: Yu, Ailun email: yuailun@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 3 givenname: Daoguang surname: Zan fullname: Zan, Daoguang email: daoguang@iscas.ac.cn organization: Chinese Academy of Sciences,Institute of Software,Beijing,China – sequence: 4 givenname: Bo surname: Shen fullname: Shen, Bo email: shenbo21@huawei.com organization: Huawei Cloud Computing Technologies Co., Ltd.,Beijing,China – sequence: 5 givenname: Wei surname: Zhang fullname: Zhang, Wei email: zhangw.sei@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 6 givenname: Haiyan surname: Zhao fullname: Zhao, Haiyan email: zhhy.sei@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 7 givenname: Zhi surname: Jin fullname: Jin, Zhi email: zhijin@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 8 givenname: Qianxiang surname: Wang fullname: Wang, Qianxiang email: wangqianxiang@huawei.com organization: Huawei Cloud Computing Technologies Co., Ltd.,Beijing,China |
| BookMark | eNotT0tLAzEQjqJgrT178ZA_kJp3dr1pqVVYEETPJbuZ2MA2u2RDsf_eqL3MNwPfa67RRRwiIHTL6JIxqe6FrpnmdFlQUSXP0KI2dSUpNYzLypyjGddSEKYMv0KLaQotLavSjOkZGjbJjrvV4CA94HXc2diF-IXfYRymkId0JA0coMe_jDL2Yw85DBEfgi2nTROQPBAfIhRNTgEOtsdPdgKHC-ukihm-M_5LukGX3vYTLE44R5_P64_VC2neNq-rx4bY0jkTpyUA9XXFnLeuppW3nHnVOte1vGpNJzUVlAlLNfcCZCuMk46XR01VuU6LObr79w0AsB1T2Nt03DJqtKK0Fj-bQlxV |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3691620.3695054 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798400712487 |
| EISSN | 2643-1572 |
| EndPage | 581 |
| ExternalDocumentID | 10765009 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-a248t-d64ee0f981dfad908fa21f5bddcb28b7c4603013a062f3e4b37d4d2264788dc63 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001353105400046&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Jan 15 06:20:43 EST 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a248t-d64ee0f981dfad908fa21f5bddcb28b7c4603013a062f3e4b37d4d2264788dc63 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_10765009 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-Oct.-27 |
| PublicationDateYYYYMMDD | 2024-10-27 |
| PublicationDate_xml | – month: 10 year: 2024 text: 2024-Oct.-27 day: 27 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] |
| PublicationTitleAbbrev | ASE |
| PublicationYear | 2024 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssib057256116 ssj0051577 |
| Score | 2.2926986 |
| Snippet | The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 570 |
| SubjectTerms | Accuracy Code completion Code graphs Codes Computational modeling Filtering Information systems Large language model Mathematical models Process control Retrieval augmented generation Software Software algorithms Software engineering |
| Title | GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph |
| URI | https://ieeexplore.ieee.org/document/10765009 |
| WOSCitedRecordID | wos001353105400046&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYmAqH0V8ywOrIbEdO2GkamGoqg4gdav8CV0S1Kb9_dwlKYiBgS2JbCVy7Lx38b13hNxpnwPNMJ6pPCRMCmuZAV7CTLQy44kLxjY-sxM9nebzeTHrxOqNFiaE0CSfhXs8bPbyfeU2-KsMVrgGQoFyvX2tVSvW2k2eTAN4p8h12s8w4LTWnZdPKrMHoYAIcYhRVQGgL38VU2mwZNz_51MckcGPKo_OvvHmmOyF8oT0d2UZaLdKT0n1jCbUQyxy9khH5Qc6apTvFKn2eol76myCmUIUW1Dsjv7bVUm3SwOnEOgGVlcsAv2EPlhvCyYjfQK08xRadb1KzBihzZ0G5G08eh2-sK6uAjNc5jXzSoaQxAKoajS-SPJoeBoz672zPLfaSYWBkjCJ4lEEaYX20qPiFuJl75Q4I72yKsM5od4aVQghM5tGKQuXCyu4kVxqjlKseEEGOICLz9Y6Y7Ebu8s_rl-RQw6sAcGB62vSq1ebcEMO3LZerle3zQv_Alxgq8c |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVQQYKpfBTxjQdWQ2o7dsJIRSkiVB2K1K2yYxu6JKhN-_u5S1MQAwNbEtlK5Nh57-J79wi50S4BmmEcU4mPmBTWMgO8hJlgZcyj3Btb15nN9HCYTCbpqBGr11oY732dfOZv8bDey3dlvsRfZbDCNRAKlOtto3VWI9faTJ9YA3x3ke2sP8SA1Fo31Xy6Mr4TCqgQhyhVpQD78pedSo0m_fY_n2OfdH50eXT0jTgHZMsXh6S9MWagzTo9IuUTlqHuoc3ZPX0sPrCmRvFOkWwvZrirzjLMFaLYgmJ3rMBdFnQ1M3AKoa5nVckCEFDog45bMB3pA-Cdo9Cq6VVgzgit79Qhb_3HcW_AGmcFZrhMKuaU9D4KKZDVYFwaJcHwboitc7nlidW5VBgqCRMpHoSXVmgnHWpuIWJ2uRLHpFWUhT8h1FmjUiFkbLtByjRPhBXcSC41RzFWOCUdHMDp57p4xnQzdmd_XL8mu4PxazbNnocv52SPA4dAqOD6grSq-dJfkp18Vc0W86v65X8BfcGvEA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=GraphCoder%3A+Enhancing+Repository-Level+Code+Completion+via+Coarse-to-fine+Retrieval+Based+on+Code+Context+Graph&rft.au=Liu%2C+Wei&rft.au=Yu%2C+Ailun&rft.au=Zan%2C+Daoguang&rft.au=Shen%2C+Bo&rft.date=2024-10-27&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=570&rft.epage=581&rft_id=info:doi/10.1145%2F3691620.3695054&rft.externalDocID=10765009 |