GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph
The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion...
Saved in:
| Published in: | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] pp. 570 - 581 |
|---|---|
| Main Authors: | , , , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
ACM
27.10.2024
|
| Subjects: | |
| ISSN: | 2643-1572 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.CCS Concepts* Software and its engineering → Search-based software engineering; * Information systems → Language models; Query representation; * Mathematics of computing → Graph algorithms. |
|---|---|
| AbstractList | The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.CCS Concepts* Software and its engineering → Search-based software engineering; * Information systems → Language models; Query representation; * Mathematics of computing → Graph algorithms. |
| Author | Wang, Qianxiang Liu, Wei Zhang, Wei Shen, Bo Yu, Ailun Zhao, Haiyan Jin, Zhi Zan, Daoguang |
| Author_xml | – sequence: 1 givenname: Wei surname: Liu fullname: Liu, Wei email: weiliu@stu.pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 2 givenname: Ailun surname: Yu fullname: Yu, Ailun email: yuailun@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 3 givenname: Daoguang surname: Zan fullname: Zan, Daoguang email: daoguang@iscas.ac.cn organization: Chinese Academy of Sciences,Institute of Software,Beijing,China – sequence: 4 givenname: Bo surname: Shen fullname: Shen, Bo email: shenbo21@huawei.com organization: Huawei Cloud Computing Technologies Co., Ltd.,Beijing,China – sequence: 5 givenname: Wei surname: Zhang fullname: Zhang, Wei email: zhangw.sei@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 6 givenname: Haiyan surname: Zhao fullname: Zhao, Haiyan email: zhhy.sei@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 7 givenname: Zhi surname: Jin fullname: Jin, Zhi email: zhijin@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 8 givenname: Qianxiang surname: Wang fullname: Wang, Qianxiang email: wangqianxiang@huawei.com organization: Huawei Cloud Computing Technologies Co., Ltd.,Beijing,China |
| BookMark | eNotT0tLAzEQjqJgrT178ZA_kJp3dr1pqVVYEETPJbuZ2MA2u2RDsf_eqL3MNwPfa67RRRwiIHTL6JIxqe6FrpnmdFlQUSXP0KI2dSUpNYzLypyjGddSEKYMv0KLaQotLavSjOkZGjbJjrvV4CA94HXc2diF-IXfYRymkId0JA0coMe_jDL2Yw85DBEfgi2nTROQPBAfIhRNTgEOtsdPdgKHC-ukihm-M_5LukGX3vYTLE44R5_P64_VC2neNq-rx4bY0jkTpyUA9XXFnLeuppW3nHnVOte1vGpNJzUVlAlLNfcCZCuMk46XR01VuU6LObr79w0AsB1T2Nt03DJqtKK0Fj-bQlxV |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3691620.3695054 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library (IEL) (UW System Shared) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798400712487 |
| EISSN | 2643-1572 |
| EndPage | 581 |
| ExternalDocumentID | 10765009 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-a248t-d64ee0f981dfad908fa21f5bddcb28b7c4603013a062f3e4b37d4d2264788dc63 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001353105400046&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Jan 15 06:20:43 EST 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a248t-d64ee0f981dfad908fa21f5bddcb28b7c4603013a062f3e4b37d4d2264788dc63 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_10765009 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-Oct.-27 |
| PublicationDateYYYYMMDD | 2024-10-27 |
| PublicationDate_xml | – month: 10 year: 2024 text: 2024-Oct.-27 day: 27 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] |
| PublicationTitleAbbrev | ASE |
| PublicationYear | 2024 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssib057256116 ssj0051577 |
| Score | 2.2926023 |
| Snippet | The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 570 |
| SubjectTerms | Accuracy Code completion Code graphs Codes Computational modeling Filtering Information systems Large language model Mathematical models Process control Retrieval augmented generation Software Software algorithms Software engineering |
| Title | GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph |
| URI | https://ieeexplore.ieee.org/document/10765009 |
| WOSCitedRecordID | wos001353105400046&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYmAqH0V8ywOrIbEdO2GkamFAVQeQulVOfIYuDippfz93aQpiYGBLojs5cuy892zfHWM30mZVUKkRZUCtSggtCpODUI7QuaicM6EtNmEnk3w2K6ZdsHobCwMA7eEzuKXLdi_f19WKlspwhlskFBSut2ut2QRrbQdPZhG8U-I6m98w4rS1XS6fVGd3yiARkqhRTYGgr38VU2mxZNz_51scsMFPVB6ffuPNIduBeMT627IMvJulx6x-pCTUQypyds9H8Z0yasQ3TlT7c0F76uKZTgpxsuDkTvm368jXC4e3KHRBNLUISD_Rh-pt4WDkD4h2nqNV5xXpxAhvWxqw1_HoZfgkuroKwkmdN8IbDZCEAqlqcL5I8uBkGrLS-6qUeWkrbUgoKZcYGRToUlmvPUXcol72lVEnrBfrCKeMp5B4RyngdJ5o8AobCGgeUEUVJRg4YwPqwPnHJnXGfNt35388v2D7ElkDgYO0l6zXLFdwxfaqdbP4XF63H_wLDLqrww |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVQQYKpfBTxjQdWQ-I4dsJIRSmiVB2K1K1y4jN0SVCb9vdzl6YgBga2JLqTI8fOe8_23TF2I02c-yjUIvOoVQmhRaoTEJEldE5za7Wvi02Y4TCZTNJRE6xex8IAQH34DG7pst7Ld2W-pKUynOEGCQWF623HSslgHa61GT6xQfgOie2sf8SI1MY02XxCFd9FGqmQRJWqU4R99aucSo0mvfY_32OfdX7i8vjoG3EO2BYUh6y9KczAm3l6xMonSkPdpTJn9_yx-KCcGsU7J7K9mNGuuhjQWSFOFpzcKQN3WfDVzOItSl0QVSk8ElD0oYpbOBz5A-Kd42jVeBV0ZoTXLXXYW-9x3O2LprKCsFIllXBaAQQ-RbLqrUuDxFsZ-jhzLs9kkplcaZJKkQ209BGoLDJOOYq5RcXsch0ds1ZRFnDCeAiBs5QETiWBAhdhAx7NPeqoNAMNp6xDHTj9XCfPmG767uyP59dstz9-HUwHz8OXc7YnkUMQVEhzwVrVfAmXbCdfVbPF_Kr--F8okK8K |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=GraphCoder%3A+Enhancing+Repository-Level+Code+Completion+via+Coarse-to-fine+Retrieval+Based+on+Code+Context+Graph&rft.au=Liu%2C+Wei&rft.au=Yu%2C+Ailun&rft.au=Zan%2C+Daoguang&rft.au=Shen%2C+Bo&rft.date=2024-10-27&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=570&rft.epage=581&rft_id=info:doi/10.1145%2F3691620.3695054&rft.externalDocID=10765009 |