GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph

The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE/ACM International Conference on Automated Software Engineering : [proceedings] pp. 570 - 581
Main Authors:	Liu, Wei, Yu, Ailun, Zan, Daoguang, Shen, Bo, Zhang, Wei, Zhao, Haiyan, Jin, Zhi, Wang, Qianxiang
Format:	Conference Proceeding
Language:	English
Published:	ACM 27.10.2024
Subjects:	Accuracy Code completion Code graphs Codes Computational modeling Filtering Information systems Large language model Mathematical models Process control Retrieval augmented generation Software Software algorithms Software engineering
ISSN:	2643-1572
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.CCS Concepts* Software and its engineering → Search-based software engineering; * Information systems → Language models; Query representation; * Mathematics of computing → Graph algorithms.
AbstractList	The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.CCS Concepts* Software and its engineering → Search-based software engineering; * Information systems → Language models; Query representation; * Mathematics of computing → Graph algorithms.
Author	Wang, Qianxiang Liu, Wei Zhang, Wei Shen, Bo Yu, Ailun Zhao, Haiyan Jin, Zhi Zan, Daoguang
Author_xml	– sequence: 1 givenname: Wei surname: Liu fullname: Liu, Wei email: weiliu@stu.pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 2 givenname: Ailun surname: Yu fullname: Yu, Ailun email: yuailun@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 3 givenname: Daoguang surname: Zan fullname: Zan, Daoguang email: daoguang@iscas.ac.cn organization: Chinese Academy of Sciences,Institute of Software,Beijing,China – sequence: 4 givenname: Bo surname: Shen fullname: Shen, Bo email: shenbo21@huawei.com organization: Huawei Cloud Computing Technologies Co., Ltd.,Beijing,China – sequence: 5 givenname: Wei surname: Zhang fullname: Zhang, Wei email: zhangw.sei@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 6 givenname: Haiyan surname: Zhao fullname: Zhao, Haiyan email: zhhy.sei@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 7 givenname: Zhi surname: Jin fullname: Jin, Zhi email: zhijin@pku.edu.cn organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China – sequence: 8 givenname: Qianxiang surname: Wang fullname: Wang, Qianxiang email: wangqianxiang@huawei.com organization: Huawei Cloud Computing Technologies Co., Ltd.,Beijing,China
BookMark	eNotT0tLAzEQjqJgrT178ZA_kJp3dr1pqVVYEETPJbuZ2MA2u2RDsf_eqL3MNwPfa67RRRwiIHTL6JIxqe6FrpnmdFlQUSXP0KI2dSUpNYzLypyjGddSEKYMv0KLaQotLavSjOkZGjbJjrvV4CA94HXc2diF-IXfYRymkId0JA0coMe_jDL2Yw85DBEfgi2nTROQPBAfIhRNTgEOtsdPdgKHC-ukihm-M_5LukGX3vYTLE44R5_P64_VC2neNq-rx4bY0jkTpyUA9XXFnLeuppW3nHnVOte1vGpNJzUVlAlLNfcCZCuMk46XR01VuU6LObr79w0AsB1T2Nt03DJqtKK0Fj-bQlxV
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1145/3691620.3695054
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9798400712487
EISSN	2643-1572
EndPage	581
ExternalDocumentID	10765009
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809
GroupedDBID	6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL
ID	FETCH-LOGICAL-a248t-d64ee0f981dfad908fa21f5bddcb28b7c4603013a062f3e4b37d4d2264788dc63
IEDL.DBID	RIE
ISICitedReferencesCount	1
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001353105400046&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Jan 15 06:20:43 EST 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a248t-d64ee0f981dfad908fa21f5bddcb28b7c4603013a062f3e4b37d4d2264788dc63
PageCount	12
ParticipantIDs	ieee_primary_10765009
PublicationCentury	2000
PublicationDate	2024-Oct.-27
PublicationDateYYYYMMDD	2024-10-27
PublicationDate_xml	– month: 10 year: 2024 text: 2024-Oct.-27 day: 27
PublicationDecade	2020
PublicationTitle	IEEE/ACM International Conference on Automated Software Engineering : [proceedings]
PublicationTitleAbbrev	ASE
PublicationYear	2024
Publisher	ACM
Publisher_xml	– name: ACM
SSID	ssib057256116 ssj0051577
Score	2.2927208
Snippet	The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the...
SourceID	ieee
SourceType	Publisher
StartPage	570
SubjectTerms	Accuracy Code completion Code graphs Codes Computational modeling Filtering Information systems Large language model Mathematical models Process control Retrieval augmented generation Software Software algorithms Software engineering
Title	GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph
URI	https://ieeexplore.ieee.org/document/10765009
WOSCitedRecordID	wos001353105400046&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYmAqH0V8ywOrIYkdO2akamFAVQeQulWOfYYuCSppfz93aQpiYGBLIp8SObbfO_vuHmM3LnVKRWlFqg06KNbhOliWiUiDl1qmEFUIrdiEmUyK2cxOu2T1NhcGANrgM7ily_YsP9R-RVtlOMMNEgpK19s1xmyStbaDJzcI3ilxnc0yjDhtTFfLJ1X5ndRIhDL0UbVF0Fe_xFRaLBn3__kVB2zwk5XHp994c8h2oDpi_a0sA-9m6TGrH6kI9ZBEzu75qHqnihrVGyeq_bmgM3XxTJFCnFpwMqf623XF1wuHt-jogmhqEZF-og3pbeFg5A-IdoFjq86qoogR3r5pwF7Ho5fhk-h0FYTLVNGIoBVAEi1S1eiCTYrosjTmZQi-zIrSeKXJUZIu0VmUoEppggqUcYv-cvBanrBeVVdwyrjLQeY2GJ1AoXKc_c5bUFEnyudI9dwZG1AHzj82pTPm2747_-P5BdvPkDUQOGTmkvWa5Qqu2J5fN4vP5XX7w78At-Grvg
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZQQYKpPIp444HVEMeOnTBSUYooVYcidascP6BLgtq0v5-7NAUxMLAlkU-JHNvfd_bdfYTcGG6kDCJjXGlwUDID62CeR4w7K5TgPkjnarEJPRymk0k2apLV61wY730dfOZv8bI-y3elXeJWGcxwDYQC0_W2Eyljvk7X2gyfRAN8c2Q764UYkFrrppoPl8mdUECFYvBSVQawL3_JqdRo0mv_8zv2SecnL4-OvhHngGz54pC0N8IMtJmnR6R8wjLUXZQ5u6ePxQfW1CjeKZLtxQxP1dkAY4UotqBojhW4y4KuZgZuwdX1rCpZAAIKNqi4BcORPgDeOQqtGqsCY0Zo_aYOees9jrt91igrMBPLtGJOSe-jkAFZDcZlURpMzEOSO2fzOM21lQpdJWEiFQfhZS60kw5zbsFjdlaJY9IqysKfEGoSL5LMaRX5VCYw_43NvAwqkjYBsmdOSQc7cPq5Lp4x3fTd2R_Pr8luf_w6mA6ehy_nZC8GDoFQEesL0qrmS39Jduyqmi3mV_XP_wLUK68F
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=GraphCoder%3A+Enhancing+Repository-Level+Code+Completion+via+Coarse-to-fine+Retrieval+Based+on+Code+Context+Graph&rft.au=Liu%2C+Wei&rft.au=Yu%2C+Ailun&rft.au=Zan%2C+Daoguang&rft.au=Shen%2C+Bo&rft.date=2024-10-27&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=570&rft.epage=581&rft_id=info:doi/10.1145%2F3691620.3695054&rft.externalDocID=10765009