GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph

The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion...

Full description

Saved in:
Bibliographic Details
Published in:IEEE/ACM International Conference on Automated Software Engineering : [proceedings] pp. 570 - 581
Main Authors: Liu, Wei, Yu, Ailun, Zan, Daoguang, Shen, Bo, Zhang, Wei, Zhao, Haiyan, Jin, Zhi, Wang, Qianxiang
Format: Conference Proceeding
Language:English
Published: ACM 27.10.2024
Subjects:
ISSN:2643-1572
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.CCS Concepts* Software and its engineering → Search-based software engineering; * Information systems → Language models; Query representation; * Mathematics of computing → Graph algorithms.
AbstractList The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.CCS Concepts* Software and its engineering → Search-based software engineering; * Information systems → Language models; Query representation; * Mathematics of computing → Graph algorithms.
Author Wang, Qianxiang
Liu, Wei
Zhang, Wei
Shen, Bo
Yu, Ailun
Zhao, Haiyan
Jin, Zhi
Zan, Daoguang
Author_xml – sequence: 1
  givenname: Wei
  surname: Liu
  fullname: Liu, Wei
  email: weiliu@stu.pku.edu.cn
  organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China
– sequence: 2
  givenname: Ailun
  surname: Yu
  fullname: Yu, Ailun
  email: yuailun@pku.edu.cn
  organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China
– sequence: 3
  givenname: Daoguang
  surname: Zan
  fullname: Zan, Daoguang
  email: daoguang@iscas.ac.cn
  organization: Chinese Academy of Sciences,Institute of Software,Beijing,China
– sequence: 4
  givenname: Bo
  surname: Shen
  fullname: Shen, Bo
  email: shenbo21@huawei.com
  organization: Huawei Cloud Computing Technologies Co., Ltd.,Beijing,China
– sequence: 5
  givenname: Wei
  surname: Zhang
  fullname: Zhang, Wei
  email: zhangw.sei@pku.edu.cn
  organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China
– sequence: 6
  givenname: Haiyan
  surname: Zhao
  fullname: Zhao, Haiyan
  email: zhhy.sei@pku.edu.cn
  organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China
– sequence: 7
  givenname: Zhi
  surname: Jin
  fullname: Jin, Zhi
  email: zhijin@pku.edu.cn
  organization: China School of Computer Science, PKU,Key Lab of High Confidence Software Technologies (PKU), MoE,Beijing,China
– sequence: 8
  givenname: Qianxiang
  surname: Wang
  fullname: Wang, Qianxiang
  email: wangqianxiang@huawei.com
  organization: Huawei Cloud Computing Technologies Co., Ltd.,Beijing,China
BookMark eNotT0tLAzEQjqJgrT178ZA_kJp3dr1pqVVYEETPJbuZ2MA2u2RDsf_eqL3MNwPfa67RRRwiIHTL6JIxqe6FrpnmdFlQUSXP0KI2dSUpNYzLypyjGddSEKYMv0KLaQotLavSjOkZGjbJjrvV4CA94HXc2diF-IXfYRymkId0JA0coMe_jDL2Yw85DBEfgi2nTROQPBAfIhRNTgEOtsdPdgKHC-ukihm-M_5LukGX3vYTLE44R5_P64_VC2neNq-rx4bY0jkTpyUA9XXFnLeuppW3nHnVOte1vGpNJzUVlAlLNfcCZCuMk46XR01VuU6LObr79w0AsB1T2Nt03DJqtKK0Fj-bQlxV
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3691620.3695054
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798400712487
EISSN 2643-1572
EndPage 581
ExternalDocumentID 10765009
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IM
6IN
6J9
AAJGR
AAWTH
ABLEC
ACREN
ADYOE
ADZIZ
AFYQB
ALMA_UNASSIGNED_HOLDINGS
AMTXH
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-a248t-d64ee0f981dfad908fa21f5bddcb28b7c4603013a062f3e4b37d4d2264788dc63
IEDL.DBID RIE
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001353105400046&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Jan 15 06:20:43 EST 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a248t-d64ee0f981dfad908fa21f5bddcb28b7c4603013a062f3e4b37d4d2264788dc63
PageCount 12
ParticipantIDs ieee_primary_10765009
PublicationCentury 2000
PublicationDate 2024-Oct.-27
PublicationDateYYYYMMDD 2024-10-27
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-Oct.-27
  day: 27
PublicationDecade 2020
PublicationTitle IEEE/ACM International Conference on Automated Software Engineering : [proceedings]
PublicationTitleAbbrev ASE
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib057256116
ssj0051577
Score 2.2927208
Snippet The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the...
SourceID ieee
SourceType Publisher
StartPage 570
SubjectTerms Accuracy
Code completion
Code graphs
Codes
Computational modeling
Filtering
Information systems
Large language model
Mathematical models
Process control
Retrieval augmented generation
Software
Software algorithms
Software engineering
Title GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph
URI https://ieeexplore.ieee.org/document/10765009
WOSCitedRecordID wos001353105400046&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYmAqH0V8ywOrIYkdO2akamFAVQeQulWOfYYuCSppfz93aQpiYGBLIp8SObbfO_vuHmM3LnVKRWlFqg06KNbhOliWiUiDl1qmEFUIrdiEmUyK2cxOu2T1NhcGANrgM7ily_YsP9R-RVtlOMMNEgpK19s1xmyStbaDJzcI3ilxnc0yjDhtTFfLJ1X5ndRIhDL0UbVF0Fe_xFRaLBn3__kVB2zwk5XHp994c8h2oDpi_a0sA-9m6TGrH6kI9ZBEzu75qHqnihrVGyeq_bmgM3XxTJFCnFpwMqf623XF1wuHt-jogmhqEZF-og3pbeFg5A-IdoFjq86qoogR3r5pwF7Ho5fhk-h0FYTLVNGIoBVAEi1S1eiCTYrosjTmZQi-zIrSeKXJUZIu0VmUoEppggqUcYv-cvBanrBeVVdwyrjLQeY2GJ1AoXKc_c5bUFEnyudI9dwZG1AHzj82pTPm2747_-P5BdvPkDUQOGTmkvWa5Qqu2J5fN4vP5XX7w78At-Grvg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZQQYKpPIp444HVEMeOnTBSUYooVYcidascP6BLgtq0v5-7NAUxMLAlkU-JHNvfd_bdfYTcGG6kDCJjXGlwUDID62CeR4w7K5TgPkjnarEJPRymk0k2apLV61wY730dfOZv8bI-y3elXeJWGcxwDYQC0_W2Eyljvk7X2gyfRAN8c2Q764UYkFrrppoPl8mdUECFYvBSVQawL3_JqdRo0mv_8zv2SecnL4-OvhHngGz54pC0N8IMtJmnR6R8wjLUXZQ5u6ePxQfW1CjeKZLtxQxP1dkAY4UotqBojhW4y4KuZgZuwdX1rCpZAAIKNqi4BcORPgDeOQqtGqsCY0Zo_aYOees9jrt91igrMBPLtGJOSe-jkAFZDcZlURpMzEOSO2fzOM21lQpdJWEiFQfhZS60kw5zbsFjdlaJY9IqysKfEGoSL5LMaRX5VCYw_43NvAwqkjYBsmdOSQc7cPq5Lp4x3fTd2R_Pr8luf_w6mA6ehy_nZC8GDoFQEesL0qrmS39Jduyqmi3mV_XP_wLUK68F
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=GraphCoder%3A+Enhancing+Repository-Level+Code+Completion+via+Coarse-to-fine+Retrieval+Based+on+Code+Context+Graph&rft.au=Liu%2C+Wei&rft.au=Yu%2C+Ailun&rft.au=Zan%2C+Daoguang&rft.au=Shen%2C+Bo&rft.date=2024-10-27&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=570&rft.epage=581&rft_id=info:doi/10.1145%2F3691620.3695054&rft.externalDocID=10765009