Learning to Represent Programs with Heterogeneous Graphs

Code representation, which transforms programs into vectors with semantics, is essential for source code processing. We have witnessed the effectiveness of incorporating structural information (i.e., graph) into code representations in recent years. Specifically, the abstract syntax tree (AST) and t...

Full description

Saved in:
Bibliographic Details
Published in:2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC) pp. 378 - 389
Main Authors: Zhang, Kechi, Wang, Wenhan, Zhang, Huangzhao, Li, Ge, Jin, Zhi
Format: Conference Proceeding
Language:English
Published: ACM 01.05.2022
Subjects:
ISSN:2643-7171
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Code representation, which transforms programs into vectors with semantics, is essential for source code processing. We have witnessed the effectiveness of incorporating structural information (i.e., graph) into code representations in recent years. Specifically, the abstract syntax tree (AST) and the AST-augmented graph of the program contain much structural and semantic information, and most existing studies apply them for code representation. The graph adopted by existing approaches is homogeneous, i.e., it discards the type information of the edges and the nodes lying within AST. That may cause plausible obstruction to the representation model. In this paper, we propose to leverage the type information in the graph for code representation. To be specific, we propose the heterogeneous program graph (HPG), which provides the types of the nodes and the edges explicitly. Furthermore, we employ the heterogeneous graph transformer (HGT) architecture to generate representations based on HPG, considering the type of information during processing. With the additional types in HPG, our approach can capture complex structural information, produce accurate and delicate representations, and finally perform well on certain tasks. Our in-depth evaluations upon four classic datasets for two typical tasks (i.e., method name prediction and code classification) demonstrate that the heterogeneous types in HPG benefit the representation models. Our proposed \text{HPG}+\text{HGT} also outperforms the SOTA baselines on the subject tasks and datasets.
AbstractList Code representation, which transforms programs into vectors with semantics, is essential for source code processing. We have witnessed the effectiveness of incorporating structural information (i.e., graph) into code representations in recent years. Specifically, the abstract syntax tree (AST) and the AST-augmented graph of the program contain much structural and semantic information, and most existing studies apply them for code representation. The graph adopted by existing approaches is homogeneous, i.e., it discards the type information of the edges and the nodes lying within AST. That may cause plausible obstruction to the representation model. In this paper, we propose to leverage the type information in the graph for code representation. To be specific, we propose the heterogeneous program graph (HPG), which provides the types of the nodes and the edges explicitly. Furthermore, we employ the heterogeneous graph transformer (HGT) architecture to generate representations based on HPG, considering the type of information during processing. With the additional types in HPG, our approach can capture complex structural information, produce accurate and delicate representations, and finally perform well on certain tasks. Our in-depth evaluations upon four classic datasets for two typical tasks (i.e., method name prediction and code classification) demonstrate that the heterogeneous types in HPG benefit the representation models. Our proposed \text{HPG}+\text{HGT} also outperforms the SOTA baselines on the subject tasks and datasets.
Author Li, Ge
Zhang, Kechi
Wang, Wenhan
Jin, Zhi
Zhang, Huangzhao
Author_xml – sequence: 1
  givenname: Kechi
  surname: Zhang
  fullname: Zhang, Kechi
  email: zhangkechi@pku.edu.cn
  organization: Peking University,China
– sequence: 2
  givenname: Wenhan
  surname: Wang
  fullname: Wang, Wenhan
  email: wwhjacob@hotmail.com
  organization: Nanyang Technological University,Singapore
– sequence: 3
  givenname: Huangzhao
  surname: Zhang
  fullname: Zhang, Huangzhao
  email: zhang_hz@pku.edu.cn
  organization: Peking University,China
– sequence: 4
  givenname: Ge
  surname: Li
  fullname: Li, Ge
  email: lige@pku.edu.cn
  organization: Peking University,China
– sequence: 5
  givenname: Zhi
  surname: Jin
  fullname: Jin, Zhi
  email: zhijin@pku.edu.cn
  organization: Peking University,China
BookMark eNotzE1LAzEUheEoCra1axdu8gdG830nSynaCgOK6LrcSe-0IzYzJBHx3ztQVw_nXZw5u4hDJMZupLiT0th7bZVx05gEL-wZm09VaK987c_ZTDmjK5Agr9gy508hhFZCG4AZqxvCFPu452XgbzQmyhQLf03DPuEx85--HPiGCk2BIg3fma8Tjod8zS47_Mq0_HfBPp4e31ebqnlZP68emgoV2FKhnrAyeGyJwDpSremCNCa4nQbcaVKBANFBcFhD6FrZkVPBChFAtJ1esNvTb09E2zH1R0y_Ww_eaSf1H5UnSM8
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3524610.3527905
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL) (UW System Shared)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1450392989
9781450392983
EISSN 2643-7171
EndPage 389
ExternalDocumentID 9796361
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62072007,62192733,61832009,62192730
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-a275t-a327551c9abee756e2b4fc144c6d37ad3e2ce7aa67c6a87cfb1fe62c500c70bf3
IEDL.DBID RIE
ISICitedReferencesCount 39
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000850204200036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:23:39 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a275t-a327551c9abee756e2b4fc144c6d37ad3e2ce7aa67c6a87cfb1fe62c500c70bf3
PageCount 12
ParticipantIDs ieee_primary_9796361
PublicationCentury 2000
PublicationDate 2022-May
PublicationDateYYYYMMDD 2022-05-01
PublicationDate_xml – month: 05
  year: 2022
  text: 2022-May
PublicationDecade 2020
PublicationTitle 2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)
PublicationTitleAbbrev ICPC
PublicationYear 2022
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0003203477
ssj0002870470
Score 2.122619
Snippet Code representation, which transforms programs into vectors with semantics, is essential for source code processing. We have witnessed the effectiveness of...
SourceID ieee
SourceType Publisher
StartPage 378
SubjectTerms code representation
Codes
Grammar
graph neural networks
heterogeneous graphs
Predictive models
Semantics
Syntactics
Transformers
Transforms
Title Learning to Represent Programs with Heterogeneous Graphs
URI https://ieeexplore.ieee.org/document/9796361
WOSCitedRecordID wos000850204200036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVKxcBUoEV8ywMjaRPb8cUzonRAVYUAdats54JYGtSm_H58bihCYmFyYilRZF90fnfv3jF2Q17RlpRkDe4yUVlpkiKggMQLDeF0XGgTM_ivjzCdFvO5mXXY7a4WBhEj-QyHdBlz-WXtNxQqGxkI5kJYZw8AtrVau3gKJexUa3t0L0UqFUCr5pOpfBSOGiQuPgwjqVL9aqcSvcm497_vOGSDn7I8Pts5nCPWweUx6333ZeDtb9pnRSua-sabmj9Fqmt4Hz1KVKw1p9grnxAPpg7mgwH78wfSrV4P2Mv4_vlukrQdEhIrIG8SK8OQZ95Yhwi5RuFU5QNG8rqUYQ8kCo9grQavbQG-clmFWvg8TT2krpInrLusl3jKuAQR0KoDZZRTUmubZRYQC-lEmRrvzlifFmLxsRXBWLRrcP739AU7EFQnEJmBl6zbrDZ4xfb9Z_O-Xl3HnfsCWqWXfg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8MwDLWmgQSnARvimxw40q1N0qQ9I8YQY5rQQLtNSeoiLuu0dfx-4q4MIXHhlDZSqypx5Tz7-RnghryiySjJ6t1lIKMsDRKPAgLHlfan40SlVQb_bahHo2Q6TccNuN3WwiBiRT7DLl1WufyscGsKlfVS7c2FsM5OLCWPNtVa24gKpexkbX10L3gopNa1nk8k454_bJC8eNePpEv1q6FK5U_6rf99yQF0fgrz2Hjrcg6hgfMjaH13ZmD1j9qGpJZNfWdlwV4qsqt_Hz1KZKwVo-grGxATpvAGhB79swdSrl514LV_P7kbBHWPhMBwHZeBEX6II5cai6hjhdzK3HmU5FQm_C4I5A61MUo7ZRLtchvlqLiLw9Dp0ObiGJrzYo4nwITmHq9aLVNppVDKRJHRiImwPAtTZ0-hTQsxW2xkMGb1Gpz9PX0Ne4PJ83A2fBw9ncM-p6qBiid4Ac1yucZL2HWf5cdqeVXt4hdtCZrF
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2022+IEEE%2FACM+30th+International+Conference+on+Program+Comprehension+%28ICPC%29&rft.atitle=Learning+to+Represent+Programs+with+Heterogeneous+Graphs&rft.au=Zhang%2C+Kechi&rft.au=Wang%2C+Wenhan&rft.au=Zhang%2C+Huangzhao&rft.au=Li%2C+Ge&rft.date=2022-05-01&rft.pub=ACM&rft.eissn=2643-7171&rft.spage=378&rft.epage=389&rft_id=info:doi/10.1145%2F3524610.3527905&rft.externalDocID=9796361