Learning to Represent Programs with Heterogeneous Graphs
Code representation, which transforms programs into vectors with semantics, is essential for source code processing. We have witnessed the effectiveness of incorporating structural information (i.e., graph) into code representations in recent years. Specifically, the abstract syntax tree (AST) and t...
Gespeichert in:
| Veröffentlicht in: | 2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC) S. 378 - 389 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
ACM
01.05.2022
|
| Schlagworte: | |
| ISSN: | 2643-7171 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Code representation, which transforms programs into vectors with semantics, is essential for source code processing. We have witnessed the effectiveness of incorporating structural information (i.e., graph) into code representations in recent years. Specifically, the abstract syntax tree (AST) and the AST-augmented graph of the program contain much structural and semantic information, and most existing studies apply them for code representation. The graph adopted by existing approaches is homogeneous, i.e., it discards the type information of the edges and the nodes lying within AST. That may cause plausible obstruction to the representation model. In this paper, we propose to leverage the type information in the graph for code representation. To be specific, we propose the heterogeneous program graph (HPG), which provides the types of the nodes and the edges explicitly. Furthermore, we employ the heterogeneous graph transformer (HGT) architecture to generate representations based on HPG, considering the type of information during processing. With the additional types in HPG, our approach can capture complex structural information, produce accurate and delicate representations, and finally perform well on certain tasks. Our in-depth evaluations upon four classic datasets for two typical tasks (i.e., method name prediction and code classification) demonstrate that the heterogeneous types in HPG benefit the representation models. Our proposed \text{HPG}+\text{HGT} also outperforms the SOTA baselines on the subject tasks and datasets. |
|---|---|
| AbstractList | Code representation, which transforms programs into vectors with semantics, is essential for source code processing. We have witnessed the effectiveness of incorporating structural information (i.e., graph) into code representations in recent years. Specifically, the abstract syntax tree (AST) and the AST-augmented graph of the program contain much structural and semantic information, and most existing studies apply them for code representation. The graph adopted by existing approaches is homogeneous, i.e., it discards the type information of the edges and the nodes lying within AST. That may cause plausible obstruction to the representation model. In this paper, we propose to leverage the type information in the graph for code representation. To be specific, we propose the heterogeneous program graph (HPG), which provides the types of the nodes and the edges explicitly. Furthermore, we employ the heterogeneous graph transformer (HGT) architecture to generate representations based on HPG, considering the type of information during processing. With the additional types in HPG, our approach can capture complex structural information, produce accurate and delicate representations, and finally perform well on certain tasks. Our in-depth evaluations upon four classic datasets for two typical tasks (i.e., method name prediction and code classification) demonstrate that the heterogeneous types in HPG benefit the representation models. Our proposed \text{HPG}+\text{HGT} also outperforms the SOTA baselines on the subject tasks and datasets. |
| Author | Li, Ge Zhang, Kechi Wang, Wenhan Jin, Zhi Zhang, Huangzhao |
| Author_xml | – sequence: 1 givenname: Kechi surname: Zhang fullname: Zhang, Kechi email: zhangkechi@pku.edu.cn organization: Peking University,China – sequence: 2 givenname: Wenhan surname: Wang fullname: Wang, Wenhan email: wwhjacob@hotmail.com organization: Nanyang Technological University,Singapore – sequence: 3 givenname: Huangzhao surname: Zhang fullname: Zhang, Huangzhao email: zhang_hz@pku.edu.cn organization: Peking University,China – sequence: 4 givenname: Ge surname: Li fullname: Li, Ge email: lige@pku.edu.cn organization: Peking University,China – sequence: 5 givenname: Zhi surname: Jin fullname: Jin, Zhi email: zhijin@pku.edu.cn organization: Peking University,China |
| BookMark | eNotzE1LAzEUheEoCra1axdu8gdG830nSynaCgOK6LrcSe-0IzYzJBHx3ztQVw_nXZw5u4hDJMZupLiT0th7bZVx05gEL-wZm09VaK987c_ZTDmjK5Agr9gy508hhFZCG4AZqxvCFPu452XgbzQmyhQLf03DPuEx85--HPiGCk2BIg3fma8Tjod8zS47_Mq0_HfBPp4e31ebqnlZP68emgoV2FKhnrAyeGyJwDpSremCNCa4nQbcaVKBANFBcFhD6FrZkVPBChFAtJ1esNvTb09E2zH1R0y_Ww_eaSf1H5UnSM8 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3524610.3527905 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1450392989 9781450392983 |
| EISSN | 2643-7171 |
| EndPage | 389 |
| ExternalDocumentID | 9796361 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 62072007,62192733,61832009,62192730 funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IF 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-a275t-a327551c9abee756e2b4fc144c6d37ad3e2ce7aa67c6a87cfb1fe62c500c70bf3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 39 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000850204200036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:23:39 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a275t-a327551c9abee756e2b4fc144c6d37ad3e2ce7aa67c6a87cfb1fe62c500c70bf3 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_9796361 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-May |
| PublicationDateYYYYMMDD | 2022-05-01 |
| PublicationDate_xml | – month: 05 year: 2022 text: 2022-May |
| PublicationDecade | 2020 |
| PublicationTitle | 2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC) |
| PublicationTitleAbbrev | ICPC |
| PublicationYear | 2022 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0003203477 ssj0002870470 |
| Score | 2.122619 |
| Snippet | Code representation, which transforms programs into vectors with semantics, is essential for source code processing. We have witnessed the effectiveness of... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 378 |
| SubjectTerms | code representation Codes Grammar graph neural networks heterogeneous graphs Predictive models Semantics Syntactics Transformers Transforms |
| Title | Learning to Represent Programs with Heterogeneous Graphs |
| URI | https://ieeexplore.ieee.org/document/9796361 |
| WOSCitedRecordID | wos000850204200036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61ePBUtRXf5ODRbXfzmuxZrD2VIgq9lSQ7ES9d6W79_SbpWhG8eMoDEsJkQjKTb74h5M5hqbz2PqsYqkwIhMyCsJkMVcUd-Nz6lGwC5nO9XJaLHrnfx8IgYgKf4ThW019-VbttdJVNSgjqEm2dAwDYxWrt_Snxw050uhfbnOVcAHRsPoWQk_DUiOTi41BGVqpf6VTSbTId_G8dx2T0E5ZHF_sL54T0cH1KBt95GWh3TIdEd6Spb7St6XOCuob54tAIxWpo9L3SWcTB1EF9MNj-9CnyVjcj8jp9fHmYZV2GhMwwkG1meChk4UpjEUEqZFZ4F2wkpyoOpuLIHIIxCpwyGpy3hUfFnMxzB2EX-Bnpr-s1nhNqmbQ5K60vtBfa8xKDqeYZGJFboQtzQYZREKuPHQnGqpPB5d_dV-SIxTiBhAy8Jv12s8Ubcug-2_dmc5t27gvckpnH |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5KFfRUtRXf5uDRbXeTbGb3LNaKtRSp0FtJ0ol46Uq79febbNeK4MVTHpAQJhOSmXzzDcCNpVy5zLlozklFUhJGBqWJUl9VwqKLjauSTeBolE2n-bgBt9tYGCKqwGfUDdXqL39e2HVwlfVy9OoSbJ0dPxVPNtFaW49K-LKTtfaFtuCxkIg1n08i055_bAR68a4vAy_Vr4Qq1X3Sb_1vJQfQ-QnMY-PtlXMIDVocQes7MwOrD2obspo29Y2VBXupwK5-vjA0gLFWLHhf2SAgYQqvQOStf_YQmKtXHXjt30_uBlGdIyHSHNMy0sIXaWJzbYgwVcSNdNZbSVbNBeq5IG4JtVZolc7QOpM4UtymcWzR74M4huaiWNAJMMNTE_PcuCRzMnMiJ2-sOY5axkZmiT6FdhDE7GNDgzGrZXD2d_c17A0mz8PZ8HH0dA77PEQNVDjBC2iWyzVdwq79LN9Xy6tqF78ADqudDg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2022+IEEE%2FACM+30th+International+Conference+on+Program+Comprehension+%28ICPC%29&rft.atitle=Learning+to+Represent+Programs+with+Heterogeneous+Graphs&rft.au=Zhang%2C+Kechi&rft.au=Wang%2C+Wenhan&rft.au=Zhang%2C+Huangzhao&rft.au=Li%2C+Ge&rft.date=2022-05-01&rft.pub=ACM&rft.eissn=2643-7171&rft.spage=378&rft.epage=389&rft_id=info:doi/10.1145%2F3524610.3527905&rft.externalDocID=9796361 |