What Makes Good In-Context Demonstrations for Code Intelligence Tasks with LLMs?
Pre-trained models of source code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size, large language models have shown the ability of in-context learning (ICL). ICL employs task instructions and a few examples as demonstrations,...
Gespeichert in:
| Veröffentlicht in: | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] S. 761 - 773 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
11.09.2023
|
| Schlagworte: | |
| ISSN: | 2643-1572 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Pre-trained models of source code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size, large language models have shown the ability of in-context learning (ICL). ICL employs task instructions and a few examples as demonstrations, and then inputs the demonstrations to the language models for making predictions. This new learning paradigm is training-free and has shown impressive performance in various natural language processing and code intelligence tasks. However, the performance of ICL heavily relies on the quality of demonstrations, e.g., the selected examples. It is important to systematically investigate how to construct a good demonstration for code-related tasks. In this paper, we empirically explore the impact of three key factors on the performance of ICL in code intelligence tasks: the selection, order, and number of demonstration examples. We conduct extensive experiments on three code intelligence tasks including code summarization, bug fixing, and program synthesis. Our experimental results demonstrate that all the above three factors dramatically impact the performance of ICL in code intelligence tasks. Additionally, we summarize our findings and provide takeaway suggestions on how to construct effective demonstrations, taking into account these three perspectives. We also show that a carefully-designed demonstration based on our findings can lead to substantial improvements over widely-used demonstration construction methods, e.g., improving BLEU-4, EM, and EM by at least 9.90%, 175.96%, and 50.81% on code summarization, bug fixing, and program synthesis, respectively. |
|---|---|
| AbstractList | Pre-trained models of source code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size, large language models have shown the ability of in-context learning (ICL). ICL employs task instructions and a few examples as demonstrations, and then inputs the demonstrations to the language models for making predictions. This new learning paradigm is training-free and has shown impressive performance in various natural language processing and code intelligence tasks. However, the performance of ICL heavily relies on the quality of demonstrations, e.g., the selected examples. It is important to systematically investigate how to construct a good demonstration for code-related tasks. In this paper, we empirically explore the impact of three key factors on the performance of ICL in code intelligence tasks: the selection, order, and number of demonstration examples. We conduct extensive experiments on three code intelligence tasks including code summarization, bug fixing, and program synthesis. Our experimental results demonstrate that all the above three factors dramatically impact the performance of ICL in code intelligence tasks. Additionally, we summarize our findings and provide takeaway suggestions on how to construct effective demonstrations, taking into account these three perspectives. We also show that a carefully-designed demonstration based on our findings can lead to substantial improvements over widely-used demonstration construction methods, e.g., improving BLEU-4, EM, and EM by at least 9.90%, 175.96%, and 50.81% on code summarization, bug fixing, and program synthesis, respectively. |
| Author | Wen, Xin-Cheng Gao, Cuiyun Gao, Shuzheng Lyu, Michael R. Wang, Wenxuan Zhang, Hongyu |
| Author_xml | – sequence: 1 givenname: Shuzheng surname: Gao fullname: Gao, Shuzheng email: szgao98@gmail.com organization: School of Computer Science and Technology, Harbin Institute of Technology,Shenzhen,China – sequence: 2 givenname: Xin-Cheng surname: Wen fullname: Wen, Xin-Cheng email: xiamenwxc@foxmail.com organization: School of Computer Science and Technology, Harbin Institute of Technology,Shenzhen,China – sequence: 3 givenname: Cuiyun surname: Gao fullname: Gao, Cuiyun email: gaocuiyun@hit.edu.cn organization: School of Computer Science and Technology, Harbin Institute of Technology,Shenzhen,China – sequence: 4 givenname: Wenxuan surname: Wang fullname: Wang, Wenxuan email: wxwang@cse.cuhk.edu.hk organization: The Chinese University of Hong Kong,Department of Computer Science and Engineering,China – sequence: 5 givenname: Hongyu surname: Zhang fullname: Zhang, Hongyu email: hyzhang@cqu.edu.cn organization: School of Big Data and Software Engineering, Chongqing University,China – sequence: 6 givenname: Michael R. surname: Lyu fullname: Lyu, Michael R. email: lyu@cse.cuhk.edu.hk organization: The Chinese University of Hong Kong,Department of Computer Science and Engineering,China |
| BookMark | eNotj9FKwzAYhaMouM09gV7kBVqTP23TXMmocw46FJx4OZLmj6vrGmkC6ttb1KuPczgczpmSs973SMgVZynnTN0snpd5AaBSYCBSxkbvhMyVVKXImQCliuyUTKDIRMJzCRdkGsI7Y_ko5IQ8ve51pBt9wEBX3lu67pPK9xG_Ir3Do-9DHHRsR1LnB1p5i2MkYte1b9g3SLc6HAL9bOOe1vUm3F6Sc6e7gPN_zsjL_XJbPST142pdLepEQ5nFRJaA-neeyQyXhjlnnFJNDgiMNU421vLSNdqUFrjWFoVsSmm5Rq6MUGJGrv96W0TcfQztUQ_fO85g_A1K_ACrmVG6 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ASE56229.2023.00109 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798350329964 |
| EISSN | 2643-1572 |
| EndPage | 773 |
| ExternalDocumentID | 10298329 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Natural Science Foundation of Guangdong Province grantid: 2023A1515011959 funderid: 10.13039/501100003453 – fundername: National Natural Science Foundation of China grantid: 62002084 funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-a284t-782ea03299b4b17b0ffbf99c52e200cf7cdd18fcab8d21aade37c87d1ae19b393 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 56 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001103357200061&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:32:41 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a284t-782ea03299b4b17b0ffbf99c52e200cf7cdd18fcab8d21aade37c87d1ae19b393 |
| PageCount | 13 |
| ParticipantIDs | ieee_primary_10298329 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Sept.-11 |
| PublicationDateYYYYMMDD | 2023-09-11 |
| PublicationDate_xml | – month: 09 year: 2023 text: 2023-Sept.-11 day: 11 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] |
| PublicationTitleAbbrev | ASE |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0051577 ssib057256115 |
| Score | 2.5887785 |
| Snippet | Pre-trained models of source code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size,... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 761 |
| SubjectTerms | Codes Computer bugs Natural language processing Predictive models Software engineering Source coding Task analysis |
| Title | What Makes Good In-Context Demonstrations for Code Intelligence Tasks with LLMs? |
| URI | https://ieeexplore.ieee.org/document/10298329 |
| WOSCitedRecordID | wos001103357200061&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62ePBUHxXf5OA1utmHszmJ1FYLbSlYobeSTSYgxd3S3Yo_3yTd1l48eAs5hZkM30wy3zeE3BoJsYgxZaghY3GEyGSMCeMoExNolaL0kvkDGI3S6VSMa7K658Igom8-wzu39H_5ulAr91RmIzwU9gaKBmkAwJqstbk8CVjw5nyb-1qcBqhlhngg7p_euhbqQ8dNCZ2oqW9B3Bmo4vGk1_rnSQ5J-5eZR8dbzDkie5gfk9ZmNAOtI_WEjJ0kNx3KOZb0pSg07efMC1F9V_QZP11SuHZ9SW3WSjuFRtrfUeekE1nOS-peaelgMCwf2-S91510Xlk9PIFJizgVs8iPMrBHFFmcccgCYzIjhEpCtIGhDCiteWqUzFIdcik1RqBS0FwiF1kkolPSzIsczwhNFIIrZAOldAy2pg0lyAdjS0EhjE0Iz0nbWWi2WOtjzDbGufhj_5IcOCe4rgvOr0izWq7wmuyrr-qjXN54r_4AaTyizA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA1aBT3Vj4rf5uA1utnNms1JpFpb3JaCFXor2WQCUtyV7lb8-SbptvbiwVvIKcxkeDPJvDcIXRvJmWCQENA8IywCIJJBTCjI2ARaJSC9ZH7KB4NkPBbDmqzuuTAA4JvP4MYt_V--LtTcPZXZCA-FvYFiE23FjIV0QddaXp-YW_imdJX9WqTmvBYaooG4fXh9smAfOnZK6GRNfRPi2kgVjyid5j_Psodav9w8PFyhzj7agPwANZfDGXAdq4do6ES5cV9OocTPRaFxLydeiuq7wo_w4dLChfNLbPNW3C404N6aPiceyXJaYvdOi9O0X9630FvnadTuknp8ApEWcypisR9kYI8oMpZRngXGZEYIFYdgQ0MZrrSmiVEyS3RIpdQQcZVwTSVQkUUiOkKNvMjhGOFYAXelbKCUZtxWtaHk8s7YYlAIY1PCE9RyFpp8LhQyJkvjnP6xf4V2uqN-Okl7g5cztOsc4nowKD1HjWo2hwu0rb6q93J26T38AxJpphM |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=What+Makes+Good+In-Context+Demonstrations+for+Code+Intelligence+Tasks+with+LLMs%3F&rft.au=Gao%2C+Shuzheng&rft.au=Wen%2C+Xin-Cheng&rft.au=Gao%2C+Cuiyun&rft.au=Wang%2C+Wenxuan&rft.date=2023-09-11&rft.pub=IEEE&rft.eissn=2643-1572&rft.spage=761&rft.epage=773&rft_id=info:doi/10.1109%2FASE56229.2023.00109&rft.externalDocID=10298329 |