A syntax-guided multi-task learning approach for Turducken-style code generation

Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code will not always adhere to syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Empirical software engineering : an international journal Jg. 28; H. 6; S. 141
Hauptverfasser:	Yang, Guang, Zhou, Yu, Chen, Xiang, Zhang, Xiangyu, Xu, Yiran, Han, Tingting, Chen, Taolue
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York Springer US 01.11.2023 Springer Nature B.V
Schlagworte:	Algorithms Compilers Computer Science Constraint modelling Decoding Interpreters Programming Languages Representations Software Engineering/Programming and Operating Systems Syntax Turducken-style code CodeT5 Multi-task learning Abstract syntax tree Syntactically-constrained code generation
ISSN:	1382-3256, 1573-7616
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code will not always adhere to syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative code snippets are embedded within imperative programs. In this study, we summarize three significant challenges in regards to syntactic constraints: (1) the efficient representation of syntactic constraints, (2) the effective integration of syntactic information, and (3) the scalable syntax-first decoding algorithm. To address these challenges, we propose a syntax-guided multi-task learning approach TurduckenGen. Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints. Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code. Finally, the syntactically correct code is selected accurately from the multiple candidates with the help of the compiler feedback. Extensive experiments and comprehensive analysis demonstrate the effectiveness and general applicability of our approach after being compared with six state-of-the-art baselines on two Turducken-style code datasets. Finally, we conducted a human study and found the code quality generated by our approach is better than baselines in terms of code readability and semantic similarity.
AbstractList	Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code will not always adhere to syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative code snippets are embedded within imperative programs. In this study, we summarize three significant challenges in regards to syntactic constraints: (1) the efficient representation of syntactic constraints, (2) the effective integration of syntactic information, and (3) the scalable syntax-first decoding algorithm. To address these challenges, we propose a syntax-guided multi-task learning approach TurduckenGen. Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints. Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code. Finally, the syntactically correct code is selected accurately from the multiple candidates with the help of the compiler feedback. Extensive experiments and comprehensive analysis demonstrate the effectiveness and general applicability of our approach after being compared with six state-of-the-art baselines on two Turducken-style code datasets. Finally, we conducted a human study and found the code quality generated by our approach is better than baselines in terms of code readability and semantic similarity.
ArticleNumber	141
Author	Yang, Guang Zhou, Yu Han, Tingting Chen, Taolue Xu, Yiran Zhang, Xiangyu Chen, Xiang
Author_xml	– sequence: 1 givenname: Guang surname: Yang fullname: Yang, Guang organization: The College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics – sequence: 2 givenname: Yu orcidid: 0000-0002-3723-7584 surname: Zhou fullname: Zhou, Yu email: zhouyu@nuaa.edu.cn organization: The College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics – sequence: 3 givenname: Xiang surname: Chen fullname: Chen, Xiang organization: The School of Information Science and Technology, Nantong University – sequence: 4 givenname: Xiangyu surname: Zhang fullname: Zhang, Xiangyu organization: The College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics – sequence: 5 givenname: Yiran surname: Xu fullname: Xu, Yiran organization: The College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics – sequence: 6 givenname: Tingting surname: Han fullname: Han, Tingting organization: Department of Computer Science, Birkbeck, University of London – sequence: 7 givenname: Taolue surname: Chen fullname: Chen, Taolue organization: Department of Computer Science, Birkbeck, University of London
BookMark	eNp9kEtLAzEUhYNUsK3-AVcDrqN5TTJZluILBF3UdchkknHaaaYmGbD_3tgKgouu7l2c755zzwxM_OAtANcY3WKExF3EiHMGEaEQIyoIxGdgiktBoeCYT_JOKwIpKfkFmMW4RghJwcopeFsUce-T_oLt2DW2KbZjnzqYdNwUvdXBd74t9G4XBm0-CjeEYjWGZjQb62FM-94WZmhs0Vpvg07d4C_BudN9tFe_cw7eH-5Xyyf48vr4vFy8QEOxTNAIYXRtS1GShhpsHG4aVzFXS2ykqajRDGNJHGekZjYHl4LUVNRcOskcl3QObo53c7TP0cak1sMYfLZUpBICUUYYyipyVJkwxBisU7vQbXXYK4zUT3Pq2JzKzalDcwpnqPoHmS4dnktBd_1plB7RmH18a8NfqhPUN0YNhRM
CitedBy_id	crossref_primary_10_1007_s10515_025_00494_9 crossref_primary_10_1145_3728639 crossref_primary_10_1016_j_cose_2024_104184 crossref_primary_10_1016_j_infsof_2025_107820 crossref_primary_10_1145_3695988 crossref_primary_10_1109_TSE_2024_3440503 crossref_primary_10_1007_s10664_024_10466_4
Cites_doi	10.1145/3510003.3510096 10.1145/3540250.3549162 10.18653/v1/N18-2093 10.3115/1075812.1075823 10.3115/1073083.1073135 10.18653/v1/2021.emnlp-main.669 10.18653/v1/P19-1448 10.24963/ijcai.2022/588 10.18653/v1/D19-1204 10.1142/S0218194020500230 10.1145/604045.604120 10.1145/3560815 10.1049/sfw2.12017 10.18653/v1/2022.findings-naacl.141 10.1016/j.jss.2022.111577 10.1109/ICSE48619.2023.00072 10.18653/v1/P19-1443 10.18653/v1/2022.acl-long.499 10.1145/3510454.3528648 10.1609/aaai.v33i01.33017055 10.18653/v1/2021.emnlp-main.685 10.1109/APSEC53868.2021.00029 10.18653/v1/2020.acl-main.677 10.1007/s10664-019-09730-9 10.18653/v1/D18-1111 10.1145/2790755.2790797 10.1016/j.infsof.2020.106309 10.1145/319838.319848 10.1109/EICT.2015.7391926 10.1145/3487569 10.18653/v1/D18-2002 10.48550/ARXIV.2211.00818 10.1109/DSA52907.2021.00013 10.1109/SANER50967.2021.00014 10.18653/v1/2022.acl-long.576 10.18653/v1/2021.acl-long.353 10.1007/978-1-4612-4380-9_16 10.18653/v1/P17-4012 10.18653/v1/D18-1425 10.18653/v1/2021.naacl-main.211 10.1109/SANER53432.2022.00052 10.1109/ISSRE52982.2021.00042 10.1145/3510003.3510152 10.1109/ASE51524.2021.9678552 10.18653/v1/2020.acl-main.131 10.1109/ICPC52881.2021.00022 10.1198/108571105X46642 10.18653/v1/P16-1057 10.18653/v1/2021.spnlp-1.2 10.18653/v1/2021.emnlp-main.779 10.1109/MSR.2013.6624004 10.1145/3324884.3416591 10.18653/v1/P17-1089 10.1145/3540250.3549113 10.18653/v1/2021.acl-long.295 10.18653/v1/D16-1137
ContentType	Journal Article
Copyright	The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Copyright_xml	– notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
DBID	AAYXX CITATION 7SC 8FD 8FE 8FG ABJCF AFKRA ARAPS BENPR BGLVJ CCPQU DWQXO HCIFZ JQ2 L6V L7M L~C L~D M7S P5Z P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS S0W
DOI	10.1007/s10664-023-10372-1
DatabaseName	CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central UK/Ireland Advanced Technologies & Computer Science Collection ProQuest Central Technology collection ProQuest One ProQuest Central SciTech Premium Collection ProQuest Computer Science Collection ProQuest Engineering Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Engineering Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Engineering collection DELNET Engineering & Technology Collection
DatabaseTitle	CrossRef Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Computer Science Collection Computer and Information Systems Abstracts SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest Central Korea ProQuest Central (New) Advanced Technologies Database with Aerospace Engineering Collection Advanced Technologies & Aerospace Collection Engineering Database ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest DELNET Engineering and Technology Collection Materials Science & Engineering Collection ProQuest One Academic ProQuest One Academic (New)
DatabaseTitleList	Technology Collection
Database_xml	– sequence: 1 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1573-7616
ExternalDocumentID	10_1007_s10664_023_10372_1
GrantInformation_xml	– fundername: Collaborative Innovation Center of Novel Software Technology and Industrialization, and the Open Project of Key Laboratory of Safety-Critical Software for Nanjing University of Aeronautics and Astronautics, Ministry of Industry and Information Technology grantid: No. NJ2020022 – fundername: State Key Laboratory of Novel Software Technology grantid: KFKT2022A03; No. 62272397 funderid: http://dx.doi.org/10.13039/501100011246 – fundername: Postgraduate Research & Practice Innovation Program of Jiangsu Province – fundername: Natural Science Foundation of Jiangsu Province grantid: No. BK20201292 funderid: http://dx.doi.org/10.13039/501100004608 – fundername: National Natural Science Foundation of China grantid: No. 61972197 funderid: http://dx.doi.org/10.13039/501100001809
GroupedDBID	-4Z -59 -5G -BR -EM -Y2 -~C .86 .DC .VR 06D 0R~ 0VY 199 1N0 1SB 2.D 203 28- 29G 2J2 2JN 2JY 2KG 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5GY 5QI 5VS 67Z 6NX 78A 8FE 8FG 8TC 8UJ 95- 95. 95~ 96X AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYOK AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJCF ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACIWK ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACSNA ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFKRA AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARAPS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. BA0 BBWZM BDATZ BENPR BGLVJ BGNMA BSONS CAG CCPQU COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EBLON EBS EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW L6V LAK LLZTM M4Y M7S MA- N2Q NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM P19 P62 P9O PF0 PT4 PT5 PTHSS Q2X QOK QOS R4E R89 R9I RHV RNI RNS ROL RPX RSV RZC RZE RZK S0W S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TSG TSK TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR Z45 Z7R Z7S Z7V Z7X Z7Z Z81 Z83 Z86 Z88 Z8M Z8N Z8P Z8R Z8T Z8U Z8W Z92 ZMTXR ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC ADHKG AEZWR AFDZB AFFHD AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION PHGZM PHGZT PQGLB 7SC 8FD DWQXO JQ2 L7M L~C L~D PKEHL PQEST PQQKQ PQUKI PRINS
ID	FETCH-LOGICAL-c319t-c77cabe5752d3c1cf1ddf84fb91c9c83ca41192f642b4e325972b37b69f94f693
IEDL.DBID	RSV
ISICitedReferencesCount	4
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001082634100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	1382-3256
IngestDate	Tue Dec 02 16:28:25 EST 2025 Sat Nov 29 05:37:47 EST 2025 Tue Nov 18 22:19:04 EST 2025 Fri Feb 21 02:40:56 EST 2025
IsPeerReviewed	true
IsScholarly	true
Issue	6
Keywords	Turducken-style code CodeT5 Multi-task learning Abstract syntax tree Syntactically-constrained code generation
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c319t-c77cabe5752d3c1cf1ddf84fb91c9c83ca41192f642b4e325972b37b69f94f693
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-3723-7584
PQID	2877034240
PQPubID	326341
ParticipantIDs	proquest_journals_2877034240 crossref_primary_10_1007_s10664_023_10372_1 crossref_citationtrail_10_1007_s10664_023_10372_1 springer_journals_10_1007_s10664_023_10372_1
PublicationCentury	2000
PublicationDate	2023-11-01
PublicationDateYYYYMMDD	2023-11-01
PublicationDate_xml	– month: 11 year: 2023 text: 2023-11-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York – name: Dordrecht
PublicationSubtitle	An International Journal
PublicationTitle	Empirical software engineering : an international journal
PublicationTitleAbbrev	Empir Software Eng
PublicationYear	2023
Publisher	Springer US Springer Nature B.V
Publisher_xml	– name: Springer US – name: Springer Nature B.V
References	WangXWangYWanYMiFLiYZhouPLiuJWuHJiangXLiuQCompilable neural code generation with compiler feedbackFindings of the Association for Computational Linguistics: ACL20222022919 HussainYHuangZZhouYImproving source code suggestion with code embedding and enhanced convolutional long short-term memoryIET Softw202115319921310.1049/sfw2.12017 Dong Y, Jiang X, Liu Y, Li G, Jin Z (2022) Codepad: Sequence-based code generation with pushdown automaton. https://doi.org/10.48550/ARXIV.2211.00818. arXiv:2211.00818 Liu F, Li G, Zhao Y, Jin Z (2020a) Multi-task learning based pre-trained language model for code completion. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering. pp 473–485 Liguori P, Al-Hossami E, Orbinato V, Natella R, Shaikh S, Cotroneo D, Cukic B (2021) Evil: exploiting software via natural language. In: 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). IEEE, pp 321–332 SunZZhuQMouLXiongYLiGZhangLA grammar-based structural cnn decoder for code generationProceedings of the AAAI conference on artificial intelligence2019337055706210.1609/aaai.v33i01.33017055 XuFFVasilescuBNeubigGIn-ide code generation from natural language: Promise and challengesACM Trans Softw Eng Methodol (TOSEM)202231214710.1145/3487569 Zhong V, Xiong C, Socher R (2017) Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv:1709.00103 Yu T, Zhang R, Yasunaga M, Tan YC, Lin XV, Li S, Er H, Li I, Pang B, Chen T et al (2019b) Sparc: Cross-domain semantic parsing in context. In: Proceedings of the 57th annual meeting of the association for computational linguistics. pp 4511–4523 Chakraborty S, Ahmed T, Ding Y, Devanbu PT, Ray B (2022) Natgen: generative pre-training by “naturalizing” source code. In: Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering. pp 18–30 Hu X, Xia X, Lo D, Wan Z, Chen Q, Zimmermann T (2022) Practitioners’ expectations on automated code comment generation. In: 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022. ACM, Pittsburgh, PA, USA, May 25-27, 2022, pp 1693–1705. https://doi.org/10.1145/3510003.3510152 Wang D, Yu Y, Li S, Dong W, Wang J, Qing L (2021a) Mulcode: A multi-task learning approach for source code understanding. In: 2021 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 48–59 RaffelCShazeerNRobertsALeeKNarangSMatenaMZhouYLiWLiuPJExploring the limits of transfer learning with a unified text-to-text transformerJ Mach Learn Res2020211548555514138124 Liang Q, Sun Z, Zhu Q, Zhang W, Yu L, Xiong Y, Zhang L (2021) Lyra: A benchmark for turducken-style code generation. arXiv:2108.12144 Mahmud T, Hasan KA, Ahmed M, Chak THC (2015) A rule based approach for nlp based query processing. In: 2015 2nd International conference on electrical information and communication technologies (EICT). IEEE, pp 78–82 Gu Y, Han X, Liu Z, Huang M (2022) Ppt: Pre-trained prompt tuning for few-shot learning. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers). pp 8410–8423 Mou L, Men R, Li G, Zhang L, Jin Z (2015) On end-to-end program generation from user intention by deep neural networks. arXiv:1510.07211 SunZZhuQXiongYSunYMouLZhangLTreegen: A tree-based transformer architecture for code generationProc AAAI Conf Art Intell20203489848991 Fernandes S, Bernardino J (2015) What is bigquery? In: Proceedings of the 19th International Database Engineering & Applications Symposium. pp 202–203 Huang J, Wang Y, Wang Y, Dong Y, Xiao Y (2021) Relation aware semi-autoregressive semantic parsing for nl2sql. arXiv:2108.00804 LinXVSocherRXiongCBridging textual and tabular data for cross-domain text-to-sql semantic parsingFindings of the Association for Computational Linguistics: EMNLP2020202048704888 Longpre S, Hou L, Vu T, Webson A, Chung HW, Tay Y, Zhou D, Le QV, Zoph B, Wei J, et al (2023) The flan collection: Designing data and methods for effective instruction tuning. arXiv:2301.13688 Bogin B, Berant J, Gardner M (2019) Representing schema structure with graph neural networks for text-to-sql parsing. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. pp 4560–4565 Wang B, Shin R, Liu X, Polozov O, Richardson M (2020) Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers. In: Proceedings of the 58th annual meeting of the association for computational linguistics. pp 7567–7578 Li XL, Liang P (2021) Prefix-tuning: Optimizing continuous prompts for generation. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers). pp 4582–4597 Gao T, Fisch A, Chen D (2021) Making pre-trained language models better few-shot learners. In: Joint conference of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, ACL-IJCNLP 2021, Association for Computational Linguistics (ACL). pp 3816–3830 Wang Y, Wang W, Joty S, Hoi SC (2021b) Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 conference on empirical methods in natural language processing. pp 8696–8708 Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations. pp 67–72 LiuPYuanWFuJJiangZHayashiHNeubigGPre-train, prompt, and predict: A systematic survey of prompting methods in natural language processingACM Comput Surv202355913510.1145/3560815 Liu Y, Tantithamthavorn C, Liu Y, Li L (2023c) On the reliability and explainability of automated code generation approaches. arXiv:2302.09587 Bailey MW (2009) Workshop on declarative aspects of multicore programming (damp 2009) damp 2009 Liu F, Li J, Zhang L (2023a) Syntax and domain aware model for unsupervised program translation. arXiv:2302.03908 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30 Yang G, Zhou Y, Chen X, Zhang X, Han T, Chen T (2022c) Exploitgen: Template-augmented exploit code generation based on codebert. J Syst Softw 111577 Dahl DA, Bates M, Brown MK, Fisher WM, Hunicke-Smith K, Pallett DS, Pao C, Rudnicky A, Shriberg E (1994) Expanding the scope of the atis task: The atis-3 corpus. In: Human language technology: proceedings of a workshop held at Plainsboro, New Jersey, March 8-11, 1994 Liu Q, Chen Y, Chen B, Lou JG, Chen Z, Zhou B, Zhang D (2020b) You impress me: Dialogue generation via mutual persona perception. In: Proceedings of the 58th annual meeting of the association for computational linguistics. pp 1417–1427 Lu S, Guo D, Ren S, Huang J, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang D, Tang D et al. (2021a) Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv:2102.04664 Eghbali A, Pradel M (2022) Crystalbleu: precisely and efficiently measuring the similarity of code. In: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings. pp 341–342 Le H, Wang Y, Gotmare AD, Savarese S, Hoi SC (2022) Coderl: Mastering code generation through pretrained models and deep reinforcement learning. arXiv:2207.01780 Xuan K, Wang Y, Wang Y, Wen Z, Dong Y (2021) Sead: End-to-end text-to-sql generation with schema-aware denoising. arXiv:2105.07911 Sánchez-Cartagena VM, Esplà-Gomis M, Pérez-Ortiz JA, Sánchez-Martínez F (2021) Rethinking data augmentation for low-resource neural machine translation: A multi-task learning approach. In: Proceedings of the 2021 conference on empirical methods in natural language processing. pp 8502–8516 Yang G, Chen X, Cao J, Xu S, Cui Z, Yu C, Liu K (2021a) Comformer: Code comment generation via transformer and fusion method-based hybrid code representation. In: 2021 8th International conference on dependable systems and their applications (DSA). IEEE, pp 30–41 Xie R, Ye W, Sun J, Zhang S (2021) Exploiting method names to improve code summarization: A deliberation multi-task learning approach. In: 2021 IEEE/ACM 29th international conference on program comprehension (ICPC). IEEE, pp 138–148 Huang J, Wang C, Zhang J, Yan C, Cui H, Inala JP, Clement C, Duan N, Gao J (2022) Execution-based evaluation for data science code generation models. arXiv:2211.09374 LiuFLiGWeiBXiaXFuZJinZA unified multi-task learning model for ast-level and token-level code completionEmp Softw Eng2022274138 Niu C, Li C, Ng V, Ge J, Huang L, Luo B (2022) Spt-code: sequence-to-sequence pre-training for learning source code representations. In: Proceedings of the 44th international conference on software engineering. pp 2006–2018 Yu T, Zhang R, Er H, Li S, Xue E, Pang B, Lin XV, Tan YC, Shi T, Li Z et al. (2019a) Cosql: A conversational text-to-sql challenge towards cross-domain natural language interfaces to databases. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). pp 1962–1979 RadfordAWuJChildRLuanDAmodeiDSutskeverILanguage models are unsupervised multitask learnersOpenAI blog2019189 Yu T, Zhang R, Yang K, Yasunaga M, Wang D, Li Z, Ma J, Li I, Yao Q, Roman S et al. (2018b) Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In: Proceedings of the 2018 conference on empirical methods in natural language processing. pp 3911–3921 Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics. pp 311–318 S 10372_CR23 10372_CR67 10372_CR68 10372_CR29 10372_CR27 10372_CR28 Y Hussain (10372_CR25) 2020; 30 10372_CR72 10372_CR73 10372_CR70 10372_CR71 10372_CR32 10372_CR33 10372_CR77 P Liu (10372_CR39) 2023; 55 Z Sun (10372_CR58) 2020; 34 10372_CR74 10372_CR31 10372_CR75 10372_CR14 10372_CR15 10372_CR59 10372_CR12 10372_CR56 10372_CR13 Z Feng (10372_CR10) 2020; 2020 10372_CR19 10372_CR16 Y Hussain (10372_CR26) 2021; 15 10372_CR17 X Hu (10372_CR18) 2020; 25 C Raffel (10372_CR52) 2020; 21 10372_CR61 10372_CR62 10372_CR60 10372_CR21 10372_CR65 10372_CR22 10372_CR66 10372_CR20 10372_CR64 10372_CR5 10372_CR47 10372_CR6 10372_CR48 10372_CR3 10372_CR45 10372_CR4 10372_CR46 10372_CR1 10372_CR2 10372_CR49 10372_CR9 P Legendre (10372_CR30) 2005; 10 Z Sun (10372_CR57) 2019; 33 10372_CR7 10372_CR8 10372_CR50 10372_CR54 10372_CR11 10372_CR55 Y Hussain (10372_CR24) 2020; 125 10372_CR53 10372_CR36 A Radford (10372_CR51) 2019; 1 10372_CR78 10372_CR35 10372_CR79 F Liu (10372_CR37) 2022; 27 10372_CR38 XV Lin (10372_CR34) 2020; 2020 X Wang (10372_CR63) 2022; 2022 10372_CR80 10372_CR83 10372_CR40 FF Xu (10372_CR69) 2022; 31 10372_CR81 10372_CR82 10372_CR43 10372_CR44 10372_CR41 G Yang (10372_CR76) 2023; 197 10372_CR42
References_xml	– reference: Li XL, Liang P (2021) Prefix-tuning: Optimizing continuous prompts for generation. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers). pp 4582–4597 – reference: Yu T, Li Z, Zhang Z, Zhang R, Radev D (2018a) Typesql: Knowledge-based type-aware neural text-to-sql generation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). pp 588–594 – reference: Liguori P, Al-Hossami E, Orbinato V, Natella R, Shaikh S, Cotroneo D, Cukic B (2021) Evil: exploiting software via natural language. In: 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). IEEE, pp 321–332 – reference: Guo D, Lu S, Duan N, Wang Y, Zhou M, Yin J (2022) Unixcoder: Unified cross-modal pre-training for code representation. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers). pp 7212–7225 – reference: Wei B, Li G, Xia X, Fu Z, Jin Z (2019) Code generation as a dual task of code summarization. Adv Neural Inf Process Syst 32 – reference: Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics. pp 311–318 – reference: Hu X, Xia X, Lo D, Wan Z, Chen Q, Zimmermann T (2022) Practitioners’ expectations on automated code comment generation. In: 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022. ACM, Pittsburgh, PA, USA, May 25-27, 2022, pp 1693–1705. https://doi.org/10.1145/3510003.3510152 – reference: LiuFLiGWeiBXiaXFuZJinZA unified multi-task learning model for ast-level and token-level code completionEmp Softw Eng2022274138 – reference: Wang Y, Wang W, Joty S, Hoi SC (2021b) Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 conference on empirical methods in natural language processing. pp 8696–8708 – reference: Bailey MW (2009) Workshop on declarative aspects of multicore programming (damp 2009) damp 2009 – reference: Xuan K, Wang Y, Wang Y, Wen Z, Dong Y (2021) Sead: End-to-end text-to-sql generation with schema-aware denoising. arXiv:2105.07911 – reference: Chakraborty S, Ahmed T, Ding Y, Devanbu PT, Ray B (2022) Natgen: generative pre-training by “naturalizing” source code. In: Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering. pp 18–30 – reference: Scholak T, Schucher N, Bahdanau D (2021) Picard: Parsing incrementally for constrained auto-regressive decoding from language models. In: Proceedings of the 2021 conference on empirical methods in natural language processing. pp 9895–9901 – reference: Liang Q, Sun Z, Zhu Q, Zhang W, Yu L, Xiong Y, Zhang L (2021) Lyra: A benchmark for turducken-style code generation. arXiv:2108.12144 – reference: LiuPYuanWFuJJiangZHayashiHNeubigGPre-train, prompt, and predict: A systematic survey of prompting methods in natural language processingACM Comput Surv202355913510.1145/3560815 – reference: Dauphin YN, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: International conference on machine learning. PMLR, pp 933–941 – reference: Huang J, Wang C, Zhang J, Yan C, Cui H, Inala JP, Clement C, Duan N, Gao J (2022) Execution-based evaluation for data science code generation models. arXiv:2211.09374 – reference: FengZGuoDTangDDuanNFengXGongMShouLQinBLiuTJiangDCodebert: A pre-trained model for programming and natural languagesFindings of the Association for Computational Linguistics: EMNLP2020202015361547 – reference: Zhong V, Xiong C, Socher R (2017) Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv:1709.00103 – reference: Wang D, Yu Y, Li S, Dong W, Wang J, Qing L (2021a) Mulcode: A multi-task learning approach for source code understanding. In: 2021 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 48–59 – reference: Gao T, Fisch A, Chen D (2021) Making pre-trained language models better few-shot learners. In: Joint conference of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, ACL-IJCNLP 2021, Association for Computational Linguistics (ACL). pp 3816–3830 – reference: Rubin O, Berant J (2021) Smbop: Semi-autoregressive bottom-up semantic parsing. In: Proceedings of the 5th workshop on structured prediction for NLP (SPNLP 2021). pp 12–21 – reference: Wiseman S, Rush AM (2016) Sequence-to-sequence learning as beam-search optimization. In: Proceedings of the 2016 conference on empirical methods in natural language processing. pp 1296–1306 – reference: Ren S, Guo D, Lu S, Zhou L, Liu S, Tang D, Sundaresan N, Zhou M, Blanco A, Ma S (2020) Codebleu: a method for automatic evaluation of code synthesis. arXiv:2009.10297 – reference: Ling W, Blunsom P, Grefenstette E, Hermann KM, Kočiskỳ T, Wang F, Senior A (2016) Latent predictor networks for code generation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers). pp 599–609 – reference: Lu S, Guo D, Ren S, Huang J, Svyatkovskiy A, Blanco A, Clement CB, Drain D, Jiang D, Tang D, Li G, Zhou L, Shou L, Zhou L, Tufano M, Gong M, Zhou M, Duan N, Sundaresan N, Deng SK, Fu S, Liu S (2021b) Codexglue: A machine learning benchmark dataset for code understanding and generation. In: Vanschoren J, Yeung S (eds) Proceedings of the neural information processing systems track on datasets and benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual – reference: Dong Y, Jiang X, Liu Y, Li G, Jin Z (2022) Codepad: Sequence-based code generation with pushdown automaton. https://doi.org/10.48550/ARXIV.2211.00818. arXiv:2211.00818 – reference: HuXLiGXiaXLoDJinZDeep code comment generation with hybrid lexical and syntactical informationEmpir Softw Eng20202532179221710.1007/s10664-019-09730-9 – reference: LegendrePSpecies associations: the kendall coefficient of concordance revisitedJ Agric Biol Environ Stat200510222624510.1198/108571105X46642 – reference: Niu C, Li C, Ng V, Ge J, Huang L, Luo B (2022) Spt-code: sequence-to-sequence pre-training for learning source code representations. In: Proceedings of the 44th international conference on software engineering. pp 2006–2018 – reference: HussainYHuangZZhouYWangSCodegru: Context-aware deep learning with gated recurrent unit for source code modelingInf Softw Technol202012510.1016/j.infsof.2020.106309 – reference: SunZZhuQMouLXiongYLiGZhangLA grammar-based structural cnn decoder for code generationProceedings of the AAAI conference on artificial intelligence2019337055706210.1609/aaai.v33i01.33017055 – reference: Gu Y, Han X, Liu Z, Huang M (2022) Ppt: Pre-trained prompt tuning for few-shot learning. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers). pp 8410–8423 – reference: XuFFVasilescuBNeubigGIn-ide code generation from natural language: Promise and challengesACM Trans Softw Eng Methodol (TOSEM)202231214710.1145/3487569 – reference: Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations. pp 67–72 – reference: Husain H, Wu HH, Gazit T, Allamanis M, Brockschmidt M (2019) Codesearchnet challenge: Evaluating the state of semantic code search. arXiv:1909.09436 – reference: Iyer S, Konstas I, Cheung A, Krishnamurthy J, Zettlemoyer L (2017) Learning a neural semantic parser from user feedback. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers). pp 963–973 – reference: Liu F, Li G, Zhao Y, Jin Z (2020a) Multi-task learning based pre-trained language model for code completion. In: Proceedings of the 35th IEEE/ACM international conference on automated software engineering. pp 473–485 – reference: Sánchez-Cartagena VM, Esplà-Gomis M, Pérez-Ortiz JA, Sánchez-Martínez F (2021) Rethinking data augmentation for low-resource neural machine translation: A multi-task learning approach. In: Proceedings of the 2021 conference on empirical methods in natural language processing. pp 8502–8516 – reference: WangXWangYWanYMiFLiYZhouPLiuJWuHJiangXLiuQCompilable neural code generation with compiler feedbackFindings of the Association for Computational Linguistics: ACL20222022919 – reference: Yin P, Neubig G (2018) Tranx: A transition-based neural abstract syntax parser for semantic parsing and code generation. In: Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations. pp 7–12 – reference: Hu X, Gao Z, Xia X, Lo D, Yang X (2021) Automating user notice generation for smart contract functions. In: 2021 36th IEEE/ACM international conference on automated software engineering (ASE). pp 5–17. https://doi.org/10.1109/ASE51524.2021.9678552 – reference: Liu Q, Chen Y, Chen B, Lou JG, Chen Z, Zhou B, Zhang D (2020b) You impress me: Dialogue generation via mutual persona perception. In: Proceedings of the 58th annual meeting of the association for computational linguistics. pp 1417–1427 – reference: Fernandes S, Bernardino J (2015) What is bigquery? In: Proceedings of the 19th International Database Engineering & Applications Symposium. pp 202–203 – reference: Longpre S, Hou L, Vu T, Webson A, Chung HW, Tay Y, Zhou D, Le QV, Zoph B, Wei J, et al (2023) The flan collection: Designing data and methods for effective instruction tuning. arXiv:2301.13688 – reference: YangGZhouYChenXZhangXHanTChenTExploitgen: Template-augmented exploit code generation based on codebertJ Syst Softw202319710.1016/j.jss.2022.111577 – reference: RaffelCShazeerNRobertsALeeKNarangSMatenaMZhouYLiWLiuPJExploring the limits of transfer learning with a unified text-to-text transformerJ Mach Learn Res2020211548555514138124 – reference: Allamanis M, Sutton C (2013) Why, when, and what: analyzing stack overflow questions by topic, type, and code. In: 2013 10th Working conference on mining software repositories (MSR). IEEE, pp 53–56 – reference: Zelle JM, Mooney RJ (1996) Learning to parse database queries using inductive logic programming. In: Proceedings of the national conference on artificial intelligence. pp 1050–1055 – reference: Liu Y, Tantithamthavorn C, Liu Y, Li L (2023c) On the reliability and explainability of automated code generation approaches. arXiv:2302.09587 – reference: Yang G, Chen X, Zhou Y, Yu C (2022a) Dualsc: Automatic generation and summarization of shellcode via transformer and dual learning. arXiv:2202.09785 – reference: Mahmud T, Hasan KA, Ahmed M, Chak THC (2015) A rule based approach for nlp based query processing. In: 2015 2nd International conference on electrical information and communication technologies (EICT). IEEE, pp 78–82 – reference: Guo D, Ren S, Lu S, Feng Z, Tang D, Liu S, Zhou L, Duan N, Svyatkovskiy A, Fu S et al (2021) Graphcodebert: Pre-training code representations with data flow. In: ICLR – reference: HussainYHuangZZhouYWangSDeep transfer learning for source code modelingInt J Softw Eng Knowl Eng2020300564966810.1142/S0218194020500230 – reference: Lu S, Guo D, Ren S, Huang J, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang D, Tang D et al. (2021a) Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv:2102.04664 – reference: Yang G, Chen X, Cao J, Xu S, Cui Z, Yu C, Liu K (2021a) Comformer: Code comment generation via transformer and fusion method-based hybrid code representation. In: 2021 8th International conference on dependable systems and their applications (DSA). IEEE, pp 30–41 – reference: Mou L, Men R, Li G, Zhang L, Jin Z (2015) On end-to-end program generation from user intention by deep neural networks. arXiv:1510.07211 – reference: Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30 – reference: Huang J, Wang Y, Wang Y, Dong Y, Xiao Y (2021) Relation aware semi-autoregressive semantic parsing for nl2sql. arXiv:2108.00804 – reference: Bogin B, Berant J, Gardner M (2019) Representing schema structure with graph neural networks for text-to-sql parsing. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. pp 4560–4565 – reference: Wilcoxon F (1992) Individual comparisons by ranking methods. In: Breakthroughs in statistics. Springer, pp 196–202 – reference: Hayati SA, Olivier R, Avvaru P, Yin P, Tomasic A, Neubig G (2018) Retrieval-based neural code generation. In: Proceedings of the 2018 conference on empirical methods in natural language processing. pp 925–930 – reference: Le H, Wang Y, Gotmare AD, Savarese S, Hoi SC (2022) Coderl: Mastering code generation through pretrained models and deep reinforcement learning. arXiv:2207.01780 – reference: Yang G, Zhou Y, Chen X, Yu C (2021b) Fine-grained pseudo-code generation method via code feature extraction and transformer. In: 2021 28th Asia-pacific software engineering conference (APSEC). IEEE, pp 213–222 – reference: Lloyd JW (1994) Practical advtanages of declarative programming. In: GULP-PRODE (1). pp 18–30 – reference: Wang C, Yang Y, Gao C, Peng Y, Zhang H, Lyu MR (2022a) No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence. In: Proceedings of the 30th ACM joint European software engineering conference and symposium on the foundations of software engineering. pp 382–394 – reference: Xie R, Ye W, Sun J, Zhang S (2021) Exploiting method names to improve code summarization: A deliberation multi-task learning approach. In: 2021 IEEE/ACM 29th international conference on program comprehension (ICPC). IEEE, pp 138–148 – reference: Dahl DA, Bates M, Brown MK, Fisher WM, Hunicke-Smith K, Pallett DS, Pao C, Rudnicky A, Shriberg E (1994) Expanding the scope of the atis task: The atis-3 corpus. In: Human language technology: proceedings of a workshop held at Plainsboro, New Jersey, March 8-11, 1994 – reference: SunZZhuQXiongYSunYMouLZhangLTreegen: A tree-based transformer architecture for code generationProc AAAI Conf Art Intell20203489848991 – reference: Ahmad W, Chakraborty S, Ray B, Chang KW (2021) Unified pre-training for program understanding and generation. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp 2655–2668 – reference: Yu T, Zhang R, Yang K, Yasunaga M, Wang D, Li Z, Ma J, Li I, Yao Q, Roman S et al. (2018b) Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In: Proceedings of the 2018 conference on empirical methods in natural language processing. pp 3911–3921 – reference: Popescu AM, Etzioni O, Kautz H (2003) Towards a theory of natural language interfaces to databases. In: Proceedings of the 8th international conference on intelligent user interfaces. pp 149–157 – reference: Yang G, Zhou Y, Chen X, Zhang X, Han T, Chen T (2022c) Exploitgen: Template-augmented exploit code generation based on codebert. J Syst Softw 111577 – reference: Yu T, Zhang R, Er H, Li S, Xue E, Pang B, Lin XV, Tan YC, Shi T, Li Z et al. (2019a) Cosql: A conversational text-to-sql challenge towards cross-domain natural language interfaces to databases. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). pp 1962–1979 – reference: Liu F, Li J, Zhang L (2023a) Syntax and domain aware model for unsupervised program translation. arXiv:2302.03908 – reference: LinXVSocherRXiongCBridging textual and tabular data for cross-domain text-to-sql semantic parsingFindings of the Association for Computational Linguistics: EMNLP2020202048704888 – reference: RadfordAWuJChildRLuanDAmodeiDSutskeverILanguage models are unsupervised multitask learnersOpenAI blog2019189 – reference: Yu T, Zhang R, Yasunaga M, Tan YC, Lin XV, Li S, Er H, Li I, Pang B, Chen T et al (2019b) Sparc: Cross-domain semantic parsing in context. In: Proceedings of the 57th annual meeting of the association for computational linguistics. pp 4511–4523 – reference: Eghbali A, Pradel M (2022) Crystalbleu: precisely and efficiently measuring the similarity of code. In: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings. pp 341–342 – reference: HussainYHuangZZhouYImproving source code suggestion with code embedding and enhanced convolutional long short-term memoryIET Softw202115319921310.1049/sfw2.12017 – reference: Gifford DK, Lucassen JM (1986) Integrating functional and imperative programming. In: Proceedings of the 1986 ACM conference on LISP and functional programming. pp 28–38 – reference: Wang B, Shin R, Liu X, Polozov O, Richardson M (2020) Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers. In: Proceedings of the 58th annual meeting of the association for computational linguistics. pp 7567–7578 – reference: Yang G, Chen X, Zhou Y, Yu C (2022b) Dualsc: Automatic generation and summarization of shellcode via transformer and dual learning. In: IEEE international conference on software analysis, evolution and reengineering, SANER 2022, Honolulu, HI, USA, March 15-18, 2022. IEEE, pp 361–372. https://doi.org/10.1109/SANER53432.2022.00052 – ident: 10372_CR23 – ident: 10372_CR48 doi: 10.1145/3510003.3510096 – ident: 10372_CR5 doi: 10.1145/3540250.3549162 – ident: 10372_CR65 – ident: 10372_CR78 doi: 10.18653/v1/N18-2093 – ident: 10372_CR6 doi: 10.3115/1075812.1075823 – ident: 10372_CR42 – ident: 10372_CR49 doi: 10.3115/1073083.1073135 – ident: 10372_CR55 doi: 10.18653/v1/2021.emnlp-main.669 – ident: 10372_CR47 – ident: 10372_CR4 doi: 10.18653/v1/P19-1448 – ident: 10372_CR22 – ident: 10372_CR7 – ident: 10372_CR32 doi: 10.24963/ijcai.2022/588 – ident: 10372_CR53 – ident: 10372_CR43 – volume: 2022 start-page: 9 year: 2022 ident: 10372_CR63 publication-title: Findings of the Association for Computational Linguistics: ACL – ident: 10372_CR80 doi: 10.18653/v1/D19-1204 – volume: 30 start-page: 649 issue: 05 year: 2020 ident: 10372_CR25 publication-title: Int J Softw Eng Knowl Eng doi: 10.1142/S0218194020500230 – ident: 10372_CR50 doi: 10.1145/604045.604120 – volume: 55 start-page: 1 issue: 9 year: 2023 ident: 10372_CR39 publication-title: ACM Comput Surv doi: 10.1145/3560815 – volume: 15 start-page: 199 issue: 3 year: 2021 ident: 10372_CR26 publication-title: IET Softw doi: 10.1049/sfw2.12017 – ident: 10372_CR70 doi: 10.18653/v1/2022.findings-naacl.141 – ident: 10372_CR75 doi: 10.1016/j.jss.2022.111577 – ident: 10372_CR38 doi: 10.1109/ICSE48619.2023.00072 – ident: 10372_CR81 doi: 10.18653/v1/P19-1443 – ident: 10372_CR15 – ident: 10372_CR16 doi: 10.18653/v1/2022.acl-long.499 – volume: 34 start-page: 8984 year: 2020 ident: 10372_CR58 publication-title: Proc AAAI Conf Art Intell – ident: 10372_CR9 doi: 10.1145/3510454.3528648 – volume: 33 start-page: 7055 year: 2019 ident: 10372_CR57 publication-title: Proceedings of the AAAI conference on artificial intelligence doi: 10.1609/aaai.v33i01.33017055 – ident: 10372_CR64 doi: 10.18653/v1/2021.emnlp-main.685 – ident: 10372_CR72 doi: 10.1109/APSEC53868.2021.00029 – volume: 2020 start-page: 4870 year: 2020 ident: 10372_CR34 publication-title: Findings of the Association for Computational Linguistics: EMNLP – ident: 10372_CR60 doi: 10.18653/v1/2020.acl-main.677 – volume: 25 start-page: 2179 issue: 3 year: 2020 ident: 10372_CR18 publication-title: Empir Softw Eng doi: 10.1007/s10664-019-09730-9 – ident: 10372_CR21 – ident: 10372_CR17 doi: 10.18653/v1/D18-1111 – ident: 10372_CR44 – ident: 10372_CR29 – ident: 10372_CR11 doi: 10.1145/2790755.2790797 – volume: 21 start-page: 5485 issue: 1 year: 2020 ident: 10372_CR52 publication-title: J Mach Learn Res – volume: 125 year: 2020 ident: 10372_CR24 publication-title: Inf Softw Technol doi: 10.1016/j.infsof.2020.106309 – ident: 10372_CR13 doi: 10.1145/319838.319848 – ident: 10372_CR46 doi: 10.1109/EICT.2015.7391926 – volume: 31 start-page: 1 issue: 2 year: 2022 ident: 10372_CR69 publication-title: ACM Trans Softw Eng Methodol (TOSEM) doi: 10.1145/3487569 – ident: 10372_CR77 doi: 10.18653/v1/D18-2002 – ident: 10372_CR8 doi: 10.48550/ARXIV.2211.00818 – ident: 10372_CR71 doi: 10.1109/DSA52907.2021.00013 – ident: 10372_CR62 doi: 10.1109/SANER50967.2021.00014 – ident: 10372_CR82 – ident: 10372_CR3 – ident: 10372_CR14 doi: 10.18653/v1/2022.acl-long.576 – ident: 10372_CR31 doi: 10.18653/v1/2021.acl-long.353 – ident: 10372_CR66 doi: 10.1007/978-1-4612-4380-9_16 – ident: 10372_CR28 doi: 10.18653/v1/P17-4012 – ident: 10372_CR79 doi: 10.18653/v1/D18-1425 – ident: 10372_CR1 doi: 10.18653/v1/2021.naacl-main.211 – ident: 10372_CR73 doi: 10.1109/SANER53432.2022.00052 – ident: 10372_CR45 – ident: 10372_CR33 doi: 10.1109/ISSRE52982.2021.00042 – ident: 10372_CR20 doi: 10.1145/3510003.3510152 – ident: 10372_CR19 doi: 10.1109/ASE51524.2021.9678552 – volume: 197 year: 2023 ident: 10372_CR76 publication-title: J Syst Softw doi: 10.1016/j.jss.2022.111577 – ident: 10372_CR40 doi: 10.18653/v1/2020.acl-main.131 – ident: 10372_CR74 doi: 10.1109/SANER53432.2022.00052 – ident: 10372_CR68 doi: 10.1109/ICPC52881.2021.00022 – volume: 10 start-page: 226 issue: 2 year: 2005 ident: 10372_CR30 publication-title: J Agric Biol Environ Stat doi: 10.1198/108571105X46642 – ident: 10372_CR35 doi: 10.18653/v1/P16-1057 – ident: 10372_CR54 doi: 10.18653/v1/2021.spnlp-1.2 – ident: 10372_CR83 – ident: 10372_CR56 doi: 10.18653/v1/2021.emnlp-main.779 – ident: 10372_CR59 – ident: 10372_CR2 doi: 10.1109/MSR.2013.6624004 – ident: 10372_CR36 doi: 10.1145/3324884.3416591 – volume: 27 start-page: 1 issue: 4 year: 2022 ident: 10372_CR37 publication-title: Emp Softw Eng – volume: 2020 start-page: 1536 year: 2020 ident: 10372_CR10 publication-title: Findings of the Association for Computational Linguistics: EMNLP – ident: 10372_CR27 doi: 10.18653/v1/P17-1089 – ident: 10372_CR41 – ident: 10372_CR61 doi: 10.1145/3540250.3549113 – ident: 10372_CR12 doi: 10.18653/v1/2021.acl-long.295 – ident: 10372_CR67 doi: 10.18653/v1/D16-1137 – volume: 1 start-page: 9 issue: 8 year: 2019 ident: 10372_CR51 publication-title: OpenAI blog
SSID	ssj0009745
Score	2.4111724
Snippet	Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated...
SourceID	proquest crossref springer
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	141
SubjectTerms	Algorithms Compilers Computer Science Constraint modelling Decoding Interpreters Programming Languages Representations Software Engineering/Programming and Operating Systems Syntax
SummonAdditionalLinks	– databaseName: Engineering Database dbid: M7S link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LSwMxEA5aPXixPrFaJQdvGmyyadI9SRGLBykFK_S2bF6lWLa1uxX7703SrIuCvXjdR1jyzWRmk5nvA-BaGcwEIxKRNI0QFbKNhEwF4jb6MGZUy0hPmf_M-_3OaBQPwoZbHsoqyzXRL9RqJt0e-Z3N7Lmjq6Ot-_k7cqpR7nQ1SGhsgx3HktDypXsvFeku9yLFjmYPRTa2h6aZ0DrHGEU2YiHXKUcQ_hmYqmzz1wGpjzu9-n-_-ADsh4wTdtcmcgi2dHYE6qWaAwzOfQwGXZivsiL9ROPlRGkFfa0hKtL8DQZtiTEsKcihzXXhcLlQriwjQ3mxmmro2uPh2PNYO7hPwGvvcfjwhILeApLWEQskObcwaZvAERVJLA1WynSoETGWsexEMqXYJoTG_rIIqu2ExpyIiAsWm5gaFkenoJbNMn0GIKYaU3vVKS1SQbggjimwExtHEEZYuwFwOdmJDGTkThNjmlQ0yg6gxAKUeIAS3AA33-_M11QcG59ulqgkwS3zpIKkAW5LXKvbf492vnm0C7BHvCm5rZkmqBWLpb4Eu_KjmOSLK2-UX4lB5eA priority: 102 providerName: ProQuest
Title	A syntax-guided multi-task learning approach for Turducken-style code generation
URI	https://link.springer.com/article/10.1007/s10664-023-10372-1 https://www.proquest.com/docview/2877034240
Volume	28
WOSCitedRecordID	wos001082634100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 1573-7616 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0009745 issn: 1382-3256 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3PS8MwFA6yefDi_InTOXLwpoElzZLmOGXiQUbZpgwvpUmbMRxV1k7cf2-StVZFBb300KahvJfX917y3vcBcBZrzCQjCpEo8hCVqoukiiTixvswpuOOVg4y_5YPBv5kIoKiKSwrq93LI0n3p_7Q7MYYRcbHINvbRpDJeeoWvMTyFgxH9xXULnfUxBZcD3nGoxetMt_P8dkdVTHml2NR522uG__7zh2wXUSXsLdeDrtgI0n3QKNkboCFIe-DoAezVZpHr2i6nMVJDF1dIcqj7BEWPBJTWMKNQxPXwvFyEdsSjBRl-WqeQNsKD6cOs9qq9gDcXffHVzeo4FZAyhhdjhTnRiWJCdZI7CmsNI5j7VMtBVZC-Z6KKDbBnzbpiaSJEaPgRHpcMqEF1Ux4h6CWPqXJEYCYJpiau5ZVkUrCJbGogL7QFgyMsG4T4FLEoSqAxy3_xTysIJOtyEIjstCJLMRNcP7-zvMaduPX0a1Sc2FhglloUkFu8Q1ppwkuSk1Vj3-e7fhvw0_AFnHKttsyLVDLF8vkFGyql3yWLdqgftkfBMO2rSkdmWvQfWi75foGK47fYA
linkProvider	Springer Nature
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3LbtQwFL2qChJsKE91SgEvYAUWteOxJwuEKqBq1WE0i0Gq2IT4Naqo0naSAean-Ebu9SREINFdF2yT2Eri4-vjxz0H4LmPQlstHZdlmXFl3ZBbV1pucPTROvq96JJk_thMJqOTk3y6AT-7XBg6VtnFxBSo_bmjNfLXyOwNydWpvbcXl5xco2h3tbPQWMPiOKy-45StfnP0Htv3hZQHH2bvDnnrKsAdwq3hzhh8mYA0RfrMCReF93Gkos2Fy90oc6USSHsiEnOrQobTAyNtZqzOY66iJvElDPk3kEbkFAimw8-9yK9Jpsgk68exoG6TdNpUPa0VxxGSU2ae5OLPgbBnt39tyKZx7mDrf_tDd-FOy6jZ_roL3IONUN2Hrc6tgrXB6wFM91m9qpryB58vT33wLJ2l5E1Zf2Wtd8acdRLrDLk8my0Xno6dVLxuVmeBUfo_myedboLzQ_h0Ld_1CDar8ypsAxMqCIVXyUlSWWmsJCXEUR5JAE3q4QBE17iFa8XWyfPjrOhlogkQBQKiSIAoxABe_i5zsZYaufLp3Q4FRRt26qKHwABedTjqb_-7tp2ra3sGtw5nH8fF-Ghy_BhuywRjWobahc1msQxP4Kb71pzWi6epQzD4ct34-gUd2UN9
linkToPdf	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Nb9QwEB1VLUJcKJ9iSwEf4ARW145rbw4IFdoVVavVChWptxB_rSqqtGyyhf1r_Dpmsg4RSPTWA9cktpL4eWZsz7wH8NJHoa2WjsuyzLiybpdbV1pu0PtoHf0wupYy_9hMJqPT03y6Bj-7WhhKq-xsYmuo_YWjPfIdjOwN0dWp4U5MaRHT_fG7y2-cFKTopLWT01hB5Cgsv-PyrX57uI9j_UrK8cHJh488KQxwh9BruDMGXyxgyCJ95oSLwvs4UtHmwuVulLlSCQyBIgbpVoUMlwpG2sxYncdcRU1ETGj-N0yWD3F2bbw_mEw_9ZS_ppVIJpI_jk11KtlJhXtaK47-klOdnuTiT7fYx7p_Hc-2Xm-8-T__r3twN8XabG81Oe7DWqgewGanY8GSWXsI0z1WL6um_MFnizMfPGuzLHlT1l9ZUtWYsY58nWGUz04Wc08JKRWvm-V5YEQMwGYtgzcB_RF8vpHvegzr1UUVngATKgiFV0ljUllprCSOxFEeiRpN6t0BiG6gC5do2EkN5LzoCaQJHAWCo2jBUYgBvP7d5nJFQnLt09sdIopkkOqih8MA3nSY6m__u7et63t7AbcRVsXx4eToKdyRLaJpf2ob1pv5IjyDW-6qOavnz9PsYPDlpgH2C2fITXU
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+syntax-guided+multi-task+learning+approach+for+Turducken-style+code+generation&rft.jtitle=Empirical+software+engineering+%3A+an+international+journal&rft.au=Yang%2C+Guang&rft.au=Zhou%2C+Yu&rft.au=Chen%2C+Xiang&rft.au=Zhang%2C+Xiangyu&rft.date=2023-11-01&rft.pub=Springer+US&rft.issn=1382-3256&rft.eissn=1573-7616&rft.volume=28&rft.issue=6&rft_id=info:doi/10.1007%2Fs10664-023-10372-1&rft.externalDocID=10_1007_s10664_023_10372_1
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1382-3256&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1382-3256&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1382-3256&client=summon