Rocks Coding, Not Development: A Human-Centric, Experimental Evaluation of LLM-Supported SE Tasks

Recently, large language models (LLM) based generative AI has been gaining momentum for their impressive high-quality performances in multiple domains, particularly after the release of the ChatGPT. Many believe that they have the potential to perform general-purpose problem-solving in software deve...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the ACM on software engineering Vol. 1; no. FSE; pp. 699 - 721
Main Authors:	Wang, Wei, Ning, Huilong, Zhang, Gaowei, Liu, Libo, Wang, Yi
Format:	Journal Article
Language:	English
Published:	New York, NY, USA ACM 12.07.2024
Subjects:	Human-centered computing Laboratory experiments Software and its engineering Software development methods Software development techniques controlled experiment large langauge models human-AI collaboration software development task
ISSN:	2994-970X, 2994-970X
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Recently, large language models (LLM) based generative AI has been gaining momentum for their impressive high-quality performances in multiple domains, particularly after the release of the ChatGPT. Many believe that they have the potential to perform general-purpose problem-solving in software development and replace human software developers. Nevertheless, there are in a lack of serious investigation into the capability of these LLM techniques in fulfilling software development tasks. In a controlled 2 × 2 between-subject experiment with 109 participants, we examined whether and to what degree working with ChatGPT was helpful in the coding task and typical software development task and how people work with ChatGPT. We found that while ChatGPT performed well in solving simple coding problems, its performance in supporting typical software development tasks was not that good. We also observed the interactions between participants and ChatGPT and found the relations between the interactions and the outcomes. Our study thus provides first-hand insights into using ChatGPT to fulfill software engineering tasks with real-world developers and motivates the need for novel interaction mechanisms that help developers effectively work with large language models to achieve desired outcomes.
AbstractList	Recently, large language models (LLM) based generative AI has been gaining momentum for their impressive high-quality performances in multiple domains, particularly after the release of the ChatGPT. Many believe that they have the potential to perform general-purpose problem-solving in software development and replace human software developers. Nevertheless, there are in a lack of serious investigation into the capability of these LLM techniques in fulfilling software development tasks. In a controlled 2 × 2 between-subject experiment with 109 participants, we examined whether and to what degree working with ChatGPT was helpful in the coding task and typical software development task and how people work with ChatGPT. We found that while ChatGPT performed well in solving simple coding problems, its performance in supporting typical software development tasks was not that good. We also observed the interactions between participants and ChatGPT and found the relations between the interactions and the outcomes. Our study thus provides first-hand insights into using ChatGPT to fulfill software engineering tasks with real-world developers and motivates the need for novel interaction mechanisms that help developers effectively work with large language models to achieve desired outcomes.
ArticleNumber	32
Author	Wang, Yi Wang, Wei Ning, Huilong Zhang, Gaowei Liu, Libo
Author_xml	– sequence: 1 givenname: Wei orcidid: 0000-0003-3240-343X surname: Wang fullname: Wang, Wei email: weiwang@bupt.edu.cn organization: Beijing University of Posts and Telecommunications, Beijing, China – sequence: 2 givenname: Huilong orcidid: 0009-0001-9393-6507 surname: Ning fullname: Ning, Huilong email: nulogn@bupt.edu.cn organization: Beijing University of Posts and Telecommunications, Beijing, China – sequence: 3 givenname: Gaowei orcidid: 0009-0006-5767-2280 surname: Zhang fullname: Zhang, Gaowei email: zhanggaowei@bupt.edu.cn organization: Beijing University of Posts and Telecommunications, Beijing, China – sequence: 4 givenname: Libo orcidid: 0000-0002-0136-8902 surname: Liu fullname: Liu, Libo email: libo.liu@unimelb.edu.au organization: University of Melbourne, Melbourne, Australia – sequence: 5 givenname: Yi orcidid: 0000-0003-1321-4035 surname: Wang fullname: Wang, Yi email: wang@cocolabs.org organization: Beijing University of Posts and Telecommunications, Beijing, China
BookMark	eNptkM9LwzAUx4NMcM7h3VNuXhZN2mxpvI2uOqEquAneyjNNpK5rStIO_e_t3BQRT-_H98Pj-77HqFfZSiN0yugFY3x8GU54KMbRAeoHUnIiBX3u_eqP0ND7N0ppt2FM0D6CR6tWHsc2L6rXEb63DZ7pjS5tvdZVc4WneN6uoSJxN7lCjXDyXmtXbEUocbKBsoWmsBW2BqfpHVm0dW1do3O8SPAS_MqfoEMDpdfDfR2gp-tkGc9J-nBzG09TAiwIIqI0HxujwUQhMACZTwIujIRcABiqJsAkkyJSlOfGCKNoYCIJL4oZpnMuaThAZHdXOeu90yZTRfPlrXFQlBmj2TaibB9Rx5__4evuL3Af_5BnOxLU-gf6Fj8B7aNv7A
CitedBy_id	crossref_primary_10_3390_app15126836 crossref_primary_10_14500_aro_12159 crossref_primary_10_3390_computers14050185 crossref_primary_10_1080_0144929X_2025_2478278
Cites_doi	10.1007/978-3-642-04898-2_616 10.1145/3411764.3445734 10.1145/3334480.3381069 10.1146/annurev-soc-071811-145443 10.1007/s10664-022-10160-3 10.1145/3491102.3502030 10.1145/3313831.3376442 10.1145/3480027 10.1145/3597503.3608128 10.2307/2287653 10.1145/3570220 10.1038/d41586-023-00288-7 10.1145/3490099.3511157 10.18653/v1 10.3102/10769986025002101 10.1145/3544548.3580919 10.1145/3411764.3445432 10.1007/978-1-84800-044-5_8 10.1007/978-1-4757-3304-4 10.1109/THFE2.1960.4503259 10.1177/154193120605000909 10.1145/3422622 10.4135/9781483384733 10.1109/MC.1987.1663532 10.1162/neco.1997.9.8.1735 10.1145/3600211.3604712 10.1145/3487569 10.1145/3491101.3519665 10.1145/3491102.3517582 10.1109/TSE.1986.6312975 10.1109/MC.2020.2996587 10.1145/3576915.3623157 10.1109/MC.2016.200 10.5555/2818754.2818836 10.1145/3397481.3450656 10.1145/3359313 10.1145/325737.325845 10.1145/2145204.2145329 10.1145/3613904.3641936 10.1109/ESEM.2017.27 10.1145/3520312.3534864 10.1007/978-3-642-29044-2 10.1145/3544548.3580817 10.1145/2568225.2568266 10.1109/TSE.2022.3156071 10.1007/s10664-017-9523-3 10.21606/drs.2020.282 10.1145/3387111 10.21437/Interspeech.2010-343 10.1109/TSE.2005.97 10.1145/3555212
ContentType	Journal Article
Copyright	Owner/Author
Copyright_xml	– notice: Owner/Author
DBID	AAYXX CITATION
DOI	10.1145/3643758
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2994-970X
EndPage	721
ExternalDocumentID	10_1145_3643758 3643758
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 62076232, 62172049 funderid: https:\/\/doi.org\/10.13039\/501100001809
GroupedDBID	AAKMM ACM AEJOY AKRVB ALMA_UNASSIGNED_HOLDINGS LHSKQ M~E AAYXX CITATION ROL
ID	FETCH-LOGICAL-a1228-ce45ffeaf83a1aa9d6247f9ad7aaf0c6a191978c04dff7fc02f89abc1f1ed4903
ISSN	2994-970X
IngestDate	Sat Nov 29 07:50:29 EST 2025 Tue Nov 18 22:23:45 EST 2025 Mon Jul 14 20:49:06 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	FSE
Keywords	controlled experiment large langauge models human-AI collaboration software development task
Language	English
License	This work is licensed under a Creative Commons Attribution International 4.0 License. https://creativecommons.org/licenses/by/4.0
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-a1228-ce45ffeaf83a1aa9d6247f9ad7aaf0c6a191978c04dff7fc02f89abc1f1ed4903
ORCID	0000-0002-0136-8902 0000-0003-3240-343X 0009-0001-9393-6507 0009-0006-5767-2280 0000-0003-1321-4035
OpenAccessLink	https://dl.acm.org/doi/10.1145/3643758
PageCount	23
ParticipantIDs	crossref_citationtrail_10_1145_3643758 crossref_primary_10_1145_3643758 acm_primary_3643758
PublicationCentury	2000
PublicationDate	20240712 2024-07-12
PublicationDateYYYYMMDD	2024-07-12
PublicationDate_xml	– month: 07 year: 2024 text: 20240712 day: 12
PublicationDecade	2020
PublicationPlace	New York, NY, USA
PublicationPlace_xml	– name: New York, NY, USA
PublicationTitle	Proceedings of the ACM on software engineering
PublicationTitleAbbrev	ACM PACMSE
PublicationYear	2024
Publisher	ACM
Publisher_xml	– name: ACM
References	Natalia Juristo and Ana M Moreno. 2013. Basics of Software Engineering Experimentation. Springer New York, New York, NY, USA. https://doi.org/10.1007/978-1-4757-3304-4 10.1007/978-1-4757-3304-4 Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, and Alex Ray. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35 (2022), 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf Justin D. Weisz, Michael Muller, Steven I. Ross, Fernando Martinez, Stephanie Houde, Mayank Agarwal, Kartik Talamadupula, and John T. Richards. 2022. Better Together? An Evaluation of AI-Supported Code Translation. In 27th International Conference on Intelligent User Interfaces (IUI ’22). Association for Computing Machinery, New York, NY, USA. 369–391. isbn:9781450391443 https://doi.org/10.1145/3490099.3511157 10.1145/3490099.3511157 Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with ChatGPT. arxiv:2302.11382. arxiv:2302.11382 Sandra G Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the 50th Human Factors and Ergonomics Society Annual Meeting (Ergonomics ’06). 50, Sage Publications, Los Angeles, CA, USA. 904–908. https://doi.org/10.1177/154193120605000909 10.1177/154193120605000909 Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit Test Case Generation with Transformers. CoRR, abs/2009.05617 (2020), arXiv:2009.05617. arxiv:2009.05617 Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374. arxiv:2107.03374 Denis Rothman and Antonio Gulli. 2022. Transformers for Natural Language Processing: Build, train, and fine-tune deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3 (2. ed.). Packt Publishing Ltd. https://www.packtpub.com/en-PT/product/transformers-for-natural-language-processing-second-edition-9781803247335 Junyoung Chung, Çaglar Gülçehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR, abs/1412.3555 (2014), arXiv:1412.3555. arxiv:1412.3555 Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI collaboration in data science: Exploring data scientists’ perceptions of automated AI. Proc. ACM Hum.-Comput. Interact., 3, CSCW (2019), Article 211, nov, 24 pages. https://doi.org/10.1145/3359313 10.1145/3359313 Zhendong Wang, Yang Feng, Yi Wang, James A. Jones, and David Redmiles. 2020. Unveiling Elite Developers’ Activities in Open Source Projects. ACM Trans. Softw. Eng. Methodol., 29, 3 (2020), Article 16, 35 pages. issn:1049-331X https://doi.org/10.1145/3387111 10.1145/3387111 Bruin Rugge, Howard Balshem, Raj Sehgal, Rose Relevo, Paul Gorman, and Mark Helfand. 2012. Screening and treatment of subclinical hypothyroidism or hyperthyroidism. Comparative Effectiveness Reviews, https://www.ncbi.nlm.nih.gov/books/NBK83492 Jessica Pater, Amanda Coupe, Rachel Pfafman, Chanda Phelan, Tammy Toscos, and Maia Jacobs. 2021. Standardizing Reporting of Participant Compensation in HCI: A Systematic Literature Review and Recommendations for the Field. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA. Article 141, 16 pages. isbn:9781450380966 https://doi.org/10.1145/3411764.3445734 10.1145/3411764.3445734 Michelle Brachman, Zahra Ashktorab, Michael Desmond, Evelyn Duesterwald, Casey Dugan, Narendra Nath Joshi, Qian Pan, and Aabhas Sharma. 2022. Reliance and Automation for Human-AI Collaborative Data Labeling Conflict Resolution. Proceedings of the ACM on Human-Computer Interaction, 6, CSCW2 (2022), Article 321, 27 pages. https://doi.org/10.1145/3555212 10.1145/3555212 Donald B Rubin. 1980. Randomization analysis of experimental data: The Fisher randomization test comment. J. Amer. Statist. Assoc., 75, 371 (1980), 591–593. https://doi.org/10.2307/2287653 10.2307/2287653 Yi-Chia Wang, Robert Kraut, and John M. Levine. 2012. To stay or leave? the relationship of emotional and informational support to commitment in online health support groups. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW ’12). Association for Computing Machinery, New York, NY, USA. 833–842. isbn:9781450310864 https://doi.org/10.1145/2145204.2145329 10.1145/2145204.2145329 Victor R Basili, Richard W Selby, and David H Hutchens. 1986. Experimentation in software engineering. IEEE Trans. Softw. Eng., 12, 1 (1986), 733–743. issn:0098-5589 https://doi.org/10.1109/TSE.1986.6312975 10.1109/TSE.1986.6312975 Joseph CR Licklider. 1960. Man-computer symbiosis. IRE Transactions on Human Factors in Electronics, 4–11. https://doi.org/10.1109/THFE2.1960.4503259 10.1109/THFE2.1960.4503259 Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA. Article 455, 23 pages. isbn:9781450394215 https://doi.org/10.1145/3544548.3580919 10.1145/3544548.3580919 Justin D. Weisz, Michael Muller, Stephanie Houde, John Richards, Steven I. Ross, Fernando Martinez, Mayank Agarwal, and Kartik Talamadupula. 2021. Perfection Not Required? Human-AI Partnerships in Code Translation. In 26th International Conference on Intelligent User Interfaces (IUI ’21). Association for Computing Machinery, New York, NY, USA. 402–412. isbn:9781450380171 https://doi.org/10.1145/3397481.3450656 10.1145/3397481.3450656 Frank F. Xu, Bogdan Vasilescu, and Graham Neubig. 2022. In-IDE Code Generation from Natural Language: Promise and Challenges. ACM Trans. Softw. Eng. Methodol., 31, 2 (2022), Article 29, 47 pages. issn:1049-331X https://doi.org/10.1145/3487569 10.1145/3487569 Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139 10.18653/v1/2020.findings-emnlp.139 Albert Ziegler, Eirini Kalliamvakou, X. Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2022. Productivity assessment of neural code completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (MAPS ’22). Association for Computing Machinery, New York, NY, USA. 21–29. isbn:9781450392730 https://doi.org/10.1145/3520312.3534864 10.1145/3520312.3534864 András Vargha and Harold D Delaney. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25, 2 (2000), 101–132. https://doi.org/10.3102/10769986025002101 10.3102/10769986025002101 Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. Springer Berlin, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29044-2 10.1007/978-3-642-29044-2 Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: A user study on the security implications of large language model code assistants. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security ’23). USENIX Association, Anaheim, CA. 2205–2222. isbn:978-1-939133-37-3 https://www.usenix.org/conference/usenixsecurity23/presentation/sandoval Ali Borji. 2023. Generated faces in the wild: Quantitative comparison of stable diffusion, Midjourney and DALL-E 2. arxiv:2210.00586. arxiv:2210.00586 David Vadas and James R. Curran. 2005. Programming With Unrestricted Natural Language. In Proceedings of the 2005 Australasian Language Technology Association Workshop (ATLA ’05). Sydney, Australia. 191–199. https://aclanthology.org/U05-1027 Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada. 440–450. https://doi.org/10.18653/v1/P17-1041 10.18653/v1/P17-1041 Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM, 6 Vaswani Ashish (e_1_2_1_52_1) 2017; 31 e_1_2_1_60_1 e_1_2_1_20_1 e_1_2_1_66_1 e_1_2_1_68_1 Radford Alec (e_1_2_1_38_1) 2021; 8763 Winer Ben James (e_1_2_1_63_1) e_1_2_1_24_1 e_1_2_1_22_1 e_1_2_1_43_1 e_1_2_1_64_1 e_1_2_1_28_1 e_1_2_1_49_1 e_1_2_1_26_1 Chung Junyoung (e_1_2_1_9_1) 2014 Guo Daya (e_1_2_1_16_1) 2021 e_1_2_1_31_1 Tufano Michele (e_1_2_1_47_1) 2020 e_1_2_1_54_1 e_1_2_1_8_1 e_1_2_1_56_1 e_1_2_1_6_1 e_1_2_1_12_1 e_1_2_1_35_1 e_1_2_1_50_1 e_1_2_1_4_1 e_1_2_1_10_1 e_1_2_1_33_1 e_1_2_1_2_1 e_1_2_1_39_1 e_1_2_1_14_1 e_1_2_1_37_1 e_1_2_1_58_1 e_1_2_1_18_1 Rothman Denis (e_1_2_1_41_1) 1803 Sandoval Gustavo (e_1_2_1_45_1) 2023 Vadas David (e_1_2_1_48_1) e_1_2_1_42_1 e_1_2_1_65_1 e_1_2_1_40_1 e_1_2_1_67_1 e_1_2_1_23_1 e_1_2_1_46_1 e_1_2_1_61_1 e_1_2_1_21_1 e_1_2_1_44_1 Ouyang Long (e_1_2_1_34_1) 2022; 35 e_1_2_1_27_1 e_1_2_1_25_1 e_1_2_1_69_1 e_1_2_1_29_1 e_1_2_1_7_1 e_1_2_1_30_1 e_1_2_1_55_1 e_1_2_1_5_1 e_1_2_1_57_1 e_1_2_1_3_1 e_1_2_1_13_1 e_1_2_1_51_1 e_1_2_1_1_1 e_1_2_1_11_1 e_1_2_1_32_1 e_1_2_1_53_1 e_1_2_1_17_1 e_1_2_1_15_1 e_1_2_1_36_1 White Jules (e_1_2_1_62_1) 2023 e_1_2_1_59_1 e_1_2_1_19_1
References_xml	– reference: Sandra G Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the 50th Human Factors and Ergonomics Society Annual Meeting (Ergonomics ’06). 50, Sage Publications, Los Angeles, CA, USA. 904–908. https://doi.org/10.1177/154193120605000909 10.1177/154193120605000909 – reference: Yang Yue, Yi Wang, and David Redmiles. 2023. Off to a Good Start: Dynamic Contribution Patterns and Technical Success in an OSS Newcomer’s Early Career. IEEE Trans. Softw. Eng., 49, 2 (2023), 529–548. https://doi.org/10.1109/TSE.2022.3156071 10.1109/TSE.2022.3156071 – reference: Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie LIU, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCodeBERT: Pre-training Code Representations with Data Flow. In Proceedings of the International Conference on Learning Representations (ICLR ’21). https://openreview.net/forum?id=jLoC4ez43PZ – reference: Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA. Article 455, 23 pages. isbn:9781450394215 https://doi.org/10.1145/3544548.3580919 10.1145/3544548.3580919 – reference: Michelle Brachman, Zahra Ashktorab, Michael Desmond, Evelyn Duesterwald, Casey Dugan, Narendra Nath Joshi, Qian Pan, and Aabhas Sharma. 2022. Reliance and Automation for Human-AI Collaborative Data Labeling Conflict Resolution. Proceedings of the ACM on Human-Computer Interaction, 6, CSCW2 (2022), Article 321, 27 pages. https://doi.org/10.1145/3555212 10.1145/3555212 – reference: Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit Test Case Generation with Transformers. CoRR, abs/2009.05617 (2020), arXiv:2009.05617. arxiv:2009.05617 – reference: Justin D. Weisz, Michael Muller, Stephanie Houde, John Richards, Steven I. Ross, Fernando Martinez, Mayank Agarwal, and Kartik Talamadupula. 2021. Perfection Not Required? Human-AI Partnerships in Code Translation. In 26th International Conference on Intelligent User Interfaces (IUI ’21). Association for Computing Machinery, New York, NY, USA. 402–412. isbn:9781450380171 https://doi.org/10.1145/3397481.3450656 10.1145/3397481.3450656 – reference: Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM, 63, 11 (2020), oct, 139–144. issn:0001-0782 https://doi.org/10.1145/3422622 10.1145/3422622 – reference: Andreas Jedlitschka, Marcus Ciolkowski, and Dietmar Pfahl. 2008. Reporting Experiments in Software Engineering. In Guide to Advanced Empirical Software Engineering, Forrest Shull, Janice Singer, and Dag I. K. Sjøberg (Eds.). Springer London, London, UK. 201–228. isbn:978-1-84800-044-5 https://doi.org/10.1007/978-1-84800-044-5_8 10.1007/978-1-84800-044-5_8 – reference: Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: A user study on the security implications of large language model code assistants. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security ’23). USENIX Association, Anaheim, CA. 2205–2222. isbn:978-1-939133-37-3 https://www.usenix.org/conference/usenixsecurity23/presentation/sandoval – reference: András Vargha and Harold D Delaney. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25, 2 (2000), 101–132. https://doi.org/10.3102/10769986025002101 10.3102/10769986025002101 – reference: Jan Auernhammer. 2020. Human-centered AI: The role of human-centered design research in the development of AI. In Synergy-DRS International Conference 2020. Online. https://doi.org/10.21606/drs.2020.282 10.21606/drs.2020.282 – reference: Dror G. Feitelson. 2022. Considerations and Pitfalls for Reducing Threats to the Validity of Controlled Experiments on Code Comprehension. Empirical Softw. Engg., 27, 6 (2022), nov, 42 pages. issn:1382-3256 https://doi.org/10.1007/s10664-022-10160-3 10.1007/s10664-022-10160-3 – reference: Yi Wang. 2017. Characterizing developer behavior in cloud based IDEs. In Proceedings of the 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’17). IEEE Press, Markham, Ontario, Canada. 48–57. isbn:9781509040391 https://doi.org/10.1109/ESEM.2017.27 10.1109/ESEM.2017.27 – reference: Yi-Chia Wang, Robert Kraut, and John M. Levine. 2012. To stay or leave? the relationship of emotional and informational support to commitment in online health support groups. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW ’12). Association for Computing Machinery, New York, NY, USA. 833–842. isbn:9781450310864 https://doi.org/10.1145/2145204.2145329 10.1145/2145204.2145329 – reference: Victor R Basili, Richard W Selby, and David H Hutchens. 1986. Experimentation in software engineering. IEEE Trans. Softw. Eng., 12, 1 (1986), 733–743. issn:0098-5589 https://doi.org/10.1109/TSE.1986.6312975 10.1109/TSE.1986.6312975 – reference: Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. Springer Berlin, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29044-2 10.1007/978-3-642-29044-2 – reference: Wang Ling, Phil Blunsom, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Fumin Wang, and Andrew Senior. 2016. Latent predictor networks for code generation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany. 599–609. https://doi.org/10.18653/v1/P16-1057 10.18653/v1/P16-1057 – reference: Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput., 9, 8 (1997), 1735–1780. issn:0899-7667 https://doi.org/10.1162/neco.1997.9.8.1735 10.1162/neco.1997.9.8.1735 – reference: Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online. 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703 10.18653/v1/2020.acl-main.703 – reference: Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do Users Write More Insecure Code with AI Assistants? In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23). Association for Computing Machinery, New York, NY, USA. 2785–2799. isbn:9798400700507 https://doi.org/10.1145/3576915.3623157 10.1145/3576915.3623157 – reference: Dakuo Wang, Liuping Wang, Zhan Zhang, Ding Wang, Haiyi Zhu, Yvonne Gao, Xiangmin Fan, and Feng Tian. 2021. “Brilliant AI Doctor” in Rural Clinics: Challenges in AI-Powered Clinical Decision Support System Deployment. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA. Article 697, 18 pages. isbn:9781450380966 https://doi.org/10.1145/3411764.3445432 10.1145/3411764.3445432 – reference: Justin D. Weisz, Michael Muller, Steven I. Ross, Fernando Martinez, Stephanie Houde, Mayank Agarwal, Kartik Talamadupula, and John T. Richards. 2022. Better Together? An Evaluation of AI-Supported Code Translation. In 27th International Conference on Intelligent User Interfaces (IUI ’22). Association for Computing Machinery, New York, NY, USA. 369–391. isbn:9781450391443 https://doi.org/10.1145/3490099.3511157 10.1145/3490099.3511157 – reference: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374. arxiv:2107.03374 – reference: Frank F. Xu, Bogdan Vasilescu, and Graham Neubig. 2022. In-IDE Code Generation from Natural Language: Promise and Challenges. ACM Trans. Softw. Eng. Methodol., 31, 2 (2022), Article 29, 47 pages. issn:1049-331X https://doi.org/10.1145/3487569 10.1145/3487569 – reference: David Vadas and James R. Curran. 2005. Programming With Unrestricted Natural Language. In Proceedings of the 2005 Australasian Language Technology Association Workshop (ATLA ’05). Sydney, Australia. 191–199. https://aclanthology.org/U05-1027 – reference: Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified pre-training for program understanding and generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online. 2655–2668. https://doi.org/10.18653/v1/2021.naacl-main.211 10.18653/v1/2021.naacl-main.211 – reference: Zeynep Akata, Dan Balliet, Maarten de Rijke, Frank Dignum, Virginia Dignum, Guszti Eiben, Antske Fokkens, Davide Grossi, Koen Hindriks, Holger Hoos, Hayley Hung, Catholijn Jonker, Christof Monz, Mark Neerincx, Frans Oliehoek, Henry Prakken, Stefan Schlobach, Linda van der Gaag, Frank van Harmelen, Herke van Hoof, Birna van Riemsdijk, Aimee van Wynsberghe, Rineke Verbrugge, Bart Verheij, Piek Vossen, and Max Welling. 2020. A research agenda for hybrid intelligence: augmenting human intellect with collaborative, adaptive, responsible, and explainable artificial intelligence. Computer, 53, 8 (2020), aug, 18–28. issn:0018-9162 https://doi.org/10.1109/MC.2020.2996587 10.1109/MC.2020.2996587 – reference: Hussein Mozannar, Gagan Bansal, Adam Fourney, and Eric Horvitz. 2023. Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming. arxiv:2210.14306. – reference: Ali Borji. 2023. Generated faces in the wild: Quantitative comparison of stable diffusion, Midjourney and DALL-E 2. arxiv:2210.00586. arxiv:2210.00586 – reference: Michael Xieyang Liu, Advait Sarkar, Carina Negreanu, Benjamin Zorn, Jack Williams, Neil Toronto, and Andrew D. Gordon. 2023. “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA. Article 598, 31 pages. isbn:9781450394215 https://doi.org/10.1145/3544548.3580817 10.1145/3544548.3580817 – reference: Brad A. Myers, Andrew J. Ko, Thomas D. LaToza, and YoungSeok Yoon. 2016. Programmers Are Users Too: Human-Centered Methods for Improving Programming Tools. Computer, 49, 7 (2016), 44–52. issn:0018-9162 https://doi.org/10.1109/MC.2016.200 10.1109/MC.2016.200 – reference: Bruin Rugge, Howard Balshem, Raj Sehgal, Rose Relevo, Paul Gorman, and Mark Helfand. 2012. Screening and treatment of subclinical hypothyroidism or hyperthyroidism. Comparative Effectiveness Reviews, https://www.ncbi.nlm.nih.gov/books/NBK83492/ – reference: Dakuo Wang, Elizabeth Churchill, Pattie Maes, Xiangmin Fan, Ben Shneiderman, Yuanchun Shi, and Qianying Wang. 2020. From Human-Human Collaboration to Human-AI Collaboration: Designing AI Systems That Can Work Together with People. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (CHI EA ’20). Association for Computing Machinery, New York, NY, USA. 1–6. isbn:9781450368193 https://doi.org/10.1145/3334480.3381069 10.1145/3334480.3381069 – reference: Albert Ziegler, Eirini Kalliamvakou, X. Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2022. Productivity assessment of neural code completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (MAPS ’22). Association for Computing Machinery, New York, NY, USA. 21–29. isbn:9781450392730 https://doi.org/10.1145/3520312.3534864 10.1145/3520312.3534864 – reference: Roger E Kirk. 2012. Experimental Design: Procedures for the Behavioral Sciences (4. ed.). Sage Publications, Thousand Oaks, CA, USA. https://doi.org/10.4135/9781483384733 10.4135/9781483384733 – reference: Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with ChatGPT. arxiv:2302.11382. arxiv:2302.11382 – reference: Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139 10.18653/v1/2020.findings-emnlp.139 – reference: Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, and Alex Ray. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35 (2022), 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf – reference: Natalia Juristo and Ana M Moreno. 2013. Basics of Software Engineering Experimentation. Springer New York, New York, NY, USA. https://doi.org/10.1007/978-1-4757-3304-4 10.1007/978-1-4757-3304-4 – reference: Frederik P Brooks. 1987. No Silver Bullet–Essence and accidents of software engineering. IEEE computer, 20, 4 (1987), 10–19. https://doi.org/10.1109/MC.1987.1663532 10.1109/MC.1987.1663532 – reference: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems, 31 (2017), 1–15. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf – reference: Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI collaboration in data science: Exploring data scientists’ perceptions of automated AI. Proc. ACM Hum.-Comput. Interact., 3, CSCW (2019), Article 211, nov, 24 pages. https://doi.org/10.1145/3359313 10.1145/3359313 – reference: Michelle Jackson and David R Cox. 2013. The principles of experimental design and their application in sociology. Annual Review of Sociology, 39 (2013), 27–49. https://doi.org/10.1146/annurev-soc-071811-145443 10.1146/annurev-soc-071811-145443 – reference: Tongshuang Wu, Michael Terry, and Carrie Jun Cai. 2022. AI chains: Transparent and controllable human-AI interaction by chaining large language model prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA. Article 385, 22 pages. isbn:9781450391573 https://doi.org/10.1145/3491102.3517582 10.1145/3491102.3517582 – reference: Eva AM van Dis, Johan Bollen, Willem Zuidema, Robert van Rooij, and Claudi L Bockting. 2023. ChatGPT: five priorities for research. Nature, 614, 7947 (2023), 224–226. https://doi.org/10.1038/d41586-023-00288-7 10.1038/d41586-023-00288-7 – reference: David Price, Ellen Rilofff, Joseph Zachary, and Brandon Harvey. 2000. NaturalJava: a natural language interface for programming in Java. In Proceedings of the 5th International Conference on Intelligent User Interfaces (IUI ’00). Association for Computing Machinery, New York, NY, USA. 207–211. isbn:1581131348 https://doi.org/10.1145/325737.325845 10.1145/325737.325845 – reference: Denis Rothman and Antonio Gulli. 2022. Transformers for Natural Language Processing: Build, train, and fine-tune deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3 (2. ed.). Packt Publishing Ltd. https://www.packtpub.com/en-PT/product/transformers-for-natural-language-processing-second-edition-9781803247335 – reference: Davide Falessi, Natalia Juristo, Claes Wohlin, Burak Turhan, Jürgen Münch, Andreas Jedlitschka, and Markku Oivo. 2018. Empirical software engineering experts on the use of students and professionals in experiments. Empirical Softw. Engg., 23, 1 (2018), 452–489. issn:1382-3256 https://doi.org/10.1007/s10664-017-9523-3 10.1007/s10664-017-9523-3 – reference: Ben James Winer, Donald R Brown, and Kenneth M Michels. 1971. Statistical Principles in Experimental Design. 2, Mcgraw-Hill, New York, NY, USA. – reference: Thomas Fritz, Andrew Begel, Sebastian C. Müller, Serap Yigit-Elliott, and Manuela Züger. 2014. Using Psycho-Physiological Measures to Assess Task Difficulty in Software Development. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). Association for Computing Machinery, New York, NY, USA. 402–413. isbn:9781450327565 https://doi.org/10.1145/2568225.2568266 10.1145/2568225.2568266 – reference: Charvi Rastogi, Marco Tulio Ribeiro, Nicholas King, Harsha Nori, and Saleema Amershi. 2023. Supporting human-AI collaboration in auditing LLMs with LLMs. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’23). Association for Computing Machinery, New York, NY, USA. 913–926. isbn:9798400702310 https://doi.org/10.1145/3600211.3604712 10.1145/3600211.3604712 – reference: Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada. 440–450. https://doi.org/10.18653/v1/P17-1041 10.18653/v1/P17-1041 – reference: Junyoung Chung, Çaglar Gülçehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR, abs/1412.3555 (2014), arXiv:1412.3555. arxiv:1412.3555 – reference: Donald B Rubin. 1980. Randomization analysis of experimental data: The Fisher randomization test comment. J. Amer. Statist. Assoc., 75, 371 (1980), 591–593. https://doi.org/10.2307/2287653 10.2307/2287653 – reference: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, Marina Meila and Tong Zhang (Eds.) (ICML ’21, Vol. 139). PMLR, 8748–8763. https://proceedings.mlr.press/v139/radford21a.html – reference: Dag I. K. Sjoberg, Jo E. Hannay, Ove Hansen, Vigdis By Kampenes, Amela Karahasanovic, Nils-Kristian Liborg, and Anette C. Rekdal. 2005. A survey of controlled experiments in software engineering. IEEE Trans. Softw. Eng., 31, 9 (2005), 733–753. issn:0098-5589 https://doi.org/10.1109/TSE.2005.97 10.1109/TSE.2005.97 – reference: Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the usability of code generation tools powered by large language models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22). Association for Computing Machinery, New York, NY, USA. Article 332, 7 pages. isbn:9781450391566 https://doi.org/10.1145/3491101.3519665 10.1145/3491101.3519665 – reference: Chao Liu, Xin Xia, David Lo, Cuiyun Gao, Xiaohu Yang, and John Grundy. 2021. Opportunities and Challenges in Code Search Tools. ACM Comput. Surv., 54, 9 (2021), Article 196, 40 pages. issn:0360-0300 https://doi.org/10.1145/3480027 10.1145/3480027 – reference: Zhendong Wang, Yang Feng, Yi Wang, James A. Jones, and David Redmiles. 2020. Unveiling Elite Developers’ Activities in Open Source Projects. ACM Trans. Softw. Eng. Methodol., 29, 3 (2020), Article 16, 35 pages. issn:1049-331X https://doi.org/10.1145/3387111 10.1145/3387111 – reference: Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernocký, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech ’10), Makuhari, Chiba, Japan, September 26-30, 2010, Takao Kobayashi, Keikichi Hirose, and Satoshi Nakamura (Eds.). ISCA, 1045–1048. https://doi.org/10.21437/Interspeech.2010-343 10.21437/Interspeech.2010-343 – reference: Jenny T. Liang, Chenyang Yang, and Brad A. Myers. 2024. A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery, New York, NY, USA. Article 52, 13 pages. isbn:9798400702174 https://doi.org/10.1145/3597503.3608128 10.1145/3597503.3608128 – reference: Ian Drosos, Titus Barik, Philip J. Guo, Robert DeLine, and Sumit Gulwani. 2020. Wrex: A Unified Programming-by-Example Interaction for Synthesizing Readable Code for Data Scientists. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20). Association for Computing Machinery, New York, NY, USA. 1–12. isbn:9781450367080 https://doi.org/10.1145/3313831.3376442 10.1145/3313831.3376442 – reference: Mina Lee, Percy Liang, and Qian Yang. 2022. CoAuthor: Designing a human-AI collaborative writing dataset for exploring language model capabilities. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA. Article 388, 19 pages. isbn:9781450391573 https://doi.org/10.1145/3491102.3502030 10.1145/3491102.3502030 – reference: Joseph CR Licklider. 1960. Man-computer symbiosis. IRE Transactions on Human Factors in Electronics, 4–11. https://doi.org/10.1109/THFE2.1960.4503259 10.1109/THFE2.1960.4503259 – reference: Iflaah Salman, Ayse Tosun Misirli, and Natalia Juristo. 2015. Are students representatives of professionals in software engineering experiments? In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, 666–676. isbn:9781479919345 https://doi.org/10.5555/2818754.2818836 – reference: Matt Welsh. 2022. The End of Programming. Commun. ACM, 66, 1 (2022), dec, 34–35. issn:0001-0782 https://doi.org/10.1145/3570220 10.1145/3570220 – reference: Jessica Pater, Amanda Coupe, Rachel Pfafman, Chanda Phelan, Tammy Toscos, and Maia Jacobs. 2021. Standardizing Reporting of Participant Compensation in HCI: A Systematic Literature Review and Recommendations for the Field. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA. Article 141, 16 pages. isbn:9781450380966 https://doi.org/10.1145/3411764.3445734 10.1145/3411764.3445734 – reference: Denise Rey and Markus Neuhäuser. 2011. Wilcoxon-Signed-Rank Test. In International Encyclopedia of Statistical Science, Miodrag Lovric (Ed.). Springer Berlin, Berlin, Heidelberg. 1658–1659. isbn:978-3-642-04898-2 https://doi.org/10.1007/978-3-642-04898-2_616 10.1007/978-3-642-04898-2_616 – volume-title: Proceedings of the 2005 Australasian Language Technology Association Workshop (ATLA ’05) ident: e_1_2_1_48_1 – ident: e_1_2_1_40_1 doi: 10.1007/978-3-642-04898-2_616 – ident: e_1_2_1_35_1 doi: 10.1145/3411764.3445734 – ident: e_1_2_1_53_1 doi: 10.1145/3334480.3381069 – ident: e_1_2_1_43_1 – volume-title: Schmidt year: 2023 ident: e_1_2_1_62_1 – ident: e_1_2_1_19_1 doi: 10.1146/annurev-soc-071811-145443 – ident: e_1_2_1_12_1 doi: 10.1007/s10664-022-10160-3 – ident: e_1_2_1_24_1 doi: 10.1145/3491102.3502030 – ident: e_1_2_1_10_1 doi: 10.1145/3313831.3376442 – ident: e_1_2_1_29_1 doi: 10.1145/3480027 – volume: 31 start-page: 1 year: 2017 ident: e_1_2_1_52_1 article-title: Attention is all you need publication-title: Advances in Neural Information Processing Systems – ident: e_1_2_1_26_1 doi: 10.1145/3597503.3608128 – ident: e_1_2_1_42_1 doi: 10.2307/2287653 – ident: e_1_2_1_61_1 doi: 10.1145/3570220 – ident: e_1_2_1_50_1 doi: 10.1038/d41586-023-00288-7 – ident: e_1_2_1_60_1 doi: 10.1145/3490099.3511157 – ident: e_1_2_1_28_1 doi: 10.18653/v1 – ident: e_1_2_1_51_1 doi: 10.3102/10769986025002101 – volume-title: Proceedings of the International Conference on Learning Representations (ICLR ’21) year: 2021 ident: e_1_2_1_16_1 – ident: e_1_2_1_22_1 doi: 10.1145/3544548.3580919 – ident: e_1_2_1_54_1 doi: 10.1145/3411764.3445432 – ident: e_1_2_1_20_1 doi: 10.1007/978-1-84800-044-5_8 – ident: e_1_2_1_8_1 – ident: e_1_2_1_5_1 – ident: e_1_2_1_21_1 doi: 10.1007/978-1-4757-3304-4 – ident: e_1_2_1_27_1 doi: 10.1109/THFE2.1960.4503259 – ident: e_1_2_1_1_1 doi: 10.18653/v1 – volume-title: Statistical Principles in Experimental Design. 2 ident: e_1_2_1_63_1 – ident: e_1_2_1_17_1 doi: 10.1177/154193120605000909 – ident: e_1_2_1_15_1 doi: 10.1145/3422622 – volume-title: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR, abs/1412.3555 year: 2014 ident: e_1_2_1_9_1 – ident: e_1_2_1_23_1 doi: 10.4135/9781483384733 – ident: e_1_2_1_7_1 doi: 10.1109/MC.1987.1663532 – ident: e_1_2_1_18_1 doi: 10.1162/neco.1997.9.8.1735 – ident: e_1_2_1_39_1 doi: 10.1145/3600211.3604712 – ident: e_1_2_1_66_1 doi: 10.1145/3487569 – ident: e_1_2_1_49_1 doi: 10.1145/3491101.3519665 – ident: e_1_2_1_25_1 doi: 10.18653/v1 – ident: e_1_2_1_65_1 doi: 10.1145/3491102.3517582 – ident: e_1_2_1_4_1 doi: 10.1109/TSE.1986.6312975 – ident: e_1_2_1_2_1 doi: 10.1109/MC.2020.2996587 – ident: e_1_2_1_36_1 doi: 10.1145/3576915.3623157 – ident: e_1_2_1_33_1 doi: 10.1109/MC.2016.200 – ident: e_1_2_1_44_1 doi: 10.5555/2818754.2818836 – ident: e_1_2_1_59_1 doi: 10.1145/3397481.3450656 – volume-title: Proceedings of the 32nd USENIX Security Symposium (USENIX Security ’23) year: 2023 ident: e_1_2_1_45_1 – ident: e_1_2_1_55_1 doi: 10.1145/3359313 – volume: 35 start-page: 27730 year: 2022 ident: e_1_2_1_34_1 article-title: Training language models to follow instructions with human feedback publication-title: Advances in Neural Information Processing Systems – ident: e_1_2_1_37_1 doi: 10.1145/325737.325845 – volume-title: PyTorch, TensorFlow, BERT, and GPT-3 (2. ed.) year: 1803 ident: e_1_2_1_41_1 – ident: e_1_2_1_57_1 doi: 10.1145/2145204.2145329 – volume: 8763 volume-title: Proceedings of the 38th International Conference on Machine Learning, Marina Meila and Tong Zhang (Eds.) (ICML ’21 year: 2021 ident: e_1_2_1_38_1 – ident: e_1_2_1_13_1 doi: 10.18653/v1 – ident: e_1_2_1_32_1 doi: 10.1145/3613904.3641936 – ident: e_1_2_1_67_1 doi: 10.18653/v1 – volume-title: Shao Kun Deng, and Neel Sundaresan year: 2020 ident: e_1_2_1_47_1 – ident: e_1_2_1_56_1 doi: 10.1109/ESEM.2017.27 – ident: e_1_2_1_69_1 doi: 10.1145/3520312.3534864 – ident: e_1_2_1_64_1 doi: 10.1007/978-3-642-29044-2 – ident: e_1_2_1_30_1 doi: 10.1145/3544548.3580817 – ident: e_1_2_1_14_1 doi: 10.1145/2568225.2568266 – ident: e_1_2_1_68_1 doi: 10.1109/TSE.2022.3156071 – ident: e_1_2_1_11_1 doi: 10.1007/s10664-017-9523-3 – ident: e_1_2_1_3_1 doi: 10.21606/drs.2020.282 – ident: e_1_2_1_58_1 doi: 10.1145/3387111 – ident: e_1_2_1_31_1 doi: 10.21437/Interspeech.2010-343 – ident: e_1_2_1_46_1 doi: 10.1109/TSE.2005.97 – ident: e_1_2_1_6_1 doi: 10.1145/3555212
SSID	ssj0002991170
Score	2.2615783
Snippet	Recently, large language models (LLM) based generative AI has been gaining momentum for their impressive high-quality performances in multiple domains,...
SourceID	crossref acm
SourceType	Enrichment Source Index Database Publisher
StartPage	699
SubjectTerms	Human-centered computing Laboratory experiments Software and its engineering Software development methods Software development techniques
SubjectTermsDisplay	Human-centered computing -- Laboratory experiments Software and its engineering -- Software development methods Software and its engineering -- Software development techniques
Title	Rocks Coding, Not Development: A Human-Centric, Experimental Evaluation of LLM-Supported SE Tasks
URI	https://dl.acm.org/doi/10.1145/3643758
Volume	1
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2994-970X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002991170 issn: 2994-970X databaseCode: M~E dateStart: 20240101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Nb9MwFLfK4MCFjwFiMCYfEBcWkThOHe9WTdl2WKuJFdFb5fhDiijJtLTbTog_HTt2YndwgAOXqHKf3Mjv1-fn5_feD4D3qfbYFCYyyrjMIox5rv9ztIwUT8tSYBlLITqyCTKb5YsFvRiNfva1MDcrUtf53R29-q-q1mNa2aZ09h_UPUyqB_RnrXT91GrXz79S_Gdt4QwRp3BsJbNmHaYG2VL0LnYfdaHdquM4L8JO_8XQAry7QjifRob806Tlau-0-Dhn7bc29Govhl2w7XMOJsdTcw_Rait_a5LLpO976GP49gW_ysrfjtihs021arzkENU-Zc2tlz6vNjasUDZh8AJhExVNUAA3lxDfGTxk2hRTEi-2rHMAwpPLIrC1Y8us5LZtYgutf98RsGmekZr7Sdsj_l57bffNA_AQkYyapMDpDx-l069kqHlstbWZ65OTN84M_x44M4FXMn8GnrjjBJxYGDwHI1nvgqc9VQd0lvsFYB0qoEXFIdSYgAEmjuAEbiHiEIZ4gB4PsFFwCw_wsoAdHl6CLyfF_PgscuwaEUsQyiMucaaUZCpPWcIYFWOEiaJMEMZUzMdMn-QpyXmMhVJE8RipnLKSJyqRAtM4fQV26qaWrwEssZYvsSEPEjhLaa4wy8zBNiYMjxnaA7t6sZZXtn_K0i3hHvjQL96Su4b0hhdltbTF8pkXhINgP8c9kTd__IW34LEH3T7YWV9v5DvwiN-sq_b6oNP0L3kIdn8
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Rocks+Coding%2C+Not+Development%3A+A+Human-Centric%2C+Experimental+Evaluation+of+LLM-Supported+SE+Tasks&rft.jtitle=Proceedings+of+the+ACM+on+software+engineering&rft.au=Wang%2C+Wei&rft.au=Ning%2C+Huilong&rft.au=Zhang%2C+Gaowei&rft.au=Liu%2C+Libo&rft.date=2024-07-12&rft.pub=ACM&rft.eissn=2994-970X&rft.volume=1&rft.issue=FSE&rft.spage=699&rft.epage=721&rft_id=info:doi/10.1145%2F3643758&rft.externalDocID=3643758
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2994-970X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2994-970X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2994-970X&client=summon