Rocks Coding, Not Development: A Human-Centric, Experimental Evaluation of LLM-Supported SE Tasks

Recently, large language models (LLM) based generative AI has been gaining momentum for their impressive high-quality performances in multiple domains, particularly after the release of the ChatGPT. Many believe that they have the potential to perform general-purpose problem-solving in software deve...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the ACM on software engineering Vol. 1; no. FSE; pp. 699 - 721
Main Authors: Wang, Wei, Ning, Huilong, Zhang, Gaowei, Liu, Libo, Wang, Yi
Format: Journal Article
Language:English
Published: New York, NY, USA ACM 12.07.2024
Subjects:
ISSN:2994-970X, 2994-970X
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Recently, large language models (LLM) based generative AI has been gaining momentum for their impressive high-quality performances in multiple domains, particularly after the release of the ChatGPT. Many believe that they have the potential to perform general-purpose problem-solving in software development and replace human software developers. Nevertheless, there are in a lack of serious investigation into the capability of these LLM techniques in fulfilling software development tasks. In a controlled 2 × 2 between-subject experiment with 109 participants, we examined whether and to what degree working with ChatGPT was helpful in the coding task and typical software development task and how people work with ChatGPT. We found that while ChatGPT performed well in solving simple coding problems, its performance in supporting typical software development tasks was not that good. We also observed the interactions between participants and ChatGPT and found the relations between the interactions and the outcomes. Our study thus provides first-hand insights into using ChatGPT to fulfill software engineering tasks with real-world developers and motivates the need for novel interaction mechanisms that help developers effectively work with large language models to achieve desired outcomes.
AbstractList Recently, large language models (LLM) based generative AI has been gaining momentum for their impressive high-quality performances in multiple domains, particularly after the release of the ChatGPT. Many believe that they have the potential to perform general-purpose problem-solving in software development and replace human software developers. Nevertheless, there are in a lack of serious investigation into the capability of these LLM techniques in fulfilling software development tasks. In a controlled 2 × 2 between-subject experiment with 109 participants, we examined whether and to what degree working with ChatGPT was helpful in the coding task and typical software development task and how people work with ChatGPT. We found that while ChatGPT performed well in solving simple coding problems, its performance in supporting typical software development tasks was not that good. We also observed the interactions between participants and ChatGPT and found the relations between the interactions and the outcomes. Our study thus provides first-hand insights into using ChatGPT to fulfill software engineering tasks with real-world developers and motivates the need for novel interaction mechanisms that help developers effectively work with large language models to achieve desired outcomes.
ArticleNumber 32
Author Wang, Yi
Wang, Wei
Ning, Huilong
Zhang, Gaowei
Liu, Libo
Author_xml – sequence: 1
  givenname: Wei
  orcidid: 0000-0003-3240-343X
  surname: Wang
  fullname: Wang, Wei
  email: weiwang@bupt.edu.cn
  organization: Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 2
  givenname: Huilong
  orcidid: 0009-0001-9393-6507
  surname: Ning
  fullname: Ning, Huilong
  email: nulogn@bupt.edu.cn
  organization: Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 3
  givenname: Gaowei
  orcidid: 0009-0006-5767-2280
  surname: Zhang
  fullname: Zhang, Gaowei
  email: zhanggaowei@bupt.edu.cn
  organization: Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 4
  givenname: Libo
  orcidid: 0000-0002-0136-8902
  surname: Liu
  fullname: Liu, Libo
  email: libo.liu@unimelb.edu.au
  organization: University of Melbourne, Melbourne, Australia
– sequence: 5
  givenname: Yi
  orcidid: 0000-0003-1321-4035
  surname: Wang
  fullname: Wang, Yi
  email: wang@cocolabs.org
  organization: Beijing University of Posts and Telecommunications, Beijing, China
BookMark eNptkM9LwzAUx4NMcM7h3VNuXhZN2mxpvI2uOqEquAneyjNNpK5rStIO_e_t3BQRT-_H98Pj-77HqFfZSiN0yugFY3x8GU54KMbRAeoHUnIiBX3u_eqP0ND7N0ppt2FM0D6CR6tWHsc2L6rXEb63DZ7pjS5tvdZVc4WneN6uoSJxN7lCjXDyXmtXbEUocbKBsoWmsBW2BqfpHVm0dW1do3O8SPAS_MqfoEMDpdfDfR2gp-tkGc9J-nBzG09TAiwIIqI0HxujwUQhMACZTwIujIRcABiqJsAkkyJSlOfGCKNoYCIJL4oZpnMuaThAZHdXOeu90yZTRfPlrXFQlBmj2TaibB9Rx5__4evuL3Af_5BnOxLU-gf6Fj8B7aNv7A
CitedBy_id crossref_primary_10_3390_app15126836
crossref_primary_10_14500_aro_12159
crossref_primary_10_3390_computers14050185
crossref_primary_10_1080_0144929X_2025_2478278
Cites_doi 10.1007/978-3-642-04898-2_616
10.1145/3411764.3445734
10.1145/3334480.3381069
10.1146/annurev-soc-071811-145443
10.1007/s10664-022-10160-3
10.1145/3491102.3502030
10.1145/3313831.3376442
10.1145/3480027
10.1145/3597503.3608128
10.2307/2287653
10.1145/3570220
10.1038/d41586-023-00288-7
10.1145/3490099.3511157
10.18653/v1
10.3102/10769986025002101
10.1145/3544548.3580919
10.1145/3411764.3445432
10.1007/978-1-84800-044-5_8
10.1007/978-1-4757-3304-4
10.1109/THFE2.1960.4503259
10.1177/154193120605000909
10.1145/3422622
10.4135/9781483384733
10.1109/MC.1987.1663532
10.1162/neco.1997.9.8.1735
10.1145/3600211.3604712
10.1145/3487569
10.1145/3491101.3519665
10.1145/3491102.3517582
10.1109/TSE.1986.6312975
10.1109/MC.2020.2996587
10.1145/3576915.3623157
10.1109/MC.2016.200
10.5555/2818754.2818836
10.1145/3397481.3450656
10.1145/3359313
10.1145/325737.325845
10.1145/2145204.2145329
10.1145/3613904.3641936
10.1109/ESEM.2017.27
10.1145/3520312.3534864
10.1007/978-3-642-29044-2
10.1145/3544548.3580817
10.1145/2568225.2568266
10.1109/TSE.2022.3156071
10.1007/s10664-017-9523-3
10.21606/drs.2020.282
10.1145/3387111
10.21437/Interspeech.2010-343
10.1109/TSE.2005.97
10.1145/3555212
ContentType Journal Article
Copyright Owner/Author
Copyright_xml – notice: Owner/Author
DBID AAYXX
CITATION
DOI 10.1145/3643758
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2994-970X
EndPage 721
ExternalDocumentID 10_1145_3643758
3643758
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62076232, 62172049
  funderid: https:\/\/doi.org\/10.13039\/501100001809
GroupedDBID AAKMM
ACM
AEJOY
AKRVB
ALMA_UNASSIGNED_HOLDINGS
LHSKQ
M~E
AAYXX
CITATION
ROL
ID FETCH-LOGICAL-a1228-ce45ffeaf83a1aa9d6247f9ad7aaf0c6a191978c04dff7fc02f89abc1f1ed4903
ISSN 2994-970X
IngestDate Sat Nov 29 07:50:29 EST 2025
Tue Nov 18 22:23:45 EST 2025
Mon Jul 14 20:49:06 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue FSE
Keywords controlled experiment
large langauge models
human-AI collaboration
software development task
Language English
License This work is licensed under a Creative Commons Attribution International 4.0 License.
https://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a1228-ce45ffeaf83a1aa9d6247f9ad7aaf0c6a191978c04dff7fc02f89abc1f1ed4903
ORCID 0000-0002-0136-8902
0000-0003-3240-343X
0009-0001-9393-6507
0009-0006-5767-2280
0000-0003-1321-4035
OpenAccessLink https://dl.acm.org/doi/10.1145/3643758
PageCount 23
ParticipantIDs crossref_citationtrail_10_1145_3643758
crossref_primary_10_1145_3643758
acm_primary_3643758
PublicationCentury 2000
PublicationDate 20240712
2024-07-12
PublicationDateYYYYMMDD 2024-07-12
PublicationDate_xml – month: 07
  year: 2024
  text: 20240712
  day: 12
PublicationDecade 2020
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationTitle Proceedings of the ACM on software engineering
PublicationTitleAbbrev ACM PACMSE
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
References Natalia Juristo and Ana M Moreno. 2013. Basics of Software Engineering Experimentation. Springer New York, New York, NY, USA. https://doi.org/10.1007/978-1-4757-3304-4 10.1007/978-1-4757-3304-4
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, and Alex Ray. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35 (2022), 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
Justin D. Weisz, Michael Muller, Steven I. Ross, Fernando Martinez, Stephanie Houde, Mayank Agarwal, Kartik Talamadupula, and John T. Richards. 2022. Better Together? An Evaluation of AI-Supported Code Translation. In 27th International Conference on Intelligent User Interfaces (IUI ’22). Association for Computing Machinery, New York, NY, USA. 369–391. isbn:9781450391443 https://doi.org/10.1145/3490099.3511157 10.1145/3490099.3511157
Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with ChatGPT. arxiv:2302.11382. arxiv:2302.11382
Sandra G Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the 50th Human Factors and Ergonomics Society Annual Meeting (Ergonomics ’06). 50, Sage Publications, Los Angeles, CA, USA. 904–908. https://doi.org/10.1177/154193120605000909 10.1177/154193120605000909
Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit Test Case Generation with Transformers. CoRR, abs/2009.05617 (2020), arXiv:2009.05617. arxiv:2009.05617
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374. arxiv:2107.03374
Denis Rothman and Antonio Gulli. 2022. Transformers for Natural Language Processing: Build, train, and fine-tune deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3 (2. ed.). Packt Publishing Ltd. https://www.packtpub.com/en-PT/product/transformers-for-natural-language-processing-second-edition-9781803247335
Junyoung Chung, Çaglar Gülçehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR, abs/1412.3555 (2014), arXiv:1412.3555. arxiv:1412.3555
Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI collaboration in data science: Exploring data scientists’ perceptions of automated AI. Proc. ACM Hum.-Comput. Interact., 3, CSCW (2019), Article 211, nov, 24 pages. https://doi.org/10.1145/3359313 10.1145/3359313
Zhendong Wang, Yang Feng, Yi Wang, James A. Jones, and David Redmiles. 2020. Unveiling Elite Developers’ Activities in Open Source Projects. ACM Trans. Softw. Eng. Methodol., 29, 3 (2020), Article 16, 35 pages. issn:1049-331X https://doi.org/10.1145/3387111 10.1145/3387111
Bruin Rugge, Howard Balshem, Raj Sehgal, Rose Relevo, Paul Gorman, and Mark Helfand. 2012. Screening and treatment of subclinical hypothyroidism or hyperthyroidism. Comparative Effectiveness Reviews, https://www.ncbi.nlm.nih.gov/books/NBK83492
Jessica Pater, Amanda Coupe, Rachel Pfafman, Chanda Phelan, Tammy Toscos, and Maia Jacobs. 2021. Standardizing Reporting of Participant Compensation in HCI: A Systematic Literature Review and Recommendations for the Field. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA. Article 141, 16 pages. isbn:9781450380966 https://doi.org/10.1145/3411764.3445734 10.1145/3411764.3445734
Michelle Brachman, Zahra Ashktorab, Michael Desmond, Evelyn Duesterwald, Casey Dugan, Narendra Nath Joshi, Qian Pan, and Aabhas Sharma. 2022. Reliance and Automation for Human-AI Collaborative Data Labeling Conflict Resolution. Proceedings of the ACM on Human-Computer Interaction, 6, CSCW2 (2022), Article 321, 27 pages. https://doi.org/10.1145/3555212 10.1145/3555212
Donald B Rubin. 1980. Randomization analysis of experimental data: The Fisher randomization test comment. J. Amer. Statist. Assoc., 75, 371 (1980), 591–593. https://doi.org/10.2307/2287653 10.2307/2287653
Yi-Chia Wang, Robert Kraut, and John M. Levine. 2012. To stay or leave? the relationship of emotional and informational support to commitment in online health support groups. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW ’12). Association for Computing Machinery, New York, NY, USA. 833–842. isbn:9781450310864 https://doi.org/10.1145/2145204.2145329 10.1145/2145204.2145329
Victor R Basili, Richard W Selby, and David H Hutchens. 1986. Experimentation in software engineering. IEEE Trans. Softw. Eng., 12, 1 (1986), 733–743. issn:0098-5589 https://doi.org/10.1109/TSE.1986.6312975 10.1109/TSE.1986.6312975
Joseph CR Licklider. 1960. Man-computer symbiosis. IRE Transactions on Human Factors in Electronics, 4–11. https://doi.org/10.1109/THFE2.1960.4503259 10.1109/THFE2.1960.4503259
Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA. Article 455, 23 pages. isbn:9781450394215 https://doi.org/10.1145/3544548.3580919 10.1145/3544548.3580919
Justin D. Weisz, Michael Muller, Stephanie Houde, John Richards, Steven I. Ross, Fernando Martinez, Mayank Agarwal, and Kartik Talamadupula. 2021. Perfection Not Required? Human-AI Partnerships in Code Translation. In 26th International Conference on Intelligent User Interfaces (IUI ’21). Association for Computing Machinery, New York, NY, USA. 402–412. isbn:9781450380171 https://doi.org/10.1145/3397481.3450656 10.1145/3397481.3450656
Frank F. Xu, Bogdan Vasilescu, and Graham Neubig. 2022. In-IDE Code Generation from Natural Language: Promise and Challenges. ACM Trans. Softw. Eng. Methodol., 31, 2 (2022), Article 29, 47 pages. issn:1049-331X https://doi.org/10.1145/3487569 10.1145/3487569
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139 10.18653/v1/2020.findings-emnlp.139
Albert Ziegler, Eirini Kalliamvakou, X. Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2022. Productivity assessment of neural code completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (MAPS ’22). Association for Computing Machinery, New York, NY, USA. 21–29. isbn:9781450392730 https://doi.org/10.1145/3520312.3534864 10.1145/3520312.3534864
András Vargha and Harold D Delaney. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25, 2 (2000), 101–132. https://doi.org/10.3102/10769986025002101 10.3102/10769986025002101
Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. Springer Berlin, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29044-2 10.1007/978-3-642-29044-2
Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: A user study on the security implications of large language model code assistants. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security ’23). USENIX Association, Anaheim, CA. 2205–2222. isbn:978-1-939133-37-3 https://www.usenix.org/conference/usenixsecurity23/presentation/sandoval
Ali Borji. 2023. Generated faces in the wild: Quantitative comparison of stable diffusion, Midjourney and DALL-E 2. arxiv:2210.00586. arxiv:2210.00586
David Vadas and James R. Curran. 2005. Programming With Unrestricted Natural Language. In Proceedings of the 2005 Australasian Language Technology Association Workshop (ATLA ’05). Sydney, Australia. 191–199. https://aclanthology.org/U05-1027
Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada. 440–450. https://doi.org/10.18653/v1/P17-1041 10.18653/v1/P17-1041
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM, 6
Vaswani Ashish (e_1_2_1_52_1) 2017; 31
e_1_2_1_60_1
e_1_2_1_20_1
e_1_2_1_66_1
e_1_2_1_68_1
Radford Alec (e_1_2_1_38_1) 2021; 8763
Winer Ben James (e_1_2_1_63_1)
e_1_2_1_24_1
e_1_2_1_22_1
e_1_2_1_43_1
e_1_2_1_64_1
e_1_2_1_28_1
e_1_2_1_49_1
e_1_2_1_26_1
Chung Junyoung (e_1_2_1_9_1) 2014
Guo Daya (e_1_2_1_16_1) 2021
e_1_2_1_31_1
Tufano Michele (e_1_2_1_47_1) 2020
e_1_2_1_54_1
e_1_2_1_8_1
e_1_2_1_56_1
e_1_2_1_6_1
e_1_2_1_12_1
e_1_2_1_35_1
e_1_2_1_50_1
e_1_2_1_4_1
e_1_2_1_10_1
e_1_2_1_33_1
e_1_2_1_2_1
e_1_2_1_39_1
e_1_2_1_14_1
e_1_2_1_37_1
e_1_2_1_58_1
e_1_2_1_18_1
Rothman Denis (e_1_2_1_41_1) 1803
Sandoval Gustavo (e_1_2_1_45_1) 2023
Vadas David (e_1_2_1_48_1)
e_1_2_1_42_1
e_1_2_1_65_1
e_1_2_1_40_1
e_1_2_1_67_1
e_1_2_1_23_1
e_1_2_1_46_1
e_1_2_1_61_1
e_1_2_1_21_1
e_1_2_1_44_1
Ouyang Long (e_1_2_1_34_1) 2022; 35
e_1_2_1_27_1
e_1_2_1_25_1
e_1_2_1_69_1
e_1_2_1_29_1
e_1_2_1_7_1
e_1_2_1_30_1
e_1_2_1_55_1
e_1_2_1_5_1
e_1_2_1_57_1
e_1_2_1_3_1
e_1_2_1_13_1
e_1_2_1_51_1
e_1_2_1_1_1
e_1_2_1_11_1
e_1_2_1_32_1
e_1_2_1_53_1
e_1_2_1_17_1
e_1_2_1_15_1
e_1_2_1_36_1
White Jules (e_1_2_1_62_1) 2023
e_1_2_1_59_1
e_1_2_1_19_1
References_xml – reference: Sandra G Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the 50th Human Factors and Ergonomics Society Annual Meeting (Ergonomics ’06). 50, Sage Publications, Los Angeles, CA, USA. 904–908. https://doi.org/10.1177/154193120605000909 10.1177/154193120605000909
– reference: Yang Yue, Yi Wang, and David Redmiles. 2023. Off to a Good Start: Dynamic Contribution Patterns and Technical Success in an OSS Newcomer’s Early Career. IEEE Trans. Softw. Eng., 49, 2 (2023), 529–548. https://doi.org/10.1109/TSE.2022.3156071 10.1109/TSE.2022.3156071
– reference: Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie LIU, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCodeBERT: Pre-training Code Representations with Data Flow. In Proceedings of the International Conference on Learning Representations (ICLR ’21). https://openreview.net/forum?id=jLoC4ez43PZ
– reference: Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J. Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA. Article 455, 23 pages. isbn:9781450394215 https://doi.org/10.1145/3544548.3580919 10.1145/3544548.3580919
– reference: Michelle Brachman, Zahra Ashktorab, Michael Desmond, Evelyn Duesterwald, Casey Dugan, Narendra Nath Joshi, Qian Pan, and Aabhas Sharma. 2022. Reliance and Automation for Human-AI Collaborative Data Labeling Conflict Resolution. Proceedings of the ACM on Human-Computer Interaction, 6, CSCW2 (2022), Article 321, 27 pages. https://doi.org/10.1145/3555212 10.1145/3555212
– reference: Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit Test Case Generation with Transformers. CoRR, abs/2009.05617 (2020), arXiv:2009.05617. arxiv:2009.05617
– reference: Justin D. Weisz, Michael Muller, Stephanie Houde, John Richards, Steven I. Ross, Fernando Martinez, Mayank Agarwal, and Kartik Talamadupula. 2021. Perfection Not Required? Human-AI Partnerships in Code Translation. In 26th International Conference on Intelligent User Interfaces (IUI ’21). Association for Computing Machinery, New York, NY, USA. 402–412. isbn:9781450380171 https://doi.org/10.1145/3397481.3450656 10.1145/3397481.3450656
– reference: Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM, 63, 11 (2020), oct, 139–144. issn:0001-0782 https://doi.org/10.1145/3422622 10.1145/3422622
– reference: Andreas Jedlitschka, Marcus Ciolkowski, and Dietmar Pfahl. 2008. Reporting Experiments in Software Engineering. In Guide to Advanced Empirical Software Engineering, Forrest Shull, Janice Singer, and Dag I. K. Sjøberg (Eds.). Springer London, London, UK. 201–228. isbn:978-1-84800-044-5 https://doi.org/10.1007/978-1-84800-044-5_8 10.1007/978-1-84800-044-5_8
– reference: Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: A user study on the security implications of large language model code assistants. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security ’23). USENIX Association, Anaheim, CA. 2205–2222. isbn:978-1-939133-37-3 https://www.usenix.org/conference/usenixsecurity23/presentation/sandoval
– reference: András Vargha and Harold D Delaney. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25, 2 (2000), 101–132. https://doi.org/10.3102/10769986025002101 10.3102/10769986025002101
– reference: Jan Auernhammer. 2020. Human-centered AI: The role of human-centered design research in the development of AI. In Synergy-DRS International Conference 2020. Online. https://doi.org/10.21606/drs.2020.282 10.21606/drs.2020.282
– reference: Dror G. Feitelson. 2022. Considerations and Pitfalls for Reducing Threats to the Validity of Controlled Experiments on Code Comprehension. Empirical Softw. Engg., 27, 6 (2022), nov, 42 pages. issn:1382-3256 https://doi.org/10.1007/s10664-022-10160-3 10.1007/s10664-022-10160-3
– reference: Yi Wang. 2017. Characterizing developer behavior in cloud based IDEs. In Proceedings of the 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’17). IEEE Press, Markham, Ontario, Canada. 48–57. isbn:9781509040391 https://doi.org/10.1109/ESEM.2017.27 10.1109/ESEM.2017.27
– reference: Yi-Chia Wang, Robert Kraut, and John M. Levine. 2012. To stay or leave? the relationship of emotional and informational support to commitment in online health support groups. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW ’12). Association for Computing Machinery, New York, NY, USA. 833–842. isbn:9781450310864 https://doi.org/10.1145/2145204.2145329 10.1145/2145204.2145329
– reference: Victor R Basili, Richard W Selby, and David H Hutchens. 1986. Experimentation in software engineering. IEEE Trans. Softw. Eng., 12, 1 (1986), 733–743. issn:0098-5589 https://doi.org/10.1109/TSE.1986.6312975 10.1109/TSE.1986.6312975
– reference: Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. Springer Berlin, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29044-2 10.1007/978-3-642-29044-2
– reference: Wang Ling, Phil Blunsom, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Fumin Wang, and Andrew Senior. 2016. Latent predictor networks for code generation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany. 599–609. https://doi.org/10.18653/v1/P16-1057 10.18653/v1/P16-1057
– reference: Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput., 9, 8 (1997), 1735–1780. issn:0899-7667 https://doi.org/10.1162/neco.1997.9.8.1735 10.1162/neco.1997.9.8.1735
– reference: Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online. 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703 10.18653/v1/2020.acl-main.703
– reference: Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do Users Write More Insecure Code with AI Assistants? In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23). Association for Computing Machinery, New York, NY, USA. 2785–2799. isbn:9798400700507 https://doi.org/10.1145/3576915.3623157 10.1145/3576915.3623157
– reference: Dakuo Wang, Liuping Wang, Zhan Zhang, Ding Wang, Haiyi Zhu, Yvonne Gao, Xiangmin Fan, and Feng Tian. 2021. “Brilliant AI Doctor” in Rural Clinics: Challenges in AI-Powered Clinical Decision Support System Deployment. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA. Article 697, 18 pages. isbn:9781450380966 https://doi.org/10.1145/3411764.3445432 10.1145/3411764.3445432
– reference: Justin D. Weisz, Michael Muller, Steven I. Ross, Fernando Martinez, Stephanie Houde, Mayank Agarwal, Kartik Talamadupula, and John T. Richards. 2022. Better Together? An Evaluation of AI-Supported Code Translation. In 27th International Conference on Intelligent User Interfaces (IUI ’22). Association for Computing Machinery, New York, NY, USA. 369–391. isbn:9781450391443 https://doi.org/10.1145/3490099.3511157 10.1145/3490099.3511157
– reference: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374. arxiv:2107.03374
– reference: Frank F. Xu, Bogdan Vasilescu, and Graham Neubig. 2022. In-IDE Code Generation from Natural Language: Promise and Challenges. ACM Trans. Softw. Eng. Methodol., 31, 2 (2022), Article 29, 47 pages. issn:1049-331X https://doi.org/10.1145/3487569 10.1145/3487569
– reference: David Vadas and James R. Curran. 2005. Programming With Unrestricted Natural Language. In Proceedings of the 2005 Australasian Language Technology Association Workshop (ATLA ’05). Sydney, Australia. 191–199. https://aclanthology.org/U05-1027
– reference: Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified pre-training for program understanding and generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online. 2655–2668. https://doi.org/10.18653/v1/2021.naacl-main.211 10.18653/v1/2021.naacl-main.211
– reference: Zeynep Akata, Dan Balliet, Maarten de Rijke, Frank Dignum, Virginia Dignum, Guszti Eiben, Antske Fokkens, Davide Grossi, Koen Hindriks, Holger Hoos, Hayley Hung, Catholijn Jonker, Christof Monz, Mark Neerincx, Frans Oliehoek, Henry Prakken, Stefan Schlobach, Linda van der Gaag, Frank van Harmelen, Herke van Hoof, Birna van Riemsdijk, Aimee van Wynsberghe, Rineke Verbrugge, Bart Verheij, Piek Vossen, and Max Welling. 2020. A research agenda for hybrid intelligence: augmenting human intellect with collaborative, adaptive, responsible, and explainable artificial intelligence. Computer, 53, 8 (2020), aug, 18–28. issn:0018-9162 https://doi.org/10.1109/MC.2020.2996587 10.1109/MC.2020.2996587
– reference: Hussein Mozannar, Gagan Bansal, Adam Fourney, and Eric Horvitz. 2023. Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming. arxiv:2210.14306.
– reference: Ali Borji. 2023. Generated faces in the wild: Quantitative comparison of stable diffusion, Midjourney and DALL-E 2. arxiv:2210.00586. arxiv:2210.00586
– reference: Michael Xieyang Liu, Advait Sarkar, Carina Negreanu, Benjamin Zorn, Jack Williams, Neil Toronto, and Andrew D. Gordon. 2023. “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA. Article 598, 31 pages. isbn:9781450394215 https://doi.org/10.1145/3544548.3580817 10.1145/3544548.3580817
– reference: Brad A. Myers, Andrew J. Ko, Thomas D. LaToza, and YoungSeok Yoon. 2016. Programmers Are Users Too: Human-Centered Methods for Improving Programming Tools. Computer, 49, 7 (2016), 44–52. issn:0018-9162 https://doi.org/10.1109/MC.2016.200 10.1109/MC.2016.200
– reference: Bruin Rugge, Howard Balshem, Raj Sehgal, Rose Relevo, Paul Gorman, and Mark Helfand. 2012. Screening and treatment of subclinical hypothyroidism or hyperthyroidism. Comparative Effectiveness Reviews, https://www.ncbi.nlm.nih.gov/books/NBK83492/
– reference: Dakuo Wang, Elizabeth Churchill, Pattie Maes, Xiangmin Fan, Ben Shneiderman, Yuanchun Shi, and Qianying Wang. 2020. From Human-Human Collaboration to Human-AI Collaboration: Designing AI Systems That Can Work Together with People. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (CHI EA ’20). Association for Computing Machinery, New York, NY, USA. 1–6. isbn:9781450368193 https://doi.org/10.1145/3334480.3381069 10.1145/3334480.3381069
– reference: Albert Ziegler, Eirini Kalliamvakou, X. Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2022. Productivity assessment of neural code completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (MAPS ’22). Association for Computing Machinery, New York, NY, USA. 21–29. isbn:9781450392730 https://doi.org/10.1145/3520312.3534864 10.1145/3520312.3534864
– reference: Roger E Kirk. 2012. Experimental Design: Procedures for the Behavioral Sciences (4. ed.). Sage Publications, Thousand Oaks, CA, USA. https://doi.org/10.4135/9781483384733 10.4135/9781483384733
– reference: Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with ChatGPT. arxiv:2302.11382. arxiv:2302.11382
– reference: Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139 10.18653/v1/2020.findings-emnlp.139
– reference: Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, and Alex Ray. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35 (2022), 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
– reference: Natalia Juristo and Ana M Moreno. 2013. Basics of Software Engineering Experimentation. Springer New York, New York, NY, USA. https://doi.org/10.1007/978-1-4757-3304-4 10.1007/978-1-4757-3304-4
– reference: Frederik P Brooks. 1987. No Silver Bullet–Essence and accidents of software engineering. IEEE computer, 20, 4 (1987), 10–19. https://doi.org/10.1109/MC.1987.1663532 10.1109/MC.1987.1663532
– reference: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems, 31 (2017), 1–15. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
– reference: Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI collaboration in data science: Exploring data scientists’ perceptions of automated AI. Proc. ACM Hum.-Comput. Interact., 3, CSCW (2019), Article 211, nov, 24 pages. https://doi.org/10.1145/3359313 10.1145/3359313
– reference: Michelle Jackson and David R Cox. 2013. The principles of experimental design and their application in sociology. Annual Review of Sociology, 39 (2013), 27–49. https://doi.org/10.1146/annurev-soc-071811-145443 10.1146/annurev-soc-071811-145443
– reference: Tongshuang Wu, Michael Terry, and Carrie Jun Cai. 2022. AI chains: Transparent and controllable human-AI interaction by chaining large language model prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA. Article 385, 22 pages. isbn:9781450391573 https://doi.org/10.1145/3491102.3517582 10.1145/3491102.3517582
– reference: Eva AM van Dis, Johan Bollen, Willem Zuidema, Robert van Rooij, and Claudi L Bockting. 2023. ChatGPT: five priorities for research. Nature, 614, 7947 (2023), 224–226. https://doi.org/10.1038/d41586-023-00288-7 10.1038/d41586-023-00288-7
– reference: David Price, Ellen Rilofff, Joseph Zachary, and Brandon Harvey. 2000. NaturalJava: a natural language interface for programming in Java. In Proceedings of the 5th International Conference on Intelligent User Interfaces (IUI ’00). Association for Computing Machinery, New York, NY, USA. 207–211. isbn:1581131348 https://doi.org/10.1145/325737.325845 10.1145/325737.325845
– reference: Denis Rothman and Antonio Gulli. 2022. Transformers for Natural Language Processing: Build, train, and fine-tune deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3 (2. ed.). Packt Publishing Ltd. https://www.packtpub.com/en-PT/product/transformers-for-natural-language-processing-second-edition-9781803247335
– reference: Davide Falessi, Natalia Juristo, Claes Wohlin, Burak Turhan, Jürgen Münch, Andreas Jedlitschka, and Markku Oivo. 2018. Empirical software engineering experts on the use of students and professionals in experiments. Empirical Softw. Engg., 23, 1 (2018), 452–489. issn:1382-3256 https://doi.org/10.1007/s10664-017-9523-3 10.1007/s10664-017-9523-3
– reference: Ben James Winer, Donald R Brown, and Kenneth M Michels. 1971. Statistical Principles in Experimental Design. 2, Mcgraw-Hill, New York, NY, USA.
– reference: Thomas Fritz, Andrew Begel, Sebastian C. Müller, Serap Yigit-Elliott, and Manuela Züger. 2014. Using Psycho-Physiological Measures to Assess Task Difficulty in Software Development. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). Association for Computing Machinery, New York, NY, USA. 402–413. isbn:9781450327565 https://doi.org/10.1145/2568225.2568266 10.1145/2568225.2568266
– reference: Charvi Rastogi, Marco Tulio Ribeiro, Nicholas King, Harsha Nori, and Saleema Amershi. 2023. Supporting human-AI collaboration in auditing LLMs with LLMs. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’23). Association for Computing Machinery, New York, NY, USA. 913–926. isbn:9798400702310 https://doi.org/10.1145/3600211.3604712 10.1145/3600211.3604712
– reference: Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada. 440–450. https://doi.org/10.18653/v1/P17-1041 10.18653/v1/P17-1041
– reference: Junyoung Chung, Çaglar Gülçehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR, abs/1412.3555 (2014), arXiv:1412.3555. arxiv:1412.3555
– reference: Donald B Rubin. 1980. Randomization analysis of experimental data: The Fisher randomization test comment. J. Amer. Statist. Assoc., 75, 371 (1980), 591–593. https://doi.org/10.2307/2287653 10.2307/2287653
– reference: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, Marina Meila and Tong Zhang (Eds.) (ICML ’21, Vol. 139). PMLR, 8748–8763. https://proceedings.mlr.press/v139/radford21a.html
– reference: Dag I. K. Sjoberg, Jo E. Hannay, Ove Hansen, Vigdis By Kampenes, Amela Karahasanovic, Nils-Kristian Liborg, and Anette C. Rekdal. 2005. A survey of controlled experiments in software engineering. IEEE Trans. Softw. Eng., 31, 9 (2005), 733–753. issn:0098-5589 https://doi.org/10.1109/TSE.2005.97 10.1109/TSE.2005.97
– reference: Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the usability of code generation tools powered by large language models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22). Association for Computing Machinery, New York, NY, USA. Article 332, 7 pages. isbn:9781450391566 https://doi.org/10.1145/3491101.3519665 10.1145/3491101.3519665
– reference: Chao Liu, Xin Xia, David Lo, Cuiyun Gao, Xiaohu Yang, and John Grundy. 2021. Opportunities and Challenges in Code Search Tools. ACM Comput. Surv., 54, 9 (2021), Article 196, 40 pages. issn:0360-0300 https://doi.org/10.1145/3480027 10.1145/3480027
– reference: Zhendong Wang, Yang Feng, Yi Wang, James A. Jones, and David Redmiles. 2020. Unveiling Elite Developers’ Activities in Open Source Projects. ACM Trans. Softw. Eng. Methodol., 29, 3 (2020), Article 16, 35 pages. issn:1049-331X https://doi.org/10.1145/3387111 10.1145/3387111
– reference: Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernocký, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech ’10), Makuhari, Chiba, Japan, September 26-30, 2010, Takao Kobayashi, Keikichi Hirose, and Satoshi Nakamura (Eds.). ISCA, 1045–1048. https://doi.org/10.21437/Interspeech.2010-343 10.21437/Interspeech.2010-343
– reference: Jenny T. Liang, Chenyang Yang, and Brad A. Myers. 2024. A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery, New York, NY, USA. Article 52, 13 pages. isbn:9798400702174 https://doi.org/10.1145/3597503.3608128 10.1145/3597503.3608128
– reference: Ian Drosos, Titus Barik, Philip J. Guo, Robert DeLine, and Sumit Gulwani. 2020. Wrex: A Unified Programming-by-Example Interaction for Synthesizing Readable Code for Data Scientists. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20). Association for Computing Machinery, New York, NY, USA. 1–12. isbn:9781450367080 https://doi.org/10.1145/3313831.3376442 10.1145/3313831.3376442
– reference: Mina Lee, Percy Liang, and Qian Yang. 2022. CoAuthor: Designing a human-AI collaborative writing dataset for exploring language model capabilities. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA. Article 388, 19 pages. isbn:9781450391573 https://doi.org/10.1145/3491102.3502030 10.1145/3491102.3502030
– reference: Joseph CR Licklider. 1960. Man-computer symbiosis. IRE Transactions on Human Factors in Electronics, 4–11. https://doi.org/10.1109/THFE2.1960.4503259 10.1109/THFE2.1960.4503259
– reference: Iflaah Salman, Ayse Tosun Misirli, and Natalia Juristo. 2015. Are students representatives of professionals in software engineering experiments? In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, 666–676. isbn:9781479919345 https://doi.org/10.5555/2818754.2818836
– reference: Matt Welsh. 2022. The End of Programming. Commun. ACM, 66, 1 (2022), dec, 34–35. issn:0001-0782 https://doi.org/10.1145/3570220 10.1145/3570220
– reference: Jessica Pater, Amanda Coupe, Rachel Pfafman, Chanda Phelan, Tammy Toscos, and Maia Jacobs. 2021. Standardizing Reporting of Participant Compensation in HCI: A Systematic Literature Review and Recommendations for the Field. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA. Article 141, 16 pages. isbn:9781450380966 https://doi.org/10.1145/3411764.3445734 10.1145/3411764.3445734
– reference: Denise Rey and Markus Neuhäuser. 2011. Wilcoxon-Signed-Rank Test. In International Encyclopedia of Statistical Science, Miodrag Lovric (Ed.). Springer Berlin, Berlin, Heidelberg. 1658–1659. isbn:978-3-642-04898-2 https://doi.org/10.1007/978-3-642-04898-2_616 10.1007/978-3-642-04898-2_616
– volume-title: Proceedings of the 2005 Australasian Language Technology Association Workshop (ATLA ’05)
  ident: e_1_2_1_48_1
– ident: e_1_2_1_40_1
  doi: 10.1007/978-3-642-04898-2_616
– ident: e_1_2_1_35_1
  doi: 10.1145/3411764.3445734
– ident: e_1_2_1_53_1
  doi: 10.1145/3334480.3381069
– ident: e_1_2_1_43_1
– volume-title: Schmidt
  year: 2023
  ident: e_1_2_1_62_1
– ident: e_1_2_1_19_1
  doi: 10.1146/annurev-soc-071811-145443
– ident: e_1_2_1_12_1
  doi: 10.1007/s10664-022-10160-3
– ident: e_1_2_1_24_1
  doi: 10.1145/3491102.3502030
– ident: e_1_2_1_10_1
  doi: 10.1145/3313831.3376442
– ident: e_1_2_1_29_1
  doi: 10.1145/3480027
– volume: 31
  start-page: 1
  year: 2017
  ident: e_1_2_1_52_1
  article-title: Attention is all you need
  publication-title: Advances in Neural Information Processing Systems
– ident: e_1_2_1_26_1
  doi: 10.1145/3597503.3608128
– ident: e_1_2_1_42_1
  doi: 10.2307/2287653
– ident: e_1_2_1_61_1
  doi: 10.1145/3570220
– ident: e_1_2_1_50_1
  doi: 10.1038/d41586-023-00288-7
– ident: e_1_2_1_60_1
  doi: 10.1145/3490099.3511157
– ident: e_1_2_1_28_1
  doi: 10.18653/v1
– ident: e_1_2_1_51_1
  doi: 10.3102/10769986025002101
– volume-title: Proceedings of the International Conference on Learning Representations (ICLR ’21)
  year: 2021
  ident: e_1_2_1_16_1
– ident: e_1_2_1_22_1
  doi: 10.1145/3544548.3580919
– ident: e_1_2_1_54_1
  doi: 10.1145/3411764.3445432
– ident: e_1_2_1_20_1
  doi: 10.1007/978-1-84800-044-5_8
– ident: e_1_2_1_8_1
– ident: e_1_2_1_5_1
– ident: e_1_2_1_21_1
  doi: 10.1007/978-1-4757-3304-4
– ident: e_1_2_1_27_1
  doi: 10.1109/THFE2.1960.4503259
– ident: e_1_2_1_1_1
  doi: 10.18653/v1
– volume-title: Statistical Principles in Experimental Design. 2
  ident: e_1_2_1_63_1
– ident: e_1_2_1_17_1
  doi: 10.1177/154193120605000909
– ident: e_1_2_1_15_1
  doi: 10.1145/3422622
– volume-title: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR, abs/1412.3555
  year: 2014
  ident: e_1_2_1_9_1
– ident: e_1_2_1_23_1
  doi: 10.4135/9781483384733
– ident: e_1_2_1_7_1
  doi: 10.1109/MC.1987.1663532
– ident: e_1_2_1_18_1
  doi: 10.1162/neco.1997.9.8.1735
– ident: e_1_2_1_39_1
  doi: 10.1145/3600211.3604712
– ident: e_1_2_1_66_1
  doi: 10.1145/3487569
– ident: e_1_2_1_49_1
  doi: 10.1145/3491101.3519665
– ident: e_1_2_1_25_1
  doi: 10.18653/v1
– ident: e_1_2_1_65_1
  doi: 10.1145/3491102.3517582
– ident: e_1_2_1_4_1
  doi: 10.1109/TSE.1986.6312975
– ident: e_1_2_1_2_1
  doi: 10.1109/MC.2020.2996587
– ident: e_1_2_1_36_1
  doi: 10.1145/3576915.3623157
– ident: e_1_2_1_33_1
  doi: 10.1109/MC.2016.200
– ident: e_1_2_1_44_1
  doi: 10.5555/2818754.2818836
– ident: e_1_2_1_59_1
  doi: 10.1145/3397481.3450656
– volume-title: Proceedings of the 32nd USENIX Security Symposium (USENIX Security ’23)
  year: 2023
  ident: e_1_2_1_45_1
– ident: e_1_2_1_55_1
  doi: 10.1145/3359313
– volume: 35
  start-page: 27730
  year: 2022
  ident: e_1_2_1_34_1
  article-title: Training language models to follow instructions with human feedback
  publication-title: Advances in Neural Information Processing Systems
– ident: e_1_2_1_37_1
  doi: 10.1145/325737.325845
– volume-title: PyTorch, TensorFlow, BERT, and GPT-3 (2. ed.)
  year: 1803
  ident: e_1_2_1_41_1
– ident: e_1_2_1_57_1
  doi: 10.1145/2145204.2145329
– volume: 8763
  volume-title: Proceedings of the 38th International Conference on Machine Learning, Marina Meila and Tong Zhang (Eds.) (ICML ’21
  year: 2021
  ident: e_1_2_1_38_1
– ident: e_1_2_1_13_1
  doi: 10.18653/v1
– ident: e_1_2_1_32_1
  doi: 10.1145/3613904.3641936
– ident: e_1_2_1_67_1
  doi: 10.18653/v1
– volume-title: Shao Kun Deng, and Neel Sundaresan
  year: 2020
  ident: e_1_2_1_47_1
– ident: e_1_2_1_56_1
  doi: 10.1109/ESEM.2017.27
– ident: e_1_2_1_69_1
  doi: 10.1145/3520312.3534864
– ident: e_1_2_1_64_1
  doi: 10.1007/978-3-642-29044-2
– ident: e_1_2_1_30_1
  doi: 10.1145/3544548.3580817
– ident: e_1_2_1_14_1
  doi: 10.1145/2568225.2568266
– ident: e_1_2_1_68_1
  doi: 10.1109/TSE.2022.3156071
– ident: e_1_2_1_11_1
  doi: 10.1007/s10664-017-9523-3
– ident: e_1_2_1_3_1
  doi: 10.21606/drs.2020.282
– ident: e_1_2_1_58_1
  doi: 10.1145/3387111
– ident: e_1_2_1_31_1
  doi: 10.21437/Interspeech.2010-343
– ident: e_1_2_1_46_1
  doi: 10.1109/TSE.2005.97
– ident: e_1_2_1_6_1
  doi: 10.1145/3555212
SSID ssj0002991170
Score 2.2615783
Snippet Recently, large language models (LLM) based generative AI has been gaining momentum for their impressive high-quality performances in multiple domains,...
SourceID crossref
acm
SourceType Enrichment Source
Index Database
Publisher
StartPage 699
SubjectTerms Human-centered computing
Laboratory experiments
Software and its engineering
Software development methods
Software development techniques
SubjectTermsDisplay Human-centered computing -- Laboratory experiments
Software and its engineering -- Software development methods
Software and its engineering -- Software development techniques
Title Rocks Coding, Not Development: A Human-Centric, Experimental Evaluation of LLM-Supported SE Tasks
URI https://dl.acm.org/doi/10.1145/3643758
Volume 1
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2994-970X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002991170
  issn: 2994-970X
  databaseCode: M~E
  dateStart: 20240101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Nb9MwFLfK4MCFjwFiMCYfEBcWkThOHe9WTdl2WKuJFdFb5fhDiijJtLTbTog_HTt2YndwgAOXqHKf3Mjv1-fn5_feD4D3qfbYFCYyyrjMIox5rv9ztIwUT8tSYBlLITqyCTKb5YsFvRiNfva1MDcrUtf53R29-q-q1mNa2aZ09h_UPUyqB_RnrXT91GrXz79S_Gdt4QwRp3BsJbNmHaYG2VL0LnYfdaHdquM4L8JO_8XQAry7QjifRob806Tlau-0-Dhn7bc29Govhl2w7XMOJsdTcw_Rait_a5LLpO976GP49gW_ysrfjtihs021arzkENU-Zc2tlz6vNjasUDZh8AJhExVNUAA3lxDfGTxk2hRTEi-2rHMAwpPLIrC1Y8us5LZtYgutf98RsGmekZr7Sdsj_l57bffNA_AQkYyapMDpDx-l069kqHlstbWZ65OTN84M_x44M4FXMn8GnrjjBJxYGDwHI1nvgqc9VQd0lvsFYB0qoEXFIdSYgAEmjuAEbiHiEIZ4gB4PsFFwCw_wsoAdHl6CLyfF_PgscuwaEUsQyiMucaaUZCpPWcIYFWOEiaJMEMZUzMdMn-QpyXmMhVJE8RipnLKSJyqRAtM4fQV26qaWrwEssZYvsSEPEjhLaa4wy8zBNiYMjxnaA7t6sZZXtn_K0i3hHvjQL96Su4b0hhdltbTF8pkXhINgP8c9kTd__IW34LEH3T7YWV9v5DvwiN-sq_b6oNP0L3kIdn8
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Rocks+Coding%2C+Not+Development%3A+A+Human-Centric%2C+Experimental+Evaluation+of+LLM-Supported+SE+Tasks&rft.jtitle=Proceedings+of+the+ACM+on+software+engineering&rft.au=Wang%2C+Wei&rft.au=Ning%2C+Huilong&rft.au=Zhang%2C+Gaowei&rft.au=Liu%2C+Libo&rft.date=2024-07-12&rft.pub=ACM&rft.eissn=2994-970X&rft.volume=1&rft.issue=FSE&rft.spage=699&rft.epage=721&rft_id=info:doi/10.1145%2F3643758&rft.externalDocID=3643758
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2994-970X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2994-970X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2994-970X&client=summon