Applying Large Language Models to Enhance the Assessment of Parallel Functional Programming Assignments

Courses in computer science (CS) often assess student programming assignments manually, with the intent of providing in-depth feedback to each student regarding correctness, style, efficiency, and other quality attributes. As class sizes increase, however, it is hard to provide detailed feedback con...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2024 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code) s. 102 - 110
Hlavní autoři:	Grandel, Skyler, Schmidt, Douglas C., Leach, Kevin
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	ACM 20.04.2024
Témata:	Accuracy Automated Grading Chatbots ChatGPT Codes Computational modeling Conferences Education Feature extraction Functional programming Generative AI Large language models Prompt engineering Software maintenance
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Courses in computer science (CS) often assess student programming assignments manually, with the intent of providing in-depth feedback to each student regarding correctness, style, efficiency, and other quality attributes. As class sizes increase, however, it is hard to provide detailed feedback consistently, especially when multiple assessors are required to handle a larger number of assignment submissions. Large language models (LLMs), such as ChatGPT, offer a promising alternative to help automate this process in a consistent, scalable, and minimally-biased manner.This paper explores ChatGPT-4's scalablility and accuracy in assessing programming assignments based on predefined rubrics in the context of a case study we conducted in an upper-level undergraduate and graduate CS course at Vanderbilt University. In this case study, we employed a method that compared assessments generated by ChatGPT-4 against human graders to measure the accuracy, precision, and recall associated with identifying programming mistakes. Our results show that when ChatGPT-4 is used properly (e.g., with appropriate prompt engineering and feature selection) it can improve objectivity and grading efficiency, thereby acting as a complementary tool to human graders for advanced computer science graduate and undergraduate students.CCS CONCEPTS* Software and its engineering → Software maintenance tools; * Applied computing → Computer-assisted instruction.
DOI:	10.1145/3643795.3648375