The future of grading programming assignments in education: The role of ChatGPT in automating the assessment and feedback process

•ChatGPT, similar to the teacher, can assess works as correct, almost correct, or incorrect.•ChatGPT’s grades correlate with teacher’s evaluations.•ChatGPT’s grades are usually lower than the teacher’s evaluations.•The evaluation generated by ChatGPT is repeatable in successive iterations.•Teachers...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Thinking skills and creativity Ročník 52; s. 101522
Hlavní autor: Jukiewicz, Marcin
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.06.2024
Témata:
ISSN:1871-1871, 1878-0423
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•ChatGPT, similar to the teacher, can assess works as correct, almost correct, or incorrect.•ChatGPT’s grades correlate with teacher’s evaluations.•ChatGPT’s grades are usually lower than the teacher’s evaluations.•The evaluation generated by ChatGPT is repeatable in successive iterations.•Teachers spend hours grading assessments, while ChatGPT does it within minutes. This research evaluated ChatGPT’s potential as a tool for grading programming tasks, exploring its capability to understand and assess code quality. The study took place over a 15-week Python programming course with 67 students of the Cognitive Science program. Nine different assignments were assessed by both a teacher and the ChatGPT system, and the grading differences were recorded. The teacher’s grades were higher than those generated by ChatGPT. Despite this, there was a strong positive correlation between these grades, suggesting consistency in grading. Nonetheless, the repeatability of ChatGPT’s evaluations was excellent, and the observed differences in successive evaluations during grading iterations were negligible. The study concludes that ChatGPT could be a beneficial tool for grading programming assignments, providing several advantages such as time efficiency, quality assessment, unbiased grading, enforcement of coding standards, and the ability to generate feedback. However, the system has limitations such as cost, potential hallucinations, lack of absolute agreement reproducible results, and the occasional need for teacher intervention. The study suggests that the artificial intelligence model could complement or even substitute human grading but requires careful usage and potential verification by a human teacher.
ISSN:1871-1871
1878-0423
DOI:10.1016/j.tsc.2024.101522