The future of grading programming assignments in education: The role of ChatGPT in automating the assessment and feedback process

•ChatGPT, similar to the teacher, can assess works as correct, almost correct, or incorrect.•ChatGPT’s grades correlate with teacher’s evaluations.•ChatGPT’s grades are usually lower than the teacher’s evaluations.•The evaluation generated by ChatGPT is repeatable in successive iterations.•Teachers...

Full description

Saved in:
Bibliographic Details
Published in:Thinking skills and creativity Vol. 52; p. 101522
Main Author: Jukiewicz, Marcin
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.06.2024
Subjects:
ISSN:1871-1871, 1878-0423
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•ChatGPT, similar to the teacher, can assess works as correct, almost correct, or incorrect.•ChatGPT’s grades correlate with teacher’s evaluations.•ChatGPT’s grades are usually lower than the teacher’s evaluations.•The evaluation generated by ChatGPT is repeatable in successive iterations.•Teachers spend hours grading assessments, while ChatGPT does it within minutes. This research evaluated ChatGPT’s potential as a tool for grading programming tasks, exploring its capability to understand and assess code quality. The study took place over a 15-week Python programming course with 67 students of the Cognitive Science program. Nine different assignments were assessed by both a teacher and the ChatGPT system, and the grading differences were recorded. The teacher’s grades were higher than those generated by ChatGPT. Despite this, there was a strong positive correlation between these grades, suggesting consistency in grading. Nonetheless, the repeatability of ChatGPT’s evaluations was excellent, and the observed differences in successive evaluations during grading iterations were negligible. The study concludes that ChatGPT could be a beneficial tool for grading programming assignments, providing several advantages such as time efficiency, quality assessment, unbiased grading, enforcement of coding standards, and the ability to generate feedback. However, the system has limitations such as cost, potential hallucinations, lack of absolute agreement reproducible results, and the occasional need for teacher intervention. The study suggests that the artificial intelligence model could complement or even substitute human grading but requires careful usage and potential verification by a human teacher.
ISSN:1871-1871
1878-0423
DOI:10.1016/j.tsc.2024.101522