Comparing the Quality of AI-generated and Instructor Feedback in a University Writing Program

Saved in:
Bibliographic Details
Title: Comparing the Quality of AI-generated and Instructor Feedback in a University Writing Program
Authors: Nassar, Hana Mohamed
Source: Theses and Dissertations
Publisher Information: AUC Knowledge Fountain
Publication Year: 2025
Subject Terms: AI-generated feedback, teacher feedback, language instruction, writing, AWE, scores, proficiency, Applied Linguistics
Description: Feedback is an undeniably important aspect of the language learning process. It helps students recognize their strengths and weaknesses and identifies ways they can improve. Over the years, feedback has been provided by teachers, peers and Automated Writing Evaluation (AWE) tools. However, in recent years, artificial intelligence applications have proliferated significantly. With abilities to analyze and generate any kind of content, these models are being used to generate scores and feedback on written assignments to help lighten teachers’ load. ChatGPT has been called “the world’s most advanced chatbot” and “a potential chance to improve second language learning and instruction” (Shabara et al., 2024). The present study aims to investigate the quality of AI-generated scores and feedback on writing in comparison to teacher scores and feedback. Using a mixed methods design, the study compared ChatGPT-generated and its regenerated scores and qualitative comments to those assigned by experienced university instructors. A total of 89 argumentative essays were collected from the archives of a private university in Egypt. ChatGPT- 4o and two human raters scored them using a rubric that evaluates writing based on four criteria: content and development, organization and connection of ideas, linguistic range and control, and communicative effect. All scores were statistically analyzed to examine the consistency and accuracy of ChatGPT in scoring. Similarly, the written feedback was thematically analyzed and compared to teacher feedback. Themes identified from the data included tone of feedback, following the rubric, prioritizing certain writing features, and providing judgmental or improvement-oriented feedback. The quantitative data revealed a moderate correlation between AI-generated and teacher scores, with the only strong relationship being in the linguistic precision criterion. The results also showed a weak consistency in ChatGPT-generated and regenerated scores. In terms of qualitative feedback, it was found to ...
Document Type: text
File Description: application/pdf
Language: unknown
Relation: https://fount.aucegypt.edu/etds/2468; https://fount.aucegypt.edu/context/etds/article/3514/viewcontent/hana_mohamed_nassar_thesis.pdf
Availability: https://fount.aucegypt.edu/etds/2468
https://fount.aucegypt.edu/context/etds/article/3514/viewcontent/hana_mohamed_nassar_thesis.pdf
Accession Number: edsbas.4F35F59F
Database: BASE
Description
Abstract:Feedback is an undeniably important aspect of the language learning process. It helps students recognize their strengths and weaknesses and identifies ways they can improve. Over the years, feedback has been provided by teachers, peers and Automated Writing Evaluation (AWE) tools. However, in recent years, artificial intelligence applications have proliferated significantly. With abilities to analyze and generate any kind of content, these models are being used to generate scores and feedback on written assignments to help lighten teachers’ load. ChatGPT has been called “the world’s most advanced chatbot” and “a potential chance to improve second language learning and instruction” (Shabara et al., 2024). The present study aims to investigate the quality of AI-generated scores and feedback on writing in comparison to teacher scores and feedback. Using a mixed methods design, the study compared ChatGPT-generated and its regenerated scores and qualitative comments to those assigned by experienced university instructors. A total of 89 argumentative essays were collected from the archives of a private university in Egypt. ChatGPT- 4o and two human raters scored them using a rubric that evaluates writing based on four criteria: content and development, organization and connection of ideas, linguistic range and control, and communicative effect. All scores were statistically analyzed to examine the consistency and accuracy of ChatGPT in scoring. Similarly, the written feedback was thematically analyzed and compared to teacher feedback. Themes identified from the data included tone of feedback, following the rubric, prioritizing certain writing features, and providing judgmental or improvement-oriented feedback. The quantitative data revealed a moderate correlation between AI-generated and teacher scores, with the only strong relationship being in the linguistic precision criterion. The results also showed a weak consistency in ChatGPT-generated and regenerated scores. In terms of qualitative feedback, it was found to ...