Evaluating the Construct Validity of an Automated Writing Evaluation System with a Randomization Algorithm

Uloženo v:
Podrobná bibliografie
Název: Evaluating the Construct Validity of an Automated Writing Evaluation System with a Randomization Algorithm
Jazyk: English
Autoři: Myers, Matthew C. (ORCID 0000-0001-7414-9148), Wilson, Joshua (ORCID 0000-0002-7192-3510)
Zdroj: International Journal of Artificial Intelligence in Education. Sep 2023 33(3):609-634.
Dostupnost: Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
Peer Reviewed: Y
Page Count: 26
Datum vydání: 2023
Druh dokumentu: Journal Articles
Reports - Research
Education Level: Junior High Schools
Middle Schools
Secondary Education
Elementary Education
Grade 7
Grade 8
Descriptors: Construct Validity, Automation, Writing Evaluation, Algorithms, Scoring, Persuasive Discourse, Essays, Middle School Students, Grade 7, Grade 8, Programming Languages, Scores, Sentences, Concept Formation, Text Structure, Formative Evaluation, Feedback (Response), Computer Assisted Testing
DOI: 10.1007/s40593-022-00301-6
ISSN: 1560-4292
1560-4306
Abstrakt: This study evaluated the construct validity of six scoring traits of an automated writing evaluation (AWE) system called "MI Write." Persuasive essays (N = 100) written by students in grades 7 and 8 were randomized at the sentence-level using a script written with Python's NLTK module. Each persuasive essay was randomized 30 times (n = 3000 total randomizations), and the mean trait scores for each set of randomized iterations were compared to those of the control text across all traits. We were specifically interested in evaluating the effects of randomization on the high-level traits of "idea development" and "organization." Given the rubrics and qualitative feedback provided by MI Write, we hypothesized that these high-level traits ought to be sensitive to sentence-level randomization (i.e., scores should decrease). Overall, complete randomizations did not consistently significantly impact trait scoring for these high-level writing traits. In fact, more than a third of the essays saw significant increases in one or both high-level traits despite randomization, indicating a disconnect between MI Write's formative feedback and its underlying constructs. Findings have implications for consumers and developers of AWE.
Abstractor: As Provided
Entry Date: 2023
Přístupové číslo: EJ1388568
Databáze: ERIC
Popis
Abstrakt:This study evaluated the construct validity of six scoring traits of an automated writing evaluation (AWE) system called "MI Write." Persuasive essays (N = 100) written by students in grades 7 and 8 were randomized at the sentence-level using a script written with Python's NLTK module. Each persuasive essay was randomized 30 times (n = 3000 total randomizations), and the mean trait scores for each set of randomized iterations were compared to those of the control text across all traits. We were specifically interested in evaluating the effects of randomization on the high-level traits of "idea development" and "organization." Given the rubrics and qualitative feedback provided by MI Write, we hypothesized that these high-level traits ought to be sensitive to sentence-level randomization (i.e., scores should decrease). Overall, complete randomizations did not consistently significantly impact trait scoring for these high-level writing traits. In fact, more than a third of the essays saw significant increases in one or both high-level traits despite randomization, indicating a disconnect between MI Write's formative feedback and its underlying constructs. Findings have implications for consumers and developers of AWE.
ISSN:1560-4292
1560-4306
DOI:10.1007/s40593-022-00301-6