Algorithmic Fairness in Automatic Short Answer Scoring.

Gespeichert in:
Bibliographische Detailangaben
Titel: Algorithmic Fairness in Automatic Short Answer Scoring.
Autoren: Andersen, Nico, Mang, Julia, Goldhammer, Frank, Zehner, Fabian
Quelle: International Journal of Artificial Intelligence in Education (Springer Science & Business Media B.V.); Dec2025, Vol. 35 Issue 5, p3128-3165, 38p
Schlagwörter: SUPPORT vector machines, EQUALITY, SEXISM, LINGUISTIC usage, ARTIFICIAL intelligence, ASSESSMENT of education
Abstract: Equal treatment of groups and individuals is crucial for fair assessment and demand unbiased scoring decisions. We examined algorithmic fairness focusing on demographic disparities between groups of different gender and language use based on automatic scoring (text responses). We tested various combinations of semantic representations and classification methods on responses to reading comprehension items from the 2015 German PISA assessment. Classifications from the most accurate method, namely a Support Vector Machine trained with RoBERTa embeddings, exhibited no discernible gender differences, but a minor significant bias in the automatic scoring of students based on their language background. Specifically, students speaking mainly a foreign language at home received significantly higher automatic scores than their actual performance warranted, thereby gaining a relative advantage from the machine scoring system. Lower performing groups with more incorrect responses tend to receive more correct scores because incorrect responses are generally less likely to be recognized. Differences are particularly evident at the item level, where we identified several factors that promote algorithmic unfairness such as scoring accuracy, student performance, linguistic diversity of text responses, and the psychometrically determined item difficulty. [ABSTRACT FROM AUTHOR]
Copyright of International Journal of Artificial Intelligence in Education (Springer Science & Business Media B.V.) is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Datenbank: Complementary Index
Beschreibung
Abstract:Equal treatment of groups and individuals is crucial for fair assessment and demand unbiased scoring decisions. We examined algorithmic fairness focusing on demographic disparities between groups of different gender and language use based on automatic scoring (text responses). We tested various combinations of semantic representations and classification methods on responses to reading comprehension items from the 2015 German PISA assessment. Classifications from the most accurate method, namely a Support Vector Machine trained with RoBERTa embeddings, exhibited no discernible gender differences, but a minor significant bias in the automatic scoring of students based on their language background. Specifically, students speaking mainly a foreign language at home received significantly higher automatic scores than their actual performance warranted, thereby gaining a relative advantage from the machine scoring system. Lower performing groups with more incorrect responses tend to receive more correct scores because incorrect responses are generally less likely to be recognized. Differences are particularly evident at the item level, where we identified several factors that promote algorithmic unfairness such as scoring accuracy, student performance, linguistic diversity of text responses, and the psychometrically determined item difficulty. [ABSTRACT FROM AUTHOR]
ISSN:15604292
DOI:10.1007/s40593-025-00495-5