GenAI Reliability in Content Analysis: Assessing Agreement Between LLMs in Measuring Discursive Violence
This study investigates the reliability of three leading large language models (LLMs), ChatGPT 4.5, Claude 3.7 Sonnet, and Gemini 2.0 Flash, in measuring discursive violence against women in Eminem's lyrics. Through a three-phase experimental design, we assessed both inter-coder reliability bet...
Uloženo v:
| Vydáno v: | International Conference on Control Systems and Computer Science (Online) s. 604 - 611 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
27.05.2025
|
| Témata: | |
| ISSN: | 2379-0482 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | This study investigates the reliability of three leading large language models (LLMs), ChatGPT 4.5, Claude 3.7 Sonnet, and Gemini 2.0 Flash, in measuring discursive violence against women in Eminem's lyrics. Through a three-phase experimental design, we assessed both inter-coder reliability between different AI systems and measurement fidelity within each system across repeated evaluations. Our findings show good agreement among AI systems in their independent assessments (r = .75-.86), with stronger alignment between ChatGPT and Claude. When systems were exposed to each other's interpretations, agreement increased (reaching r = .89-.97), showing that cross-system exposure increases measurement consistency. All systems maintained stability in their evaluations across time (r = .93-.97), though Sexual Violence seems to be the most challenging dimension to evaluate reliably. These results have implications for AI-assisted content analysis, indicating that while these systems show promising reliability for evaluative tasks, some aspects of gender-based violence remain more subjective and challenging to operationalize consistently, even with structured frameworks. Our methodological approach offers new strategies for enhancing measurement consistency in AI-assisted qualitative research. |
|---|---|
| ISSN: | 2379-0482 |
| DOI: | 10.1109/CSCS66924.2025.00095 |