Qualitative Coding with GPT-4 Where it Works Better

This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies —...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of Learning Analytics Ročník 12; číslo 1; s. 169 - 185
Hlavní autoři: Liu, Xiner, Zambrano, Andres Felipe, Baker, Ryan S., Barany, Amanda, Ocumpaugh, Jaclyn, Zhang, Jiayi, Pankiewicz, Maciej, Nasiar, Nidhi, Wei, Zhanlan
Médium: Journal Article
Jazyk:angličtina
Vydáno: 27.03.2025
ISSN:1929-7750, 1929-7750
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies — Zero-shot, Few-shot, and Few-shot with contextual information — as well as the use of embeddings. We do so in the context of qualitatively coding three distinct educational datasets: Algebra I semi-personalized tutoring session transcripts, student observations in a game-based learning environment, and debugging behaviours in an introductory programming course. We evaluated the performance of each approach based on its inter-rater agreement with human coders and explored how different methods vary in effectiveness depending on a construct’s degree of clarity, concreteness, objectivity, granularity, and specificity. Our findings suggest that while GPT-4 can code a broad range of constructs, no single method consistently outperforms the others, and the selection of a particular method should be tailored to the specific properties of the construct and context being analyzed. We also found that GPT-4 has the most difficulty with the same constructs than human coders find more difficult to reach inter-rater reliability on.
ISSN:1929-7750
1929-7750
DOI:10.18608/jla.2025.8575