Qualitative Coding with GPT-4 Where it Works Better

This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies —...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Learning Analytics Jg. 12; H. 1; S. 169 - 185
Hauptverfasser: Liu, Xiner, Zambrano, Andres Felipe, Baker, Ryan S., Barany, Amanda, Ocumpaugh, Jaclyn, Zhang, Jiayi, Pankiewicz, Maciej, Nasiar, Nidhi, Wei, Zhanlan
Format: Journal Article
Sprache:Englisch
Veröffentlicht: 27.03.2025
ISSN:1929-7750, 1929-7750
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies — Zero-shot, Few-shot, and Few-shot with contextual information — as well as the use of embeddings. We do so in the context of qualitatively coding three distinct educational datasets: Algebra I semi-personalized tutoring session transcripts, student observations in a game-based learning environment, and debugging behaviours in an introductory programming course. We evaluated the performance of each approach based on its inter-rater agreement with human coders and explored how different methods vary in effectiveness depending on a construct’s degree of clarity, concreteness, objectivity, granularity, and specificity. Our findings suggest that while GPT-4 can code a broad range of constructs, no single method consistently outperforms the others, and the selection of a particular method should be tailored to the specific properties of the construct and context being analyzed. We also found that GPT-4 has the most difficulty with the same constructs than human coders find more difficult to reach inter-rater reliability on.
AbstractList This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies — Zero-shot, Few-shot, and Few-shot with contextual information — as well as the use of embeddings. We do so in the context of qualitatively coding three distinct educational datasets: Algebra I semi-personalized tutoring session transcripts, student observations in a game-based learning environment, and debugging behaviours in an introductory programming course. We evaluated the performance of each approach based on its inter-rater agreement with human coders and explored how different methods vary in effectiveness depending on a construct’s degree of clarity, concreteness, objectivity, granularity, and specificity. Our findings suggest that while GPT-4 can code a broad range of constructs, no single method consistently outperforms the others, and the selection of a particular method should be tailored to the specific properties of the construct and context being analyzed. We also found that GPT-4 has the most difficulty with the same constructs than human coders find more difficult to reach inter-rater reliability on.
Author Zhang, Jiayi
Liu, Xiner
Zambrano, Andres Felipe
Ocumpaugh, Jaclyn
Barany, Amanda
Baker, Ryan S.
Nasiar, Nidhi
Wei, Zhanlan
Pankiewicz, Maciej
Author_xml – sequence: 1
  givenname: Xiner
  orcidid: 0009-0004-3796-2251
  surname: Liu
  fullname: Liu, Xiner
– sequence: 2
  givenname: Andres Felipe
  orcidid: 0000-0003-0692-1209
  surname: Zambrano
  fullname: Zambrano, Andres Felipe
– sequence: 3
  givenname: Ryan S.
  orcidid: 0000-0002-3051-3232
  surname: Baker
  fullname: Baker, Ryan S.
– sequence: 4
  givenname: Amanda
  orcidid: 0000-0003-2239-2271
  surname: Barany
  fullname: Barany, Amanda
– sequence: 5
  givenname: Jaclyn
  orcidid: 0000-0002-9667-8523
  surname: Ocumpaugh
  fullname: Ocumpaugh, Jaclyn
– sequence: 6
  givenname: Jiayi
  orcidid: 0000-0002-7334-4256
  surname: Zhang
  fullname: Zhang, Jiayi
– sequence: 7
  givenname: Maciej
  orcidid: 0000-0002-6945-0523
  surname: Pankiewicz
  fullname: Pankiewicz, Maciej
– sequence: 8
  givenname: Nidhi
  orcidid: 0009-0006-7063-5433
  surname: Nasiar
  fullname: Nasiar, Nidhi
– sequence: 9
  givenname: Zhanlan
  orcidid: 0009-0002-3931-6398
  surname: Wei
  fullname: Wei, Zhanlan
BookMark eNpNz0FLwzAYxvEgE5xzV29Cv0DrmyZvkh6l6CYMNmGew5s00Y7aSlMVv71OPXh6_qcHfuds1g99YOySQ8GNAnN96KgoocTCoMYTNudVWeVaI8z-9RlbpnQAAFFyISqYs6uHN-raiab2PWT10LT9U_bRTs_ZarfP5QU7jdSlsPzbBXu8u93X63yzXd3XN5vc80pNuaxImYANYSR0OignkDhGKR2SJ6c4BRWdUgIIBfcSZIkNeK-FU9oIsWDF768fh5TGEO3r2L7Q-Gk52B-f_fbZo88efeILevRDPw
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.18608/jla.2025.8575
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
EISSN 1929-7750
EndPage 185
ExternalDocumentID 10_18608_jla_2025_8575
GroupedDBID AAYXX
ABOPQ
ALMA_UNASSIGNED_HOLDINGS
CITATION
FRS
M~E
OK1
ID FETCH-LOGICAL-c196t-49a68e5da5fa5b7e6b35a15f44b5acab61ae6fb6630a531c40425d0cc73b67833
ISICitedReferencesCount 5
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001474975100010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1929-7750
IngestDate Sat Nov 29 08:06:30 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License https://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c196t-49a68e5da5fa5b7e6b35a15f44b5acab61ae6fb6630a531c40425d0cc73b67833
ORCID 0009-0004-3796-2251
0000-0002-3051-3232
0009-0006-7063-5433
0000-0002-9667-8523
0000-0002-7334-4256
0009-0002-3931-6398
0000-0003-2239-2271
0000-0002-6945-0523
0000-0003-0692-1209
PageCount 17
ParticipantIDs crossref_primary_10_18608_jla_2025_8575
PublicationCentury 2000
PublicationDate 2025-03-27
PublicationDateYYYYMMDD 2025-03-27
PublicationDate_xml – month: 03
  year: 2025
  text: 2025-03-27
  day: 27
PublicationDecade 2020
PublicationTitle Journal of Learning Analytics
PublicationYear 2025
SSID ssj0003213390
Score 2.3546994
Snippet This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring...
SourceID crossref
SourceType Index Database
StartPage 169
Subtitle Where it Works Better
Title Qualitative Coding with GPT-4
Volume 12
WOSCitedRecordID wos001474975100010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1929-7750
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0003213390
  issn: 1929-7750
  databaseCode: M~E
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NS8MwFA86PXgRRcWvSQ-CB-nsV9rmOIbTwxwiE3YraZvAZHZj62Re_Nt9r-mXbod58FJK6Attf8kvLy_vg5BrFhle5NlCB2yF7jhM6qFpS91g0rXjmOPJVVZswuv3_eGQFSe686ycgJck_nLJpv8KNbQB2Bg6-we4y06hAe4BdLgC7HDdCHiVFUPl8-5M4tLY-vA8yCNdVnXRXmkgwRwlac0DvjdaIApDjBEsjcwY58WTSekQOb_tivFoWh3Q89xX4-UT6aNVtYNYhmr7nRe2gNzkYFH0uVIR_DlLgk4FarnKGNsSa9oKarVWhpDiSVPVZ8mXXFNV7Vlhc9_NQhTexpggyqItrCZarVvFWf2v5ax0MsTtDfYQgHyA8gHKb5Mdy6MMCfDpqzLH2RZs1jOLXPkheYpP7OLuxyvUVJiaLjI4IPs5cFpbgX9ItkRyRJo14DUFvIbAaxnwx-S1ez_oPOp58Qs9AlJMdYdx1xc05lRyGnrCDW3KTSodJ6Q84qFrcuHKEBRGgwOPRg6yb2xEMPFCUEBs-4Q0kkkiTolm0gg2rn5shqA9u4JzmIKUGZJJ2xAgeUZuiu8JpirHSbD-551v_OQF2atGziVppLOFaJLd6CMdzWdX2b__BhqmQ1I
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Qualitative+Coding+with+GPT-4&rft.jtitle=Journal+of+Learning+Analytics&rft.au=Liu%2C+Xiner&rft.au=Zambrano%2C+Andres+Felipe&rft.au=Baker%2C+Ryan+S.&rft.au=Barany%2C+Amanda&rft.date=2025-03-27&rft.issn=1929-7750&rft.eissn=1929-7750&rft.volume=12&rft.issue=1&rft.spage=169&rft.epage=185&rft_id=info:doi/10.18608%2Fjla.2025.8575&rft.externalDBID=n%2Fa&rft.externalDocID=10_18608_jla_2025_8575
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1929-7750&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1929-7750&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1929-7750&client=summon