Qualitative Coding with GPT-4 Where it Works Better
This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies —...
Gespeichert in:
| Veröffentlicht in: | Journal of Learning Analytics Jg. 12; H. 1; S. 169 - 185 |
|---|---|
| Hauptverfasser: | , , , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
27.03.2025
|
| ISSN: | 1929-7750, 1929-7750 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies — Zero-shot, Few-shot, and Few-shot with contextual information — as well as the use of embeddings. We do so in the context of qualitatively coding three distinct educational datasets: Algebra I semi-personalized tutoring session transcripts, student observations in a game-based learning environment, and debugging behaviours in an introductory programming course. We evaluated the performance of each approach based on its inter-rater agreement with human coders and explored how different methods vary in effectiveness depending on a construct’s degree of clarity, concreteness, objectivity, granularity, and specificity. Our findings suggest that while GPT-4 can code a broad range of constructs, no single method consistently outperforms the others, and the selection of a particular method should be tailored to the specific properties of the construct and context being analyzed. We also found that GPT-4 has the most difficulty with the same constructs than human coders find more difficult to reach inter-rater reliability on. |
|---|---|
| AbstractList | This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies — Zero-shot, Few-shot, and Few-shot with contextual information — as well as the use of embeddings. We do so in the context of qualitatively coding three distinct educational datasets: Algebra I semi-personalized tutoring session transcripts, student observations in a game-based learning environment, and debugging behaviours in an introductory programming course. We evaluated the performance of each approach based on its inter-rater agreement with human coders and explored how different methods vary in effectiveness depending on a construct’s degree of clarity, concreteness, objectivity, granularity, and specificity. Our findings suggest that while GPT-4 can code a broad range of constructs, no single method consistently outperforms the others, and the selection of a particular method should be tailored to the specific properties of the construct and context being analyzed. We also found that GPT-4 has the most difficulty with the same constructs than human coders find more difficult to reach inter-rater reliability on. |
| Author | Zhang, Jiayi Liu, Xiner Zambrano, Andres Felipe Ocumpaugh, Jaclyn Barany, Amanda Baker, Ryan S. Nasiar, Nidhi Wei, Zhanlan Pankiewicz, Maciej |
| Author_xml | – sequence: 1 givenname: Xiner orcidid: 0009-0004-3796-2251 surname: Liu fullname: Liu, Xiner – sequence: 2 givenname: Andres Felipe orcidid: 0000-0003-0692-1209 surname: Zambrano fullname: Zambrano, Andres Felipe – sequence: 3 givenname: Ryan S. orcidid: 0000-0002-3051-3232 surname: Baker fullname: Baker, Ryan S. – sequence: 4 givenname: Amanda orcidid: 0000-0003-2239-2271 surname: Barany fullname: Barany, Amanda – sequence: 5 givenname: Jaclyn orcidid: 0000-0002-9667-8523 surname: Ocumpaugh fullname: Ocumpaugh, Jaclyn – sequence: 6 givenname: Jiayi orcidid: 0000-0002-7334-4256 surname: Zhang fullname: Zhang, Jiayi – sequence: 7 givenname: Maciej orcidid: 0000-0002-6945-0523 surname: Pankiewicz fullname: Pankiewicz, Maciej – sequence: 8 givenname: Nidhi orcidid: 0009-0006-7063-5433 surname: Nasiar fullname: Nasiar, Nidhi – sequence: 9 givenname: Zhanlan orcidid: 0009-0002-3931-6398 surname: Wei fullname: Wei, Zhanlan |
| BookMark | eNpNz0FLwzAYxvEgE5xzV29Cv0DrmyZvkh6l6CYMNmGew5s00Y7aSlMVv71OPXh6_qcHfuds1g99YOySQ8GNAnN96KgoocTCoMYTNudVWeVaI8z-9RlbpnQAAFFyISqYs6uHN-raiab2PWT10LT9U_bRTs_ZarfP5QU7jdSlsPzbBXu8u93X63yzXd3XN5vc80pNuaxImYANYSR0OignkDhGKR2SJ6c4BRWdUgIIBfcSZIkNeK-FU9oIsWDF768fh5TGEO3r2L7Q-Gk52B-f_fbZo88efeILevRDPw |
| ContentType | Journal Article |
| DBID | AAYXX CITATION |
| DOI | 10.18608/jla.2025.8575 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 1929-7750 |
| EndPage | 185 |
| ExternalDocumentID | 10_18608_jla_2025_8575 |
| GroupedDBID | AAYXX ABOPQ ALMA_UNASSIGNED_HOLDINGS CITATION FRS M~E OK1 |
| ID | FETCH-LOGICAL-c196t-49a68e5da5fa5b7e6b35a15f44b5acab61ae6fb6630a531c40425d0cc73b67833 |
| ISICitedReferencesCount | 5 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001474975100010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1929-7750 |
| IngestDate | Sat Nov 29 08:06:30 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | https://creativecommons.org/licenses/by/4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c196t-49a68e5da5fa5b7e6b35a15f44b5acab61ae6fb6630a531c40425d0cc73b67833 |
| ORCID | 0009-0004-3796-2251 0000-0002-3051-3232 0009-0006-7063-5433 0000-0002-9667-8523 0000-0002-7334-4256 0009-0002-3931-6398 0000-0003-2239-2271 0000-0002-6945-0523 0000-0003-0692-1209 |
| PageCount | 17 |
| ParticipantIDs | crossref_primary_10_18608_jla_2025_8575 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-03-27 |
| PublicationDateYYYYMMDD | 2025-03-27 |
| PublicationDate_xml | – month: 03 year: 2025 text: 2025-03-27 day: 27 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of Learning Analytics |
| PublicationYear | 2025 |
| SSID | ssj0003213390 |
| Score | 2.3546994 |
| Snippet | This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring... |
| SourceID | crossref |
| SourceType | Index Database |
| StartPage | 169 |
| Subtitle | Where it Works Better |
| Title | Qualitative Coding with GPT-4 |
| Volume | 12 |
| WOSCitedRecordID | wos001474975100010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1929-7750 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0003213390 issn: 1929-7750 databaseCode: M~E dateStart: 20140101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NS8MwFA86PXgRRcWvSQ-CB-nsV9rmOIbTwxwiE3YraZvAZHZj62Re_Nt9r-mXbod58FJK6Attf8kvLy_vg5BrFhle5NlCB2yF7jhM6qFpS91g0rXjmOPJVVZswuv3_eGQFSe686ycgJck_nLJpv8KNbQB2Bg6-we4y06hAe4BdLgC7HDdCHiVFUPl8-5M4tLY-vA8yCNdVnXRXmkgwRwlac0DvjdaIApDjBEsjcwY58WTSekQOb_tivFoWh3Q89xX4-UT6aNVtYNYhmr7nRe2gNzkYFH0uVIR_DlLgk4FarnKGNsSa9oKarVWhpDiSVPVZ8mXXFNV7Vlhc9_NQhTexpggyqItrCZarVvFWf2v5ax0MsTtDfYQgHyA8gHKb5Mdy6MMCfDpqzLH2RZs1jOLXPkheYpP7OLuxyvUVJiaLjI4IPs5cFpbgX9ItkRyRJo14DUFvIbAaxnwx-S1ez_oPOp58Qs9AlJMdYdx1xc05lRyGnrCDW3KTSodJ6Q84qFrcuHKEBRGgwOPRg6yb2xEMPFCUEBs-4Q0kkkiTolm0gg2rn5shqA9u4JzmIKUGZJJ2xAgeUZuiu8JpirHSbD-551v_OQF2atGziVppLOFaJLd6CMdzWdX2b__BhqmQ1I |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Qualitative+Coding+with+GPT-4&rft.jtitle=Journal+of+Learning+Analytics&rft.au=Liu%2C+Xiner&rft.au=Zambrano%2C+Andres+Felipe&rft.au=Baker%2C+Ryan+S.&rft.au=Barany%2C+Amanda&rft.date=2025-03-27&rft.issn=1929-7750&rft.eissn=1929-7750&rft.volume=12&rft.issue=1&rft.spage=169&rft.epage=185&rft_id=info:doi/10.18608%2Fjla.2025.8575&rft.externalDBID=n%2Fa&rft.externalDocID=10_18608_jla_2025_8575 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1929-7750&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1929-7750&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1929-7750&client=summon |