Qualitative Coding with GPT-4 Where it Works Better
This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies —...
Uloženo v:
| Vydáno v: | Journal of Learning Analytics Ročník 12; číslo 1; s. 169 - 185 |
|---|---|
| Hlavní autoři: | , , , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
27.03.2025
|
| ISSN: | 1929-7750, 1929-7750 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies — Zero-shot, Few-shot, and Few-shot with contextual information — as well as the use of embeddings. We do so in the context of qualitatively coding three distinct educational datasets: Algebra I semi-personalized tutoring session transcripts, student observations in a game-based learning environment, and debugging behaviours in an introductory programming course. We evaluated the performance of each approach based on its inter-rater agreement with human coders and explored how different methods vary in effectiveness depending on a construct’s degree of clarity, concreteness, objectivity, granularity, and specificity. Our findings suggest that while GPT-4 can code a broad range of constructs, no single method consistently outperforms the others, and the selection of a particular method should be tailored to the specific properties of the construct and context being analyzed. We also found that GPT-4 has the most difficulty with the same constructs than human coders find more difficult to reach inter-rater reliability on. |
|---|---|
| AbstractList | This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies — Zero-shot, Few-shot, and Few-shot with contextual information — as well as the use of embeddings. We do so in the context of qualitatively coding three distinct educational datasets: Algebra I semi-personalized tutoring session transcripts, student observations in a game-based learning environment, and debugging behaviours in an introductory programming course. We evaluated the performance of each approach based on its inter-rater agreement with human coders and explored how different methods vary in effectiveness depending on a construct’s degree of clarity, concreteness, objectivity, granularity, and specificity. Our findings suggest that while GPT-4 can code a broad range of constructs, no single method consistently outperforms the others, and the selection of a particular method should be tailored to the specific properties of the construct and context being analyzed. We also found that GPT-4 has the most difficulty with the same constructs than human coders find more difficult to reach inter-rater reliability on. |
| Author | Zhang, Jiayi Liu, Xiner Zambrano, Andres Felipe Ocumpaugh, Jaclyn Barany, Amanda Baker, Ryan S. Nasiar, Nidhi Wei, Zhanlan Pankiewicz, Maciej |
| Author_xml | – sequence: 1 givenname: Xiner orcidid: 0009-0004-3796-2251 surname: Liu fullname: Liu, Xiner – sequence: 2 givenname: Andres Felipe orcidid: 0000-0003-0692-1209 surname: Zambrano fullname: Zambrano, Andres Felipe – sequence: 3 givenname: Ryan S. orcidid: 0000-0002-3051-3232 surname: Baker fullname: Baker, Ryan S. – sequence: 4 givenname: Amanda orcidid: 0000-0003-2239-2271 surname: Barany fullname: Barany, Amanda – sequence: 5 givenname: Jaclyn orcidid: 0000-0002-9667-8523 surname: Ocumpaugh fullname: Ocumpaugh, Jaclyn – sequence: 6 givenname: Jiayi orcidid: 0000-0002-7334-4256 surname: Zhang fullname: Zhang, Jiayi – sequence: 7 givenname: Maciej orcidid: 0000-0002-6945-0523 surname: Pankiewicz fullname: Pankiewicz, Maciej – sequence: 8 givenname: Nidhi orcidid: 0009-0006-7063-5433 surname: Nasiar fullname: Nasiar, Nidhi – sequence: 9 givenname: Zhanlan orcidid: 0009-0002-3931-6398 surname: Wei fullname: Wei, Zhanlan |
| BookMark | eNpNz0FLwzAYxvEgE5xzV29Cv0DrmyZvkh6l6CYMNmGew5s00Y7aSlMVv71OPXh6_qcHfuds1g99YOySQ8GNAnN96KgoocTCoMYTNudVWeVaI8z-9RlbpnQAAFFyISqYs6uHN-raiab2PWT10LT9U_bRTs_ZarfP5QU7jdSlsPzbBXu8u93X63yzXd3XN5vc80pNuaxImYANYSR0OignkDhGKR2SJ6c4BRWdUgIIBfcSZIkNeK-FU9oIsWDF768fh5TGEO3r2L7Q-Gk52B-f_fbZo88efeILevRDPw |
| ContentType | Journal Article |
| DBID | AAYXX CITATION |
| DOI | 10.18608/jla.2025.8575 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 1929-7750 |
| EndPage | 185 |
| ExternalDocumentID | 10_18608_jla_2025_8575 |
| GroupedDBID | AAYXX ABOPQ ALMA_UNASSIGNED_HOLDINGS CITATION FRS M~E OK1 |
| ID | FETCH-LOGICAL-c196t-49a68e5da5fa5b7e6b35a15f44b5acab61ae6fb6630a531c40425d0cc73b67833 |
| ISICitedReferencesCount | 5 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001474975100010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1929-7750 |
| IngestDate | Sat Nov 29 08:06:30 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | https://creativecommons.org/licenses/by/4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c196t-49a68e5da5fa5b7e6b35a15f44b5acab61ae6fb6630a531c40425d0cc73b67833 |
| ORCID | 0009-0004-3796-2251 0000-0002-3051-3232 0009-0006-7063-5433 0000-0002-9667-8523 0000-0002-7334-4256 0009-0002-3931-6398 0000-0003-2239-2271 0000-0002-6945-0523 0000-0003-0692-1209 |
| PageCount | 17 |
| ParticipantIDs | crossref_primary_10_18608_jla_2025_8575 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-03-27 |
| PublicationDateYYYYMMDD | 2025-03-27 |
| PublicationDate_xml | – month: 03 year: 2025 text: 2025-03-27 day: 27 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of Learning Analytics |
| PublicationYear | 2025 |
| SSID | ssj0003213390 |
| Score | 2.354668 |
| Snippet | This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring... |
| SourceID | crossref |
| SourceType | Index Database |
| StartPage | 169 |
| Subtitle | Where it Works Better |
| Title | Qualitative Coding with GPT-4 |
| Volume | 12 |
| WOSCitedRecordID | wos001474975100010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1929-7750 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0003213390 issn: 1929-7750 databaseCode: M~E dateStart: 20140101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV07T8MwELagMLAgECBeRRmQGFBKEsd5jFVFYSgVQkXqFtmOKxWVtGpTVBZ-O3dxHoV2KAOLlVg5K_Znn8_nexByHQifM-ook0rmmm5oUTO0uG_yQFq2UHAminmWbMLvdoN-PyxudGdZOgE_SYLFIpz8K9RQB2Cj6-wf4C4bhQp4BtChBNih3Ah4HRVDx_NujeNS2frw3Ms9XVZl0U6pIMEYJemSBXxnOEcU-ugjWCqZ0c-LJ-PSIHJ221aj4aS6oOe5rcbLJ7KPRlUPZBmqzXde6AJylYPD0OZKe_DnXBJkKhDLdcTYhlpTV7BWZ2UKaT5p6_ws-ZZr66w9K9w88DIXhbcRBohyWAOziVb7VnFX_2s7K40M8XiDLURAHyF9hPTbZMfxWYjWf09flTqOOnBYzzRyZUfyEJ_YxN2PX1gSYZZkkd4B2c-BM5oa_EOypZIjUl8C3tDAGwi8kQF_TF7b973Wo5knvzAlMMUUlg33AsVizgacCV95gjJus4HrCsYlF57NlTcQIDBasNhs6SL3jS0pfSpAAKH0hNSScaJOicFA_gB6y4PChZ4KL2aOghclKfUD74zcFP2JJjrGSbR-8M43_vKC7FUz55LU0ulc1cmu_EiHs-lVNvbfn5tDGQ |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Qualitative+Coding+with+GPT-4&rft.jtitle=Journal+of+Learning+Analytics&rft.au=Liu%2C+Xiner&rft.au=Zambrano%2C+Andres+Felipe&rft.au=Baker%2C+Ryan+S.&rft.au=Barany%2C+Amanda&rft.date=2025-03-27&rft.issn=1929-7750&rft.eissn=1929-7750&rft.volume=12&rft.issue=1&rft.spage=169&rft.epage=185&rft_id=info:doi/10.18608%2Fjla.2025.8575&rft.externalDBID=n%2Fa&rft.externalDocID=10_18608_jla_2025_8575 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1929-7750&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1929-7750&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1929-7750&client=summon |