Coding with the machines: machine-assisted coding of rare event data
Abstract While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks—especially in “zer...
Uloženo v:
| Vydáno v: | PNAS nexus Ročník 3; číslo 5; s. pgae165 |
|---|---|
| Hlavní autoři: | , , , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
US
Oxford University Press
01.05.2024
|
| Témata: | |
| ISSN: | 2752-6542, 2752-6542 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Abstract
While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks—especially in “zero-shot” applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models’ performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM’s pre-training on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward. |
|---|---|
| AbstractList | While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks—especially in “zero-shot” applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models’ performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM’s pre-training on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward. While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks--especially in "zero-shot" applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models' performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM's pretraining on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward. While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks--especially in "zero-shot" applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models' performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM's pretraining on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward. Keywords: machine coding, political event data, GPT, BERT, machine learning While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks-especially in "zero-shot" applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models' performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM's pre-training on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward.While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks-especially in "zero-shot" applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models' performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM's pre-training on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward. Abstract While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks—especially in “zero-shot” applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models’ performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM’s pre-training on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward. |
| Audience | Academic |
| Author | Goers, Harriet Liu, Amy H Birnir, Jóhanna K Pathak, Ojashwi Gouws-Dewar, Jordan Chew, Keith Padraic Smith, Katy Hlatky, Roman Overos, Henry David |
| Author_xml | – sequence: 1 givenname: Henry David orcidid: 0000-0001-9804-7752 surname: Overos fullname: Overos, Henry David – sequence: 2 givenname: Roman orcidid: 0000-0001-8378-2877 surname: Hlatky fullname: Hlatky, Roman – sequence: 3 givenname: Ojashwi orcidid: 0009-0007-7653-6214 surname: Pathak fullname: Pathak, Ojashwi – sequence: 4 givenname: Harriet orcidid: 0000-0001-6010-4707 surname: Goers fullname: Goers, Harriet – sequence: 5 givenname: Jordan orcidid: 0009-0006-3251-055X surname: Gouws-Dewar fullname: Gouws-Dewar, Jordan – sequence: 6 givenname: Katy surname: Smith fullname: Smith, Katy – sequence: 7 givenname: Keith Padraic orcidid: 0000-0003-1789-0268 surname: Chew fullname: Chew, Keith Padraic – sequence: 8 givenname: Jóhanna K orcidid: 0000-0003-1261-8812 surname: Birnir fullname: Birnir, Jóhanna K email: jkbirnir@umd.edu – sequence: 9 givenname: Amy H orcidid: 0000-0001-5380-2849 surname: Liu fullname: Liu, Amy H |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/38765715$$D View this record in MEDLINE/PubMed |
| BookMark | eNqNkk9P3DAQxa2KqlDKB-ilitQLBwIeO7aTXiq09J-E1Et7tow92TVK7DROKP329Wp3KaAKIR9s2b_3Rs8zr8leiAEJeQv0FGjDz4ZgUsDbOZ0NS4MgxQtywJRgpRQV27t33idHKV1TSplSAJV4RfZ5raRQIA7IxSI6H5bFbz-timmFRW_sygdMH3an0qTk04SusBs0tsVoRizwBsNUODOZN-Rla7qER9v9kPz8_OnH4mt5-f3Lt8X5ZWkFZ1NpWnTCQS2kdAptVRujZNNWArmskHHBDaJrAJAKJhErZRt-1bDGopTUOX5IPm58h_mqR2dz_dF0ehh9b8Y_OhqvH74Ev9LLeKMBgDIqVXY43jqM8deMadK9Txa7zgSMc9KcCkUVAwoZff8IvY7zGHI-zaGBulFU1P-opelQ-9DGXNiuTfV5nf-8URXITJ3-h8rLYe9t7mvr8_0Dwbv7Se8i7hqXAbUB7BhTGrHV1k9m8nEd3HcaqF5Pib6bEr2dkqyER8qd-VOak40mzsMz8L__j9MH |
| CitedBy_id | crossref_primary_10_20991_allazimuth_1590826 crossref_primary_10_1057_s41599_025_04503_w crossref_primary_10_1017_S1049096525101248 |
| Cites_doi | 10.1177/0022002717719974 10.48550/arXiv.2303.18223 10.1073/pnas.2305016120 10.1017/pan.2018.11 10.1080/21670811.2017.1293487 10.1093/pan/mps028 10.48550/arXiv.2306.00176 10.1093/pnasnexus/pgad355 10.48550/arXiv.1810.04805 10.48550/arXiv.2304.10145 10.1017/lap.2022.25 10.1007/978-1-4614-5311-6_2 10.1177/089443939401200408 10.48550/arXiv.2304.06588 10.48550/arXiv.2304.11085 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences. 2024 The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences. COPYRIGHT 2024 Oxford University Press The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences. 2024 – notice: The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences. – notice: COPYRIGHT 2024 Oxford University Press – notice: The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | TOX AAYXX CITATION NPM 3V. 7X7 7XB 88I 8FE 8FG 8FH 8FI 8FJ 8FK ABJCF ABUWG AFKRA ATCPS AZQEC BBNVY BENPR BGLVJ BHPHI BKSAR CCPQU DWQXO FYUFA GHDGH GNUQQ HCIFZ K9. L6V LK8 M0S M2P M7P M7S PATMY PCBAR PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQGLB PQQKQ PQUKI PRINS PTHSS PYCSY Q9U 7X8 5PM |
| DOI | 10.1093/pnasnexus/pgae165 |
| DatabaseName | Open Access: Oxford University Press Open Journals CrossRef PubMed ProQuest Central (Corporate) ProQuest_Health & Medical Collection ProQuest Central (purchase pre-March 2016) Science Database (Alumni Edition) ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Collection Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland Agricultural & Environmental Science Collection (subscription) ProQuest Central Essentials Biological Science Collection ProQuest Central Technology collection Natural Science Collection Earth, Atmospheric & Aquatic Science Collection ProQuest One Community College ProQuest Central Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student SciTech Premium Collection ProQuest Health & Medical Complete (Alumni) ProQuest Engineering Collection Biological Sciences Health & Medical Collection (Alumni Edition) Science Database Biological Science Database Engineering Database Environmental Science Database Earth, Atmospheric & Aquatic Science Database ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Engineering collection Environmental Science Collection ProQuest Central Basic MEDLINE - Academic PubMed Central (Full Participant titles) |
| DatabaseTitle | CrossRef PubMed Publicly Available Content Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest One Health & Nursing ProQuest Natural Science Collection ProQuest Central China ProQuest Central Earth, Atmospheric & Aquatic Science Collection ProQuest One Applied & Life Sciences ProQuest Health & Medical Research Collection ProQuest Engineering Collection Health Research Premium Collection Health and Medicine Complete (Alumni Edition) Natural Science Collection ProQuest Central Korea Agricultural & Environmental Science Collection Biological Science Collection ProQuest Central (New) Engineering Collection Engineering Database ProQuest Science Journals (Alumni Edition) ProQuest Biological Science Collection ProQuest Central Basic ProQuest Science Journals ProQuest One Academic Eastern Edition Earth, Atmospheric & Aquatic Science Database ProQuest Hospital Collection ProQuest Technology Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest SciTech Collection ProQuest Hospital Collection (Alumni) Environmental Science Collection ProQuest Health & Medical Complete ProQuest One Academic UKI Edition Materials Science & Engineering Collection Environmental Science Database ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic |
| DatabaseTitleList | CrossRef MEDLINE - Academic PubMed Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: TOX name: Oxford University Press Open Access Journals url: https://academic.oup.com/journals/ sourceTypes: Publisher – sequence: 3 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Sciences (General) Engineering |
| EISSN | 2752-6542 |
| ExternalDocumentID | PMC11102067 A800097416 38765715 10_1093_pnasnexus_pgae165 10.1093/pnasnexus/pgae165 |
| Genre | Journal Article |
| GeographicLocations | United States |
| GeographicLocations_xml | – name: United States |
| GroupedDBID | 0R~ 53G AAPXW AAVAP ABEJV ABPTD ABXVV ALMA_UNASSIGNED_HOLDINGS AMNDL GROUPED_DOAJ IAO IHR INH ITC M~E NQS OK1 ROX RPM TOX 7X7 88I 8FI 8FJ AAYXX ABGNP ABJCF ABUWG AFFHD AFKRA ATCPS AZQEC BBNVY BENPR BGLVJ BHPHI BKSAR CCPQU CITATION DWQXO FYUFA GNUQQ H13 HCIFZ HMCUK M2P M7P M7S PATMY PCBAR PHGZM PHGZT PIMPY PQGLB PTHSS PYCSY UKHRP NPM 3V. 7XB 8FE 8FG 8FH 8FK K9. L6V LK8 PJZUB PKEHL PPXIY PQEST PQQKQ PQUKI PRINS Q9U 7X8 5PM |
| ID | FETCH-LOGICAL-c532t-afed5d18566d7ec48aa769f45e364e2353aeed911e0526ee47c93b929ce660dd3 |
| IEDL.DBID | BENPR |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001227610600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2752-6542 |
| IngestDate | Tue Nov 04 02:05:59 EST 2025 Thu Jul 10 23:12:00 EDT 2025 Tue Oct 07 07:09:06 EDT 2025 Tue Nov 11 11:02:35 EST 2025 Tue Nov 04 18:20:47 EST 2025 Wed Feb 19 02:05:45 EST 2025 Sat Nov 29 02:16:56 EST 2025 Tue Nov 18 22:12:47 EST 2025 Tue Jan 21 07:39:11 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Keywords | GPT BERT machine coding machine learning political event data |
| Language | English |
| License | This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. https://creativecommons.org/licenses/by/4.0 The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c532t-afed5d18566d7ec48aa769f45e364e2353aeed911e0526ee47c93b929ce660dd3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Competing Interest: The authors declare no competing interest. |
| ORCID | 0000-0001-8378-2877 0009-0007-7653-6214 0000-0003-1789-0268 0000-0001-9804-7752 0000-0001-6010-4707 0000-0001-5380-2849 0009-0006-3251-055X 0000-0003-1261-8812 |
| OpenAccessLink | https://www.proquest.com/docview/3191897058?pq-origsite=%requestingapplication% |
| PMID | 38765715 |
| PQID | 3191897058 |
| PQPubID | 7215252 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_11102067 proquest_miscellaneous_3057072101 proquest_journals_3191897058 gale_infotracmisc_A800097416 gale_infotracacademiconefile_A800097416 pubmed_primary_38765715 crossref_citationtrail_10_1093_pnasnexus_pgae165 crossref_primary_10_1093_pnasnexus_pgae165 oup_primary_10_1093_pnasnexus_pgae165 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-05-01 |
| PublicationDateYYYYMMDD | 2024-05-01 |
| PublicationDate_xml | – month: 05 year: 2024 text: 2024-05-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | US |
| PublicationPlace_xml | – name: US – name: England – name: Los Angeles |
| PublicationTitle | PNAS nexus |
| PublicationTitleAlternate | PNAS Nexus |
| PublicationYear | 2024 |
| Publisher | Oxford University Press |
| Publisher_xml | – name: Oxford University Press |
| References | Watanabe (2024053108503702500_pgae165-B16) 2018; 6 Dong (2024053108503702500_pgae165-B21) 2023 Reiss (2024053108503702500_pgae165-B8) 2023 Törnberg (2024053108503702500_pgae165-B7) 2023 OpenAI (2024053108503702500_pgae165-B13) OpenAI (2024053108503702500_pgae165-B18) Napp (2024053108503702500_pgae165-B4) 2023; 2 Schrodt (2024053108503702500_pgae165-B2) 1994; 12 Thapa (2024053108503702500_pgae165-B12) 2023 Devlin (2024053108503702500_pgae165-B17) 2019 Zhu (2024053108503702500_pgae165-B10) 2023 Grimmer (2024053108503702500_pgae165-B15) 2013; 21 Green (2024053108503702500_pgae165-B22) 2019; 27 Gilardi (2024053108503702500_pgae165-B6) 2023; 120 Zhao (2024053108503702500_pgae165-B3) 2023 Birnir (2024053108503702500_pgae165-B14) 2018; 62 Vera (2024053108503702500_pgae165-B5) 2023; 65 Pangakis (2024053108503702500_pgae165-B9) 2023 Ollion (2024053108503702500_pgae165-B11) Wickham (2024053108503702500_pgae165-B20) 2022 Schrodt (2024053108503702500_pgae165-B1) 2013 White (2024053108503702500_pgae165-B19) 2023 |
| References_xml | – volume: 62 start-page: 203 issue: 1 year: 2018 ident: 2024053108503702500_pgae165-B14 article-title: Introducing the AMAR (all minorities at risk) data publication-title: J Conflict Resol doi: 10.1177/0022002717719974 – year: 2023 ident: 2024053108503702500_pgae165-B3 doi: 10.48550/arXiv.2303.18223 – volume: 120 issue: 30 year: 2023 ident: 2024053108503702500_pgae165-B6 article-title: ChatGPT outperforms crowd workers for text-annotation tasks publication-title: Proc Natl Acad Sci U S A doi: 10.1073/pnas.2305016120 – volume: 27 start-page: 223 issue: 2 year: 2019 ident: 2024053108503702500_pgae165-B22 article-title: Machine learning human rights and wrongs: how the successes and failures of supervised learning algorithms can inform the debate about information effects publication-title: Polit Anal doi: 10.1017/pan.2018.11 – year: 2023 ident: 2024053108503702500_pgae165-B21 – ident: 2024053108503702500_pgae165-B11 – year: 2023 ident: 2024053108503702500_pgae165-B12 – volume: 6 start-page: 294 issue: 3 year: 2018 ident: 2024053108503702500_pgae165-B16 article-title: Newsmap: a semi-supervised approach to geographical news classification publication-title: Digital Journalism doi: 10.1080/21670811.2017.1293487 – volume: 21 start-page: 267 issue: 3 year: 2013 ident: 2024053108503702500_pgae165-B15 article-title: Text as data: the promise and pitfalls of automatic content analysis methods for political texts publication-title: Polit Anal doi: 10.1093/pan/mps028 – year: 2023 ident: 2024053108503702500_pgae165-B9 article-title: Automated annotation with generative AI requires validation doi: 10.48550/arXiv.2306.00176 – volume: 2 start-page: pgad355 issue: 11 year: 2023 ident: 2024053108503702500_pgae165-B4 article-title: Gender stereotypes embedded in natural language are stronger in more economically developed and individualistic countries publication-title: PNAS Nexus doi: 10.1093/pnasnexus/pgad355 – year: 2019 ident: 2024053108503702500_pgae165-B17 doi: 10.48550/arXiv.1810.04805 – year: 2022 ident: 2024053108503702500_pgae165-B20 – year: 2023 ident: 2024053108503702500_pgae165-B10 doi: 10.48550/arXiv.2304.10145 – volume: 65 start-page: 74 issue: 1 year: 2023 ident: 2024053108503702500_pgae165-B5 article-title: Rage in the machine: activation of racist content in social media publication-title: Lat Am Polit Soc doi: 10.1017/lap.2022.25 – start-page: 23 volume-title: Handbook of computational approaches to counterterrorism year: 2013 ident: 2024053108503702500_pgae165-B1 doi: 10.1007/978-1-4614-5311-6_2 – volume: 12 start-page: 561 issue: 4 year: 1994 ident: 2024053108503702500_pgae165-B2 article-title: Political science: KEDS—a program for the machine coding of event data publication-title: Soc Sci Comput Rev doi: 10.1177/089443939401200408 – year: 2023 ident: 2024053108503702500_pgae165-B7 doi: 10.48550/arXiv.2304.06588 – ident: 2024053108503702500_pgae165-B13 – ident: 2024053108503702500_pgae165-B18 – year: 2023 ident: 2024053108503702500_pgae165-B8 doi: 10.48550/arXiv.2304.11085 – year: 2023 ident: 2024053108503702500_pgae165-B19 |
| SSID | ssj0002771145 |
| Score | 2.3010314 |
| Snippet | Abstract
While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM... While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification... |
| SourceID | pubmedcentral proquest gale pubmed crossref oup |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | pgae165 |
| SubjectTerms | Accuracy Algorithms Coders Coding Datasets Dictionaries Electronic data processing Engineering Language Large language models Machine learning Methods Minority & ethnic groups Performance assessment Performance evaluation Politics Prompt engineering Social and Political Sciences |
| Title | Coding with the machines: machine-assisted coding of rare event data |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/38765715 https://www.proquest.com/docview/3191897058 https://www.proquest.com/docview/3057072101 https://pubmed.ncbi.nlm.nih.gov/PMC11102067 |
| Volume | 3 |
| WOSCitedRecordID | wos001227610600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: Directory of Open Access Journals customDbUrl: eissn: 2752-6542 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002771145 issn: 2752-6542 databaseCode: DOA dateStart: 20220101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2752-6542 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002771145 issn: 2752-6542 databaseCode: M~E dateStart: 20220101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVASL databaseName: Oxford University Press Open Access Journals customDbUrl: eissn: 2752-6542 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002771145 issn: 2752-6542 databaseCode: TOX dateStart: 20220301 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press – providerCode: PRVPQU databaseName: AUTh Library subscriptions: ProQuest Central customDbUrl: eissn: 2752-6542 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002771145 issn: 2752-6542 databaseCode: BENPR dateStart: 20220301 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Biological Science Database customDbUrl: eissn: 2752-6542 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002771145 issn: 2752-6542 databaseCode: M7P dateStart: 20220301 isFulltext: true titleUrlDefault: http://search.proquest.com/biologicalscijournals providerName: ProQuest – providerCode: PRVPQU databaseName: Earth, Atmospheric & Aquatic Science Database customDbUrl: eissn: 2752-6542 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002771145 issn: 2752-6542 databaseCode: PCBAR dateStart: 20220301 isFulltext: true titleUrlDefault: https://search.proquest.com/eaasdb providerName: ProQuest – providerCode: PRVPQU databaseName: Engineering Database customDbUrl: eissn: 2752-6542 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002771145 issn: 2752-6542 databaseCode: M7S dateStart: 20220301 isFulltext: true titleUrlDefault: http://search.proquest.com providerName: ProQuest – providerCode: PRVPQU databaseName: Environmental Science Database customDbUrl: eissn: 2752-6542 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002771145 issn: 2752-6542 databaseCode: PATMY dateStart: 20220301 isFulltext: true titleUrlDefault: http://search.proquest.com/environmentalscience providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest_Health & Medical Collection customDbUrl: eissn: 2752-6542 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002771145 issn: 2752-6542 databaseCode: 7X7 dateStart: 20220301 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: Publicly Available Content Database customDbUrl: eissn: 2752-6542 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002771145 issn: 2752-6542 databaseCode: PIMPY dateStart: 20220301 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest – providerCode: PRVPQU databaseName: Science Database customDbUrl: eissn: 2752-6542 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002771145 issn: 2752-6542 databaseCode: M2P dateStart: 20220301 isFulltext: true titleUrlDefault: https://search.proquest.com/sciencejournals providerName: ProQuest |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3db9MwED-xjQd4ADa-AqUKEogPKWoax7HDCxqjE0isRDCm8hQ5tgOTIClLxyN_O3eJE1oeJiReTol8iVP557vr-fwzwCNhRGl1KAJtNAtiW6ZBoUUZ8KI0hZJC83YX_8k7MZ_LxSLNXMKtcWWVvU1sDbWpNeXIJwiVqUxFyOXL5Y-ATo2i1VV3hMYW7BBTGeJ859Vsnn0YsiyREBjw8345M2WTZaWIZfK8mSy_KDslp7LmkJxZ3tjrthZy_l05ueaKDq__74-4AddcEOrvd6jZhUu22oOra9SEe7DrJn3jP3XM1M9uwuuDmlydT8lbHyNH_3tbimmbF_1VgLE4Acf4ulOtSx__jlu_JYryqR71Fnw6nB0fvAncMQyB5ixaBaq0hhv060lihNWxVEokaRlzy5LYRowzhY4WjaYl7hhrY6FTVmDYpW2ShMaw27Bd1ZW9C37Ii4InWvKoYHEcqVRqRWwwSRkWqZGFB2E_Frl2HOV0VMa3vFsrZ_kwfLkbPg-eD48sO4KOi5Sf0ADnNHnxvVq5PQj4dUSDle_LdmMLBqkejDY0cdLpjebHCJF_6XDUAyF3xqHJ_6DAg4dDM_VABW-VrfF5NMOCqOvCqQd3OswNvTH0YFxM8eVyA42DAlGGb7ZUp19b6nD0bCER9t-7-Lvuw5UIg7eusHME26uzc_sALuufq9PmbAxbYiFaKcduwo3bXAbKoygjKTr5keSvGWplb4-yz3h3_H7xG4ReQH0 |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Nb9QwEB2VggQcgBYKCwsYiQqoFG02jmMHCaGqpWrVZdVDqXoLju1AJUiWzRbEn-I3MpMvNhwqLj1wW8mOnWye542dmTcAz6WVmTO-9Iw13AtdFnupkZkn0symWkkjqiz-k4mcTtXpaXy0Ar_aXBgKq2xtYmWobWHojHyEUBmrWPpCvZ1986hqFH1dbUto1LA4dD9_4JatfHOwi-93Mwj23h3v7HtNVQHPCB4sPJ05KyzSVBRZ6UyotJZRnIXC8Sh0ARdcI2-gDXAkheJcKE3MU_QijIsi31qO416Bq-hGBH4VKnjUnekEUuL2QrQfT2M-muWaNC3Py9Hsk3ZjorAl-mtIoJdZt-Tg_h2nuUR8e7f_t7_sDtxqXGy2Xa-JNVhx-TrcXBJeXIe1xqSV7GWju_3qLuzuFETkjI6mGfrF7GsVaOrK1-0vD3catCwsM3XXImNzPXesksFiFG17Dz5cyrNtwGpe5O4BMF-kqYiMEkHKwzDQsTKatG6izE9jq9IB-O27T0yjwE6FQL4kdSQATzq4JA1cBrDVXTKr5Ucu6vyCAJWQacJxjW4yLPDuSOQr2VZV2g664AMY9nqiSTG95k2E5L9MOGyBlzSmr0z-oG4Az7pmmoHC-XJX4PVIMpKE-fzxAO7XGO9m48jPQo5xcNVDf9eBBNH7LfnZ50oYHXnbp3IEDy--r6dwff_4_SSZHEwPH8GNAN3UOoR1CKuL-bl7DNfM98VZOX9SLW8GHy97cfwG0oyTjw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Coding+with+the+machines%3A+machine-assisted+coding+of+rare+event+data&rft.jtitle=PNAS+nexus&rft.au=Overos%2C+Henry+David&rft.au=Hlatky%2C+Roman&rft.au=Pathak%2C+Ojashwi&rft.au=Goers%2C+Harriet&rft.date=2024-05-01&rft.issn=2752-6542&rft.eissn=2752-6542&rft.volume=3&rft.issue=5&rft_id=info:doi/10.1093%2Fpnasnexus%2Fpgae165&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_pnasnexus_pgae165 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2752-6542&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2752-6542&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2752-6542&client=summon |