Coding with the machines: machine-assisted coding of rare event data

Abstract While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks—especially in “zer...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:PNAS nexus Ročník 3; číslo 5; s. pgae165
Hlavní autoři: Overos, Henry David, Hlatky, Roman, Pathak, Ojashwi, Goers, Harriet, Gouws-Dewar, Jordan, Smith, Katy, Chew, Keith Padraic, Birnir, Jóhanna K, Liu, Amy H
Médium: Journal Article
Jazyk:angličtina
Vydáno: US Oxford University Press 01.05.2024
Témata:
ISSN:2752-6542, 2752-6542
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Abstract While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks—especially in “zero-shot” applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models’ performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM’s pre-training on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward.
AbstractList While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks—especially in “zero-shot” applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models’ performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM’s pre-training on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward.
While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks--especially in "zero-shot" applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models' performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM's pretraining on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward.
While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks--especially in "zero-shot" applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models' performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM's pretraining on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward. Keywords: machine coding, political event data, GPT, BERT, machine learning
While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks-especially in "zero-shot" applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models' performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM's pre-training on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward.While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks-especially in "zero-shot" applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models' performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM's pre-training on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward.
Abstract While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification showing, for example, that reliability varies greatly by prompt and temperature tuning, across subject areas and tasks—especially in “zero-shot” applications. This paper contributes to the discussion of validation in several different ways. To test the relative performance of supervised and semi-supervised algorithms when coding political data, we compare three models’ performances to each other over multiple iterations for each model and to trained expert coding of data. We also examine changes in performance resulting from prompt engineering and pre-processing of source data. To ameliorate concerns regarding LLM’s pre-training on test data, we assess performance by updating an existing dataset beyond what is publicly available. Overall, we find that only GPT-4 approaches trained expert coders when coding contexts familiar to human coders and codes more consistently across contexts. We conclude by discussing some benefits and drawbacks of machine coding moving forward.
Audience Academic
Author Goers, Harriet
Liu, Amy H
Birnir, Jóhanna K
Pathak, Ojashwi
Gouws-Dewar, Jordan
Chew, Keith Padraic
Smith, Katy
Hlatky, Roman
Overos, Henry David
Author_xml – sequence: 1
  givenname: Henry David
  orcidid: 0000-0001-9804-7752
  surname: Overos
  fullname: Overos, Henry David
– sequence: 2
  givenname: Roman
  orcidid: 0000-0001-8378-2877
  surname: Hlatky
  fullname: Hlatky, Roman
– sequence: 3
  givenname: Ojashwi
  orcidid: 0009-0007-7653-6214
  surname: Pathak
  fullname: Pathak, Ojashwi
– sequence: 4
  givenname: Harriet
  orcidid: 0000-0001-6010-4707
  surname: Goers
  fullname: Goers, Harriet
– sequence: 5
  givenname: Jordan
  orcidid: 0009-0006-3251-055X
  surname: Gouws-Dewar
  fullname: Gouws-Dewar, Jordan
– sequence: 6
  givenname: Katy
  surname: Smith
  fullname: Smith, Katy
– sequence: 7
  givenname: Keith Padraic
  orcidid: 0000-0003-1789-0268
  surname: Chew
  fullname: Chew, Keith Padraic
– sequence: 8
  givenname: Jóhanna K
  orcidid: 0000-0003-1261-8812
  surname: Birnir
  fullname: Birnir, Jóhanna K
  email: jkbirnir@umd.edu
– sequence: 9
  givenname: Amy H
  orcidid: 0000-0001-5380-2849
  surname: Liu
  fullname: Liu, Amy H
BackLink https://www.ncbi.nlm.nih.gov/pubmed/38765715$$D View this record in MEDLINE/PubMed
BookMark eNqNkk9P3DAQxa2KqlDKB-ilitQLBwIeO7aTXiq09J-E1Et7tow92TVK7DROKP329Wp3KaAKIR9s2b_3Rs8zr8leiAEJeQv0FGjDz4ZgUsDbOZ0NS4MgxQtywJRgpRQV27t33idHKV1TSplSAJV4RfZ5raRQIA7IxSI6H5bFbz-timmFRW_sygdMH3an0qTk04SusBs0tsVoRizwBsNUODOZN-Rla7qER9v9kPz8_OnH4mt5-f3Lt8X5ZWkFZ1NpWnTCQS2kdAptVRujZNNWArmskHHBDaJrAJAKJhErZRt-1bDGopTUOX5IPm58h_mqR2dz_dF0ehh9b8Y_OhqvH74Ev9LLeKMBgDIqVXY43jqM8deMadK9Txa7zgSMc9KcCkUVAwoZff8IvY7zGHI-zaGBulFU1P-opelQ-9DGXNiuTfV5nf-8URXITJ3-h8rLYe9t7mvr8_0Dwbv7Se8i7hqXAbUB7BhTGrHV1k9m8nEd3HcaqF5Pib6bEr2dkqyER8qd-VOak40mzsMz8L__j9MH
CitedBy_id crossref_primary_10_20991_allazimuth_1590826
crossref_primary_10_1057_s41599_025_04503_w
crossref_primary_10_1017_S1049096525101248
Cites_doi 10.1177/0022002717719974
10.48550/arXiv.2303.18223
10.1073/pnas.2305016120
10.1017/pan.2018.11
10.1080/21670811.2017.1293487
10.1093/pan/mps028
10.48550/arXiv.2306.00176
10.1093/pnasnexus/pgad355
10.48550/arXiv.1810.04805
10.48550/arXiv.2304.10145
10.1017/lap.2022.25
10.1007/978-1-4614-5311-6_2
10.1177/089443939401200408
10.48550/arXiv.2304.06588
10.48550/arXiv.2304.11085
ContentType Journal Article
Copyright The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences. 2024
The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences.
COPYRIGHT 2024 Oxford University Press
The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences. 2024
– notice: The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences.
– notice: COPYRIGHT 2024 Oxford University Press
– notice: The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID TOX
AAYXX
CITATION
NPM
3V.
7X7
7XB
88I
8FE
8FG
8FH
8FI
8FJ
8FK
ABJCF
ABUWG
AFKRA
ATCPS
AZQEC
BBNVY
BENPR
BGLVJ
BHPHI
BKSAR
CCPQU
DWQXO
FYUFA
GHDGH
GNUQQ
HCIFZ
K9.
L6V
LK8
M0S
M2P
M7P
M7S
PATMY
PCBAR
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
PYCSY
Q9U
7X8
5PM
DOI 10.1093/pnasnexus/pgae165
DatabaseName Open Access: Oxford University Press Open Journals
CrossRef
PubMed
ProQuest Central (Corporate)
ProQuest_Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Science Database (Alumni Edition)
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Natural Science Collection
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Agricultural & Environmental Science Collection (subscription)
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Technology collection
Natural Science Collection
Earth, Atmospheric & Aquatic Science Collection
ProQuest One Community College
ProQuest Central
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
ProQuest Engineering Collection
Biological Sciences
Health & Medical Collection (Alumni Edition)
Science Database
Biological Science Database
Engineering Database
Environmental Science Database
Earth, Atmospheric & Aquatic Science Database
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering collection
Environmental Science Collection
ProQuest Central Basic
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
PubMed
Publicly Available Content Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Central China
ProQuest Central
Earth, Atmospheric & Aquatic Science Collection
ProQuest One Applied & Life Sciences
ProQuest Health & Medical Research Collection
ProQuest Engineering Collection
Health Research Premium Collection
Health and Medicine Complete (Alumni Edition)
Natural Science Collection
ProQuest Central Korea
Agricultural & Environmental Science Collection
Biological Science Collection
ProQuest Central (New)
Engineering Collection
Engineering Database
ProQuest Science Journals (Alumni Edition)
ProQuest Biological Science Collection
ProQuest Central Basic
ProQuest Science Journals
ProQuest One Academic Eastern Edition
Earth, Atmospheric & Aquatic Science Database
ProQuest Hospital Collection
ProQuest Technology Collection
Health Research Premium Collection (Alumni)
Biological Science Database
ProQuest SciTech Collection
ProQuest Hospital Collection (Alumni)
Environmental Science Collection
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
Environmental Science Database
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList
CrossRef


MEDLINE - Academic
PubMed

Publicly Available Content Database
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: TOX
  name: Oxford University Press Open Access Journals
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
– sequence: 3
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
Engineering
EISSN 2752-6542
ExternalDocumentID PMC11102067
A800097416
38765715
10_1093_pnasnexus_pgae165
10.1093/pnasnexus/pgae165
Genre Journal Article
GeographicLocations United States
GeographicLocations_xml – name: United States
GroupedDBID 0R~
53G
AAPXW
AAVAP
ABEJV
ABPTD
ABXVV
ALMA_UNASSIGNED_HOLDINGS
AMNDL
GROUPED_DOAJ
IAO
IHR
INH
ITC
M~E
NQS
OK1
ROX
RPM
TOX
7X7
88I
8FI
8FJ
AAYXX
ABGNP
ABJCF
ABUWG
AFFHD
AFKRA
ATCPS
AZQEC
BBNVY
BENPR
BGLVJ
BHPHI
BKSAR
CCPQU
CITATION
DWQXO
FYUFA
GNUQQ
H13
HCIFZ
HMCUK
M2P
M7P
M7S
PATMY
PCBAR
PHGZM
PHGZT
PIMPY
PQGLB
PTHSS
PYCSY
UKHRP
NPM
3V.
7XB
8FE
8FG
8FH
8FK
K9.
L6V
LK8
PJZUB
PKEHL
PPXIY
PQEST
PQQKQ
PQUKI
PRINS
Q9U
7X8
5PM
ID FETCH-LOGICAL-c532t-afed5d18566d7ec48aa769f45e364e2353aeed911e0526ee47c93b929ce660dd3
IEDL.DBID BENPR
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001227610600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2752-6542
IngestDate Tue Nov 04 02:05:59 EST 2025
Thu Jul 10 23:12:00 EDT 2025
Tue Oct 07 07:09:06 EDT 2025
Tue Nov 11 11:02:35 EST 2025
Tue Nov 04 18:20:47 EST 2025
Wed Feb 19 02:05:45 EST 2025
Sat Nov 29 02:16:56 EST 2025
Tue Nov 18 22:12:47 EST 2025
Tue Jan 21 07:39:11 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Keywords GPT
BERT
machine coding
machine learning
political event data
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
https://creativecommons.org/licenses/by/4.0
The Author(s) 2024. Published by Oxford University Press on behalf of National Academy of Sciences.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c532t-afed5d18566d7ec48aa769f45e364e2353aeed911e0526ee47c93b929ce660dd3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Competing Interest: The authors declare no competing interest.
ORCID 0000-0001-8378-2877
0009-0007-7653-6214
0000-0003-1789-0268
0000-0001-9804-7752
0000-0001-6010-4707
0000-0001-5380-2849
0009-0006-3251-055X
0000-0003-1261-8812
OpenAccessLink https://www.proquest.com/docview/3191897058?pq-origsite=%requestingapplication%
PMID 38765715
PQID 3191897058
PQPubID 7215252
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_11102067
proquest_miscellaneous_3057072101
proquest_journals_3191897058
gale_infotracmisc_A800097416
gale_infotracacademiconefile_A800097416
pubmed_primary_38765715
crossref_citationtrail_10_1093_pnasnexus_pgae165
crossref_primary_10_1093_pnasnexus_pgae165
oup_primary_10_1093_pnasnexus_pgae165
PublicationCentury 2000
PublicationDate 2024-05-01
PublicationDateYYYYMMDD 2024-05-01
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-05-01
  day: 01
PublicationDecade 2020
PublicationPlace US
PublicationPlace_xml – name: US
– name: England
– name: Los Angeles
PublicationTitle PNAS nexus
PublicationTitleAlternate PNAS Nexus
PublicationYear 2024
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Watanabe (2024053108503702500_pgae165-B16) 2018; 6
Dong (2024053108503702500_pgae165-B21) 2023
Reiss (2024053108503702500_pgae165-B8) 2023
Törnberg (2024053108503702500_pgae165-B7) 2023
OpenAI (2024053108503702500_pgae165-B13)
OpenAI (2024053108503702500_pgae165-B18)
Napp (2024053108503702500_pgae165-B4) 2023; 2
Schrodt (2024053108503702500_pgae165-B2) 1994; 12
Thapa (2024053108503702500_pgae165-B12) 2023
Devlin (2024053108503702500_pgae165-B17) 2019
Zhu (2024053108503702500_pgae165-B10) 2023
Grimmer (2024053108503702500_pgae165-B15) 2013; 21
Green (2024053108503702500_pgae165-B22) 2019; 27
Gilardi (2024053108503702500_pgae165-B6) 2023; 120
Zhao (2024053108503702500_pgae165-B3) 2023
Birnir (2024053108503702500_pgae165-B14) 2018; 62
Vera (2024053108503702500_pgae165-B5) 2023; 65
Pangakis (2024053108503702500_pgae165-B9) 2023
Ollion (2024053108503702500_pgae165-B11)
Wickham (2024053108503702500_pgae165-B20) 2022
Schrodt (2024053108503702500_pgae165-B1) 2013
White (2024053108503702500_pgae165-B19) 2023
References_xml – volume: 62
  start-page: 203
  issue: 1
  year: 2018
  ident: 2024053108503702500_pgae165-B14
  article-title: Introducing the AMAR (all minorities at risk) data
  publication-title: J Conflict Resol
  doi: 10.1177/0022002717719974
– year: 2023
  ident: 2024053108503702500_pgae165-B3
  doi: 10.48550/arXiv.2303.18223
– volume: 120
  issue: 30
  year: 2023
  ident: 2024053108503702500_pgae165-B6
  article-title: ChatGPT outperforms crowd workers for text-annotation tasks
  publication-title: Proc Natl Acad Sci U S A
  doi: 10.1073/pnas.2305016120
– volume: 27
  start-page: 223
  issue: 2
  year: 2019
  ident: 2024053108503702500_pgae165-B22
  article-title: Machine learning human rights and wrongs: how the successes and failures of supervised learning algorithms can inform the debate about information effects
  publication-title: Polit Anal
  doi: 10.1017/pan.2018.11
– year: 2023
  ident: 2024053108503702500_pgae165-B21
– ident: 2024053108503702500_pgae165-B11
– year: 2023
  ident: 2024053108503702500_pgae165-B12
– volume: 6
  start-page: 294
  issue: 3
  year: 2018
  ident: 2024053108503702500_pgae165-B16
  article-title: Newsmap: a semi-supervised approach to geographical news classification
  publication-title: Digital Journalism
  doi: 10.1080/21670811.2017.1293487
– volume: 21
  start-page: 267
  issue: 3
  year: 2013
  ident: 2024053108503702500_pgae165-B15
  article-title: Text as data: the promise and pitfalls of automatic content analysis methods for political texts
  publication-title: Polit Anal
  doi: 10.1093/pan/mps028
– year: 2023
  ident: 2024053108503702500_pgae165-B9
  article-title: Automated annotation with generative AI requires validation
  doi: 10.48550/arXiv.2306.00176
– volume: 2
  start-page: pgad355
  issue: 11
  year: 2023
  ident: 2024053108503702500_pgae165-B4
  article-title: Gender stereotypes embedded in natural language are stronger in more economically developed and individualistic countries
  publication-title: PNAS Nexus
  doi: 10.1093/pnasnexus/pgad355
– year: 2019
  ident: 2024053108503702500_pgae165-B17
  doi: 10.48550/arXiv.1810.04805
– year: 2022
  ident: 2024053108503702500_pgae165-B20
– year: 2023
  ident: 2024053108503702500_pgae165-B10
  doi: 10.48550/arXiv.2304.10145
– volume: 65
  start-page: 74
  issue: 1
  year: 2023
  ident: 2024053108503702500_pgae165-B5
  article-title: Rage in the machine: activation of racist content in social media
  publication-title: Lat Am Polit Soc
  doi: 10.1017/lap.2022.25
– start-page: 23
  volume-title: Handbook of computational approaches to counterterrorism
  year: 2013
  ident: 2024053108503702500_pgae165-B1
  doi: 10.1007/978-1-4614-5311-6_2
– volume: 12
  start-page: 561
  issue: 4
  year: 1994
  ident: 2024053108503702500_pgae165-B2
  article-title: Political science: KEDS—a program for the machine coding of event data
  publication-title: Soc Sci Comput Rev
  doi: 10.1177/089443939401200408
– year: 2023
  ident: 2024053108503702500_pgae165-B7
  doi: 10.48550/arXiv.2304.06588
– ident: 2024053108503702500_pgae165-B13
– ident: 2024053108503702500_pgae165-B18
– year: 2023
  ident: 2024053108503702500_pgae165-B8
  doi: 10.48550/arXiv.2304.11085
– year: 2023
  ident: 2024053108503702500_pgae165-B19
SSID ssj0002771145
Score 2.3010314
Snippet Abstract While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM...
While machine coding of data has dramatically advanced in recent years, the literature raises significant concerns about validation of LLM classification...
SourceID pubmedcentral
proquest
gale
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage pgae165
SubjectTerms Accuracy
Algorithms
Coders
Coding
Datasets
Dictionaries
Electronic data processing
Engineering
Language
Large language models
Machine learning
Methods
Minority & ethnic groups
Performance assessment
Performance evaluation
Politics
Prompt engineering
Social and Political Sciences
Title Coding with the machines: machine-assisted coding of rare event data
URI https://www.ncbi.nlm.nih.gov/pubmed/38765715
https://www.proquest.com/docview/3191897058
https://www.proquest.com/docview/3057072101
https://pubmed.ncbi.nlm.nih.gov/PMC11102067
Volume 3
WOSCitedRecordID wos001227610600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: Directory of Open Access Journals
  customDbUrl:
  eissn: 2752-6542
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002771145
  issn: 2752-6542
  databaseCode: DOA
  dateStart: 20220101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2752-6542
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002771145
  issn: 2752-6542
  databaseCode: M~E
  dateStart: 20220101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVASL
  databaseName: Oxford University Press Open Access Journals
  customDbUrl:
  eissn: 2752-6542
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002771145
  issn: 2752-6542
  databaseCode: TOX
  dateStart: 20220301
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
– providerCode: PRVPQU
  databaseName: AUTh Library subscriptions: ProQuest Central
  customDbUrl:
  eissn: 2752-6542
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002771145
  issn: 2752-6542
  databaseCode: BENPR
  dateStart: 20220301
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Biological Science Database
  customDbUrl:
  eissn: 2752-6542
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002771145
  issn: 2752-6542
  databaseCode: M7P
  dateStart: 20220301
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/biologicalscijournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Earth, Atmospheric & Aquatic Science Database
  customDbUrl:
  eissn: 2752-6542
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002771145
  issn: 2752-6542
  databaseCode: PCBAR
  dateStart: 20220301
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/eaasdb
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Engineering Database
  customDbUrl:
  eissn: 2752-6542
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002771145
  issn: 2752-6542
  databaseCode: M7S
  dateStart: 20220301
  isFulltext: true
  titleUrlDefault: http://search.proquest.com
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Environmental Science Database
  customDbUrl:
  eissn: 2752-6542
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002771145
  issn: 2752-6542
  databaseCode: PATMY
  dateStart: 20220301
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/environmentalscience
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest_Health & Medical Collection
  customDbUrl:
  eissn: 2752-6542
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002771145
  issn: 2752-6542
  databaseCode: 7X7
  dateStart: 20220301
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database
  customDbUrl:
  eissn: 2752-6542
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002771145
  issn: 2752-6542
  databaseCode: PIMPY
  dateStart: 20220301
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Science Database
  customDbUrl:
  eissn: 2752-6542
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002771145
  issn: 2752-6542
  databaseCode: M2P
  dateStart: 20220301
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/sciencejournals
  providerName: ProQuest
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3db9MwED-xjQd4ADa-AqUKEogPKWoax7HDCxqjE0isRDCm8hQ5tgOTIClLxyN_O3eJE1oeJiReTol8iVP557vr-fwzwCNhRGl1KAJtNAtiW6ZBoUUZ8KI0hZJC83YX_8k7MZ_LxSLNXMKtcWWVvU1sDbWpNeXIJwiVqUxFyOXL5Y-ATo2i1VV3hMYW7BBTGeJ859Vsnn0YsiyREBjw8345M2WTZaWIZfK8mSy_KDslp7LmkJxZ3tjrthZy_l05ueaKDq__74-4AddcEOrvd6jZhUu22oOra9SEe7DrJn3jP3XM1M9uwuuDmlydT8lbHyNH_3tbimmbF_1VgLE4Acf4ulOtSx__jlu_JYryqR71Fnw6nB0fvAncMQyB5ixaBaq0hhv060lihNWxVEokaRlzy5LYRowzhY4WjaYl7hhrY6FTVmDYpW2ShMaw27Bd1ZW9C37Ii4InWvKoYHEcqVRqRWwwSRkWqZGFB2E_Frl2HOV0VMa3vFsrZ_kwfLkbPg-eD48sO4KOi5Sf0ADnNHnxvVq5PQj4dUSDle_LdmMLBqkejDY0cdLpjebHCJF_6XDUAyF3xqHJ_6DAg4dDM_VABW-VrfF5NMOCqOvCqQd3OswNvTH0YFxM8eVyA42DAlGGb7ZUp19b6nD0bCER9t-7-Lvuw5UIg7eusHME26uzc_sALuufq9PmbAxbYiFaKcduwo3bXAbKoygjKTr5keSvGWplb4-yz3h3_H7xG4ReQH0
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Nb9QwEB2VggQcgBYKCwsYiQqoFG02jmMHCaGqpWrVZdVDqXoLju1AJUiWzRbEn-I3MpMvNhwqLj1wW8mOnWye542dmTcAz6WVmTO-9Iw13AtdFnupkZkn0symWkkjqiz-k4mcTtXpaXy0Ar_aXBgKq2xtYmWobWHojHyEUBmrWPpCvZ1986hqFH1dbUto1LA4dD9_4JatfHOwi-93Mwj23h3v7HtNVQHPCB4sPJ05KyzSVBRZ6UyotJZRnIXC8Sh0ARdcI2-gDXAkheJcKE3MU_QijIsi31qO416Bq-hGBH4VKnjUnekEUuL2QrQfT2M-muWaNC3Py9Hsk3ZjorAl-mtIoJdZt-Tg_h2nuUR8e7f_t7_sDtxqXGy2Xa-JNVhx-TrcXBJeXIe1xqSV7GWju_3qLuzuFETkjI6mGfrF7GsVaOrK1-0vD3catCwsM3XXImNzPXesksFiFG17Dz5cyrNtwGpe5O4BMF-kqYiMEkHKwzDQsTKatG6izE9jq9IB-O27T0yjwE6FQL4kdSQATzq4JA1cBrDVXTKr5Ucu6vyCAJWQacJxjW4yLPDuSOQr2VZV2g664AMY9nqiSTG95k2E5L9MOGyBlzSmr0z-oG4Az7pmmoHC-XJX4PVIMpKE-fzxAO7XGO9m48jPQo5xcNVDf9eBBNH7LfnZ50oYHXnbp3IEDy--r6dwff_4_SSZHEwPH8GNAN3UOoR1CKuL-bl7DNfM98VZOX9SLW8GHy97cfwG0oyTjw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Coding+with+the+machines%3A+machine-assisted+coding+of+rare+event+data&rft.jtitle=PNAS+nexus&rft.au=Overos%2C+Henry+David&rft.au=Hlatky%2C+Roman&rft.au=Pathak%2C+Ojashwi&rft.au=Goers%2C+Harriet&rft.date=2024-05-01&rft.issn=2752-6542&rft.eissn=2752-6542&rft.volume=3&rft.issue=5&rft_id=info:doi/10.1093%2Fpnasnexus%2Fpgae165&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_pnasnexus_pgae165
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2752-6542&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2752-6542&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2752-6542&client=summon