Individual vs. Collaborative Methods of Crowdsourced Transcription

While online crowdsourced text transcription projects have proliferated in the last decade, there is a need within the broader field to understand differences in project outcomes as they relate to task design, as well as to experiment with different models of online crowdsourced transcription that h...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of data mining and digital humanities Jg. Special Issue on Collecting,...
Hauptverfasser: Blickhan, Samantha, Krawczyk, Coleman, Hanson, Daniel, Boyer, Amy, Simenstad, Andrea, van Hyning, Victoria
Format: Journal Article
Sprache:Englisch
Veröffentlicht: INRIA 03.12.2019
Nicolas Turenne
Schlagworte:
ISSN:2416-5999, 2416-5999
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract While online crowdsourced text transcription projects have proliferated in the last decade, there is a need within the broader field to understand differences in project outcomes as they relate to task design, as well as to experiment with different models of online crowdsourced transcription that have not yet been explored. The experiment discussed in this paper involves the evaluation of newly-built tools on the Zooniverse.org crowdsourcing platform, attempting to answer the research question: "Does the current Zooniverse methodology of multiple independent transcribers and aggregation of results render higher-quality outcomes than allowing volunteers to see previous transcriptions and/or markings by other users? How does each methodology impact the quality and depth of analysis and participation?" To answer these questions, the Zooniverse team ran an A/B experiment on the project Anti-Slavery Manuscripts at the Boston Public Library. This paper will share results of this study, and also describe the process of designing the experiment and the metrics used to evaluate each transcription method. These include the comparison of aggregate transcription results with ground truth data; evaluation of annotation methods; the time it took for volunteers to complete transcribing each dataset; and the level of engagement with other project elements such as posting on the message board or reading supporting documentation. Particular focus will be given to the (at times) competing goals of data quality, efficiency, volunteer engagement, and user retention, all of which are of high importance for projects that focus on data from galleries, libraries, archives and museums. Ultimately, this paper aims to provide a model for impactful, intentional design and study of online crowdsourcing transcription methods, as well as shed light on the associations between project design, methodology and outcomes.
AbstractList While online crowdsourced text transcription projects have proliferated in the last decade, there is a need within the broader field to understand differences in project outcomes as they relate to task design, as well as to experiment with different models of online crowdsourced transcription that have not yet been explored. The experiment discussed in this paper involves the evaluation of newly-built tools on the Zooniverse.org crowdsourcing platform, attempting to answer the research question: "Does the current Zooniverse methodology of multiple independent transcribers and aggregation of results render higher-quality outcomes than allowing volunteers to see previous transcriptions and/or markings by other users? How does each methodology impact the quality and depth of analysis and participation?" To answer these questions, the Zooniverse team ran an A/B experiment on the project Anti-Slavery Manuscripts at the Boston Public Library. This paper will share results of this study, and also describe the process of designing the experiment and the metrics used to evaluate each transcription method. These include the comparison of aggregate transcription results with ground truth data; evaluation of annotation methods; the time it took for volunteers to complete transcribing each dataset; and the level of engagement with other project elements such as posting on the message board or reading supporting documentation. Particular focus will be given to the (at times) competing goals of data quality, efficiency, volunteer engagement, and user retention, all of which are of high importance for projects that focus on data from galleries, libraries, archives and museums. Ultimately, this paper aims to provide a model for impactful, intentional design and study of online crowdsourcing transcription methods, as well as shed light on the associations between project design, methodology and outcomes.
Author Krawczyk, Coleman
Boyer, Amy
van Hyning, Victoria
Hanson, Daniel
Simenstad, Andrea
Blickhan, Samantha
Author_xml – sequence: 1
  givenname: Samantha
  orcidid: 0000-0002-3775-5744
  surname: Blickhan
  fullname: Blickhan, Samantha
  organization: Adler Planetarium, Adler Planetarium [Chicago]
– sequence: 2
  givenname: Coleman
  surname: Krawczyk
  fullname: Krawczyk, Coleman
  organization: University of Portsmouth
– sequence: 3
  givenname: Daniel
  surname: Hanson
  fullname: Hanson, Daniel
  organization: University of Minnesota, University of Minnesota [Twin Cities]
– sequence: 4
  givenname: Amy
  surname: Boyer
  fullname: Boyer, Amy
  organization: Adler Planetarium
– sequence: 5
  givenname: Andrea
  surname: Simenstad
  fullname: Simenstad, Andrea
  organization: University of Minnesota, University of Minnesota [Twin Cities]
– sequence: 6
  givenname: Victoria
  surname: van Hyning
  fullname: van Hyning, Victoria
  organization: Library of Congress
BackLink https://hal.science/hal-02280013$$DView record in HAL
BookMark eNpVkEtLAzEUhYNUsNau_AOzFZmax2QmWdZBbaHipq7DzWNsynRSknbEf28fIrq6l8M53-K7RoMudA6hW4InRUmleFjbjV1NeMXlBRrSgpQ5l1IO_vxXaJzSGmNMeCE450P0OO-s773dQ5v1aZLVoW1Bhwg737vs1e1WwaYsNFkdw6dNYR-Ns9kyQpdM9NudD90NumygTW78c0fo_flpWc_yxdvLvJ4uckMqJnNpHEgoNaGGMVGKqjKYE8E00RoENFYIigUYXYKklWamwhowLjRueHMcjdD8zLUB1mob_Qbilwrg1SkI8UNB3HnTOoUNx5QT6kRpCqa1loUGghtowJSCugPr7sxaQfsPNZsu1DHDlIqDJtbTQ_f-3DUxpBRd8zsgWJ3Mq5N5dTTPvgGjOnhC
ContentType Journal Article
Copyright Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
1XC
BXJBU
IHQJB
VOOES
DOA
DOI 10.46298/jdmdh.5759
DatabaseName CrossRef
Hyper Article en Ligne (HAL)
HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société
HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société (Open Access)
Hyper Article en Ligne (HAL) (Open Access)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList CrossRef


Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2416-5999
ExternalDocumentID oai_doaj_org_article_0c502512e86c43bbb94ba10fafac682e
oai:HAL:hal-02280013v2
10_46298_jdmdh_5759
GroupedDBID 5VS
AAFWJ
AAYXX
ADBBV
ADQAK
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
FRP
GROUPED_DOAJ
KQ8
M~E
OK1
1XC
BXJBU
IHQJB
VOOES
ID FETCH-LOGICAL-c1739-9cea9a6b12c3386877c05183b1bba8afd88208acb6a927b3c70ba004b0f5f12c3
IEDL.DBID DOA
ISSN 2416-5999
IngestDate Fri Oct 03 12:46:06 EDT 2025
Tue Oct 14 20:23:17 EDT 2025
Sat Nov 29 04:10:29 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1739-9cea9a6b12c3386877c05183b1bba8afd88208acb6a927b3c70ba004b0f5f12c3
ORCID 0000-0002-3775-5744
OpenAccessLink https://doaj.org/article/0c502512e86c43bbb94ba10fafac682e
ParticipantIDs doaj_primary_oai_doaj_org_article_0c502512e86c43bbb94ba10fafac682e
hal_primary_oai_HAL_hal_02280013v2
crossref_primary_10_46298_jdmdh_5759
PublicationCentury 2000
PublicationDate 2019-12-03
PublicationDateYYYYMMDD 2019-12-03
PublicationDate_xml – month: 12
  year: 2019
  text: 2019-12-03
  day: 03
PublicationDecade 2010
PublicationTitle Journal of data mining and digital humanities
PublicationYear 2019
Publisher INRIA
Nicolas Turenne
Publisher_xml – name: INRIA
– name: Nicolas Turenne
SSID ssj0001548555
Score 2.0871828
Snippet While online crowdsourced text transcription projects have proliferated in the last decade, there is a need within the broader field to understand differences...
SourceID doaj
hal
crossref
SourceType Open Website
Open Access Repository
Index Database
SubjectTerms [shs.info]humanities and social sciences/library and information sciences
[shs.museo]humanities and social sciences/cultural heritage and museology
[shs.stat]humanities and social sciences/methods and statistics
Cultural heritage and museology
Humanities and Social Sciences
Library and information sciences
Methods and statistics
Title Individual vs. Collaborative Methods of Crowdsourced Transcription
URI https://hal.science/hal-02280013
https://doaj.org/article/0c502512e86c43bbb94ba10fafac682e
Volume Special Issue on Collecting,...
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: DOA
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: M~E
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV05b8IwGLUq1KFL76r0klWxBhwnvkZAICoB6tBKbJGviFYqVEAz9rfXdhIKU5cuGaxcep_t732W_R4ALaIV0dxVqpQzFqUMk0jkxkTBbFtyLKgOOrNjNp3y2Uw871h9-T1hpTxwCVwHaeJpMLac6jRRSolUyRjlMpeacmz97IuY2CmmyvPBXvSElAfyUooF77ybDzNvez_KvRQUlPpdYpnXC6khsQxPwXHFCGG3_JMzcGAX5-CkdluA1eC7AL2n7dkpWKzbsP8bwcLCSXCCXsNlDvuusjbloryBIRfVM8MleB0OXvqjqHJAiHTMEhEJbaWQVMVYu1LSw6ndIOKJipWSXObG8WPEpVZUCsxUohlS0nV7hXKS-4euQGOxXNhrAFHKUkolN4qKlFrCHbMwmlosjHaUDjVBqwYl-yyFLjJXIATssoBd5rFrgp4HbHuLV6cODS5mWRWz7K-YNcGjg3vvHaPuOPNtQZ7H0dIC3_zHl27BkWM4wfABJXegsVl92XtwqIvN23r1EPqMu06-Bz8Vcsf1
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Individual+vs.+Collaborative+Methods+of+Crowdsourced+Transcription&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Blickhan%2C+Samantha&rft.au=Krawczyk%2C+Coleman&rft.au=Hanson%2C+Daniel&rft.au=Boyer%2C+Amy&rft.date=2019-12-03&rft.issn=2416-5999&rft.eissn=2416-5999&rft.volume=Special+Issue+on+Collecting%2C...&rft_id=info:doi/10.46298%2Fjdmdh.5759&rft.externalDBID=n%2Fa&rft.externalDocID=10_46298_jdmdh_5759
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon