Individual vs. Collaborative Methods of Crowdsourced Transcription
While online crowdsourced text transcription projects have proliferated in the last decade, there is a need within the broader field to understand differences in project outcomes as they relate to task design, as well as to experiment with different models of online crowdsourced transcription that h...
Gespeichert in:
| Veröffentlicht in: | Journal of data mining and digital humanities Jg. Special Issue on Collecting,... |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
INRIA
03.12.2019
Nicolas Turenne |
| Schlagworte: | |
| ISSN: | 2416-5999, 2416-5999 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | While online crowdsourced text transcription projects have proliferated in the last decade, there is a need within the broader field to understand differences in project outcomes as they relate to task design, as well as to experiment with different models of online crowdsourced transcription that have not yet been explored. The experiment discussed in this paper involves the evaluation of newly-built tools on the Zooniverse.org crowdsourcing platform, attempting to answer the research question: "Does the current Zooniverse methodology of multiple independent transcribers and aggregation of results render higher-quality outcomes than allowing volunteers to see previous transcriptions and/or markings by other users? How does each methodology impact the quality and depth of analysis and participation?" To answer these questions, the Zooniverse team ran an A/B experiment on the project Anti-Slavery Manuscripts at the Boston Public Library. This paper will share results of this study, and also describe the process of designing the experiment and the metrics used to evaluate each transcription method. These include the comparison of aggregate transcription results with ground truth data; evaluation of annotation methods; the time it took for volunteers to complete transcribing each dataset; and the level of engagement with other project elements such as posting on the message board or reading supporting documentation. Particular focus will be given to the (at times) competing goals of data quality, efficiency, volunteer engagement, and user retention, all of which are of high importance for projects that focus on data from galleries, libraries, archives and museums. Ultimately, this paper aims to provide a model for impactful, intentional design and study of online crowdsourcing transcription methods, as well as shed light on the associations between project design, methodology and outcomes. |
|---|---|
| AbstractList | While online crowdsourced text transcription projects have proliferated in the last decade, there is a need within the broader field to understand differences in project outcomes as they relate to task design, as well as to experiment with different models of online crowdsourced transcription that have not yet been explored. The experiment discussed in this paper involves the evaluation of newly-built tools on the Zooniverse.org crowdsourcing platform, attempting to answer the research question: "Does the current Zooniverse methodology of multiple independent transcribers and aggregation of results render higher-quality outcomes than allowing volunteers to see previous transcriptions and/or markings by other users? How does each methodology impact the quality and depth of analysis and participation?" To answer these questions, the Zooniverse team ran an A/B experiment on the project Anti-Slavery Manuscripts at the Boston Public Library. This paper will share results of this study, and also describe the process of designing the experiment and the metrics used to evaluate each transcription method. These include the comparison of aggregate transcription results with ground truth data; evaluation of annotation methods; the time it took for volunteers to complete transcribing each dataset; and the level of engagement with other project elements such as posting on the message board or reading supporting documentation. Particular focus will be given to the (at times) competing goals of data quality, efficiency, volunteer engagement, and user retention, all of which are of high importance for projects that focus on data from galleries, libraries, archives and museums. Ultimately, this paper aims to provide a model for impactful, intentional design and study of online crowdsourcing transcription methods, as well as shed light on the associations between project design, methodology and outcomes. |
| Author | Krawczyk, Coleman Boyer, Amy van Hyning, Victoria Hanson, Daniel Simenstad, Andrea Blickhan, Samantha |
| Author_xml | – sequence: 1 givenname: Samantha orcidid: 0000-0002-3775-5744 surname: Blickhan fullname: Blickhan, Samantha organization: Adler Planetarium, Adler Planetarium [Chicago] – sequence: 2 givenname: Coleman surname: Krawczyk fullname: Krawczyk, Coleman organization: University of Portsmouth – sequence: 3 givenname: Daniel surname: Hanson fullname: Hanson, Daniel organization: University of Minnesota, University of Minnesota [Twin Cities] – sequence: 4 givenname: Amy surname: Boyer fullname: Boyer, Amy organization: Adler Planetarium – sequence: 5 givenname: Andrea surname: Simenstad fullname: Simenstad, Andrea organization: University of Minnesota, University of Minnesota [Twin Cities] – sequence: 6 givenname: Victoria surname: van Hyning fullname: van Hyning, Victoria organization: Library of Congress |
| BackLink | https://hal.science/hal-02280013$$DView record in HAL |
| BookMark | eNpVkEtLAzEUhYNUsNau_AOzFZmax2QmWdZBbaHipq7DzWNsynRSknbEf28fIrq6l8M53-K7RoMudA6hW4InRUmleFjbjV1NeMXlBRrSgpQ5l1IO_vxXaJzSGmNMeCE450P0OO-s773dQ5v1aZLVoW1Bhwg737vs1e1WwaYsNFkdw6dNYR-Ns9kyQpdM9NudD90NumygTW78c0fo_flpWc_yxdvLvJ4uckMqJnNpHEgoNaGGMVGKqjKYE8E00RoENFYIigUYXYKklWamwhowLjRueHMcjdD8zLUB1mob_Qbilwrg1SkI8UNB3HnTOoUNx5QT6kRpCqa1loUGghtowJSCugPr7sxaQfsPNZsu1DHDlIqDJtbTQ_f-3DUxpBRd8zsgWJ3Mq5N5dTTPvgGjOnhC |
| ContentType | Journal Article |
| Copyright | Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | AAYXX CITATION 1XC BXJBU IHQJB VOOES DOA |
| DOI | 10.46298/jdmdh.5759 |
| DatabaseName | CrossRef Hyper Article en Ligne (HAL) HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société (Open Access) Hyper Article en Ligne (HAL) (Open Access) DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2416-5999 |
| ExternalDocumentID | oai_doaj_org_article_0c502512e86c43bbb94ba10fafac682e oai:HAL:hal-02280013v2 10_46298_jdmdh_5759 |
| GroupedDBID | 5VS AAFWJ AAYXX ADBBV ADQAK AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION FRP GROUPED_DOAJ KQ8 M~E OK1 1XC BXJBU IHQJB VOOES |
| ID | FETCH-LOGICAL-c1739-9cea9a6b12c3386877c05183b1bba8afd88208acb6a927b3c70ba004b0f5f12c3 |
| IEDL.DBID | DOA |
| ISSN | 2416-5999 |
| IngestDate | Fri Oct 03 12:46:06 EDT 2025 Tue Oct 14 20:23:17 EDT 2025 Sat Nov 29 04:10:29 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1739-9cea9a6b12c3386877c05183b1bba8afd88208acb6a927b3c70ba004b0f5f12c3 |
| ORCID | 0000-0002-3775-5744 |
| OpenAccessLink | https://doaj.org/article/0c502512e86c43bbb94ba10fafac682e |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_0c502512e86c43bbb94ba10fafac682e hal_primary_oai_HAL_hal_02280013v2 crossref_primary_10_46298_jdmdh_5759 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-12-03 |
| PublicationDateYYYYMMDD | 2019-12-03 |
| PublicationDate_xml | – month: 12 year: 2019 text: 2019-12-03 day: 03 |
| PublicationDecade | 2010 |
| PublicationTitle | Journal of data mining and digital humanities |
| PublicationYear | 2019 |
| Publisher | INRIA Nicolas Turenne |
| Publisher_xml | – name: INRIA – name: Nicolas Turenne |
| SSID | ssj0001548555 |
| Score | 2.0871828 |
| Snippet | While online crowdsourced text transcription projects have proliferated in the last decade, there is a need within the broader field to understand differences... |
| SourceID | doaj hal crossref |
| SourceType | Open Website Open Access Repository Index Database |
| SubjectTerms | [shs.info]humanities and social sciences/library and information sciences [shs.museo]humanities and social sciences/cultural heritage and museology [shs.stat]humanities and social sciences/methods and statistics Cultural heritage and museology Humanities and Social Sciences Library and information sciences Methods and statistics |
| Title | Individual vs. Collaborative Methods of Crowdsourced Transcription |
| URI | https://hal.science/hal-02280013 https://doaj.org/article/0c502512e86c43bbb94ba10fafac682e |
| Volume | Special Issue on Collecting,... |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: DOA dateStart: 20140101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: M~E dateStart: 20140101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV05b8IwGLUq1KFL76r0klWxBhwnvkZAICoB6tBKbJGviFYqVEAz9rfXdhIKU5cuGaxcep_t732W_R4ALaIV0dxVqpQzFqUMk0jkxkTBbFtyLKgOOrNjNp3y2Uw871h9-T1hpTxwCVwHaeJpMLac6jRRSolUyRjlMpeacmz97IuY2CmmyvPBXvSElAfyUooF77ybDzNvez_KvRQUlPpdYpnXC6khsQxPwXHFCGG3_JMzcGAX5-CkdluA1eC7AL2n7dkpWKzbsP8bwcLCSXCCXsNlDvuusjbloryBIRfVM8MleB0OXvqjqHJAiHTMEhEJbaWQVMVYu1LSw6ndIOKJipWSXObG8WPEpVZUCsxUohlS0nV7hXKS-4euQGOxXNhrAFHKUkolN4qKlFrCHbMwmlosjHaUDjVBqwYl-yyFLjJXIATssoBd5rFrgp4HbHuLV6cODS5mWRWz7K-YNcGjg3vvHaPuOPNtQZ7H0dIC3_zHl27BkWM4wfABJXegsVl92XtwqIvN23r1EPqMu06-Bz8Vcsf1 |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Individual+vs.+Collaborative+Methods+of+Crowdsourced+Transcription&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Blickhan%2C+Samantha&rft.au=Krawczyk%2C+Coleman&rft.au=Hanson%2C+Daniel&rft.au=Boyer%2C+Amy&rft.date=2019-12-03&rft.issn=2416-5999&rft.eissn=2416-5999&rft.volume=Special+Issue+on+Collecting%2C...&rft_id=info:doi/10.46298%2Fjdmdh.5759&rft.externalDBID=n%2Fa&rft.externalDocID=10_46298_jdmdh_5759 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon |