The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique
The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most si...
Saved in:
| Published in: | Journal of data mining and digital humanities Vol. Historical Documents and... |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Nicolas Turenne
06.12.2023
|
| Subjects: | |
| ISSN: | 2416-5999, 2416-5999 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The arrival of handwriting recognition technologies offers new possibilities
for research in heritage studies. However, it is now necessary to reflect on
the experiences and the practices developed by research teams. Our use of the
Transkribus platform since 2018 has led us to search for the most significant
ways to improve the performance of our handwritten text recognition (HTR)
models which are made to transcribe French handwriting dating from the 17th
century. This article therefore reports on the impacts of creating transcribing
protocols, using the language model at full scale and determining the best way
to use base models in order to help increase the performance of HTR models.
Combining all of these elements can indeed increase the performance of a single
model by more than 20% (reaching a Character Error Rate below 5%). This article
also discusses some challenges regarding the collaborative nature of HTR
platforms such as Transkribus and the way researchers can share their data
generated in the process of creating or training handwritten text recognition
models. |
|---|---|
| AbstractList | The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models. The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models. |
| Author | Verret, Farah Couture, Beatrice Gohier, Maxime Deslandres, Dominique |
| Author_xml | – sequence: 1 givenname: Beatrice surname: Couture fullname: Couture, Beatrice – sequence: 2 givenname: Farah surname: Verret fullname: Verret, Farah – sequence: 3 givenname: Maxime surname: Gohier fullname: Gohier, Maxime – sequence: 4 givenname: Dominique surname: Deslandres fullname: Deslandres, Dominique |
| BookMark | eNpNkF1LwzAUhoNMcM7d-QNy542dSZp-xDuZzg0mitTrkCanX7aJpp3gv7d0Il6dlwPv88JzjmbWWUDokpIVj5lIbxrTmWpFScTZCZozTuMgEkLM_uUztOz7hhBCI55GUTRHXVYBXleqbcGW0GNX4G32ip-cgRZnXtW2tuUt3gCYXOl3XHjX4WHsvHjXgB7wvbMWPG4Bl-4wYAO4vVJeV_UXYDVm8IDtoQNffx7gAp0Wqu1h-XsX6G3zkK23wf75cbe-2weacsoCzeMkYdSYVCUq10YlJI2ECUlBiU415DzkhAvDQq0EY6TQOkkAolhoGitiwgXaHbnGqUZ--LpT_ls6Vcvp4XwplR9q3YKEkMccCLBxk_OQ5WmREMIZF0zEKoWRdX1kae_63kPxx6NETublZF5O5sMf0EJ3Mw |
| ContentType | Journal Article |
| DBID | AAYXX CITATION DOA |
| DOI | 10.46298/jdmdh.10542 |
| DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2416-5999 |
| ExternalDocumentID | oai_doaj_org_article_e3464e0e24674432b8f7004249296a8e 10_46298_jdmdh_10542 |
| GroupedDBID | 5VS AAFWJ AAYXX ADBBV ADQAK AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION FRP GROUPED_DOAJ KQ8 M~E OK1 |
| ID | FETCH-LOGICAL-c1412-c467721dd8a7abcda70859d30f10c8ceb434049d23ca9220fcc77ee569c16a0d3 |
| IEDL.DBID | DOA |
| ISSN | 2416-5999 |
| IngestDate | Fri Oct 03 12:52:51 EDT 2025 Sat Nov 29 04:10:29 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1412-c467721dd8a7abcda70859d30f10c8ceb434049d23ca9220fcc77ee569c16a0d3 |
| OpenAccessLink | https://doaj.org/article/e3464e0e24674432b8f7004249296a8e |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_e3464e0e24674432b8f7004249296a8e crossref_primary_10_46298_jdmdh_10542 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-12-06 |
| PublicationDateYYYYMMDD | 2023-12-06 |
| PublicationDate_xml | – month: 12 year: 2023 text: 2023-12-06 day: 06 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of data mining and digital humanities |
| PublicationYear | 2023 |
| Publisher | Nicolas Turenne |
| Publisher_xml | – name: Nicolas Turenne |
| SSID | ssj0001548555 |
| Score | 2.2404296 |
| Snippet | The arrival of handwriting recognition technologies offers new possibilities
for research in heritage studies. However, it is now necessary to reflect on
the... The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the... |
| SourceID | doaj crossref |
| SourceType | Open Website Index Database |
| SubjectTerms | computer science - computer vision and pattern recognition computer science - machine learning |
| Title | The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique |
| URI | https://doaj.org/article/e3464e0e24674432b8f7004249296a8e |
| Volume | Historical Documents and... |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: DOA dateStart: 20140101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: M~E dateStart: 20140101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQxcDCG1FeugHEFDVxHMdhg9KqC1WFitQtcuwzrz5Q2jLy27GdFJWJhSWyosiKvrNzd_Hd9xFyyZOiSBKMA-fLAhYjD0RiN55OtUIbMtt1LbzYRNrvi9EoG6xJfbmasIoeuAKuhTHjDEOkThaDxbQQJvXnddavcynQfX1t1LOWTFX9wY70JKkq3RmnmWi96Yl-cbK2jP7yQWtU_d6ndHfJdh0Mwm31EntkA6f7ZGcltAD1vjsgE2tMaK9kT-YwM9AbPoLTMRvDsBZ5uIGu9USFVO_gekbARnYwqP6zwL0rZylhjPA8Wy5AI4yvZcU5C9KOsUSYuk5Mx-Z6SJ66nWG7F9Q6CYGKWEQDZaGxiZzWQqayUFqmjrVMx6GJQiUUFixmNhHQNFYyozQ0SqUpYsIzFXEZ6viINKazKR4TiLjgRnJuaCKZ4VQUOkpMpo3KCqGRNsnVCrn8o6LDyG0a4RHOPcK5R7hJ7hysP884Emt_w5o2r02b_2Xak_-Y5JRsOYV4X4HCz0hjUS7xnGyqz8XrvLzwq8ZeH74638L1xfE |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Challenges+of+HTR+Model+Training%3A+Feedback+from+the+Project+Donner+le+gout+de+l%27archive+a+l%27ere+numerique&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Beatrice+Couture&rft.au=Farah+Verret&rft.au=Maxime+Gohier&rft.au=Dominique+Deslandres&rft.date=2023-12-06&rft.pub=Nicolas+Turenne&rft.eissn=2416-5999&rft.volume=Historical+Documents+and...&rft_id=info:doi/10.46298%2Fjdmdh.10542&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_e3464e0e24674432b8f7004249296a8e |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon |