The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique

The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most si...

Full description

Saved in:
Bibliographic Details
Published in:Journal of data mining and digital humanities Vol. Historical Documents and...
Main Authors: Couture, Beatrice, Verret, Farah, Gohier, Maxime, Deslandres, Dominique
Format: Journal Article
Language:English
Published: Nicolas Turenne 06.12.2023
Subjects:
ISSN:2416-5999, 2416-5999
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.
AbstractList The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.
The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.
Author Verret, Farah
Couture, Beatrice
Gohier, Maxime
Deslandres, Dominique
Author_xml – sequence: 1
  givenname: Beatrice
  surname: Couture
  fullname: Couture, Beatrice
– sequence: 2
  givenname: Farah
  surname: Verret
  fullname: Verret, Farah
– sequence: 3
  givenname: Maxime
  surname: Gohier
  fullname: Gohier, Maxime
– sequence: 4
  givenname: Dominique
  surname: Deslandres
  fullname: Deslandres, Dominique
BookMark eNpNkF1LwzAUhoNMcM7d-QNy542dSZp-xDuZzg0mitTrkCanX7aJpp3gv7d0Il6dlwPv88JzjmbWWUDokpIVj5lIbxrTmWpFScTZCZozTuMgEkLM_uUztOz7hhBCI55GUTRHXVYBXleqbcGW0GNX4G32ip-cgRZnXtW2tuUt3gCYXOl3XHjX4WHsvHjXgB7wvbMWPG4Bl-4wYAO4vVJeV_UXYDVm8IDtoQNffx7gAp0Wqu1h-XsX6G3zkK23wf75cbe-2weacsoCzeMkYdSYVCUq10YlJI2ECUlBiU415DzkhAvDQq0EY6TQOkkAolhoGitiwgXaHbnGqUZ--LpT_ls6Vcvp4XwplR9q3YKEkMccCLBxk_OQ5WmREMIZF0zEKoWRdX1kae_63kPxx6NETublZF5O5sMf0EJ3Mw
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.46298/jdmdh.10542
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2416-5999
ExternalDocumentID oai_doaj_org_article_e3464e0e24674432b8f7004249296a8e
10_46298_jdmdh_10542
GroupedDBID 5VS
AAFWJ
AAYXX
ADBBV
ADQAK
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
FRP
GROUPED_DOAJ
KQ8
M~E
OK1
ID FETCH-LOGICAL-c1412-c467721dd8a7abcda70859d30f10c8ceb434049d23ca9220fcc77ee569c16a0d3
IEDL.DBID DOA
ISSN 2416-5999
IngestDate Fri Oct 03 12:52:51 EDT 2025
Sat Nov 29 04:10:29 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1412-c467721dd8a7abcda70859d30f10c8ceb434049d23ca9220fcc77ee569c16a0d3
OpenAccessLink https://doaj.org/article/e3464e0e24674432b8f7004249296a8e
ParticipantIDs doaj_primary_oai_doaj_org_article_e3464e0e24674432b8f7004249296a8e
crossref_primary_10_46298_jdmdh_10542
PublicationCentury 2000
PublicationDate 2023-12-06
PublicationDateYYYYMMDD 2023-12-06
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-12-06
  day: 06
PublicationDecade 2020
PublicationTitle Journal of data mining and digital humanities
PublicationYear 2023
Publisher Nicolas Turenne
Publisher_xml – name: Nicolas Turenne
SSID ssj0001548555
Score 2.2404296
Snippet The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the...
The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the...
SourceID doaj
crossref
SourceType Open Website
Index Database
SubjectTerms computer science - computer vision and pattern recognition
computer science - machine learning
Title The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique
URI https://doaj.org/article/e3464e0e24674432b8f7004249296a8e
Volume Historical Documents and...
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: DOA
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: M~E
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQxcDCG1FeugHEFDVxHMdhg9KqC1WFitQtcuwzrz5Q2jLy27GdFJWJhSWyosiKvrNzd_Hd9xFyyZOiSBKMA-fLAhYjD0RiN55OtUIbMtt1LbzYRNrvi9EoG6xJfbmasIoeuAKuhTHjDEOkThaDxbQQJvXnddavcynQfX1t1LOWTFX9wY70JKkq3RmnmWi96Yl-cbK2jP7yQWtU_d6ndHfJdh0Mwm31EntkA6f7ZGcltAD1vjsgE2tMaK9kT-YwM9AbPoLTMRvDsBZ5uIGu9USFVO_gekbARnYwqP6zwL0rZylhjPA8Wy5AI4yvZcU5C9KOsUSYuk5Mx-Z6SJ66nWG7F9Q6CYGKWEQDZaGxiZzWQqayUFqmjrVMx6GJQiUUFixmNhHQNFYyozQ0SqUpYsIzFXEZ6viINKazKR4TiLjgRnJuaCKZ4VQUOkpMpo3KCqGRNsnVCrn8o6LDyG0a4RHOPcK5R7hJ7hysP884Emt_w5o2r02b_2Xak_-Y5JRsOYV4X4HCz0hjUS7xnGyqz8XrvLzwq8ZeH74638L1xfE
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Challenges+of+HTR+Model+Training%3A+Feedback+from+the+Project+Donner+le+gout+de+l%27archive+a+l%27ere+numerique&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Beatrice+Couture&rft.au=Farah+Verret&rft.au=Maxime+Gohier&rft.au=Dominique+Deslandres&rft.date=2023-12-06&rft.pub=Nicolas+Turenne&rft.eissn=2416-5999&rft.volume=Historical+Documents+and...&rft_id=info:doi/10.46298%2Fjdmdh.10542&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_e3464e0e24674432b8f7004249296a8e
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon