The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique
The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most si...
Uložené v:
| Vydané v: | Journal of data mining and digital humanities Ročník Historical Documents and... |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Nicolas Turenne
06.12.2023
|
| Predmet: | |
| ISSN: | 2416-5999, 2416-5999 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | The arrival of handwriting recognition technologies offers new possibilities
for research in heritage studies. However, it is now necessary to reflect on
the experiences and the practices developed by research teams. Our use of the
Transkribus platform since 2018 has led us to search for the most significant
ways to improve the performance of our handwritten text recognition (HTR)
models which are made to transcribe French handwriting dating from the 17th
century. This article therefore reports on the impacts of creating transcribing
protocols, using the language model at full scale and determining the best way
to use base models in order to help increase the performance of HTR models.
Combining all of these elements can indeed increase the performance of a single
model by more than 20% (reaching a Character Error Rate below 5%). This article
also discusses some challenges regarding the collaborative nature of HTR
platforms such as Transkribus and the way researchers can share their data
generated in the process of creating or training handwritten text recognition
models. |
|---|---|
| ISSN: | 2416-5999 2416-5999 |
| DOI: | 10.46298/jdmdh.10542 |