Generic HTR Models for Medieval Manuscripts. The CREMMALab Project

In the Humanities, the emergence of digital methods has opened up research questions to quantitative analysis. This is why HTR technology is increasingly involved in humanities research projects following precursors such as the Himanis project. However, many research teams have limited resources, ei...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of data mining and digital humanities Ročník Historical Documents and...
Hlavný autor: Pinche, Ariane
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Nicolas Turenne 16.10.2023
Predmet:
ISSN:2416-5999, 2416-5999
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract In the Humanities, the emergence of digital methods has opened up research questions to quantitative analysis. This is why HTR technology is increasingly involved in humanities research projects following precursors such as the Himanis project. However, many research teams have limited resources, either financially or in terms of their expertise in artificial intelligence. It may therefore be difficult to integrate handwritten text recognition into their project pipeline if they need to train a model or to create data from scratch. The goal here is not to explain how to build or improve a new HTR engine, nor to find a way to automatically align a preexisting corpus with an image to quickly create ground truths for training. This paper aims to help humanists easily develop an HTR model for medieval manuscripts, create and gather training data by knowing the issues underlying their choices. The objective is also to show the importance of the constitution of consistent data as a prerequisite to allow their gathering and to train efficient HTR models. We will present an overview of our work and experiment in the CREMMALab project (2021-2022), showing first how we ensure the consistency of the data and then how we have developed a generic model for medieval French manuscripts from the 13 th to the 15 th century, ready to be shared (more than 94% accuracy) and/or fine-tuned by other projects.
AbstractList In the Humanities, the emergence of digital methods has opened up research questions to quantitative analysis. This is why HTR technology is increasingly involved in humanities research projects following precursors such as the Himanis project. However, many research teams have limited resources, either financially or in terms of their expertise in artificial intelligence. It may therefore be difficult to integrate handwritten text recognition into their project pipeline if they need to train a model or to create data from scratch. The goal here is not to explain how to build or improve a new HTR engine, nor to find a way to automatically align a preexisting corpus with an image to quickly create ground truths for training. This paper aims to help humanists easily develop an HTR model for medieval manuscripts, create and gather training data by knowing the issues underlying their choices. The objective is also to show the importance of the constitution of consistent data as a prerequisite to allow their gathering and to train efficient HTR models. We will present an overview of our work and experiment in the CREMMALab project (2021-2022), showing first how we ensure the consistency of the data and then how we have developed a generic model for medieval French manuscripts from the 13 th to the 15 th century, ready to be shared (more than 94% accuracy) and/or fine-tuned by other projects.
Author Pinche, Ariane
Author_xml – sequence: 1
  givenname: Ariane
  orcidid: 0000-0002-7843-5050
  surname: Pinche
  fullname: Pinche, Ariane
BookMark eNpNkMtOAjEARRuDiYjs_IB-gIN9zrRLJAgkTDQE102fMpNhSlo08e8lozFu7r25i7M4t2DUx94DcI_RjJVEisfWHd1hhhHh5AqMCcNlwaWUo3_7BkxzbhFCmDPBOR-Dp5XvfWosXO93sI7OdxmGmGDtXeM_dQdr3X9km5rTOc_g_uDhYres6_lWG_iaYuvt-Q5cB91lP_3tCXh7Xu4X62L7stos5tvCYoZJwbkJpedCV9RY7x23BlWSEloJJyXmxrgKaSEqSjX1jFEbKkulc0ZaZ4WnE7D54bqoW3VKzVGnLxV1o4Yjpnel07mxnVdOGO6QCQzTwIJEWrqSGGpJCIhc8sJ6-GHZFHNOPvzxMFKDTjXoVINO-g38dGmI
Cites_doi 10.1109/icdar.2017.307
10.1109/icdarw.2019.10032
10.1086/ahr/106.2.627
10.1093/llc/fqab033
10.1109/icdar.2011.20
10.1007/978-1-4471-4072-6_12
10.1093/llc/fqt039
10.5334/johd.46
10.1109/itcasia55616.2022.00006
10.5281/zenodo.4050360
10.1142/8394
10.1007/978-3-030-86159-9_21
10.4000/medievales.8198
10.1145/1815330.1815331
10.1007/978-3-030-30754-7_11
10.1109/vsmm.2009.26
10.46298/jdmdh.9806
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.46298/jdmdh.10252
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

Database_xml – sequence: 1
  dbid: DOA
  name: Directory of Open Access Journals (DOAJ) (Open Access)
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2416-5999
ExternalDocumentID oai_doaj_org_article_d8b5d0bf413f4f90a9d62b3c2ff02c2f
10_46298_jdmdh_10252
GroupedDBID 5VS
AAFWJ
AAYXX
ADBBV
ADQAK
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
FRP
GROUPED_DOAJ
KQ8
M~E
OK1
ID FETCH-LOGICAL-c1412-55bf6e58a73bceed5cb07932378d9915bbd70a88733a3e443cf7c39ddb9cdc8e3
IEDL.DBID DOA
ISSN 2416-5999
IngestDate Fri Oct 03 12:48:00 EDT 2025
Sat Nov 29 04:10:29 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1412-55bf6e58a73bceed5cb07932378d9915bbd70a88733a3e443cf7c39ddb9cdc8e3
ORCID 0000-0002-7843-5050
OpenAccessLink https://doaj.org/article/d8b5d0bf413f4f90a9d62b3c2ff02c2f
ParticipantIDs doaj_primary_oai_doaj_org_article_d8b5d0bf413f4f90a9d62b3c2ff02c2f
crossref_primary_10_46298_jdmdh_10252
PublicationCentury 2000
PublicationDate 2023-10-16
PublicationDateYYYYMMDD 2023-10-16
PublicationDate_xml – month: 10
  year: 2023
  text: 2023-10-16
  day: 16
PublicationDecade 2020
PublicationTitle Journal of data mining and digital humanities
PublicationYear 2023
Publisher Nicolas Turenne
Publisher_xml – name: Nicolas Turenne
References ref13
ref12
ref15
ref14
ref31
ref30
ref11
ref10
ref2
ref1
ref17
ref16
ref19
ref18
ref24
ref23
ref26
ref25
ref20
ref22
ref21
ref28
ref27
ref29
ref8
ref7
ref9
ref4
ref3
ref6
ref5
References_xml – ident: ref18
  doi: 10.1109/icdar.2017.307
– ident: ref3
– ident: ref5
– ident: ref19
  doi: 10.1109/icdarw.2019.10032
– ident: ref7
– ident: ref20
– ident: ref30
  doi: 10.1086/ahr/106.2.627
– ident: ref1
  doi: 10.1093/llc/fqab033
– ident: ref24
– ident: ref22
– ident: ref25
– ident: ref14
  doi: 10.1109/icdar.2011.20
– ident: ref16
  doi: 10.1007/978-1-4471-4072-6_12
– ident: ref9
– ident: ref11
  doi: 10.1093/llc/fqt039
– ident: ref17
  doi: 10.5334/johd.46
– ident: ref8
  doi: 10.1109/itcasia55616.2022.00006
– ident: ref15
– ident: ref27
  doi: 10.5281/zenodo.4050360
– ident: ref6
– ident: ref28
  doi: 10.1142/8394
– ident: ref2
  doi: 10.1007/978-3-030-86159-9_21
– ident: ref31
  doi: 10.4000/medievales.8198
– ident: ref21
– ident: ref23
– ident: ref26
– ident: ref13
  doi: 10.1145/1815330.1815331
– ident: ref29
  doi: 10.1007/978-3-030-30754-7_11
– ident: ref12
  doi: 10.1109/vsmm.2009.26
– ident: ref10
– ident: ref4
  doi: 10.46298/jdmdh.9806
SSID ssj0001548555
Score 2.2352145
Snippet In the Humanities, the emergence of digital methods has opened up research questions to quantitative analysis. This is why HTR technology is increasingly...
SourceID doaj
crossref
SourceType Open Website
Index Database
SubjectTerms [info.info-ai]computer science [cs]/artificial intelligence [cs.ai]
[info.info-lg]computer science [cs]/machine learning [cs.lg]
[info.info-mo]computer science [cs]/modeling and simulation
[shs.hist]humanities and social sciences/history
dataset
htr
medieval
text
transcription
Title Generic HTR Models for Medieval Manuscripts. The CREMMALab Project
URI https://doaj.org/article/d8b5d0bf413f4f90a9d62b3c2ff02c2f
Volume Historical Documents and...
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: Directory of Open Access Journals (DOAJ) (Open Access)
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: DOA
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: M~E
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV05T8MwFLZQxcDCjSiXPMAYmsRxHI9t1aoDqaqqSN0i-9kWRRBQD0Z-Oz4CKhMLi4cosqzv2e9Inr8PoVtbmHBIDIsoMO2u5OSRMAARADGSGMNJDl5sgo3HxXzOJ1tSX64nLNADB-A6qpBUxdJYZ2syw2PBVZ5KAqkxcWpH531t1rNVTIX7wY70hIZO9yxPedF5Vq_qyTEV0PRXDNqi6vcxZXiI9ptkEHfDIo7Qjq6P0cG30AJuzt0J6nly6AXg0WyKnXrZywrbZBO7vyyOrhuXot6E87-6x9byuD8dlGX3QUg8CZ9aTtHjcDDrj6JG_CCCJHP6BFSaXNNCMCJdIKMgHZddSlihbE5HpVQsFtZFECKIzjIChgHhSkkOCgpNzlCrfqv1OcKMFUYnGkwsLCypKHLCIJYizbRShus2uvuGo3oPHBeVrQ08bJWHrfKwtVHPYfXzjmOm9g-svarGXtVf9rr4j0ku0Z6TfXcxJMmvUGu93OhrtAsf68VqeeO3gh3Lz8EXeke8mQ
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Generic+HTR+Models+for+Medieval+Manuscripts.+The+CREMMALab+Project&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Pinche%2C+Ariane&rft.date=2023-10-16&rft.issn=2416-5999&rft.eissn=2416-5999&rft.volume=Historical+Documents+and...&rft_id=info:doi/10.46298%2Fjdmdh.10252&rft.externalDBID=n%2Fa&rft.externalDocID=10_46298_jdmdh_10252
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon