Generic HTR Models for Medieval Manuscripts. The CREMMALab Project
In the Humanities, the emergence of digital methods has opened up research questions to quantitative analysis. This is why HTR technology is increasingly involved in humanities research projects following precursors such as the Himanis project. However, many research teams have limited resources, ei...
Uložené v:
| Vydané v: | Journal of data mining and digital humanities Ročník Historical Documents and... |
|---|---|
| Hlavný autor: | |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Nicolas Turenne
16.10.2023
|
| Predmet: | |
| ISSN: | 2416-5999, 2416-5999 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | In the Humanities, the emergence of digital methods has opened up research questions to quantitative analysis. This is why HTR technology is increasingly involved in humanities research projects following precursors such as the Himanis project. However, many research teams have limited resources, either financially or in terms of their expertise in artificial intelligence. It may therefore be difficult to integrate handwritten text recognition into their project pipeline if they need to train a model or to create data from scratch. The goal here is not to explain how to build or improve a new HTR engine, nor to find a way to automatically align a preexisting corpus with an image to quickly create ground truths for training. This paper aims to help humanists easily develop an HTR model for medieval manuscripts, create and gather training data by knowing the issues underlying their choices. The objective is also to show the importance of the constitution of consistent data as a prerequisite to allow their gathering and to train efficient HTR models. We will present an overview of our work and experiment in the CREMMALab project (2021-2022), showing first how we ensure the consistency of the data and then how we have developed a generic model for medieval French manuscripts from the 13 th to the 15 th century, ready to be shared (more than 94% accuracy) and/or fine-tuned by other projects. |
|---|---|
| AbstractList | In the Humanities, the emergence of digital methods has opened up research questions to quantitative analysis. This is why HTR technology is increasingly involved in humanities research projects following precursors such as the Himanis project. However, many research teams have limited resources, either financially or in terms of their expertise in artificial intelligence. It may therefore be difficult to integrate handwritten text recognition into their project pipeline if they need to train a model or to create data from scratch. The goal here is not to explain how to build or improve a new HTR engine, nor to find a way to automatically align a preexisting corpus with an image to quickly create ground truths for training. This paper aims to help humanists easily develop an HTR model for medieval manuscripts, create and gather training data by knowing the issues underlying their choices. The objective is also to show the importance of the constitution of consistent data as a prerequisite to allow their gathering and to train efficient HTR models. We will present an overview of our work and experiment in the CREMMALab project (2021-2022), showing first how we ensure the consistency of the data and then how we have developed a generic model for medieval French manuscripts from the 13 th to the 15 th century, ready to be shared (more than 94% accuracy) and/or fine-tuned by other projects. |
| Author | Pinche, Ariane |
| Author_xml | – sequence: 1 givenname: Ariane orcidid: 0000-0002-7843-5050 surname: Pinche fullname: Pinche, Ariane |
| BookMark | eNpNkMtOAjEARRuDiYjs_IB-gIN9zrRLJAgkTDQE102fMpNhSlo08e8lozFu7r25i7M4t2DUx94DcI_RjJVEisfWHd1hhhHh5AqMCcNlwaWUo3_7BkxzbhFCmDPBOR-Dp5XvfWosXO93sI7OdxmGmGDtXeM_dQdr3X9km5rTOc_g_uDhYres6_lWG_iaYuvt-Q5cB91lP_3tCXh7Xu4X62L7stos5tvCYoZJwbkJpedCV9RY7x23BlWSEloJJyXmxrgKaSEqSjX1jFEbKkulc0ZaZ4WnE7D54bqoW3VKzVGnLxV1o4Yjpnel07mxnVdOGO6QCQzTwIJEWrqSGGpJCIhc8sJ6-GHZFHNOPvzxMFKDTjXoVINO-g38dGmI |
| Cites_doi | 10.1109/icdar.2017.307 10.1109/icdarw.2019.10032 10.1086/ahr/106.2.627 10.1093/llc/fqab033 10.1109/icdar.2011.20 10.1007/978-1-4471-4072-6_12 10.1093/llc/fqt039 10.5334/johd.46 10.1109/itcasia55616.2022.00006 10.5281/zenodo.4050360 10.1142/8394 10.1007/978-3-030-86159-9_21 10.4000/medievales.8198 10.1145/1815330.1815331 10.1007/978-3-030-30754-7_11 10.1109/vsmm.2009.26 10.46298/jdmdh.9806 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION DOA |
| DOI | 10.46298/jdmdh.10252 |
| DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: Directory of Open Access Journals (DOAJ) (Open Access) url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2416-5999 |
| ExternalDocumentID | oai_doaj_org_article_d8b5d0bf413f4f90a9d62b3c2ff02c2f 10_46298_jdmdh_10252 |
| GroupedDBID | 5VS AAFWJ AAYXX ADBBV ADQAK AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION FRP GROUPED_DOAJ KQ8 M~E OK1 |
| ID | FETCH-LOGICAL-c1412-55bf6e58a73bceed5cb07932378d9915bbd70a88733a3e443cf7c39ddb9cdc8e3 |
| IEDL.DBID | DOA |
| ISSN | 2416-5999 |
| IngestDate | Fri Oct 03 12:48:00 EDT 2025 Sat Nov 29 04:10:29 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1412-55bf6e58a73bceed5cb07932378d9915bbd70a88733a3e443cf7c39ddb9cdc8e3 |
| ORCID | 0000-0002-7843-5050 |
| OpenAccessLink | https://doaj.org/article/d8b5d0bf413f4f90a9d62b3c2ff02c2f |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_d8b5d0bf413f4f90a9d62b3c2ff02c2f crossref_primary_10_46298_jdmdh_10252 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-10-16 |
| PublicationDateYYYYMMDD | 2023-10-16 |
| PublicationDate_xml | – month: 10 year: 2023 text: 2023-10-16 day: 16 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of data mining and digital humanities |
| PublicationYear | 2023 |
| Publisher | Nicolas Turenne |
| Publisher_xml | – name: Nicolas Turenne |
| References | ref13 ref12 ref15 ref14 ref31 ref30 ref11 ref10 ref2 ref1 ref17 ref16 ref19 ref18 ref24 ref23 ref26 ref25 ref20 ref22 ref21 ref28 ref27 ref29 ref8 ref7 ref9 ref4 ref3 ref6 ref5 |
| References_xml | – ident: ref18 doi: 10.1109/icdar.2017.307 – ident: ref3 – ident: ref5 – ident: ref19 doi: 10.1109/icdarw.2019.10032 – ident: ref7 – ident: ref20 – ident: ref30 doi: 10.1086/ahr/106.2.627 – ident: ref1 doi: 10.1093/llc/fqab033 – ident: ref24 – ident: ref22 – ident: ref25 – ident: ref14 doi: 10.1109/icdar.2011.20 – ident: ref16 doi: 10.1007/978-1-4471-4072-6_12 – ident: ref9 – ident: ref11 doi: 10.1093/llc/fqt039 – ident: ref17 doi: 10.5334/johd.46 – ident: ref8 doi: 10.1109/itcasia55616.2022.00006 – ident: ref15 – ident: ref27 doi: 10.5281/zenodo.4050360 – ident: ref6 – ident: ref28 doi: 10.1142/8394 – ident: ref2 doi: 10.1007/978-3-030-86159-9_21 – ident: ref31 doi: 10.4000/medievales.8198 – ident: ref21 – ident: ref23 – ident: ref26 – ident: ref13 doi: 10.1145/1815330.1815331 – ident: ref29 doi: 10.1007/978-3-030-30754-7_11 – ident: ref12 doi: 10.1109/vsmm.2009.26 – ident: ref10 – ident: ref4 doi: 10.46298/jdmdh.9806 |
| SSID | ssj0001548555 |
| Score | 2.2352145 |
| Snippet | In the Humanities, the emergence of digital methods has opened up research questions to quantitative analysis. This is why HTR technology is increasingly... |
| SourceID | doaj crossref |
| SourceType | Open Website Index Database |
| SubjectTerms | [info.info-ai]computer science [cs]/artificial intelligence [cs.ai] [info.info-lg]computer science [cs]/machine learning [cs.lg] [info.info-mo]computer science [cs]/modeling and simulation [shs.hist]humanities and social sciences/history dataset htr medieval text transcription |
| Title | Generic HTR Models for Medieval Manuscripts. The CREMMALab Project |
| URI | https://doaj.org/article/d8b5d0bf413f4f90a9d62b3c2ff02c2f |
| Volume | Historical Documents and... |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: Directory of Open Access Journals (DOAJ) (Open Access) customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: DOA dateStart: 20140101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: M~E dateStart: 20140101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV05T8MwFLZQxcDCjSiXPMAYmsRxHI9t1aoDqaqqSN0i-9kWRRBQD0Z-Oz4CKhMLi4cosqzv2e9Inr8PoVtbmHBIDIsoMO2u5OSRMAARADGSGMNJDl5sgo3HxXzOJ1tSX64nLNADB-A6qpBUxdJYZ2syw2PBVZ5KAqkxcWpH531t1rNVTIX7wY70hIZO9yxPedF5Vq_qyTEV0PRXDNqi6vcxZXiI9ptkEHfDIo7Qjq6P0cG30AJuzt0J6nly6AXg0WyKnXrZywrbZBO7vyyOrhuXot6E87-6x9byuD8dlGX3QUg8CZ9aTtHjcDDrj6JG_CCCJHP6BFSaXNNCMCJdIKMgHZddSlihbE5HpVQsFtZFECKIzjIChgHhSkkOCgpNzlCrfqv1OcKMFUYnGkwsLCypKHLCIJYizbRShus2uvuGo3oPHBeVrQ08bJWHrfKwtVHPYfXzjmOm9g-svarGXtVf9rr4j0ku0Z6TfXcxJMmvUGu93OhrtAsf68VqeeO3gh3Lz8EXeke8mQ |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Generic+HTR+Models+for+Medieval+Manuscripts.+The+CREMMALab+Project&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Pinche%2C+Ariane&rft.date=2023-10-16&rft.issn=2416-5999&rft.eissn=2416-5999&rft.volume=Historical+Documents+and...&rft_id=info:doi/10.46298%2Fjdmdh.10252&rft.externalDBID=n%2Fa&rft.externalDocID=10_46298_jdmdh_10252 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon |