Handwritten Text Recognition for Documentary Medieval Manuscripts
Handwritten Text Recognition (HTR) techniques aim to accurately recognize sequences of characters in input manuscript images by training artificial intelligence models to capture historical writing features. Efficient HTR models can transform digitized manuscript collections into indexed and quotabl...
Saved in:
| Published in: | Journal of data mining and digital humanities Vol. Historical Documents and... |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
INRIA
22.12.2023
Nicolas Turenne |
| Subjects: | |
| ISSN: | 2416-5999, 2416-5999 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Handwritten Text Recognition (HTR) techniques aim to accurately recognize sequences of characters in input manuscript images by training artificial intelligence models to capture historical writing features. Efficient HTR models can transform digitized manuscript collections into indexed and quotable corpora, providing valuable research insight for various historical inquiries. However, several challenges must be addressed, including the scarcity of relevant training corpora, the consequential variability introduced by different scribal hands and writing scripts, and the complexity of page layouts. This paper presents two models and one cross-model approach for automatic transcription of Latin and French medieval documentary manuscripts, particularly charters and registers, written between the 12th and 15th centuries and classified into two major writing scripts: Textualis (from the late-11th to 13th century) and Cursiva (from the 13th to the 15th century). The architecture of the models is based on a Convolutional Recurrent Neural Network (CRNN) coupled with a Connectionist Temporal Classification (CTC) loss. The training and evaluation of the models, involving 120k lines of text and almost 1M tokens, were conducted using three available ground-truth corpora : The e-NDP corpus, the Alcar-HOME database and the Himanis project. This paper describes the training architecture and corpora used, while discussing the main training challenges, results, and potential applications of HTR techniques on medieval documentary manuscripts. |
|---|---|
| AbstractList | Handwritten Text Recognition (HTR) techniques aim to accurately recognize sequences of characters in input manuscript images by training artificial intelligence models to capture historical writing features. Efficient HTR models can transform digitized manuscript collections into indexed and quotable corpora, providing valuable research insight for various historical inquiries. However, several challenges must be addressed, including the scarcity of relevant training corpora, the consequential variability introduced by different scribal hands and writing scripts, and the complexity of page layouts. This paper presents two models and one cross-model approach for automatic transcription of Latin and French medieval documentary manuscripts, particularly charters and registers, written between the 12th and 15th centuries and classified into two major writing scripts: Textualis (from the late-11th to 13th century) and Cursiva (from the 13th to the 15th century). The architecture of the models is based on a Convolutional Recurrent Neural Network (CRNN) coupled with a Connectionist Temporal Classification (CTC) loss. The training and evaluation of the models, involving 120k lines of text and almost 1M tokens, were conducted using three available ground-truth corpora : The e-NDP corpus, the Alcar-HOME database and the Himanis project. This paper describes the training architecture and corpora used, while discussing the main training challenges, results, and potential applications of HTR techniques on medieval documentary manuscripts. |
| Author | Torres Aguilar, Sergio Jolivet, Vincent |
| Author_xml | – sequence: 1 givenname: Sergio orcidid: 0000-0002-1801-3147 surname: Torres Aguilar fullname: Torres Aguilar, Sergio – sequence: 2 givenname: Vincent orcidid: 0000-0003-0600-0362 surname: Jolivet fullname: Jolivet, Vincent |
| BackLink | https://hal.science/hal-03892163$$DView record in HAL |
| BookMark | eNpVkE1LAzEYhIMoWGtv_oC9Cm7Nd7LHUj9aaBGknkOSzbYp26Rkt1X_vetWRE_zMsz7MMwVOA8xOABuEBxTjgt5vy135WaMIJX0DAwwRTxnRVGc_7kvwahpthBCxKhkjA3AZKZD-Z5827qQrdxHm706G9fBtz6GrIope4j2sHOh1ekzW7rSu6Ous6UOh8Ymv2-ba3BR6bpxox8dgrenx9V0li9enufTySK3SDCaM-YKpHUnjJfciVJiaDUXxsgCV5UxwkJZMKGRwc5ygwhGGjNbSQMJFY4MwfzELaPeqn3yu66Ritqr3ohprXRqva2dEkILwiomMEJUQygF51xQR4x0GPas2xNro-t_qNlkob49SLpWiJMj6bJ3p6xNsWmSq34fEFT98qpfXvXLky-oRXdR |
| ContentType | Journal Article |
| Copyright | Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | AAYXX CITATION 1XC BXJBU IHQJB VOOES DOA |
| DOI | 10.46298/jdmdh.10484 |
| DatabaseName | CrossRef Hyper Article en Ligne (HAL) HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société (Open Access) Hyper Article en Ligne (HAL) (Open Access) DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2416-5999 |
| ExternalDocumentID | oai_doaj_org_article_77a735f572114a008766674e3b8e207e oai:HAL:hal-03892163v3 10_46298_jdmdh_10484 |
| GroupedDBID | 5VS AAFWJ AAYXX ADBBV ADQAK AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION FRP GROUPED_DOAJ KQ8 M~E OK1 1XC BXJBU IHQJB VOOES |
| ID | FETCH-LOGICAL-c1754-55e91aa55e56d6e7d820ca67bb892ffbb7c08957a1b2ec6b1321a25cf8b0347e3 |
| IEDL.DBID | DOA |
| ISSN | 2416-5999 |
| IngestDate | Fri Oct 03 12:42:25 EDT 2025 Tue Oct 14 20:43:30 EDT 2025 Sat Nov 29 04:10:29 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | digital diplomatics medieval digital studies HTR for historical documents HTR for medieval Latin manuscripts medieval charters HTR for medieval French manuscripts |
| Language | English |
| License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1754-55e91aa55e56d6e7d820ca67bb892ffbb7c08957a1b2ec6b1321a25cf8b0347e3 |
| ORCID | 0000-0003-0600-0362 0000-0002-1801-3147 |
| OpenAccessLink | https://doaj.org/article/77a735f572114a008766674e3b8e207e |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_77a735f572114a008766674e3b8e207e hal_primary_oai_HAL_hal_03892163v3 crossref_primary_10_46298_jdmdh_10484 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-12-22 |
| PublicationDateYYYYMMDD | 2023-12-22 |
| PublicationDate_xml | – month: 12 year: 2023 text: 2023-12-22 day: 22 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of data mining and digital humanities |
| PublicationYear | 2023 |
| Publisher | INRIA Nicolas Turenne |
| Publisher_xml | – name: INRIA – name: Nicolas Turenne |
| SSID | ssj0001548555 |
| Score | 2.2417202 |
| Snippet | Handwritten Text Recognition (HTR) techniques aim to accurately recognize sequences of characters in input manuscript images by training artificial... |
| SourceID | doaj hal crossref |
| SourceType | Open Website Open Access Repository Index Database |
| SubjectTerms | [info.info-ai]computer science [cs]/artificial intelligence [cs.ai] [shs.hist]humanities and social sciences/history Artificial Intelligence Computer Science digital diplomatics History htr for historical documents htr for medieval french manuscripts htr for medieval latin manuscripts Humanities and Social Sciences medieval charters medieval digital studies |
| Title | Handwritten Text Recognition for Documentary Medieval Manuscripts |
| URI | https://hal.science/hal-03892163 https://doaj.org/article/77a735f572114a008766674e3b8e207e |
| Volume | Historical Documents and... |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: DOA dateStart: 20140101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: M~E dateStart: 20140101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1NS8NAEF2kePDit1i_WESPoclmP49VWnqwRUSht7Cz2VAFU6lW8eJvd3bTSj158ZLAEpLwJtn3Jjt5Q8gFaJdmPg3NXXKVcKjSxKbWJxU3ICEHVrrYteRGjUZ6PDa3K62-Qk1YYw_cANdRyqpcVCJkKtxGBzUpFfc5aM9S5cPsi6pnJZlq_g8OpieiqXTnkhndeSqfy0lY0tT8FwdFq35klsnyS2pklv422VxIQtptbmWHrPl6l2wt2y3Qxdu3R7oDzPo_MJlHmUvvcVKld8vqn2lNUXxSZIx5LAaffdKwAhOsvOnQ1vNmbnjdJw_93v31IFl0QEgc0jpPhPAmsxZ3QpbSqxL52lmpALRhVQWgXKqNUDYD5p0ETC0zy4SrNELPlc8PSKue1v6QUCRDY8DpygTJAd6ilILMcm1LYNJCm1wuMSleGqOLAhOEiF0RsSsidm1yFQD7OSbYU8cBDFqxCFrxV9Da5Bzh_nWOQfemCGPB8I-hSHzPj_7jSsdkIzSIDwUojJ2Q1tts7k_Junt_e3ydncWHBrfDr943OUbG-A |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Handwritten+Text+Recognition+for+Documentary+Medieval+Manuscripts&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Sergio+Torres+Aguilar&rft.au=Vincent+Jolivet&rft.date=2023-12-22&rft.pub=Nicolas+Turenne&rft.eissn=2416-5999&rft.volume=Historical+Documents+and...&rft_id=info:doi/10.46298%2Fjdmdh.10484&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_77a735f572114a008766674e3b8e207e |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon |