Handwritten Text Recognition for Documentary Medieval Manuscripts

Handwritten Text Recognition (HTR) techniques aim to accurately recognize sequences of characters in input manuscript images by training artificial intelligence models to capture historical writing features. Efficient HTR models can transform digitized manuscript collections into indexed and quotabl...

Full description

Saved in:
Bibliographic Details
Published in:Journal of data mining and digital humanities Vol. Historical Documents and...
Main Authors: Torres Aguilar, Sergio, Jolivet, Vincent
Format: Journal Article
Language:English
Published: INRIA 22.12.2023
Nicolas Turenne
Subjects:
ISSN:2416-5999, 2416-5999
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Handwritten Text Recognition (HTR) techniques aim to accurately recognize sequences of characters in input manuscript images by training artificial intelligence models to capture historical writing features. Efficient HTR models can transform digitized manuscript collections into indexed and quotable corpora, providing valuable research insight for various historical inquiries. However, several challenges must be addressed, including the scarcity of relevant training corpora, the consequential variability introduced by different scribal hands and writing scripts, and the complexity of page layouts. This paper presents two models and one cross-model approach for automatic transcription of Latin and French medieval documentary manuscripts, particularly charters and registers, written between the 12th and 15th centuries and classified into two major writing scripts: Textualis (from the late-11th to 13th century) and Cursiva (from the 13th to the 15th century). The architecture of the models is based on a Convolutional Recurrent Neural Network (CRNN) coupled with a Connectionist Temporal Classification (CTC) loss. The training and evaluation of the models, involving 120k lines of text and almost 1M tokens, were conducted using three available ground-truth corpora : The e-NDP corpus, the Alcar-HOME database and the Himanis project. This paper describes the training architecture and corpora used, while discussing the main training challenges, results, and potential applications of HTR techniques on medieval documentary manuscripts.
AbstractList Handwritten Text Recognition (HTR) techniques aim to accurately recognize sequences of characters in input manuscript images by training artificial intelligence models to capture historical writing features. Efficient HTR models can transform digitized manuscript collections into indexed and quotable corpora, providing valuable research insight for various historical inquiries. However, several challenges must be addressed, including the scarcity of relevant training corpora, the consequential variability introduced by different scribal hands and writing scripts, and the complexity of page layouts. This paper presents two models and one cross-model approach for automatic transcription of Latin and French medieval documentary manuscripts, particularly charters and registers, written between the 12th and 15th centuries and classified into two major writing scripts: Textualis (from the late-11th to 13th century) and Cursiva (from the 13th to the 15th century). The architecture of the models is based on a Convolutional Recurrent Neural Network (CRNN) coupled with a Connectionist Temporal Classification (CTC) loss. The training and evaluation of the models, involving 120k lines of text and almost 1M tokens, were conducted using three available ground-truth corpora : The e-NDP corpus, the Alcar-HOME database and the Himanis project. This paper describes the training architecture and corpora used, while discussing the main training challenges, results, and potential applications of HTR techniques on medieval documentary manuscripts.
Author Torres Aguilar, Sergio
Jolivet, Vincent
Author_xml – sequence: 1
  givenname: Sergio
  orcidid: 0000-0002-1801-3147
  surname: Torres Aguilar
  fullname: Torres Aguilar, Sergio
– sequence: 2
  givenname: Vincent
  orcidid: 0000-0003-0600-0362
  surname: Jolivet
  fullname: Jolivet, Vincent
BackLink https://hal.science/hal-03892163$$DView record in HAL
BookMark eNpVkE1LAzEYhIMoWGtv_oC9Cm7Nd7LHUj9aaBGknkOSzbYp26Rkt1X_vetWRE_zMsz7MMwVOA8xOABuEBxTjgt5vy135WaMIJX0DAwwRTxnRVGc_7kvwahpthBCxKhkjA3AZKZD-Z5827qQrdxHm706G9fBtz6GrIope4j2sHOh1ekzW7rSu6Ous6UOh8Ymv2-ba3BR6bpxox8dgrenx9V0li9enufTySK3SDCaM-YKpHUnjJfciVJiaDUXxsgCV5UxwkJZMKGRwc5ygwhGGjNbSQMJFY4MwfzELaPeqn3yu66Ritqr3ohprXRqva2dEkILwiomMEJUQygF51xQR4x0GPas2xNro-t_qNlkob49SLpWiJMj6bJ3p6xNsWmSq34fEFT98qpfXvXLky-oRXdR
ContentType Journal Article
Copyright Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
1XC
BXJBU
IHQJB
VOOES
DOA
DOI 10.46298/jdmdh.10484
DatabaseName CrossRef
Hyper Article en Ligne (HAL)
HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société
HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société (Open Access)
Hyper Article en Ligne (HAL) (Open Access)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList CrossRef


Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2416-5999
ExternalDocumentID oai_doaj_org_article_77a735f572114a008766674e3b8e207e
oai:HAL:hal-03892163v3
10_46298_jdmdh_10484
GroupedDBID 5VS
AAFWJ
AAYXX
ADBBV
ADQAK
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
FRP
GROUPED_DOAJ
KQ8
M~E
OK1
1XC
BXJBU
IHQJB
VOOES
ID FETCH-LOGICAL-c1754-55e91aa55e56d6e7d820ca67bb892ffbb7c08957a1b2ec6b1321a25cf8b0347e3
IEDL.DBID DOA
ISSN 2416-5999
IngestDate Fri Oct 03 12:42:25 EDT 2025
Tue Oct 14 20:43:30 EDT 2025
Sat Nov 29 04:10:29 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords digital diplomatics
medieval digital studies
HTR for historical documents
HTR for medieval Latin manuscripts
medieval charters
HTR for medieval French manuscripts
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1754-55e91aa55e56d6e7d820ca67bb892ffbb7c08957a1b2ec6b1321a25cf8b0347e3
ORCID 0000-0003-0600-0362
0000-0002-1801-3147
OpenAccessLink https://doaj.org/article/77a735f572114a008766674e3b8e207e
ParticipantIDs doaj_primary_oai_doaj_org_article_77a735f572114a008766674e3b8e207e
hal_primary_oai_HAL_hal_03892163v3
crossref_primary_10_46298_jdmdh_10484
PublicationCentury 2000
PublicationDate 2023-12-22
PublicationDateYYYYMMDD 2023-12-22
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-12-22
  day: 22
PublicationDecade 2020
PublicationTitle Journal of data mining and digital humanities
PublicationYear 2023
Publisher INRIA
Nicolas Turenne
Publisher_xml – name: INRIA
– name: Nicolas Turenne
SSID ssj0001548555
Score 2.2417202
Snippet Handwritten Text Recognition (HTR) techniques aim to accurately recognize sequences of characters in input manuscript images by training artificial...
SourceID doaj
hal
crossref
SourceType Open Website
Open Access Repository
Index Database
SubjectTerms [info.info-ai]computer science [cs]/artificial intelligence [cs.ai]
[shs.hist]humanities and social sciences/history
Artificial Intelligence
Computer Science
digital diplomatics
History
htr for historical documents
htr for medieval french manuscripts
htr for medieval latin manuscripts
Humanities and Social Sciences
medieval charters
medieval digital studies
Title Handwritten Text Recognition for Documentary Medieval Manuscripts
URI https://hal.science/hal-03892163
https://doaj.org/article/77a735f572114a008766674e3b8e207e
Volume Historical Documents and...
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: DOA
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: M~E
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1NS8NAEF2kePDit1i_WESPoclmP49VWnqwRUSht7Cz2VAFU6lW8eJvd3bTSj158ZLAEpLwJtn3Jjt5Q8gFaJdmPg3NXXKVcKjSxKbWJxU3ICEHVrrYteRGjUZ6PDa3K62-Qk1YYw_cANdRyqpcVCJkKtxGBzUpFfc5aM9S5cPsi6pnJZlq_g8OpieiqXTnkhndeSqfy0lY0tT8FwdFq35klsnyS2pklv422VxIQtptbmWHrPl6l2wt2y3Qxdu3R7oDzPo_MJlHmUvvcVKld8vqn2lNUXxSZIx5LAaffdKwAhOsvOnQ1vNmbnjdJw_93v31IFl0QEgc0jpPhPAmsxZ3QpbSqxL52lmpALRhVQWgXKqNUDYD5p0ETC0zy4SrNELPlc8PSKue1v6QUCRDY8DpygTJAd6ilILMcm1LYNJCm1wuMSleGqOLAhOEiF0RsSsidm1yFQD7OSbYU8cBDFqxCFrxV9Da5Bzh_nWOQfemCGPB8I-hSHzPj_7jSsdkIzSIDwUojJ2Q1tts7k_Junt_e3ydncWHBrfDr943OUbG-A
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Handwritten+Text+Recognition+for+Documentary+Medieval+Manuscripts&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Sergio+Torres+Aguilar&rft.au=Vincent+Jolivet&rft.date=2023-12-22&rft.pub=Nicolas+Turenne&rft.eissn=2416-5999&rft.volume=Historical+Documents+and...&rft_id=info:doi/10.46298%2Fjdmdh.10484&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_77a735f572114a008766674e3b8e207e
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon