Automatic medieval charters structure detection : A Bi-LSTM linear segmentation approach

This paper presents a model aiming to automatically detect sections in medieval Latin charters. These legal sources are some of the most important sources for medieval studies as they reflect economic and social dynamics as well as legal and institutional writing practices. An automatic linear segme...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of data mining and digital humanities Ročník 2022
Hlavní autoři: Torres Aguilar, Sergio, Chastang, Pierre, Tannier, Xavier
Médium: Journal Article
Jazyk:angličtina
Vydáno: INRIA 30.10.2022
Nicolas Turenne
Témata:
ISSN:2416-5999, 2416-5999
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract This paper presents a model aiming to automatically detect sections in medieval Latin charters. These legal sources are some of the most important sources for medieval studies as they reflect economic and social dynamics as well as legal and institutional writing practices. An automatic linear segmentation can greatly facilitate charter indexation and speed up the recovering of evidence to support historical hypothesis by the means of granular inquiries on these raw, rarely structured sources. Our model is based on a Bi-LSTM approach using a final CRF-layer and was trained using a large, annotated collection of medieval charters (4,700 documents) coming from Lombard monasteries: the CDLM corpus (11th-12th centuries). The evaluation shows a high performance in most sections on the test-set and on an external evaluation corpus consisting of the Montecassino abbey charters (10th-12th centuries). We describe the architecture of the model, the main problems related to the treatment of medieval Latin and formulaic discourse, and we discuss some implications of the results in terms of record-keeping practices in High Middle Ages.
AbstractList This paper presents a model aiming to automatically detect sections in medieval Latin charters. These legal sources are some of the most important sources for medieval studies as they reflect economic and social dynamics as well as legal and institutional writing practices. An automatic linear segmentation can greatly facilitate charter indexation and speed up the recovering of evidence to support historical hypothesis by the means of granular inquiries on these raw, rarely structured sources. Our model is based on a Bi-LSTM approach using a final CRF-layer and was trained using a large, annotated collection of medieval charters (4,700 documents) coming from Lombard monasteries: the CDLM corpus (11th-12th centuries). The evaluation shows a high performance in most sections on the test-set and on an external evaluation corpus consisting of the Montecassino abbey charters (10th-12th centuries). We describe the architecture of the model, the main problems related to the treatment of medieval Latin and formulaic discourse, and we discuss some implications of the results in terms of record-keeping practices in High Middle Ages.
Author Tannier, Xavier
Torres Aguilar, Sergio
Chastang, Pierre
Author_xml – sequence: 1
  givenname: Sergio
  orcidid: 0000-0002-1801-3147
  surname: Torres Aguilar
  fullname: Torres Aguilar, Sergio
  organization: École Nationale des Chartes, École nationale des chartes
– sequence: 2
  givenname: Pierre
  orcidid: 0000-0001-7181-4835
  surname: Chastang
  fullname: Chastang, Pierre
  organization: Versailles Saint-Quentin-en-Yvelines University, Université de Versailles Saint-Quentin-en-Yvelines
– sequence: 3
  givenname: Xavier
  orcidid: 0000-0002-2452-8868
  surname: Tannier
  fullname: Tannier, Xavier
  organization: Sorbonne University, Laboratoire d'Informatique Médicale et Ingénierie des Connaissances en e-Santé
BackLink https://hal.science/hal-03410057$$DView record in HAL
BookMark eNpVkU1LAzEQhoMoWGtP_oFcRVrzvVlvtagtVDxYwVvIJpPulv0o2W3Bf--2FdHTDO-88zDDe4XO66YGhG4omQjFUn2_8ZXPJ1oJdYYGTFA1lmmanv_pL9GobTeEECqFllIO0Od01zWV7QqHK_AF7G2JXW5jB7HFbRd3rttFwB46cF3R1PgBT_FjMV6-r15xWdRgI25hXUHd2ePcbrexsS6_RhfBli2MfuoQfTw_rWbz8fLtZTGbLseOJry_Smc-AaG5TbwCx5VwWmSEBwfSpdwJH7jMvNNUaGUD84IGmiQilaB45ikfosWJ6xu7MdtYVDZ-mcYW5ig0cW36bwpXggHKGRCmqGVchMBS0NzZECRTPEhNetbtiZXb8h9qPl2ag0a4oITIZM96793J62LTthHC7wIl5piHOeZhDnnwb0Ovf3E
ContentType Journal Article
Copyright Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
1XC
BXJBU
IHQJB
VOOES
DOA
DOI 10.46298/jdmdh.8646
DatabaseName CrossRef
Hyper Article en Ligne (HAL)
HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société
HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société (Open Access)
Hyper Article en Ligne (HAL) (Open Access)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList

CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2416-5999
ExternalDocumentID oai_doaj_org_article_e132e0261a234ff29e83caff5263f580
oai:HAL:hal-03410057v2
10_46298_jdmdh_8646
GroupedDBID 5VS
AAFWJ
AAYXX
ADBBV
ADQAK
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
FRP
GROUPED_DOAJ
KQ8
M~E
OK1
1XC
BXJBU
IHQJB
VOOES
ID FETCH-LOGICAL-c1736-58bd7e483a7d6ec364c84b03fce5c93c4df35bdc81486af2d41f177495e63bd13
IEDL.DBID DOA
ISSN 2416-5999
IngestDate Fri Oct 03 12:40:44 EDT 2025
Tue Oct 14 20:46:44 EDT 2025
Sat Nov 29 04:10:29 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords digital diplomatics
medieval digital studies
linear text segmentation
Latin NLP
medieval charters
automatic structure detection
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1736-58bd7e483a7d6ec364c84b03fce5c93c4df35bdc81486af2d41f177495e63bd13
ORCID 0000-0002-1801-3147
0000-0002-2452-8868
0000-0001-7181-4835
OpenAccessLink https://doaj.org/article/e132e0261a234ff29e83caff5263f580
ParticipantIDs doaj_primary_oai_doaj_org_article_e132e0261a234ff29e83caff5263f580
hal_primary_oai_HAL_hal_03410057v2
crossref_primary_10_46298_jdmdh_8646
PublicationCentury 2000
PublicationDate 2022-10-30
PublicationDateYYYYMMDD 2022-10-30
PublicationDate_xml – month: 10
  year: 2022
  text: 2022-10-30
  day: 30
PublicationDecade 2020
PublicationTitle Journal of data mining and digital humanities
PublicationYear 2022
Publisher INRIA
Nicolas Turenne
Publisher_xml – name: INRIA
– name: Nicolas Turenne
SSID ssj0001548555
Score 2.1995566
Snippet This paper presents a model aiming to automatically detect sections in medieval Latin charters. These legal sources are some of the most important sources for...
SourceID doaj
hal
crossref
SourceType Open Website
Open Access Repository
Index Database
SubjectTerms [info.info-lg]computer science [cs]/machine learning [cs.lg]
[info.info-tt]computer science [cs]/document and text processing
[shs.hist]humanities and social sciences/history
automatic structure detection
Computer Science
digital diplomatics
Document and Text Processing
History
Humanities and Social Sciences
latin nlp
linear text segmentation
Machine Learning
medieval charters
medieval digital studies
Title Automatic medieval charters structure detection : A Bi-LSTM linear segmentation approach
URI https://hal.science/hal-03410057
https://doaj.org/article/e132e0261a234ff29e83caff5263f580
Volume 2022
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: DOA
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: M~E
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3LSsNAFB1EXLjxLdYXg3Qbm8wrE3eptHTRFsEK3YVkHrZCU-lr6bc7d5JKXblxk0UISTg34dyTzDkXoSYXRjAZ0oAbZgNmlQqcbM4DUTCtmXIEbqthE_FwKMfj5GVn1BesCavigSvgWsbJJQNCISeUWUsSI6nKreVEUMulV-uu69kRU5U_GEJPeGXIY4IksvWhZ3ryKAW0ujsU5JP6HbFMth9SPbF0T9BR3RHitLqTU7RnyjN0vJ22gOuX7xyN0_Vq7gNWMfwQh5huDJ4piMfEVQzsemGwNiu_uqrETzjF7WnQfx0NMDST-QIvzfusNhuVeBsnfoHeup3Rcy-o5yIEKoqpCLgsdGyYpHmshVFUMCVZEVKrDFcJVUxbygutpJM6IrdEs8hGrs1LuBG00BG9RPvlvDRXCBOHbqwYC6lxrQURsgC8tRUsT-Iikg3U3EKVfVbxF5mTDR7RzCOaAaIN1AYYfw6BzGq_w1UyqyuZ_VXJBnpwRfh1jl7az2Bf6LgWbLMbcv0fV7pBhwRcDEBB4S3adyUyd-hAbVbT5eLeP0luO_jqfANJgM6C
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automatic+medieval+charters+structure+detection+%3A+A+Bi-LSTM+linear+segmentation+approach&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Torres+Aguilar%2C+Sergio&rft.au=Chastang%2C+Pierre&rft.au=Tannier%2C+Xavier&rft.date=2022-10-30&rft.issn=2416-5999&rft.eissn=2416-5999&rft.volume=2022&rft_id=info:doi/10.46298%2Fjdmdh.8646&rft.externalDBID=n%2Fa&rft.externalDocID=10_46298_jdmdh_8646
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon