Automatic medieval charters structure detection : A Bi-LSTM linear segmentation approach
This paper presents a model aiming to automatically detect sections in medieval Latin charters. These legal sources are some of the most important sources for medieval studies as they reflect economic and social dynamics as well as legal and institutional writing practices. An automatic linear segme...
Uložené v:
| Vydané v: | Journal of data mining and digital humanities Ročník 2022 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
INRIA
30.10.2022
Nicolas Turenne |
| Predmet: | |
| ISSN: | 2416-5999, 2416-5999 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | This paper presents a model aiming to automatically detect sections in medieval Latin charters. These legal sources are some of the most important sources for medieval studies as they reflect economic and social dynamics as well as legal and institutional writing practices. An automatic linear segmentation can greatly facilitate charter indexation and speed up the recovering of evidence to support historical hypothesis by the means of granular inquiries on these raw, rarely structured sources. Our model is based on a Bi-LSTM approach using a final CRF-layer and was trained using a large, annotated collection of medieval charters (4,700 documents) coming from Lombard monasteries: the CDLM corpus (11th-12th centuries). The evaluation shows a high performance in most sections on the test-set and on an external evaluation corpus consisting of the Montecassino abbey charters (10th-12th centuries). We describe the architecture of the model, the main problems related to the treatment of medieval Latin and formulaic discourse, and we discuss some implications of the results in terms of record-keeping practices in High Middle Ages. |
|---|---|
| AbstractList | This paper presents a model aiming to automatically detect sections in medieval Latin charters. These legal sources are some of the most important sources for medieval studies as they reflect economic and social dynamics as well as legal and institutional writing practices. An automatic linear segmentation can greatly facilitate charter indexation and speed up the recovering of evidence to support historical hypothesis by the means of granular inquiries on these raw, rarely structured sources. Our model is based on a Bi-LSTM approach using a final CRF-layer and was trained using a large, annotated collection of medieval charters (4,700 documents) coming from Lombard monasteries: the CDLM corpus (11th-12th centuries). The evaluation shows a high performance in most sections on the test-set and on an external evaluation corpus consisting of the Montecassino abbey charters (10th-12th centuries). We describe the architecture of the model, the main problems related to the treatment of medieval Latin and formulaic discourse, and we discuss some implications of the results in terms of record-keeping practices in High Middle Ages. |
| Author | Tannier, Xavier Torres Aguilar, Sergio Chastang, Pierre |
| Author_xml | – sequence: 1 givenname: Sergio orcidid: 0000-0002-1801-3147 surname: Torres Aguilar fullname: Torres Aguilar, Sergio organization: École Nationale des Chartes, École nationale des chartes – sequence: 2 givenname: Pierre orcidid: 0000-0001-7181-4835 surname: Chastang fullname: Chastang, Pierre organization: Versailles Saint-Quentin-en-Yvelines University, Université de Versailles Saint-Quentin-en-Yvelines – sequence: 3 givenname: Xavier orcidid: 0000-0002-2452-8868 surname: Tannier fullname: Tannier, Xavier organization: Sorbonne University, Laboratoire d'Informatique Médicale et Ingénierie des Connaissances en e-Santé |
| BackLink | https://hal.science/hal-03410057$$DView record in HAL |
| BookMark | eNpVkU1LAzEQhoMoWGtP_oFcRVrzvVlvtagtVDxYwVvIJpPulv0o2W3Bf--2FdHTDO-88zDDe4XO66YGhG4omQjFUn2_8ZXPJ1oJdYYGTFA1lmmanv_pL9GobTeEECqFllIO0Od01zWV7QqHK_AF7G2JXW5jB7HFbRd3rttFwB46cF3R1PgBT_FjMV6-r15xWdRgI25hXUHd2ePcbrexsS6_RhfBli2MfuoQfTw_rWbz8fLtZTGbLseOJry_Smc-AaG5TbwCx5VwWmSEBwfSpdwJH7jMvNNUaGUD84IGmiQilaB45ikfosWJ6xu7MdtYVDZ-mcYW5ig0cW36bwpXggHKGRCmqGVchMBS0NzZECRTPEhNetbtiZXb8h9qPl2ag0a4oITIZM96793J62LTthHC7wIl5piHOeZhDnnwb0Ovf3E |
| ContentType | Journal Article |
| Copyright | Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | AAYXX CITATION 1XC BXJBU IHQJB VOOES DOA |
| DOI | 10.46298/jdmdh.8646 |
| DatabaseName | CrossRef Hyper Article en Ligne (HAL) HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société (Open Access) Hyper Article en Ligne (HAL) (Open Access) DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2416-5999 |
| ExternalDocumentID | oai_doaj_org_article_e132e0261a234ff29e83caff5263f580 oai:HAL:hal-03410057v2 10_46298_jdmdh_8646 |
| GroupedDBID | 5VS AAFWJ AAYXX ADBBV ADQAK AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION FRP GROUPED_DOAJ KQ8 M~E OK1 1XC BXJBU IHQJB VOOES |
| ID | FETCH-LOGICAL-c1736-58bd7e483a7d6ec364c84b03fce5c93c4df35bdc81486af2d41f177495e63bd13 |
| IEDL.DBID | DOA |
| ISSN | 2416-5999 |
| IngestDate | Fri Oct 03 12:40:44 EDT 2025 Tue Oct 14 20:46:44 EDT 2025 Sat Nov 29 04:10:29 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | digital diplomatics medieval digital studies linear text segmentation Latin NLP medieval charters automatic structure detection |
| Language | English |
| License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1736-58bd7e483a7d6ec364c84b03fce5c93c4df35bdc81486af2d41f177495e63bd13 |
| ORCID | 0000-0002-1801-3147 0000-0002-2452-8868 0000-0001-7181-4835 |
| OpenAccessLink | https://doaj.org/article/e132e0261a234ff29e83caff5263f580 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_e132e0261a234ff29e83caff5263f580 hal_primary_oai_HAL_hal_03410057v2 crossref_primary_10_46298_jdmdh_8646 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-10-30 |
| PublicationDateYYYYMMDD | 2022-10-30 |
| PublicationDate_xml | – month: 10 year: 2022 text: 2022-10-30 day: 30 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of data mining and digital humanities |
| PublicationYear | 2022 |
| Publisher | INRIA Nicolas Turenne |
| Publisher_xml | – name: INRIA – name: Nicolas Turenne |
| SSID | ssj0001548555 |
| Score | 2.1995566 |
| Snippet | This paper presents a model aiming to automatically detect sections in medieval Latin charters. These legal sources are some of the most important sources for... |
| SourceID | doaj hal crossref |
| SourceType | Open Website Open Access Repository Index Database |
| SubjectTerms | [info.info-lg]computer science [cs]/machine learning [cs.lg] [info.info-tt]computer science [cs]/document and text processing [shs.hist]humanities and social sciences/history automatic structure detection Computer Science digital diplomatics Document and Text Processing History Humanities and Social Sciences latin nlp linear text segmentation Machine Learning medieval charters medieval digital studies |
| Title | Automatic medieval charters structure detection : A Bi-LSTM linear segmentation approach |
| URI | https://hal.science/hal-03410057 https://doaj.org/article/e132e0261a234ff29e83caff5263f580 |
| Volume | 2022 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: DOA dateStart: 20140101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources (ISSN International Center) customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: M~E dateStart: 20140101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV27TsMwFLVQxcDCG1FeshBraBM_YrOlqKhDQUg81C3yE4rUgErbkW_H10lRmVhYMliRE53r6N7j3HOM0AXTWnJlTWJSFQiKz1USinqWBCoCHW0hg8at7OdhfncnRiN5v3LUF_SE1fbANXAdF-iSA6KgMkK9z6QTxCjvWcaJZyKy9TDpCpmq9cFgesJqQR7lmRSdNzuxr5eCQ6m7koKiU39ILK_LjdSYWG620WZTEeKifpMdtOaqXbS1PG0BNx_fHhoV89l7NFjF8EMcbLoxaKbAHhPXNrDzqcPWzWJ3VYWvcIF742T48HiLoZhUU_zpXiaN2KjCSzvxffR003-8HiTNuQgB0JzwhAltc0cFUbnlzhBOjaC6S7xxzEhiqPWEaWtEoDpc-czS1KehzJPMcaJtSg5Qq3qv3CHC1kjPJZFcW0qZTXWutLCpIoxb6p1vo4slVOVHbX9RBtoQES0joiUg2kY9gPHnFvCsjgMhkmUTyfKvSLbReQjCrzkGxbCEsW7ItSCbXWRH__GkY7SRgYoBUlD3BLVCiNwpWjeL2fhzehZXUrjefvW_AUCBz1A |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automatic+medieval+charters+structure+detection+%3A+A+Bi-LSTM+linear+segmentation+approach&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Sergio+Torres+Aguilar&rft.au=Pierre+Chastang&rft.au=Xavier+Tannier&rft.date=2022-10-30&rft.pub=Nicolas+Turenne&rft.eissn=2416-5999&rft.volume=2022&rft_id=info:doi/10.46298%2Fjdmdh.8646&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_e132e0261a234ff29e83caff5263f580 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon |