HistText: An Application for leveraging large-scale historical textbases

This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese sources. Developed in response to the challenges posed by the massive Modern China Textual Database, HistText emerges as a solution to efficiently e...

Full description

Saved in:
Bibliographic Details
Published in:Journal of data mining and digital humanities Vol. 2023; no. Project presentations; pp. 1 - 30
Main Authors: Baptiste, Blouin, Armand, Cécile, Henriot, Christian
Format: Journal Article
Language:English
Published: INRIA 10.11.2023
Nicolas Turenne
Subjects:
ISSN:2416-5999, 2416-5999
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese sources. Developed in response to the challenges posed by the massive Modern China Textual Database, HistText emerges as a solution to efficiently extract and visualize valuable insights from billions of words spread across millions of documents. With a user-friendly interface, advanced text analysis techniques, and powerful data visualization capabilities, HistText offers a robust platform for digital humanities research. This paper explores the rationale behind HistText, underscores its key features, and provides a comprehensive guide for its effective utilization, thus highlighting its potential to substantially enhance the realm of computational humanities.
AbstractList This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese sources. Developed in response to the challenges posed by the massive Modern China Textual Database, HistText emerges as a solution to efficiently extract and visualize valuable insights from billions of words spread across millions of documents. With a user-friendly interface, advanced text analysis techniques, and powerful data visualization capabilities, HistText offers a robust platform for digital humanities research. This paper explores the rationale behind HistText, underscores its key features, and provides a comprehensive guide for its effective utilization, thus highlighting its potential to substantially enhance the realm of computational humanities.
Author Baptiste, Blouin
Armand, Cécile
Henriot, Christian
Author_xml – sequence: 1
  givenname: Blouin
  orcidid: 0000-0002-7171-3628
  surname: Baptiste
  fullname: Baptiste, Blouin
  organization: Laboratoire d'Informatique et Systèmes
– sequence: 2
  givenname: Cécile
  orcidid: 0000-0002-9107-6443
  surname: Armand
  fullname: Armand, Cécile
  organization: Institut de recherches Asiatiques
– sequence: 3
  givenname: Christian
  orcidid: 0000-0002-1488-4367
  surname: Henriot
  fullname: Henriot, Christian
  organization: Institut de recherches Asiatiques
BackLink https://shs.hal.science/halshs-04178820$$DView record in HAL
BookMark eNptUdtKAzEQDaKg1r75AfsBVnPPrm-lqBUKvtTnMJtktynpbkmWon9v2HoFH4YZhnMZ5lyi067vHELXBN9ySavybmt3dnNLiBLyBF1QTuRMVFV1-ms-R9OUthhjIngphLhAy6VPw9q9DffFvCvm-33wBgbfd0XTxyK4g4vQ-q4tAsTWzZKB4IpN5vQxA0MxZGoNyaUrdNZASG762Sfo9fFhvVjOVi9Pz4v5amYozVc0wjJZK9HwyjCplBWCcCstY5YQyS0rnVRGsVqWFKQUdS4MpuGKV3VZUjZBz0dd28NW76PfQXzXPXg9LvrYaoiDN8FpMDUlhvGSOsydNCDrmldclA2jBlSTtW6OWhsIf6SW85XOu7RJGnOisi8-8AynR7iJfUrRNdr4YXzWEMEHTbAek9BjEnpM4sfjm_Rl9C_8A4PsjBw
CitedBy_id crossref_primary_10_3390_informatics11020026
ContentType Journal Article
Copyright Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
1XC
BXJBU
IHQJB
VOOES
DOA
DOI 10.46298/jdmdh.11756
DatabaseName CrossRef
Hyper Article en Ligne (HAL)
HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société
HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société (Open Access)
Hyper Article en Ligne (HAL) (Open Access)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2416-5999
EndPage 30
ExternalDocumentID oai_doaj_org_article_acb21c3482e04e6ca6bb49458f32ca7f
oai:HAL:halshs-04178820v4
10_46298_jdmdh_11756
GroupedDBID 5VS
AAFWJ
AAYXX
ADBBV
ADQAK
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
FRP
GROUPED_DOAJ
KQ8
M~E
OK1
1XC
BXJBU
IHQJB
VOOES
ID FETCH-LOGICAL-c2216-f5d36b75f49c3677d5514d6d33d1164d38e67c73b682a665b6650acf4749b8823
IEDL.DBID DOA
ISSN 2416-5999
IngestDate Fri Oct 03 12:44:35 EDT 2025
Tue Oct 14 20:28:22 EDT 2025
Sat Nov 29 04:10:29 EST 2025
Tue Nov 18 21:44:20 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue Project presentations
Keywords Chinese
Text analysis
history
natural language processing
data mining
document
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2216-f5d36b75f49c3677d5514d6d33d1164d38e67c73b682a665b6650acf4749b8823
ORCID 0000-0002-1488-4367
0000-0002-9107-6443
0000-0002-7171-3628
OpenAccessLink https://doaj.org/article/acb21c3482e04e6ca6bb49458f32ca7f
PageCount 30
ParticipantIDs doaj_primary_oai_doaj_org_article_acb21c3482e04e6ca6bb49458f32ca7f
hal_primary_oai_HAL_halshs_04178820v4
crossref_citationtrail_10_46298_jdmdh_11756
crossref_primary_10_46298_jdmdh_11756
PublicationCentury 2000
PublicationDate 2023-11-10
PublicationDateYYYYMMDD 2023-11-10
PublicationDate_xml – month: 11
  year: 2023
  text: 2023-11-10
  day: 10
PublicationDecade 2020
PublicationTitle Journal of data mining and digital humanities
PublicationYear 2023
Publisher INRIA
Nicolas Turenne
Publisher_xml – name: INRIA
– name: Nicolas Turenne
SSID ssj0001548555
Score 2.2371724
Snippet This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese...
SourceID doaj
hal
crossref
SourceType Open Website
Open Access Repository
Enrichment Source
Index Database
StartPage 1
SubjectTerms [info]computer science [cs]
[shs]humanities and social sciences
chinese
Computer Science
data mining
document
history
Humanities and Social Sciences
natural language processing
text analysis
Title HistText: An Application for leveraging large-scale historical textbases
URI https://shs.hal.science/halshs-04178820
https://doaj.org/article/acb21c3482e04e6ca6bb49458f32ca7f
Volume 2023
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: DOA
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: M~E
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQxcDCG1Fe8gATiprEjh2zBdSqA1QMReoW-amCSkFN6chv5-ykpQyIhSEZLCuOvnN8d_bl-xC6hKDbSKJYpHluI-ogZ1WG2UhJzYnKNfgYGsQm-GCQj0bicU3qy9eE1fTANXAdqVWaaE_BYmNqmZZMKSpoljuSasmdX31jLtaSqfr_YE96ktWV7pSlIu-8mFcz9meUXqt6zQcFqn7wLOPlTmrwLL1dtN2EhLioX2UPbdjpPtpZyi3g5us7QH1P6TGExfQGF1NcfJ88Ywg88cTCpAySQ3jiq7ujCtC3eLyiAcG-xsM7reoQPfW6w7t-1AghRDpNExa5zBCmeOao0IRxbnyYY5ghxCSQ7hiSW8Y9tixPJWOZgiuW2lFOhYIQmhyh1vRtao8RtlJkgilwW5JSnUCyAEuWM84ZoTiNVRtdL6EpdcMS7sUqJiVkCwHIMgBZBiDb6GrV-71mx_il361HedXHc1qHBrB02Vi6_MvSMBjY6Mcz-sV9CW3VuCpjmkA2n8YLevIfg52iLS8sH4WCvzPUms8-7Dna1Iv5czW7CJMN7g-f3S84btu3
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=HistText%3A+An+Application+for+leveraging+large-scale+historical+textbases&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Blouin+Baptiste&rft.au=C%C3%A9cile+Armand&rft.au=Christian+Henriot&rft.date=2023-11-10&rft.pub=Nicolas+Turenne&rft.eissn=2416-5999&rft.volume=2023&rft.issue=Project+presentations&rft_id=info:doi/10.46298%2Fjdmdh.11756&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_acb21c3482e04e6ca6bb49458f32ca7f
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon