HistText: An Application for leveraging large-scale historical textbases

This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese sources. Developed in response to the challenges posed by the massive Modern China Textual Database, HistText emerges as a solution to efficiently e...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of data mining and digital humanities Ročník 2023; číslo Project presentations; s. 1 - 30
Hlavní autori: Baptiste, Blouin, Armand, Cécile, Henriot, Christian
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: INRIA 10.11.2023
Nicolas Turenne
Predmet:
ISSN:2416-5999, 2416-5999
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese sources. Developed in response to the challenges posed by the massive Modern China Textual Database, HistText emerges as a solution to efficiently extract and visualize valuable insights from billions of words spread across millions of documents. With a user-friendly interface, advanced text analysis techniques, and powerful data visualization capabilities, HistText offers a robust platform for digital humanities research. This paper explores the rationale behind HistText, underscores its key features, and provides a comprehensive guide for its effective utilization, thus highlighting its potential to substantially enhance the realm of computational humanities.
AbstractList This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese sources. Developed in response to the challenges posed by the massive Modern China Textual Database, HistText emerges as a solution to efficiently extract and visualize valuable insights from billions of words spread across millions of documents. With a user-friendly interface, advanced text analysis techniques, and powerful data visualization capabilities, HistText offers a robust platform for digital humanities research. This paper explores the rationale behind HistText, underscores its key features, and provides a comprehensive guide for its effective utilization, thus highlighting its potential to substantially enhance the realm of computational humanities.
Author Baptiste, Blouin
Armand, Cécile
Henriot, Christian
Author_xml – sequence: 1
  givenname: Blouin
  orcidid: 0000-0002-7171-3628
  surname: Baptiste
  fullname: Baptiste, Blouin
  organization: Laboratoire d'Informatique et Systèmes
– sequence: 2
  givenname: Cécile
  orcidid: 0000-0002-9107-6443
  surname: Armand
  fullname: Armand, Cécile
  organization: Institut de recherches Asiatiques
– sequence: 3
  givenname: Christian
  orcidid: 0000-0002-1488-4367
  surname: Henriot
  fullname: Henriot, Christian
  organization: Institut de recherches Asiatiques
BackLink https://shs.hal.science/halshs-04178820$$DView record in HAL
BookMark eNptUdtKAzEQDaKg1r75AfsBVnPPrm-lqBUKvtTnMJtktynpbkmWon9v2HoFH4YZhnMZ5lyi067vHELXBN9ySavybmt3dnNLiBLyBF1QTuRMVFV1-ms-R9OUthhjIngphLhAy6VPw9q9DffFvCvm-33wBgbfd0XTxyK4g4vQ-q4tAsTWzZKB4IpN5vQxA0MxZGoNyaUrdNZASG762Sfo9fFhvVjOVi9Pz4v5amYozVc0wjJZK9HwyjCplBWCcCstY5YQyS0rnVRGsVqWFKQUdS4MpuGKV3VZUjZBz0dd28NW76PfQXzXPXg9LvrYaoiDN8FpMDUlhvGSOsydNCDrmldclA2jBlSTtW6OWhsIf6SW85XOu7RJGnOisi8-8AynR7iJfUrRNdr4YXzWEMEHTbAek9BjEnpM4sfjm_Rl9C_8A4PsjBw
CitedBy_id crossref_primary_10_3390_informatics11020026
ContentType Journal Article
Copyright Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
1XC
BXJBU
IHQJB
VOOES
DOA
DOI 10.46298/jdmdh.11756
DatabaseName CrossRef
Hyper Article en Ligne (HAL)
HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société
HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société (Open Access)
Hyper Article en Ligne (HAL) (Open Access)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2416-5999
EndPage 30
ExternalDocumentID oai_doaj_org_article_acb21c3482e04e6ca6bb49458f32ca7f
oai:HAL:halshs-04178820v4
10_46298_jdmdh_11756
GroupedDBID 5VS
AAFWJ
AAYXX
ADBBV
ADQAK
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
FRP
GROUPED_DOAJ
KQ8
M~E
OK1
1XC
BXJBU
IHQJB
VOOES
ID FETCH-LOGICAL-c2216-f5d36b75f49c3677d5514d6d33d1164d38e67c73b682a665b6650acf4749b8823
IEDL.DBID DOA
ISSN 2416-5999
IngestDate Fri Oct 03 12:44:35 EDT 2025
Tue Oct 14 20:28:22 EDT 2025
Sat Nov 29 04:10:29 EST 2025
Tue Nov 18 21:44:20 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue Project presentations
Keywords Chinese
Text analysis
history
natural language processing
data mining
document
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2216-f5d36b75f49c3677d5514d6d33d1164d38e67c73b682a665b6650acf4749b8823
ORCID 0000-0002-1488-4367
0000-0002-9107-6443
0000-0002-7171-3628
OpenAccessLink https://doaj.org/article/acb21c3482e04e6ca6bb49458f32ca7f
PageCount 30
ParticipantIDs doaj_primary_oai_doaj_org_article_acb21c3482e04e6ca6bb49458f32ca7f
hal_primary_oai_HAL_halshs_04178820v4
crossref_citationtrail_10_46298_jdmdh_11756
crossref_primary_10_46298_jdmdh_11756
PublicationCentury 2000
PublicationDate 2023-11-10
PublicationDateYYYYMMDD 2023-11-10
PublicationDate_xml – month: 11
  year: 2023
  text: 2023-11-10
  day: 10
PublicationDecade 2020
PublicationTitle Journal of data mining and digital humanities
PublicationYear 2023
Publisher INRIA
Nicolas Turenne
Publisher_xml – name: INRIA
– name: Nicolas Turenne
SSID ssj0001548555
Score 2.2371724
Snippet This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese...
SourceID doaj
hal
crossref
SourceType Open Website
Open Access Repository
Enrichment Source
Index Database
StartPage 1
SubjectTerms [info]computer science [cs]
[shs]humanities and social sciences
chinese
Computer Science
data mining
document
history
Humanities and Social Sciences
natural language processing
text analysis
Title HistText: An Application for leveraging large-scale historical textbases
URI https://shs.hal.science/halshs-04178820
https://doaj.org/article/acb21c3482e04e6ca6bb49458f32ca7f
Volume 2023
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: DOA
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2416-5999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001548555
  issn: 2416-5999
  databaseCode: M~E
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV25TsQwELUQoqDhRtxyARWKSGzHTugCWrQFIAqQtot8RQEtAW2Akm9nxskuS4FoKOLCcuLojY-ZePIeIccsNTG3XkY-k3kkhIYphceFRkMJYybTrBObULe32WiU381JfWFOWEcP3AF3pq1hiUUKFh8LL62WxohcpFnFmdWqwtU3VvlcMNX9H4ykJ2mX6S4ky7OzJ_fsajyjRK3quT0oUPXDzlJPv6SGneVqjaz0LiEtuldZJwu-2SCrU7kF2s--TTJESo97WEzPadHQ4vvkmYLjScceBmWQHKJjzO6OWkDf03pGA0IxxwM3rXaLPFwN7i-HUS-EEFnGEhlVqePSqLQSueVSKYdujpOOc5dAuON45qWyihuZMS1lauCKta2EErkBF5pvk8XmpfE7hAKOseHKOGvAV-Nce7gtRW6iRKRGyF1yOoWmtD1LOIpVjEuIFgKQZQCyDEDukpNZ69eOHeOXdheI8qwNclqHCrB02Vu6_MvS0BnY6MczhsV1CXVt3ZaxSCCaZ_GH2PuPzvbJMgrLRyHh74Asvk3e_SFZsh9vj-3kKAw2KG8-B1-mhtpO
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=HistText%3A+An+Application+for+leveraging+large-scale+historical+textbases&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Baptiste%2C+Blouin&rft.au=Armand%2C+C%C3%A9cile&rft.au=Henriot%2C+Christian&rft.date=2023-11-10&rft.issn=2416-5999&rft.eissn=2416-5999&rft.volume=2023&rft.issue=Project+presentations&rft_id=info:doi/10.46298%2Fjdmdh.11756&rft.externalDBID=n%2Fa&rft.externalDocID=10_46298_jdmdh_11756
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon