HistText: An Application for leveraging large-scale historical textbases
This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese sources. Developed in response to the challenges posed by the massive Modern China Textual Database, HistText emerges as a solution to efficiently e...
Uložené v:
| Vydané v: | Journal of data mining and digital humanities Ročník 2023; číslo Project presentations; s. 1 - 30 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
INRIA
10.11.2023
Nicolas Turenne |
| Predmet: | |
| ISSN: | 2416-5999, 2416-5999 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese sources. Developed in response to the challenges posed by the massive Modern China Textual Database, HistText emerges as a solution to efficiently extract and visualize valuable insights from billions of words spread across millions of documents. With a user-friendly interface, advanced text analysis techniques, and powerful data visualization capabilities, HistText offers a robust platform for digital humanities research. This paper explores the rationale behind HistText, underscores its key features, and provides a comprehensive guide for its effective utilization, thus highlighting its potential to substantially enhance the realm of computational humanities. |
|---|---|
| AbstractList | This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese sources. Developed in response to the challenges posed by the massive Modern China Textual Database, HistText emerges as a solution to efficiently extract and visualize valuable insights from billions of words spread across millions of documents. With a user-friendly interface, advanced text analysis techniques, and powerful data visualization capabilities, HistText offers a robust platform for digital humanities research. This paper explores the rationale behind HistText, underscores its key features, and provides a comprehensive guide for its effective utilization, thus highlighting its potential to substantially enhance the realm of computational humanities. |
| Author | Baptiste, Blouin Armand, Cécile Henriot, Christian |
| Author_xml | – sequence: 1 givenname: Blouin orcidid: 0000-0002-7171-3628 surname: Baptiste fullname: Baptiste, Blouin organization: Laboratoire d'Informatique et Systèmes – sequence: 2 givenname: Cécile orcidid: 0000-0002-9107-6443 surname: Armand fullname: Armand, Cécile organization: Institut de recherches Asiatiques – sequence: 3 givenname: Christian orcidid: 0000-0002-1488-4367 surname: Henriot fullname: Henriot, Christian organization: Institut de recherches Asiatiques |
| BackLink | https://shs.hal.science/halshs-04178820$$DView record in HAL |
| BookMark | eNptUdtKAzEQDaKg1r75AfsBVnPPrm-lqBUKvtTnMJtktynpbkmWon9v2HoFH4YZhnMZ5lyi067vHELXBN9ySavybmt3dnNLiBLyBF1QTuRMVFV1-ms-R9OUthhjIngphLhAy6VPw9q9DffFvCvm-33wBgbfd0XTxyK4g4vQ-q4tAsTWzZKB4IpN5vQxA0MxZGoNyaUrdNZASG762Sfo9fFhvVjOVi9Pz4v5amYozVc0wjJZK9HwyjCplBWCcCstY5YQyS0rnVRGsVqWFKQUdS4MpuGKV3VZUjZBz0dd28NW76PfQXzXPXg9LvrYaoiDN8FpMDUlhvGSOsydNCDrmldclA2jBlSTtW6OWhsIf6SW85XOu7RJGnOisi8-8AynR7iJfUrRNdr4YXzWEMEHTbAek9BjEnpM4sfjm_Rl9C_8A4PsjBw |
| CitedBy_id | crossref_primary_10_3390_informatics11020026 |
| ContentType | Journal Article |
| Copyright | Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | AAYXX CITATION 1XC BXJBU IHQJB VOOES DOA |
| DOI | 10.46298/jdmdh.11756 |
| DatabaseName | CrossRef Hyper Article en Ligne (HAL) HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société HAL-SHS: Archive ouverte en Sciences de l'Homme et de la Société (Open Access) Hyper Article en Ligne (HAL) (Open Access) DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2416-5999 |
| EndPage | 30 |
| ExternalDocumentID | oai_doaj_org_article_acb21c3482e04e6ca6bb49458f32ca7f oai:HAL:halshs-04178820v4 10_46298_jdmdh_11756 |
| GroupedDBID | 5VS AAFWJ AAYXX ADBBV ADQAK AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION FRP GROUPED_DOAJ KQ8 M~E OK1 1XC BXJBU IHQJB VOOES |
| ID | FETCH-LOGICAL-c2216-f5d36b75f49c3677d5514d6d33d1164d38e67c73b682a665b6650acf4749b8823 |
| IEDL.DBID | DOA |
| ISSN | 2416-5999 |
| IngestDate | Fri Oct 03 12:44:35 EDT 2025 Tue Oct 14 20:28:22 EDT 2025 Sat Nov 29 04:10:29 EST 2025 Tue Nov 18 21:44:20 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | Project presentations |
| Keywords | Chinese Text analysis history natural language processing data mining document |
| Language | English |
| License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c2216-f5d36b75f49c3677d5514d6d33d1164d38e67c73b682a665b6650acf4749b8823 |
| ORCID | 0000-0002-1488-4367 0000-0002-9107-6443 0000-0002-7171-3628 |
| OpenAccessLink | https://doaj.org/article/acb21c3482e04e6ca6bb49458f32ca7f |
| PageCount | 30 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_acb21c3482e04e6ca6bb49458f32ca7f hal_primary_oai_HAL_halshs_04178820v4 crossref_citationtrail_10_46298_jdmdh_11756 crossref_primary_10_46298_jdmdh_11756 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-11-10 |
| PublicationDateYYYYMMDD | 2023-11-10 |
| PublicationDate_xml | – month: 11 year: 2023 text: 2023-11-10 day: 10 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of data mining and digital humanities |
| PublicationYear | 2023 |
| Publisher | INRIA Nicolas Turenne |
| Publisher_xml | – name: INRIA – name: Nicolas Turenne |
| SSID | ssj0001548555 |
| Score | 2.2371724 |
| Snippet | This paper introduces HistText, a pioneering tool devised to facilitate large-scale data mining in historical documents, specifically targeting Chinese... |
| SourceID | doaj hal crossref |
| SourceType | Open Website Open Access Repository Enrichment Source Index Database |
| StartPage | 1 |
| SubjectTerms | [info]computer science [cs] [shs]humanities and social sciences chinese Computer Science data mining document history Humanities and Social Sciences natural language processing text analysis |
| Title | HistText: An Application for leveraging large-scale historical textbases |
| URI | https://shs.hal.science/halshs-04178820 https://doaj.org/article/acb21c3482e04e6ca6bb49458f32ca7f |
| Volume | 2023 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: DOA dateStart: 20140101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2416-5999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001548555 issn: 2416-5999 databaseCode: M~E dateStart: 20140101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV25TsQwELUQoqDhRtxyARWKSGzHTugCWrQFIAqQtot8RQEtAW2Akm9nxskuS4FoKOLCcuLojY-ZePIeIccsNTG3XkY-k3kkhIYphceFRkMJYybTrBObULe32WiU381JfWFOWEcP3AF3pq1hiUUKFh8LL62WxohcpFnFmdWqwtU3VvlcMNX9H4ykJ2mX6S4ky7OzJ_fsajyjRK3quT0oUPXDzlJPv6SGneVqjaz0LiEtuldZJwu-2SCrU7kF2s--TTJESo97WEzPadHQ4vvkmYLjScceBmWQHKJjzO6OWkDf03pGA0IxxwM3rXaLPFwN7i-HUS-EEFnGEhlVqePSqLQSueVSKYdujpOOc5dAuON45qWyihuZMS1lauCKta2EErkBF5pvk8XmpfE7hAKOseHKOGvAV-Nce7gtRW6iRKRGyF1yOoWmtD1LOIpVjEuIFgKQZQCyDEDukpNZ69eOHeOXdheI8qwNclqHCrB02Vu6_MvS0BnY6MczhsV1CXVt3ZaxSCCaZ_GH2PuPzvbJMgrLRyHh74Asvk3e_SFZsh9vj-3kKAw2KG8-B1-mhtpO |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=HistText%3A+An+Application+for+leveraging+large-scale+historical+textbases&rft.jtitle=Journal+of+data+mining+and+digital+humanities&rft.au=Baptiste%2C+Blouin&rft.au=Armand%2C+C%C3%A9cile&rft.au=Henriot%2C+Christian&rft.date=2023-11-10&rft.issn=2416-5999&rft.eissn=2416-5999&rft.volume=2023&rft.issue=Project+presentations&rft_id=info:doi/10.46298%2Fjdmdh.11756&rft.externalDBID=n%2Fa&rft.externalDocID=10_46298_jdmdh_11756 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2416-5999&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2416-5999&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2416-5999&client=summon |