CLS INFRA: Leveraging Computational Literary Methods

Uloženo v:
Podrobná bibliografie
Název: CLS INFRA: Leveraging Computational Literary Methods
Autoři: Hoover, Sarah
Informace o vydavateli: Zenodo, 2025.
Rok vydání: 2025
Témata: Literature studies, Computational Literary Studies, Digital humanities
Popis: The EU Horizon 2020-funded Computational Literary Studies Infrastructure (CLS INFRA) is nearing the conclusion of a transformative four-year initiative to develop shared, sustainable infrastructures for studying Europe’s multilingual literary heritage. Aligned with FAIR and CARE principles, CLS INFRA advances tools, data, and methodologies to revolutionize digital literary analysis, fostering accessibility for both academic and non-academic audiences. Building on high-quality corpora like DraCor and ELTeC, and tools such as TXM and multilingual NLP pipelines, the project bridges disciplinary and user divides. **Key Outputs:** - **Transformation Toolbox and CLSCor Metadata Catalogue:** CLSCor integrates tools and workflows for FAIR digital literary practices, enabling seamless data ingestion and processing within CLS hosting nodes. Metadata is sourced from prior CLS outputs, including NLP and annotation reports. - **User Needs Beyond Academia:** A two-year exploration of CLS applications outside academia resulted in the report *User Needs Beyond Academic Research*, identifying potential uses in art, journalism, policymaking, and more. An accompanying infographic campaign highlights these findings. - **Prototypes and Pipelines:** Demonstrations of Named Entity Recognition, Relational Extraction, and Aspect-Based Sentiment Analysis employ tools like Jupyter notebooks, showcasing machine learning applications for multilingual text analysis, relational data extraction, and fine-grained sentiment evaluation. - **DraCor and Programmable Corpora:** The Dramatic Corpora Project (DraCor) supports reproducible and programmable digital research across 20+ corpora in over 15 languages, addressing integration, reuse, and non-consumptive data access. - **Training and Internationalisation:** CLS INFRA delivered three training schools, supported 52 Transnational Access Fellowships, and provided insights into skill gaps and mentorship needs through comprehensive surveys and reports. This poster showcases CLS INFRA’s interdisciplinary tools, workflows, and strategies, demonstrating their potential for future digital literary research and diverse applications beyond academia. BIBLIOGRAPHY: · Birkholz, Julie M., Silvie Cinková, Matthieu Decorde, Serge Heiden, Maarten Janssen, Michal Křen, Alvaro Perez Pozo, Victor Diego Fresno Fernandez, and Salvador Ros. “CLS INFRA D8.2 Report and Prototypes for Annotation as Enrichment,” February 28, 2024. https://doi.org/10.5281/zenodo.11093999. · Carroll, Stephanie Russo, Ibrahim Garba, Oscar L. Figueroa-Rodríguez, Jarita Holbrook, Raymond Lovett, Simeon Materechera, Mark Parsons, et al. 2020. “The CARE Principles for Indigenous Data Governance.” Data Science Journal 19 (November): 43. https://doi.org/10.5334/dsj-2020-043. · Cinková, Silvie, Julie M. Birkholz, Ingo Börner, Tess Dejaeghere, Serge Heiden, Maarten Janssen, Michal Křen, Salvador Ros and Alvaro Perez Pozo. “CLS INFRA D8.1 Report of the Tools for the Basic Natural Language Processing (NLP) Tasks in the CLS Context,” March 9, 2023. https://doi.org/10.5281/ZENODO.7951059. · Eder, Maciej, Jan Rybicki, and Mike Kestemont. 2016. “Stylometry with R: A Package for Computational Text Analysis.” The R Journal 8 (1): 107. https://doi.org/10.32614/RJ-2016-007. · Edmond, Jennifer, and Vera Yakupova. “CLS INFRA D3.5 User Needs beyond Academic Research for Computational Literary Analysis,” August 30, 2024. https://doi.org/10.5281/ZENODO.13605872. · Fischer, Frank, Ingo Börner, Mathias Göbel, Angelika Hechtl, Christopher Kittel, Carsten Milling, and Peer Trilcke. 2019. “Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama.” In DH2019: »Complexities«. 9–12 July 2019. Book of Abstracts. Utrecht: Utrecht University. https://doi.org/10.5281/ZENODO.4284002. · Heiden, Serge. 2010. “The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme.” In 24th Pacific Asia Conference on Language, Information and Computation, edited by Ryo Otoguro, Kei Yoshimoto, Kiyoshi Ishikawa, Hiroshi Umemoto, and Yasunari Harada, 389–98. Sendai, Japan: Institute for Digital Enhancement of Cognitive Development, Waseda University. https://halshs.archives-ouvertes.fr/halshs-00549764. · Odebrecht, Carolin, Lou Burnard, and Christof Schöch. 2020. “European Literary Text Collection (ELTeC).” COST Action Distant Reading for European Literary History (CA16204) 10. · van Rossum, Lisanne M., and Artjoms Šeļa. “CLS INFRA D4.1 Skills Gap Analysis,” February 28, 2022. https://doi.org/10.5281/ZENODO.6401857. · Trilcke, Peer, and Ingo Börner. “CLS INFRA D7.3 On Versioning Living and Programmable Corpora (Executable) Report and Prototypes for Reproducible Research,” February 27, 2024. · Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 160018. https://doi.org/10.1038/sdata.2016.18. · Wilkinson, Mark D., Michel Dumontier, Ijsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2019. “Addendum: The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 6 (1): 6. https://doi.org/10.1038/s41597-019-0009-6.
Druh dokumentu: Conference object
Jazyk: English
DOI: 10.5281/zenodo.15311787
DOI: 10.5281/zenodo.15311786
Rights: CC BY
Přístupové číslo: edsair.doi.dedup.....0d4c2f993c4ec9dfebb4b0d217cb4707
Databáze: OpenAIRE
Popis
Abstrakt:The EU Horizon 2020-funded Computational Literary Studies Infrastructure (CLS INFRA) is nearing the conclusion of a transformative four-year initiative to develop shared, sustainable infrastructures for studying Europe’s multilingual literary heritage. Aligned with FAIR and CARE principles, CLS INFRA advances tools, data, and methodologies to revolutionize digital literary analysis, fostering accessibility for both academic and non-academic audiences. Building on high-quality corpora like DraCor and ELTeC, and tools such as TXM and multilingual NLP pipelines, the project bridges disciplinary and user divides. **Key Outputs:** - **Transformation Toolbox and CLSCor Metadata Catalogue:** CLSCor integrates tools and workflows for FAIR digital literary practices, enabling seamless data ingestion and processing within CLS hosting nodes. Metadata is sourced from prior CLS outputs, including NLP and annotation reports. - **User Needs Beyond Academia:** A two-year exploration of CLS applications outside academia resulted in the report *User Needs Beyond Academic Research*, identifying potential uses in art, journalism, policymaking, and more. An accompanying infographic campaign highlights these findings. - **Prototypes and Pipelines:** Demonstrations of Named Entity Recognition, Relational Extraction, and Aspect-Based Sentiment Analysis employ tools like Jupyter notebooks, showcasing machine learning applications for multilingual text analysis, relational data extraction, and fine-grained sentiment evaluation. - **DraCor and Programmable Corpora:** The Dramatic Corpora Project (DraCor) supports reproducible and programmable digital research across 20+ corpora in over 15 languages, addressing integration, reuse, and non-consumptive data access. - **Training and Internationalisation:** CLS INFRA delivered three training schools, supported 52 Transnational Access Fellowships, and provided insights into skill gaps and mentorship needs through comprehensive surveys and reports. This poster showcases CLS INFRA’s interdisciplinary tools, workflows, and strategies, demonstrating their potential for future digital literary research and diverse applications beyond academia. BIBLIOGRAPHY: · Birkholz, Julie M., Silvie Cinková, Matthieu Decorde, Serge Heiden, Maarten Janssen, Michal Křen, Alvaro Perez Pozo, Victor Diego Fresno Fernandez, and Salvador Ros. “CLS INFRA D8.2 Report and Prototypes for Annotation as Enrichment,” February 28, 2024. https://doi.org/10.5281/zenodo.11093999. · Carroll, Stephanie Russo, Ibrahim Garba, Oscar L. Figueroa-Rodríguez, Jarita Holbrook, Raymond Lovett, Simeon Materechera, Mark Parsons, et al. 2020. “The CARE Principles for Indigenous Data Governance.” Data Science Journal 19 (November): 43. https://doi.org/10.5334/dsj-2020-043. · Cinková, Silvie, Julie M. Birkholz, Ingo Börner, Tess Dejaeghere, Serge Heiden, Maarten Janssen, Michal Křen, Salvador Ros and Alvaro Perez Pozo. “CLS INFRA D8.1 Report of the Tools for the Basic Natural Language Processing (NLP) Tasks in the CLS Context,” March 9, 2023. https://doi.org/10.5281/ZENODO.7951059. · Eder, Maciej, Jan Rybicki, and Mike Kestemont. 2016. “Stylometry with R: A Package for Computational Text Analysis.” The R Journal 8 (1): 107. https://doi.org/10.32614/RJ-2016-007. · Edmond, Jennifer, and Vera Yakupova. “CLS INFRA D3.5 User Needs beyond Academic Research for Computational Literary Analysis,” August 30, 2024. https://doi.org/10.5281/ZENODO.13605872. · Fischer, Frank, Ingo Börner, Mathias Göbel, Angelika Hechtl, Christopher Kittel, Carsten Milling, and Peer Trilcke. 2019. “Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama.” In DH2019: »Complexities«. 9–12 July 2019. Book of Abstracts. Utrecht: Utrecht University. https://doi.org/10.5281/ZENODO.4284002. · Heiden, Serge. 2010. “The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme.” In 24th Pacific Asia Conference on Language, Information and Computation, edited by Ryo Otoguro, Kei Yoshimoto, Kiyoshi Ishikawa, Hiroshi Umemoto, and Yasunari Harada, 389–98. Sendai, Japan: Institute for Digital Enhancement of Cognitive Development, Waseda University. https://halshs.archives-ouvertes.fr/halshs-00549764. · Odebrecht, Carolin, Lou Burnard, and Christof Schöch. 2020. “European Literary Text Collection (ELTeC).” COST Action Distant Reading for European Literary History (CA16204) 10. · van Rossum, Lisanne M., and Artjoms Šeļa. “CLS INFRA D4.1 Skills Gap Analysis,” February 28, 2022. https://doi.org/10.5281/ZENODO.6401857. · Trilcke, Peer, and Ingo Börner. “CLS INFRA D7.3 On Versioning Living and Programmable Corpora (Executable) Report and Prototypes for Reproducible Research,” February 27, 2024. · Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 160018. https://doi.org/10.1038/sdata.2016.18. · Wilkinson, Mark D., Michel Dumontier, Ijsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2019. “Addendum: The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 6 (1): 6. https://doi.org/10.1038/s41597-019-0009-6.
DOI:10.5281/zenodo.15311787