FINK NLP: A Natural Language Processing Toolkit for Structured Analysis of Multilingual Interview Data
Gespeichert in:
| Titel: | FINK NLP: A Natural Language Processing Toolkit for Structured Analysis of Multilingual Interview Data |
|---|---|
| Autoren: | Spitale, Giovanni, orcid:0000-0002-6812- |
| Weitere Verfasser: | Germani, Federico |
| Verlagsinformationen: | Zenodo |
| Publikationsjahr: | 2025 |
| Bestand: | Zenodo |
| Schlagwörter: | nlp, Natural Language Processing |
| Beschreibung: | FINK NLP is a modular Jupyter-based pipeline designed for the structured extraction, organization, and analysis of multilingual interview transcripts stored as .docx files. It performs metadata parsing from filenames, text ingestion using textract, and corpus structuring into a DataFrame. The notebook supports selective subsetting by language, module, category, or expression. It integrates spaCy for lemmatization, gensim for topic modeling (LDA), and multiple Python visualization libraries (matplotlib, seaborn, wordcloud, pyLDAvis) to facilitate qualitative and quantitative content analysis. This repository includes the output tabular data (redacted for data protection) and the visualization outputs. |
| Publikationsart: | other/unknown material |
| Sprache: | unknown |
| Relation: | https://zenodo.org/records/15394889; oai:zenodo.org:15394889; https://doi.org/10.5281/zenodo.15394889 |
| DOI: | 10.5281/zenodo.15394889 |
| Verfügbarkeit: | https://doi.org/10.5281/zenodo.15394889 https://zenodo.org/records/15394889 |
| Rights: | Creative Commons Attribution 4.0 International ; cc-by-4.0 ; https://creativecommons.org/licenses/by/4.0/legalcode |
| Dokumentencode: | edsbas.BC14446F |
| Datenbank: | BASE |
| Abstract: | FINK NLP is a modular Jupyter-based pipeline designed for the structured extraction, organization, and analysis of multilingual interview transcripts stored as .docx files. It performs metadata parsing from filenames, text ingestion using textract, and corpus structuring into a DataFrame. The notebook supports selective subsetting by language, module, category, or expression. It integrates spaCy for lemmatization, gensim for topic modeling (LDA), and multiple Python visualization libraries (matplotlib, seaborn, wordcloud, pyLDAvis) to facilitate qualitative and quantitative content analysis. This repository includes the output tabular data (redacted for data protection) and the visualization outputs. |
|---|---|
| DOI: | 10.5281/zenodo.15394889 |
Nájsť tento článok vo Web of Science