Dynamic topic modelling for exploring the scientific literature on coronavirus: an unsupervised labelling technique

Uloženo v:
Podrobná bibliografie
Název: Dynamic topic modelling for exploring the scientific literature on coronavirus: an unsupervised labelling technique
Autoři: Guillén-Pacho, Ibai, Badenes-Olmedo, Carlos, Corcho, Oscar
Přispěvatelé: Universidad Politécnica de Madrid
Zdroj: International Journal of Data Science and Analytics ; volume 20, issue 3, page 2551-2581 ; ISSN 2364-415X 2364-4168
Informace o vydavateli: Springer Science and Business Media LLC
Rok vydání: 2024
Popis: The work presented in this article focusses on improving the interpretability of probabilistic topic models created from a large collection of scientific documents that evolve over time. Several time-dependent approaches based on topic models were compared to analyse the annual evolution of latent concepts in the CORD-19 corpus: Dynamic Topic Model, Dynamic Embedded Topic Model, and BERTopic. Then COVID-19 period (December 2019–present) has been analysed in greater depth, month by month, to explore the evolution of what is written about the disease. The evaluations suggest that the Dynamic Topic Model is the best choice to analyse the CORD-19 corpus. A novel topic labelling strategy is proposed for dynamic topic models to analyse the evolution of latent concepts. It incorporates content changes in both the annual evolution of the corpus and the monthly evolution of the COVID-19 disease. The generated labels are manually validated using two approaches: through the most relevant documents on the topic and through the documents that share the most semantically similar label topics. The labelling enables the interpretation of topics. The novel method for dynamic topic labelling fits the content of each topic and supports the semantics of the topics.
Druh dokumentu: article in journal/newspaper
Jazyk: English
DOI: 10.1007/s41060-024-00610-0
DOI: 10.1007/s41060-024-00610-0.pdf
DOI: 10.1007/s41060-024-00610-0/fulltext.html
Dostupnost: https://doi.org/10.1007/s41060-024-00610-0
https://link.springer.com/content/pdf/10.1007/s41060-024-00610-0.pdf
https://link.springer.com/article/10.1007/s41060-024-00610-0/fulltext.html
Rights: https://creativecommons.org/licenses/by/4.0 ; https://creativecommons.org/licenses/by/4.0
Přístupové číslo: edsbas.2D9F0C24
Databáze: BASE
Buďte první, kdo okomentuje tento záznam!
Nejprve se musíte přihlásit.