Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph

Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this wo...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of biomedical semantics Ročník 14; číslo 1; s. 18 - 19
Hlavní autoři:	Rabby, Gollam, D’Souza, Jennifer, Oelen, Allard, Dvorackova, Lucie, Svátek, Vojtěch, Auer, Sören
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	London BioMed Central 28.11.2023 BioMed Central Ltd Springer Nature B.V BMC
Témata:	Accuracy Algorithms Analysis Bioinformatics Classification Combinatorial Libraries Computational Biology/Bioinformatics Computational linguistics Computer Appl. in Life Sciences COVID-19 Data mining Data Mining and Knowledge Discovery Datasets Documents Domain-independent knowledge graph Embedding Experiments Humans Influential scholarly document prediction Knowledge Knowledge representation Language Language processing Learning algorithms Machine Learning Machine learning algorithms Mathematics Mathematics and Statistics Medical research Medicine, Experimental Methods Natural language interfaces Pattern Recognition, Automated Predictions Scholarly periodicals Semantics State-of-the-art reviews Text mining World health organization Germany Text mining COVID-19 Influential scholarly document prediction Machine learning algorithms World health organization Domain-independent knowledge graph
ISSN:	2041-1480, 2041-1480
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2041-1480 2041-1480
DOI:	10.1186/s13326-023-00298-4