NMF-based approach to automatic term extraction

•5 Different NMF algorithms with different parameters are compared.•Kullback-Leibler NMF requiring the stationarity of objective value is the best.•The performance of NMF algorithms depends on a corpus imbalance.•NMF outperforms 4 from 6 baseline methods and second only to deep learning methods. Thi...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Expert systems with applications Ročník 199; s. 117179
Hlavní autoři:	Nugumanova, Aliya, Akhmed-Zaki, Darkhan, Mansurova, Madina, Baiburin, Yerzhan, Maulit, Almasbek
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York Elsevier Ltd 01.08.2022 Elsevier BV
Témata:	ACTER dataset Algorithms Annotations Automatic term extraction Documents Domains NMF Probabilistic topic modeling TermEval shared task Unsupervised term extraction ACTER dataset Unsupervised term extraction Probabilistic topic modeling Automatic term extraction NMF TermEval shared task
ISSN:	0957-4174, 1873-6793
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	•5 Different NMF algorithms with different parameters are compared.•Kullback-Leibler NMF requiring the stationarity of objective value is the best.•The performance of NMF algorithms depends on a corpus imbalance.•NMF outperforms 4 from 6 baseline methods and second only to deep learning methods. This work describes automatic term extraction approach based on the combination of the probabilistic topic modelling (PTM) and non-negative matrix factorization (NMF). Topic modeling algorithms including NMF-based ones do not require expensive and time-consuming manual annotations for domain terms, but only a corpus of domain documents. The topics emerge from the corpus documents without any supervision as sets of most probable words. This work is aimed to investigate how fully and precisely these most probable words from topics can reflect domain terminology. We run a series of experiments on the novel, qualitatively annotated dataset ACTER that was first used in the TermEval 2020 Shared Task. We compare five different NMF algorithms and four different NMF initializations when changing the number of topics extracted from documents and the number of most probable words extracted from topics in order to determine optimal combinations for best performance of term extraction. Finally, we compare the obtained optimal combinations of NMF with the competitive methods in TermEval 2020 and prove that our approach is second only to two much more sophisticated, domain-dependent supervised methods.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2022.117179