NMF-based approach to automatic term extraction

•5 Different NMF algorithms with different parameters are compared.•Kullback-Leibler NMF requiring the stationarity of objective value is the best.•The performance of NMF algorithms depends on a corpus imbalance.•NMF outperforms 4 from 6 baseline methods and second only to deep learning methods. Thi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications Jg. 199; S. 117179
Hauptverfasser:	Nugumanova, Aliya, Akhmed-Zaki, Darkhan, Mansurova, Madina, Baiburin, Yerzhan, Maulit, Almasbek
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York Elsevier Ltd 01.08.2022 Elsevier BV
Schlagworte:	ACTER dataset Algorithms Annotations Automatic term extraction Documents Domains NMF Probabilistic topic modeling TermEval shared task Unsupervised term extraction ACTER dataset Unsupervised term extraction Probabilistic topic modeling Automatic term extraction NMF TermEval shared task
ISSN:	0957-4174, 1873-6793
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•5 Different NMF algorithms with different parameters are compared.•Kullback-Leibler NMF requiring the stationarity of objective value is the best.•The performance of NMF algorithms depends on a corpus imbalance.•NMF outperforms 4 from 6 baseline methods and second only to deep learning methods. This work describes automatic term extraction approach based on the combination of the probabilistic topic modelling (PTM) and non-negative matrix factorization (NMF). Topic modeling algorithms including NMF-based ones do not require expensive and time-consuming manual annotations for domain terms, but only a corpus of domain documents. The topics emerge from the corpus documents without any supervision as sets of most probable words. This work is aimed to investigate how fully and precisely these most probable words from topics can reflect domain terminology. We run a series of experiments on the novel, qualitatively annotated dataset ACTER that was first used in the TermEval 2020 Shared Task. We compare five different NMF algorithms and four different NMF initializations when changing the number of topics extracted from documents and the number of most probable words extracted from topics in order to determine optimal combinations for best performance of term extraction. Finally, we compare the obtained optimal combinations of NMF with the competitive methods in TermEval 2020 and prove that our approach is second only to two much more sophisticated, domain-dependent supervised methods.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2022.117179