NMF-based approach to automatic term extraction
•5 Different NMF algorithms with different parameters are compared.•Kullback-Leibler NMF requiring the stationarity of objective value is the best.•The performance of NMF algorithms depends on a corpus imbalance.•NMF outperforms 4 from 6 baseline methods and second only to deep learning methods. Thi...
Gespeichert in:
| Veröffentlicht in: | Expert systems with applications Jg. 199; S. 117179 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
Elsevier Ltd
01.08.2022
Elsevier BV |
| Schlagworte: | |
| ISSN: | 0957-4174, 1873-6793 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | •5 Different NMF algorithms with different parameters are compared.•Kullback-Leibler NMF requiring the stationarity of objective value is the best.•The performance of NMF algorithms depends on a corpus imbalance.•NMF outperforms 4 from 6 baseline methods and second only to deep learning methods.
This work describes automatic term extraction approach based on the combination of the probabilistic topic modelling (PTM) and non-negative matrix factorization (NMF). Topic modeling algorithms including NMF-based ones do not require expensive and time-consuming manual annotations for domain terms, but only a corpus of domain documents. The topics emerge from the corpus documents without any supervision as sets of most probable words. This work is aimed to investigate how fully and precisely these most probable words from topics can reflect domain terminology. We run a series of experiments on the novel, qualitatively annotated dataset ACTER that was first used in the TermEval 2020 Shared Task. We compare five different NMF algorithms and four different NMF initializations when changing the number of topics extracted from documents and the number of most probable words extracted from topics in order to determine optimal combinations for best performance of term extraction. Finally, we compare the obtained optimal combinations of NMF with the competitive methods in TermEval 2020 and prove that our approach is second only to two much more sophisticated, domain-dependent supervised methods. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0957-4174 1873-6793 |
| DOI: | 10.1016/j.eswa.2022.117179 |