Lazy fine-tuning algorithms for naïve Bayesian text classification

The naïve Bayes (NB) learning algorithm is widely applied in many fields, particularly in text classification. However, its performance decreases when it is used in domains where its naïve assumption is violated or when the training set is too small to find accurate estimations of the probabilities....

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Applied soft computing Ročník 96; s. 106652
Hlavní autoři:	El Hindi, Khalil M., Aljulaidan, Reem R., AlSalman, Hussien
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 01.11.2020
Témata:	Complement NB Document categorization Fine-tuning Naïve Bayes Local learning Multinomial text classification One-versus-all NB Fine-tuning Naïve Bayes Complement NB Local learning Multinomial text classification One-versus-all NB Document categorization
ISSN:	1568-4946, 1872-9681
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	The naïve Bayes (NB) learning algorithm is widely applied in many fields, particularly in text classification. However, its performance decreases when it is used in domains where its naïve assumption is violated or when the training set is too small to find accurate estimations of the probabilities. In this study, we propose a lazy fine-tuning naïve Bayes (LFTNB) method to address both problems. We propose a local fine-tuning algorithm that uses the nearest neighbors of a query instance to fine-tune the probability terms used by NB. Applying the nearest neighbors only makes the independence assumption more likely to be valid, whereas the fine-tuning algorithm is used to find more accurate estimations of the probability terms. The performance of the LFTNB approach was evaluated using 47 UCI datasets. The results show that the LFTNB method achieves superior performance than classical NB, eager FTNB, and k-nearest neighbor algorithms. We also propose eager and lazy fine-tuning versions of powerful NB-based text classification algorithms, namely, multinomial NB, complement NB, and one-versus-all NB. The empirical results using 18 UCI text classification datasets show that the proposed methods outperform untuned versions of these algorithms. •We propose lazy fine tuning algorithms for Naive Bayesian and compare between them.•This addresses the violation of the conditional independence assumption.•It also addresses the scarcity of training data (by fine-tuning the NB classifier).•We also propose eager and lazy fine-tuning algorithms for NB-based text classifiers.•The NB-based text classifiers are Multinomial NB Complement NB, One-Versus-All NB.•Results reveal that the proposed lazy algorithms outperform their eager counterparts.
ISSN:	1568-4946 1872-9681
DOI:	10.1016/j.asoc.2020.106652