Application of improved distributed naive Bayesian algorithms in text classification

The naive Bayes classifier is a widely used text classification method that applies statistical theory to text classification. Due to the particularity of the text, related feature items may generate new semantic information, which may be lost when the traditional vector space model represents text....

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:The Journal of supercomputing Ročník 75; číslo 9; s. 5831 - 5847
Hlavní autori: Gao, Hongyi, Zeng, Xi, Yao, Chunhua
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York Springer US 01.09.2019
Springer Nature B.V
Predmet:
ISSN:0920-8542, 1573-0484
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The naive Bayes classifier is a widely used text classification method that applies statistical theory to text classification. Due to the particularity of the text, related feature items may generate new semantic information, which may be lost when the traditional vector space model represents text. This paper mainly studies the construction and improvement of distributed naive Bayes automatic classification system. The application of Hadoop cloud computing in web page classification is one of the focuses of this article. Firstly, the text classification system and Bayesian classification model are analyzed and discussed, including the representation and extraction of text information, text classification methods and Bayesian text classification methods. Then, in view of the shortcomings of the above-mentioned naive Bayesian text classification method, when training text, we use the mutual information method to check the correlation between the feature sets generated after feature selection, and then combine the features with higher correlation degree appropriately. Through a series of tests, the experimental data show that the improved text classification system can achieve better classification results.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-019-02862-1