Text feature selection algorithm based on Chi-square rank correlation factorization

Feather Selection is an effective method to reduce the dimension of text feature. The existing feature selection methods usually use empirical estimation methods when determining the scale of the feature selection. These methods have achieved good results in some specific corpora. However, it is not...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of interdisciplinary mathematics Ročník 20; číslo 1; s. 153 - 160
Hlavní autor: Li, Yan-Hong
Médium: Journal Article
Jazyk:angličtina
Vydáno: Taylor & Francis 02.01.2017
Témata:
ISSN:0972-0502, 2169-012X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Feather Selection is an effective method to reduce the dimension of text feature. The existing feature selection methods usually use empirical estimation methods when determining the scale of the feature selection. These methods have achieved good results in some specific corpora. However, it is not easy to promote the further generalization of automatic text categorization due to the insufficient theoretical basis. Therefore, a text feature selection algorithm based on Chi-square rank correlation factorization is proposed based on the comprehensive consideration of the whole and local distribution of text features. Under the condition that the algorithm does not need any prior knowledge, the feature weights are portrayed and the feature selection is completed, fully reflects the characteristics of the probability distribution.
ISSN:0972-0502
2169-012X
DOI:10.1080/09720502.2016.1259769