A parametric methodology for text classification
Finding the correct category (class) a new unclassified document belongs to is an interesting and difficult problem, with a wide range of applications. Our methodology for narrative text classification is based on two techniques: we calculate the distance (similarity) between the new unclassified do...
Uloženo v:
| Vydáno v: | Journal of information science Ročník 36; číslo 4; s. 421 - 442 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
London, England
SAGE Publications
01.08.2010
Sage Publications Bowker-Saur Ltd |
| Témata: | |
| ISSN: | 0165-5515, 1741-6485 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Finding the correct category (class) a new unclassified document belongs to is an interesting and difficult problem, with a wide range of applications. Our methodology for narrative text classification is based on two techniques: we calculate the distance (similarity) between the new unclassified document and all the pre-classified documents of each class and also calculate the similarity of the new document to the ‘average class document’ of each class. In both cases we use key phrases (text phrases or key terms) as the distinctive features of our text classification methodology and eventually the proposed text classification method is based on the automatic extraction of an authority list of key phrases that is appropriate for discriminating between different classes. In this paper, we apply this methodology in handling Greek text and we present the key concepts, the algorithms, and some critical decisions. A number of parameters of the mining algorithm are also fine tuned. The actual text classification system, the adopted (embedded) ideas and the alternative values of parameters are evaluated using two training sets (test collections). |
|---|---|
| Bibliografie: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23 ObjectType-Article-1 ObjectType-Feature-2 |
| ISSN: | 0165-5515 1741-6485 |
| DOI: | 10.1177/0165551510368620 |