Improved support vector machine algorithm for heterogeneous data

A support vector machine (SVM) is a popular algorithm for classification learning. The classical SVM effectively manages classification tasks defined by means of numerical attributes. However, both numerical and nominal attributes are used in practical tasks and the classical SVM does not fully cons...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Pattern recognition Ročník 48; číslo 6; s. 2072 - 2083
Hlavní autoři: Peng, Shili, Hu, Qinghua, Chen, Yinli, Dang, Jianwu
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.06.2015
Témata:
ISSN:0031-3203, 1873-5142
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:A support vector machine (SVM) is a popular algorithm for classification learning. The classical SVM effectively manages classification tasks defined by means of numerical attributes. However, both numerical and nominal attributes are used in practical tasks and the classical SVM does not fully consider the difference between them. Nominal attributes are usually regarded as numerical after coding. This may deteriorate the performance of learning algorithms. In this study, we propose a novel SVM algorithm for learning with heterogeneous data, known as a heterogeneous SVM (HSVM). The proposed algorithm learns an mapping to embed nominal attributes into a real space by minimizing an estimated generalization error, instead of by direct coding. Extensive experiments are conducted, and some interesting results are obtained. The experiments show that HSVM improves classification performance for both nominal and heterogeneous data. •We propose an algorithm to map nominal features to a numerical space via minimizing estimated generalization errors.•We integrate the mapping algorithm with support vector machines and result in an improved learning algorithm from heterogeneous data.•Experiments show the proposed technique is effective for learning with heterogeneous data and also help deal with imbalanced tasks.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2014.12.015