Improved support vector machine algorithm for heterogeneous data

A support vector machine (SVM) is a popular algorithm for classification learning. The classical SVM effectively manages classification tasks defined by means of numerical attributes. However, both numerical and nominal attributes are used in practical tasks and the classical SVM does not fully cons...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition Vol. 48; no. 6; pp. 2072 - 2083
Main Authors: Peng, Shili, Hu, Qinghua, Chen, Yinli, Dang, Jianwu
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.06.2015
Subjects:
ISSN:0031-3203, 1873-5142
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A support vector machine (SVM) is a popular algorithm for classification learning. The classical SVM effectively manages classification tasks defined by means of numerical attributes. However, both numerical and nominal attributes are used in practical tasks and the classical SVM does not fully consider the difference between them. Nominal attributes are usually regarded as numerical after coding. This may deteriorate the performance of learning algorithms. In this study, we propose a novel SVM algorithm for learning with heterogeneous data, known as a heterogeneous SVM (HSVM). The proposed algorithm learns an mapping to embed nominal attributes into a real space by minimizing an estimated generalization error, instead of by direct coding. Extensive experiments are conducted, and some interesting results are obtained. The experiments show that HSVM improves classification performance for both nominal and heterogeneous data. •We propose an algorithm to map nominal features to a numerical space via minimizing estimated generalization errors.•We integrate the mapping algorithm with support vector machines and result in an improved learning algorithm from heterogeneous data.•Experiments show the proposed technique is effective for learning with heterogeneous data and also help deal with imbalanced tasks.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2014.12.015