An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood

Most clustering algorithms rely on the assumption that data simply contains numerical values. In fact, however, data sets containing both numerical and categorical attributes are ubiquitous in real-world tasks, and effective grouping of such data is an important yet challenging problem. Currently mo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems Jg. 133; S. 294 - 313
Hauptverfasser:	Ding, Shifei, Du, Mingjing, Sun, Tongfeng, Xu, Xiao, Xue, Yu
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Amsterdam Elsevier B.V 01.10.2017 Elsevier Science Ltd
Schlagworte:	Algorithms Attributes Clustering Data Datasets Density Density peaks clustering Entropy Experiments Fuzzy neighborhood Genetic engineering Mixed type data Neighborhoods Numbers Prototypes Residential density Robustness Robustness (mathematics) Similarity Transformation Fuzzy neighborhood Entropy Mixed type data Density peaks clustering
ISSN:	0950-7051, 1872-7409
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Most clustering algorithms rely on the assumption that data simply contains numerical values. In fact, however, data sets containing both numerical and categorical attributes are ubiquitous in real-world tasks, and effective grouping of such data is an important yet challenging problem. Currently most algorithms are sensitive to initialization and are generally unsuitable for non-spherical distribution data. For this, we propose an entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood (DP-MD-FN). Firstly, we propose a new similarity measure for either categorical or numerical attributes which has a uniform criterion. The similarity measure is proposed to avoid feature transformation and parameter adjustment between categorical and numerical values. We integrate this entropy-based strategy with the density peaks clustering method. In addition, to improve the robustness of the original algorithm, we use fuzzy neighborhood relation to redefine the local density. Besides, in order to select the cluster centers automatically, a simple determination strategy is developed through introducing the γ-graph. This method can deal with three types of data: numerical, categorical, and mixed type data. We compare the performance of our algorithm with traditional clustering algorithms, such as K-Modes, K-Prototypes, KL-FCM-GM, EKP and OCIL. Experiments on different benchmark data sets demonstrate the effectiveness and robustness of the proposed algorithm.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2017.07.027