Local search genetic algorithm-based possibilistic weighted fuzzy c-means for clustering mixed numerical and categorical data

Clustering for mixed numerical and categorical attributes has attracted many researchers due to its necessity in many real-world applications. One crucial issue concerned in clustering mixed data is to select an appropriate distance metric for each attribute type. Besides, some current clustering me...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural computing & applications Jg. 34; H. 20; S. 18059 - 18074
Hauptverfasser:	Nguyen, Thi Phuong Quyen, Kuo, R. J., Le, Minh Duc, Nguyen, Thi Cuc, Le, Thi Huynh Anh
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	London Springer London 01.10.2022 Springer Nature B.V
Schlagworte:	Artificial Intelligence Centroids Clustering Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Data Mining and Knowledge Discovery Datasets Genetic algorithms Image Processing and Computer Vision Machine learning Original Article Probability and Statistics in Computer Science Searching Local search genetic algorithm Possibilistic fuzzy means Variable neighborhood search Mixed data
ISSN:	0941-0643, 1433-3058
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Clustering for mixed numerical and categorical attributes has attracted many researchers due to its necessity in many real-world applications. One crucial issue concerned in clustering mixed data is to select an appropriate distance metric for each attribute type. Besides, some current clustering methods are sensitive to the initial solutions and easily trap into a locally optimal solution. Thus, this study proposes a local search genetic algorithm-based possibilistic weighted fuzzy c -means (LSGA-PWFCM) for clustering mixed numerical and categorical data. The possibilistic weighted fuzzy c-means (PWFCM) is firstly proposed in which the object-cluster similarity measure is employed to calculate the distance between two mixed-attribute objects. Besides, each attribute is placed a different important role by calculating its corresponding weight in the PWFCM procedure. Thereafter, GA is used to find a set of optimal parameters and the initial clustering centroids for the PFCM algorithm. To avoid local optimal solution, local search-based variable neighborhoods are embedded in the GA procedure. The proposed LSGA-PWFCM algorithm is compared with other benchmark algorithms based on some public datasets in UCI machine learning repository to evaluate its performance. Two clustering validation indices are used, i.e., clustering accuracy and Rand index. The experimental results show that the proposed LSGA-PWFCM outperforms other algorithms on most of the tested datasets.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0941-0643 1433-3058
DOI:	10.1007/s00521-022-07411-1