ClusterSwarm: cluster-specific feature selection using binary particle swarm optimisation

Gespeichert in:
Bibliographische Detailangaben
Titel: ClusterSwarm: cluster-specific feature selection using binary particle swarm optimisation
Autoren: Ezenkwu, Chinedu Pascal, Starkey, Andrew, Aziz, Azwa Abdul
Quelle: Computing. 107
Verlagsinformationen: Springer Science and Business Media LLC, 2025.
Publikationsjahr: 2025
Schlagwörter: Feature selection, Interpretability, Particle swarm optimisation, Unsupervised learning, K-means, Clustering
Beschreibung: Feature selection has become an important step in machine learning pipelines, contributing to model interpretability and accuracy. While the emphasis has been hugely on global feature selection techniques, these methods do not support feature attributions to the distinct groups within a dataset, since they assume that a single feature set is adequate to correctly undertake the classification task. Unlike unsupervised learning, moreover, feature selection techniques, whether global or local, have been well-developed for supervised learning. Due to the preceding reasons, this paper presents ClusterSwarm, a new approach towards cluster-based feature selection using Binary Particle Swarm Optimisation (BPSO) and the K-means algorithm, to identify cluster-specific feature sets. Evaluating using four publicly available datasets from the UCI repository, ClusterSwarm demonstrates superior performance to the standard K-means algorithm and agglomerative hierarchical clustering and performs similarly to Sparse K-means, a global feature selection technique. However, ClusterSwarm performs better than Sparse K-means in high-dimensional, multi-class and noisy contexts, while providing interpretability through feature attributions to each cluster. In comparison with CS Sparse K-means, a cluster-specific variant of Sparse K-means, ClusterSwarm produced better accuracies and more efficient feature selection, ignoring redundant features, unlike CS Sparse K-means. In addition to the four public datasets, we experimented with two synthetic datasets carefully curated to represent cases of noisy features and overlapping clusters. These datasets have been used to demonstrate the superiority of ClusterSwarm compared to Sparse K-means, CS Sparse K-means and the standard clustering techniques.
Publikationsart: Article
Sprache: English
ISSN: 1436-5057
0010-485X
DOI: 10.1007/s00607-025-01534-8
Zugangs-URL: https://rgu-repository.worktribe.com/output/2934849
Rights: CC BY
Dokumentencode: edsair.doi.dedup.....3e44455d3f12cba236afc0f19a4313cb
Datenbank: OpenAIRE
Beschreibung
Abstract:Feature selection has become an important step in machine learning pipelines, contributing to model interpretability and accuracy. While the emphasis has been hugely on global feature selection techniques, these methods do not support feature attributions to the distinct groups within a dataset, since they assume that a single feature set is adequate to correctly undertake the classification task. Unlike unsupervised learning, moreover, feature selection techniques, whether global or local, have been well-developed for supervised learning. Due to the preceding reasons, this paper presents ClusterSwarm, a new approach towards cluster-based feature selection using Binary Particle Swarm Optimisation (BPSO) and the K-means algorithm, to identify cluster-specific feature sets. Evaluating using four publicly available datasets from the UCI repository, ClusterSwarm demonstrates superior performance to the standard K-means algorithm and agglomerative hierarchical clustering and performs similarly to Sparse K-means, a global feature selection technique. However, ClusterSwarm performs better than Sparse K-means in high-dimensional, multi-class and noisy contexts, while providing interpretability through feature attributions to each cluster. In comparison with CS Sparse K-means, a cluster-specific variant of Sparse K-means, ClusterSwarm produced better accuracies and more efficient feature selection, ignoring redundant features, unlike CS Sparse K-means. In addition to the four public datasets, we experimented with two synthetic datasets carefully curated to represent cases of noisy features and overlapping clusters. These datasets have been used to demonstrate the superiority of ClusterSwarm compared to Sparse K-means, CS Sparse K-means and the standard clustering techniques.
ISSN:14365057
0010485X
DOI:10.1007/s00607-025-01534-8