ClusterSwarm: cluster-specific feature selection using binary particle swarm optimisation
Gespeichert in:
| Titel: | ClusterSwarm: cluster-specific feature selection using binary particle swarm optimisation |
|---|---|
| Autoren: | Ezenkwu, Chinedu Pascal, Starkey, Andrew, Aziz, Azwa Abdul |
| Quelle: | Computing. 107 |
| Verlagsinformationen: | Springer Science and Business Media LLC, 2025. |
| Publikationsjahr: | 2025 |
| Schlagwörter: | Feature selection, Interpretability, Particle swarm optimisation, Unsupervised learning, K-means, Clustering |
| Beschreibung: | Feature selection has become an important step in machine learning pipelines, contributing to model interpretability and accuracy. While the emphasis has been hugely on global feature selection techniques, these methods do not support feature attributions to the distinct groups within a dataset, since they assume that a single feature set is adequate to correctly undertake the classification task. Unlike unsupervised learning, moreover, feature selection techniques, whether global or local, have been well-developed for supervised learning. Due to the preceding reasons, this paper presents ClusterSwarm, a new approach towards cluster-based feature selection using Binary Particle Swarm Optimisation (BPSO) and the K-means algorithm, to identify cluster-specific feature sets. Evaluating using four publicly available datasets from the UCI repository, ClusterSwarm demonstrates superior performance to the standard K-means algorithm and agglomerative hierarchical clustering and performs similarly to Sparse K-means, a global feature selection technique. However, ClusterSwarm performs better than Sparse K-means in high-dimensional, multi-class and noisy contexts, while providing interpretability through feature attributions to each cluster. In comparison with CS Sparse K-means, a cluster-specific variant of Sparse K-means, ClusterSwarm produced better accuracies and more efficient feature selection, ignoring redundant features, unlike CS Sparse K-means. In addition to the four public datasets, we experimented with two synthetic datasets carefully curated to represent cases of noisy features and overlapping clusters. These datasets have been used to demonstrate the superiority of ClusterSwarm compared to Sparse K-means, CS Sparse K-means and the standard clustering techniques. |
| Publikationsart: | Article |
| Sprache: | English |
| ISSN: | 1436-5057 0010-485X |
| DOI: | 10.1007/s00607-025-01534-8 |
| Zugangs-URL: | https://rgu-repository.worktribe.com/output/2934849 |
| Rights: | CC BY |
| Dokumentencode: | edsair.doi.dedup.....3e44455d3f12cba236afc0f19a4313cb |
| Datenbank: | OpenAIRE |
| Abstract: | Feature selection has become an important step in machine learning pipelines, contributing to model interpretability and accuracy. While the emphasis has been hugely on global feature selection techniques, these methods do not support feature attributions to the distinct groups within a dataset, since they assume that a single feature set is adequate to correctly undertake the classification task. Unlike unsupervised learning, moreover, feature selection techniques, whether global or local, have been well-developed for supervised learning. Due to the preceding reasons, this paper presents ClusterSwarm, a new approach towards cluster-based feature selection using Binary Particle Swarm Optimisation (BPSO) and the K-means algorithm, to identify cluster-specific feature sets. Evaluating using four publicly available datasets from the UCI repository, ClusterSwarm demonstrates superior performance to the standard K-means algorithm and agglomerative hierarchical clustering and performs similarly to Sparse K-means, a global feature selection technique. However, ClusterSwarm performs better than Sparse K-means in high-dimensional, multi-class and noisy contexts, while providing interpretability through feature attributions to each cluster. In comparison with CS Sparse K-means, a cluster-specific variant of Sparse K-means, ClusterSwarm produced better accuracies and more efficient feature selection, ignoring redundant features, unlike CS Sparse K-means. In addition to the four public datasets, we experimented with two synthetic datasets carefully curated to represent cases of noisy features and overlapping clusters. These datasets have been used to demonstrate the superiority of ClusterSwarm compared to Sparse K-means, CS Sparse K-means and the standard clustering techniques. |
|---|---|
| ISSN: | 14365057 0010485X |
| DOI: | 10.1007/s00607-025-01534-8 |
Full Text Finder
Nájsť tento článok vo Web of Science