A survey of genetic algorithms for clustering: Taxonomy and empirical analysis

Clustering, an unsupervised learning technique, aims to group patterns into clusters where similar patterns are grouped together, while dissimilar ones are placed in different clusters. This task can present itself as a complex optimization problem due to the extensive search space generated by all...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Swarm and evolutionary computation Ročník 91; s. 101720
Hlavní autoři: Robles-Berumen, Hermes, Zafra, Amelia, Ventura, Sebastián
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.12.2024
Témata:
ISSN:2210-6502
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Clustering, an unsupervised learning technique, aims to group patterns into clusters where similar patterns are grouped together, while dissimilar ones are placed in different clusters. This task can present itself as a complex optimization problem due to the extensive search space generated by all potential data partitions. Genetic Algorithms (GAs) have emerged as efficient tools for addressing this task. Consequently, significant advancements and numerous proposals have been developed in this field. This work offers a comprehensive and critical review of state-of-the-art mono-objective Genetic Algorithms (GAs) for partitional clustering. From a more theoretical standpoint, it examines 22 well-known proposals in detail, covering their encoding strategies, objective functions, genetic operators, local search methods, and parent selection strategies. Based on this information, a specific taxonomy is proposed. In addition, from a more practical standpoint, a detailed experimental study is conducted to discern the advantages and disadvantages of approaches. Specifically, 22 different cluster validation indices are considered to compare the performance of clustering techniques. This evaluation is performed across 94 datasets encompassing diverse configurations, including the number of classes, separation between classes, and pattern dimensionality. Results reveal interesting findings, such as the key role of local search in optimizing results and reducing search space. Additionally, representations based on centroids and labels demonstrate greater efficiency and crossover and mutation operators do not prove to be as relevant. Ultimately, while the results are satisfactory, real-world clustering problems introduce additional complexity, especially for algorithms aiming to determine the number of clusters, resulting in diminished performance and the need for new approaches to be explored. Code, datasets and instructions to run algorithms in the LEAL library are available in an associated repository, in order to facilitate future experiments in this environment.
ISSN:2210-6502
DOI:10.1016/j.swevo.2024.101720