Ensemble method for cluster number determination and algorithm selection in unsupervised learning [version 1; peer review: 2 approved with reservations, 1 not approved]

Unsupervised learning, and more specifically clustering, suffers from the need for expertise in the field to be of use. Researchers must make careful and informed decisions on which algorithm to use with which set of hyperparameters for a given dataset. Additionally, researchers may need to determin...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:F1000 research Ročník 11; s. 573
Hlavní autor: Zambelli, Antoine
Médium: Journal Article
Jazyk:angličtina
Vydáno: 2022
Témata:
ISSN:2046-1402, 2046-1402
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Unsupervised learning, and more specifically clustering, suffers from the need for expertise in the field to be of use. Researchers must make careful and informed decisions on which algorithm to use with which set of hyperparameters for a given dataset. Additionally, researchers may need to determine the number of clusters in the dataset, which is unfortunately itself an input to most clustering algorithms; all of this before embarking on their actual subject matter work. After quantifying the impact of algorithm and hyperparameter selection, we propose an ensemble clustering framework which can be leveraged with minimal input. It can be used to determine both the number of clusters in the dataset and a suitable choice of algorithm to use for a given dataset. A code library is included in the Conclusions for ease of integration.
ISSN:2046-1402
2046-1402
DOI:10.12688/f1000research.121486.1