Parallel batch k-means for Big data clustering

•This paper proposes a new parallel batch clustering algorithm based on k-means.•The algorithm increases the clustering speed.•Real large datasets are used to evaluate the proposed approach.•The analysis shows the practical applicability of the algorithm to Big Data. The application of clustering al...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computers & industrial engineering Ročník 152; s. 107023
Hlavní autoři: Alguliyev, Rasim M., Aliguliyev, Ramiz M., Sukhostat, Lyudmila V.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.02.2021
Témata:
ISSN:0360-8352, 1879-0550
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•This paper proposes a new parallel batch clustering algorithm based on k-means.•The algorithm increases the clustering speed.•Real large datasets are used to evaluate the proposed approach.•The analysis shows the practical applicability of the algorithm to Big Data. The application of clustering algorithms is expanding due to the rapid growth of data volumes. Nevertheless, existing algorithms are not always effective because of high computational complexity. A new parallel batch clustering algorithm based on the k-means algorithm is proposed. The proposed algorithm splits a dataset into equal partitions and reduces the exponential growth of computations. The goal is to preserve the characteristics of the dataset while increasing the clustering speed. The centers of the clusters are calculated for each partition, which are merged and also clustered later. The approach to determine the optimal batch size is also considered. The statistical significance of the proposed approach is provided. Six experimental datasets are used to evaluate the effectiveness of the proposed parallel batch clustering. The obtained results are compared with the k-means algorithm. The analysis shows the practical applicability of the proposed algorithm to Big Data.
ISSN:0360-8352
1879-0550
DOI:10.1016/j.cie.2020.107023