RETRACTED ARTICLE: Innovative study on clustering center and distance measurement of K-means algorithm: mapreduce efficient parallel algorithm based on user data of JD mall

The traditional K-means algorithm is very sensitive to the selection of clustering centers and the calculation of distances, so the algorithm easily converges to a locally optimal solution. In addition, the traditional algorithm has slow convergence speed and low clustering accuracy, as well as memo...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Electronic commerce research Ročník 23; číslo 1; s. 43 - 73
Hlavní autori: Liu, Yang, Du, Xinxin, Ma, Shuaifeng
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York Springer US 01.03.2023
Springer
Springer Nature B.V
Predmet:
ISSN:1389-5753, 1572-9362
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The traditional K-means algorithm is very sensitive to the selection of clustering centers and the calculation of distances, so the algorithm easily converges to a locally optimal solution. In addition, the traditional algorithm has slow convergence speed and low clustering accuracy, as well as memory bottleneck problems when processing massive data. Therefore, an improved K-means algorithm is proposed in this paper. In this algorithm, the selection of the initial points in the traditional clustering algorithm is improved first, and then a new global measure, the effective distance measure, is proposed. Its main idea is to calculate the effective distance between two data samples by sparse reconstruction. Finally, on the basis of the MapReduce framework, the efficiency of the algorithm is further improved by adjusting the Hadoop cluster. Based on the real customer data from the JD Mall dataset, this paper introduces the DBI, Rand and other indicators to evaluate the clustering effects of various algorithms. The results show that the proposed algorithm not only has good convergence and accuracy but also achieves better performances than those of other compared algorithms.
Bibliografia:ObjectType-Correction/Retraction-1
SourceType-Scholarly Journals-1
content type line 14
ISSN:1389-5753
1572-9362
DOI:10.1007/s10660-021-09458-z