K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data distribution axis to split these clusters incrementally into bette...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:arXiv.org
Hlavní autoři: Mohammadi, Seyed Omid, Kalhor, Ahmad, Bodaghi, Hossein
Médium: Paper
Jazyk:angličtina
Vydáno: Ithaca Cornell University Library, arXiv.org 24.05.2022
Témata:
ISSN:2331-8422
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data distribution axis to split these clusters incrementally into better fits if needed. Accuracy and speed are two main advantages of the proposed method. We experiment on six synthetic benchmark datasets plus two real-world datasets MNIST and Fashion-MNIST, to prove that our algorithm has excellent accuracy in finding the correct number of clusters under different conditions. We also show that k-splits is faster than similar methods and can even be faster than the standard k-means in lower dimensions. Finally, we suggest using k-splits to uncover the exact position of centroids and then input them as initial points to the k-means algorithm to fine-tune the results.
AbstractList This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data distribution axis to split these clusters incrementally into better fits if needed. Accuracy and speed are two main advantages of the proposed method. We experiment on six synthetic benchmark datasets plus two real-world datasets MNIST and Fashion-MNIST, to prove that our algorithm has excellent accuracy in finding the correct number of clusters under different conditions. We also show that k-splits is faster than similar methods and can even be faster than the standard k-means in lower dimensions. Finally, we suggest using k-splits to uncover the exact position of centroids and then input them as initial points to the k-means algorithm to fine-tune the results.
Author Bodaghi, Hossein
Kalhor, Ahmad
Mohammadi, Seyed Omid
Author_xml – sequence: 1
  givenname: Seyed
  surname: Mohammadi
  middlename: Omid
  fullname: Mohammadi, Seyed Omid
– sequence: 2
  givenname: Ahmad
  surname: Kalhor
  fullname: Kalhor, Ahmad
– sequence: 3
  givenname: Hossein
  surname: Bodaghi
  fullname: Bodaghi, Hossein
BookMark eNo1kE1Lw0AYhBdRsNb-AG8LnlN33_3I1luIVkurHuzZssm-bVOSbM1uiv57A-ppYGCeYeaKnLe-RUJuOJtKoxS7s91XdZoCHwwmtWZnZARC8MRIgEsyCeHAGAOdglJiRD6WyfuxrmK4p4vm2PkTOrpMXtC2geZ1HyJ2VbujWb3zXRX3DY2eZn30jY1Vaev6mz5gxDLSuEf62jcFdtRv_6PhmlxsbR1w8qdjsp4_rvPnZPX2tMizVWIVmEQ5jkoz5qx2zJRb4dJUSl3MrEZwXEkhEFUx46WTUJSp1QBaWLApV0woLcbk9hc7LPjsMcTNwfddOzRuQJnhCjMzRvwAyPJWdQ
ContentType Paper
Copyright 2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L6V
M7S
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
DOI 10.48550/arxiv.2110.04660
DatabaseName ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest SciTech Premium Collection Technology Collection Materials Science & Engineering Database
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
ProQuest Technology Collection
ProQuest One
ProQuest Central
SciTech Premium Collection
ProQuest Engineering Collection
Engineering Database
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle Publicly Available Content Database
Engineering Database
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
Engineering Collection
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: PIMPY
  name: ProQuest - Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2331-8422
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FG
ABJCF
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FRJ
HCIFZ
L6V
M7S
M~E
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
ID FETCH-LOGICAL-a528-5d1e5600da6d08cf3d77446b9a6e2d15433ee5b91cd42bc7a62263a2a71503563
IEDL.DBID M7S
IngestDate Mon Jun 30 09:29:51 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a528-5d1e5600da6d08cf3d77446b9a6e2d15433ee5b91cd42bc7a62263a2a71503563
Notes SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
OpenAccessLink https://www.proquest.com/docview/2581108988?pq-origsite=%requestingapplication%
PQID 2581108988
PQPubID 2050157
ParticipantIDs proquest_journals_2581108988
PublicationCentury 2000
PublicationDate 20220524
PublicationDateYYYYMMDD 2022-05-24
PublicationDate_xml – month: 05
  year: 2022
  text: 20220524
  day: 24
PublicationDecade 2020
PublicationPlace Ithaca
PublicationPlace_xml – name: Ithaca
PublicationTitle arXiv.org
PublicationYear 2022
Publisher Cornell University Library, arXiv.org
Publisher_xml – name: Cornell University Library, arXiv.org
SSID ssj0002672553
Score 1.7955089
SecondaryResourceType preprint
Snippet This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Algorithms
Centroids
Cluster analysis
Clustering
Datasets
Vector quantization
Title K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters
URI https://www.proquest.com/docview/2581108988
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpZ3PT8IwFMcbBU08-Tv-QNKD18Jot27zYhAxGmRZhBi8SLq1VZLJcBtE_3vbMtSTF4_LsmXp0vc-70e_D4Bz7kuPRpGHBHU4sv2IKTuo9pV0uI-pIgrL9OY83rtB4I1Gflgm3PKyrXJlE42h5mmsc-RN7Hi6Y933vMvZO9JTo3R1tRyhsQ6qWiWhZVr3Bt85FkxdRcxkWcw00l1Nln1MFg0d9TRUZFgKU_42wcav3Gz_94t2QDVkM5HtgjUx3QObpp8zzvfBcw8NFF4W-QVcpg0Ehz3UF8oxwU4y1-oIymfBdvKi3li8vsEihe15kRoBV5Ykn_Ba6PICVHwIAzM1BKZy9Wh-AIY33WHnFpWjFBBzsIcc3hIabTij3PJiSbiiPptGPqMCc0VRhAjhRH4r5jaOYpdRRWWEYeYqXiQOJYegMk2n4ghAaVFJddhIGLFZLCLX4pFWgZdSCsULx6C2Wq1xuR3y8c9Snfx9-xRsYX2-wHIQtmugUmRzcQY24kUxybM6qF51g_Chbv6yugrv-uHTFzF_smI
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3JTsMwEB2VTXBiF0sBH-BoCHbiJEgIVQUEaqmQqBAnKie2Aak00KQsH8U_MnYb4MSNA-cokZyxZ97MPL8B2FaxiUSSRFSLQFE_TiT6QTxXJlAxE4goPMfNuW6GrVZ0cxNfVuCjvAtjaZWlT3SOWmWprZHvsSCyjPU4io6enqmdGmW7q-UIjeG2aOj3V0zZ8sPzY7TvDmOnJ-36GR1NFaAyYBEN1L62UV5JobwoNVwhAPJFEkuhmUJAwbnWQRLvp8pnSRpKgQCFSyZDhE48EBw_OwYTiCJY7JiCV18lHSZCBOh82Dt1SmF7sv_28LJrk6xdTERHOpg_Pb4LY6ez_-wHzMHEpXzS_Xmo6N4CTDm2apovwm2DXiF4LvIDMiyKaEUa9EJj2CX17sBqP2BEJrXuHS6guH8kRUZqgyJz8rSy230nx9o2TwiiX9JyM1FIZspX8yVo_8WKlmG8l_X0ChDjCSNsUswl92Wqk9BTidW4N8ZoREOrUC2N0xkd9rzzbZm13x9vwfRZ-6LZaZ63Gusww-xNCi-gzK_CeNEf6A2YTF-Kh7y_6TYWgc4f2_ET4RkJEA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=K-Splits%3A+Improved+K-Means+Clustering+Algorithm+to+Automatically+Detect+the+Number+of+Clusters&rft.jtitle=arXiv.org&rft.au=Mohammadi%2C+Seyed+Omid&rft.au=Kalhor%2C+Ahmad&rft.au=Bodaghi%2C+Hossein&rft.date=2022-05-24&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2110.04660