A Fast Partitional Clustering Algorithm based on Nearest Neighbours Heuristics

•Investigate K-means clustering for large collections of sparse vectors of high dimensionality.•Proposed utilizing the inverted list data-structure to improve run-time of K-means.•Heuristics proposed for initial centroid selection and centroid updates.•Proposed approach outperforms the run-time of K...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition letters Vol. 112; pp. 198 - 204
Main Author: Ganguly, Debasis
Format: Journal Article
Language:English
Published: Amsterdam Elsevier B.V 01.09.2018
Elsevier Science Ltd
Subjects:
ISSN:0167-8655, 1872-7344
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract •Investigate K-means clustering for large collections of sparse vectors of high dimensionality.•Proposed utilizing the inverted list data-structure to improve run-time of K-means.•Heuristics proposed for initial centroid selection and centroid updates.•Proposed approach outperforms the run-time of K-means by up to 35x on a collection of 14M tweets. K-means, along with its several other variants, is the most widely used family of partitional clustering algorithms. Generally speaking, this family of algorithm starts by initializing a number of data points as cluster centres, and then iteratively refines these cluster centres based on the current partition of the dataset. Given a set of cluster centres, inducing the partition over the dataset involves finding the nearest (or most similar) cluster centre for each data point, which is an O(NK) operation, N and K being the number of data points and the number of clusters, respectively. In our proposed approach, we avoid the explicit computation of these distances for the case of sparse vectors, e.g. documents, by utilizing a fundamental operation, namely TOP(x), which gives a list of the top most similar vectors with respect to the vector x. A standard way to store sparse vectors and retrieve the top most similar ones given a query vector, is with the help of the inverted list data structure. In our proposed method, we use the TOP(x) function to first select cluster centres that are likely to be dissimilar to each other. Secondly, to obtain the partition during each iteration of K-means, we avoid the explicit computation of the pair-wise similarities between the centroid and the non-centroid vectors. Thirdly, we avoid recomputation of the cluster centroids by adopting a centrality based heuristic. We demonstrate the effectiveness of our proposed algorithm on TREC-2011 Microblog dataset, a large collection of about 14M tweets. Our experiments demonstrate that our proposed method is about 35x faster and produces more effective clusters in comparison to the standard K-means algorithm.
AbstractList K-means, along with its several other variants, is the most widely used family of partitional clustering algorithms. Generally speaking, this family of algorithm starts by initializing a number of data points as cluster centres, and then iteratively refines these cluster centres based on the current partition of the dataset. Given a set of cluster centres, inducing the partition over the dataset involves finding the nearest (or most similar) cluster centre for each data point, which is an O(NK) operation, N and K being the number of data points and the number of clusters, respectively. In our proposed approach, we avoid the explicit computation of these distances for the case of sparse vectors, e.g. documents, by utilizing a fundamental operation, namely TOP(x), which gives a list of the top most similar vectors with respect to the vector x. A standard way to store sparse vectors and retrieve the top most similar ones given a query vector, is with the help of the inverted list data structure. In our proposed method, we use the TOP(x) function to first select cluster centres that are likely to be dissimilar to each other. Secondly, to obtain the partition during each iteration of K-means, we avoid the explicit computation of the pair-wise similarities between the centroid and the non-centroid vectors. Thirdly, we avoid recomputation of the cluster centroids by adopting a centrality based heuristic. We demonstrate the effectiveness of our proposed algorithm on TREC-2011 Microblog dataset, a large collection of about 14M tweets. Our experiments demonstrate that our proposed method is about 35x faster and produces more effective clusters in comparison to the standard K-means algorithm.
•Investigate K-means clustering for large collections of sparse vectors of high dimensionality.•Proposed utilizing the inverted list data-structure to improve run-time of K-means.•Heuristics proposed for initial centroid selection and centroid updates.•Proposed approach outperforms the run-time of K-means by up to 35x on a collection of 14M tweets. K-means, along with its several other variants, is the most widely used family of partitional clustering algorithms. Generally speaking, this family of algorithm starts by initializing a number of data points as cluster centres, and then iteratively refines these cluster centres based on the current partition of the dataset. Given a set of cluster centres, inducing the partition over the dataset involves finding the nearest (or most similar) cluster centre for each data point, which is an O(NK) operation, N and K being the number of data points and the number of clusters, respectively. In our proposed approach, we avoid the explicit computation of these distances for the case of sparse vectors, e.g. documents, by utilizing a fundamental operation, namely TOP(x), which gives a list of the top most similar vectors with respect to the vector x. A standard way to store sparse vectors and retrieve the top most similar ones given a query vector, is with the help of the inverted list data structure. In our proposed method, we use the TOP(x) function to first select cluster centres that are likely to be dissimilar to each other. Secondly, to obtain the partition during each iteration of K-means, we avoid the explicit computation of the pair-wise similarities between the centroid and the non-centroid vectors. Thirdly, we avoid recomputation of the cluster centroids by adopting a centrality based heuristic. We demonstrate the effectiveness of our proposed algorithm on TREC-2011 Microblog dataset, a large collection of about 14M tweets. Our experiments demonstrate that our proposed method is about 35x faster and produces more effective clusters in comparison to the standard K-means algorithm.
Author Ganguly, Debasis
Author_xml – sequence: 1
  givenname: Debasis
  orcidid: 0000-0003-0050-7138
  surname: Ganguly
  fullname: Ganguly, Debasis
  email: debasis.ganguly1@ie.ibm.com
  organization: IBM Research Lab, Dublin, Ireland
BookMark eNqFkE1LAzEQhoNUsFX_gYeA512TzX5kPQilWCuU6kHPIZudtCnbTU2ygv_elPXkQU_DwPsM7zwzNOltDwjdUJJSQsu7fXqUwYFKM0J5SqqU0OoMTSmvsqRieT5B0xirEl4WxQWaeb8nhJSs5lO0meOl9AG_ShdMMLaXHV50gw_gTL_F825rnQm7A26khxbbHm9AOojEBsx219jBebyCwRkfjPJX6FzLzsP1z7xE78vHt8UqWb88PS_m60QxnoUkp6TSLSs0h0YT1RKqdQO0LYuMy1oXsSwjeROXHGpZ6CavGAPFNMtlyTSwS3Q73j06-zHEOmIfm8TyXmSU1jWrM8ZjKh9TylnvHWhxdOYg3ZegRJzMib0YzYmTOUEqEc1F7P4XpkyQJznBSdP9Bz-MMMT3Pw044ZWBXkFrYjSI1pq_D3wD81OPAw
CitedBy_id crossref_primary_10_1007_s11042_022_13453_3
crossref_primary_10_1016_j_patrec_2021_10_005
crossref_primary_10_3233_JIFS_179879
crossref_primary_10_1177_0165551520911590
Cites_doi 10.1109/TIT.1982.1056489
10.1109/TPAMI.2010.57
10.1145/2027216.2027217
10.1109/TPAMI.2002.1017616
10.14778/2180912.2180915
10.1016/j.eswa.2008.01.039
10.1016/j.is.2016.08.003
ContentType Journal Article
Copyright 2018 Elsevier B.V.
Copyright Elsevier Science Ltd. Sep 1, 2018
Copyright_xml – notice: 2018 Elsevier B.V.
– notice: Copyright Elsevier Science Ltd. Sep 1, 2018
DBID AAYXX
CITATION
7SC
7TK
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.patrec.2018.07.017
DatabaseName CrossRef
Computer and Information Systems Abstracts
Neurosciences Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Neurosciences Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1872-7344
EndPage 204
ExternalDocumentID 10_1016_j_patrec_2018_07_017
S0167865518303143
GroupedDBID --M
.DC
.~1
0R~
123
1RT
1~.
1~5
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAXUO
AAYFN
ABBOA
ABFNM
ABFRF
ABJNI
ABMAC
ABYKQ
ACDAQ
ACGFO
ACGFS
ACRLP
ACZNC
ADBBV
ADEZE
ADTZH
AEBSH
AECPX
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
AXJTR
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
J1W
JJJVA
KOM
LG9
LY1
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
RIG
RNS
ROL
SDF
SDG
SDP
SES
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
UNMZH
WH7
XPP
ZMT
~G-
--K
1B1
29O
9DU
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADMXK
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
HLZ
HVGLF
HZ~
IHE
R2-
RPZ
SBC
SDS
SEW
VOH
WUQ
Y6R
~HD
7SC
7TK
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c382t-4107fd35f8ebf0cd01ffbe1d6528a9f5167304b28a4e9a5fb4733ec3f34a63fe3
ISICitedReferencesCount 5
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000443950800029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0167-8655
IngestDate Sun Nov 09 07:31:12 EST 2025
Tue Nov 18 22:10:18 EST 2025
Sat Nov 29 07:23:41 EST 2025
Fri Feb 23 02:46:00 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Tweet clustering
41A10
65D05
65D17
Inverted index
41A05
Scalable K-means
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c382t-4107fd35f8ebf0cd01ffbe1d6528a9f5167304b28a4e9a5fb4733ec3f34a63fe3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-0050-7138
PQID 2119939238
PQPubID 2047552
PageCount 7
ParticipantIDs proquest_journals_2119939238
crossref_primary_10_1016_j_patrec_2018_07_017
crossref_citationtrail_10_1016_j_patrec_2018_07_017
elsevier_sciencedirect_doi_10_1016_j_patrec_2018_07_017
PublicationCentury 2000
PublicationDate 2018-09-01
PublicationDateYYYYMMDD 2018-09-01
PublicationDate_xml – month: 09
  year: 2018
  text: 2018-09-01
  day: 01
PublicationDecade 2010
PublicationPlace Amsterdam
PublicationPlace_xml – name: Amsterdam
PublicationTitle Pattern recognition letters
PublicationYear 2018
Publisher Elsevier B.V
Elsevier Science Ltd
Publisher_xml – name: Elsevier B.V
– name: Elsevier Science Ltd
References Xu, Croft (bib0022) 1999
Lee, Croft, Allan (bib0014) 2008
Cui, Ruan, Xue, Xie, Wang, Feng (bib0006) 2014
Choi, Chung (bib0005) 2017; 64
Efron, Organisciak, Fenlon (bib0008) 2012
Lavrenko, Croft (bib0013) 2001
Ding, Liu, Huang, Li (bib0007) 2016
Levi, Raiber, Kurland, Guy (bib0015) 2016
Raiber, Kurland (bib0021) 2013
Zeng (bib0023) 2012
Park, Jun (bib0018) 2009; 36
Arthur, Manthey, Röglin (bib0001) 2011; 58
Kanungo, Mount, Netanyahu, Piatko, Silverman, Wu (bib0012) 2002; 24
Liu, Croft (bib0016) 2004
Bahmani, Moseley, Vattani, Kumar, Vassilvitskii (bib0003) 2012; 5
Elkan (bib0009) 2003
Jegou, Douze, Schmid (bib0011) 2011; 33
Pelleg, Moore (bib0020) 2000
Lloyd (bib0017) 1982; 28
Arthur, Vassilvitskii (bib0002) 2007
Broder, Garcia-Pueyo, Josifovski, Vassilvitskii, Venkatesan (bib0004) 2014
Hiemstra (bib0010) 2000
Pelleg, Moore (bib0019) 1999
Bahmani (10.1016/j.patrec.2018.07.017_bib0003) 2012; 5
Broder (10.1016/j.patrec.2018.07.017_bib0004) 2014
Levi (10.1016/j.patrec.2018.07.017_bib0015) 2016
Lloyd (10.1016/j.patrec.2018.07.017_bib0017) 1982; 28
Raiber (10.1016/j.patrec.2018.07.017_bib0021) 2013
Jegou (10.1016/j.patrec.2018.07.017_bib0011) 2011; 33
Xu (10.1016/j.patrec.2018.07.017_bib0022) 1999
Hiemstra (10.1016/j.patrec.2018.07.017_bib0010) 2000
Kanungo (10.1016/j.patrec.2018.07.017_bib0012) 2002; 24
Pelleg (10.1016/j.patrec.2018.07.017_bib0020) 2000
Arthur (10.1016/j.patrec.2018.07.017_sbref0001) 2011; 58
Arthur (10.1016/j.patrec.2018.07.017_bib0002) 2007
Lee (10.1016/j.patrec.2018.07.017_bib0014) 2008
Ding (10.1016/j.patrec.2018.07.017_bib0007) 2016
Pelleg (10.1016/j.patrec.2018.07.017_bib0019) 1999
Efron (10.1016/j.patrec.2018.07.017_bib0008) 2012
Lavrenko (10.1016/j.patrec.2018.07.017_bib0013) 2001
Liu (10.1016/j.patrec.2018.07.017_bib0016) 2004
Elkan (10.1016/j.patrec.2018.07.017_bib0009) 2003
Park (10.1016/j.patrec.2018.07.017_bib0018) 2009; 36
Zeng (10.1016/j.patrec.2018.07.017_bib0023) 2012
Choi (10.1016/j.patrec.2018.07.017_bib0005) 2017; 64
Cui (10.1016/j.patrec.2018.07.017_bib0006) 2014
References_xml – volume: 28
  start-page: 129
  year: 1982
  end-page: 137
  ident: bib0017
  article-title: Least squares quantization in PCM
  publication-title: IEEE Trans. Inf. Theor.
– start-page: 727
  year: 2000
  end-page: 734
  ident: bib0020
  article-title: X-means: Extending k-means with efficient estimation of the number of clusters
  publication-title: Proc. of ICML ’00
– start-page: 186
  year: 2004
  end-page: 193
  ident: bib0016
  article-title: Cluster-based retrieval using language models
  publication-title: Proc. of SIGIR ’04
– start-page: 333
  year: 2013
  end-page: 342
  ident: bib0021
  article-title: Ranking document clusters using markov random fields
  publication-title: Proc. of SIGIR ’13
– volume: 33
  start-page: 117
  year: 2011
  end-page: 128
  ident: bib0011
  article-title: Product quantization for nearest neighbor search
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– start-page: 235
  year: 2008
  end-page: 242
  ident: bib0014
  article-title: A cluster-based resampling method for pseudo-relevance feedback
  publication-title: Proc. of SIGIR ’08
– start-page: 3037
  year: 2012
  end-page: 3044
  ident: bib0023
  article-title: Fast approximate k-means via cluster closures
  publication-title: Proc. of CVPR ’12
– volume: 5
  start-page: 622
  year: 2012
  end-page: 633
  ident: bib0003
  article-title: Scalable k-means++
  publication-title: Proc. VLDB Endow.
– start-page: 233
  year: 2014
  end-page: 242
  ident: bib0004
  article-title: Scalable k-means by ranked retrieval
  publication-title: Proc. of WSDM ’14
– start-page: 1027
  year: 2007
  end-page: 1035
  ident: bib0002
  article-title: K-means++: The advantages of careful seeding
  publication-title: Proc. of SODA ’07
– start-page: 1339
  year: 2016
  end-page: 1348
  ident: bib0007
  article-title: K-means clustering with distributed dimensions
  publication-title: Proc. of ICML 2016
– year: 2000
  ident: bib0010
  publication-title: Using Language Models for Information Retrieval, Ph.D. thesis
– volume: 36
  start-page: 3336
  year: 2009
  end-page: 3341
  ident: bib0018
  article-title: A simple and fast algorithm for k-medoids clustering
  publication-title: Expert Syst. Appl.
– volume: 64
  start-page: 1
  year: 2017
  end-page: 11
  ident: bib0005
  article-title: A K-partitioning algorithm for clustering large-scale spatio-textual data
  publication-title: Inf. Syst.
– start-page: 1473
  year: 2016
  end-page: 1482
  ident: bib0015
  article-title: Selective cluster-based document retrieval
  publication-title: Proc. of CIKM ’16
– start-page: 254
  year: 1999
  end-page: 261
  ident: bib0022
  article-title: Cluster-based language models for distributed retrieval
  publication-title: Proc. of SIGIR ’99
– start-page: 147
  year: 2003
  end-page: 153
  ident: bib0009
  article-title: Using the triangle inequality to accelerate k-means
  publication-title: Proc. of ICML ’03
– start-page: 20:1
  year: 2014
  end-page: 20:10
  ident: bib0006
  article-title: A collaborative divide-and-conquer k-means clustering algorithm for processing large data
  publication-title: Proc. of CF’14
– start-page: 911
  year: 2012
  end-page: 920
  ident: bib0008
  article-title: Improving retrieval of short texts through document expansion
  publication-title: Proc. of SIGIR ’12
– start-page: 120
  year: 2001
  end-page: 127
  ident: bib0013
  article-title: Relevance based language models
  publication-title: Proc. of SIGIR ’01
– volume: 58
  year: 2011
  ident: bib0001
  article-title: Smoothed analysis of the k-means method
  publication-title: J. ACM
– volume: 24
  start-page: 881
  year: 2002
  end-page: 892
  ident: bib0012
  article-title: An efficient k-means clustering algorithm: analysis and implementation
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– start-page: 277
  year: 1999
  end-page: 281
  ident: bib0019
  article-title: Accelerating exact k-means algorithms with geometric reasoning
  publication-title: Proc. of KDD ’99
– start-page: 1339
  year: 2016
  ident: 10.1016/j.patrec.2018.07.017_bib0007
  article-title: K-means clustering with distributed dimensions
– start-page: 20:1
  year: 2014
  ident: 10.1016/j.patrec.2018.07.017_bib0006
  article-title: A collaborative divide-and-conquer k-means clustering algorithm for processing large data
– year: 2000
  ident: 10.1016/j.patrec.2018.07.017_bib0010
– volume: 28
  start-page: 129
  issue: 2
  year: 1982
  ident: 10.1016/j.patrec.2018.07.017_bib0017
  article-title: Least squares quantization in PCM
  publication-title: IEEE Trans. Inf. Theor.
  doi: 10.1109/TIT.1982.1056489
– start-page: 186
  year: 2004
  ident: 10.1016/j.patrec.2018.07.017_bib0016
  article-title: Cluster-based retrieval using language models
– volume: 33
  start-page: 117
  issue: 1
  year: 2011
  ident: 10.1016/j.patrec.2018.07.017_bib0011
  article-title: Product quantization for nearest neighbor search
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2010.57
– start-page: 1473
  year: 2016
  ident: 10.1016/j.patrec.2018.07.017_bib0015
  article-title: Selective cluster-based document retrieval
– volume: 58
  issue: 5
  year: 2011
  ident: 10.1016/j.patrec.2018.07.017_sbref0001
  article-title: Smoothed analysis of the k-means method
  publication-title: J. ACM
  doi: 10.1145/2027216.2027217
– volume: 24
  start-page: 881
  issue: 7
  year: 2002
  ident: 10.1016/j.patrec.2018.07.017_bib0012
  article-title: An efficient k-means clustering algorithm: analysis and implementation
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2002.1017616
– start-page: 3037
  year: 2012
  ident: 10.1016/j.patrec.2018.07.017_bib0023
  article-title: Fast approximate k-means via cluster closures
– volume: 5
  start-page: 622
  issue: 7
  year: 2012
  ident: 10.1016/j.patrec.2018.07.017_bib0003
  article-title: Scalable k-means++
  publication-title: Proc. VLDB Endow.
  doi: 10.14778/2180912.2180915
– start-page: 1027
  year: 2007
  ident: 10.1016/j.patrec.2018.07.017_bib0002
  article-title: K-means++: The advantages of careful seeding
– start-page: 235
  year: 2008
  ident: 10.1016/j.patrec.2018.07.017_bib0014
  article-title: A cluster-based resampling method for pseudo-relevance feedback
– start-page: 233
  year: 2014
  ident: 10.1016/j.patrec.2018.07.017_bib0004
  article-title: Scalable k-means by ranked retrieval
– start-page: 911
  year: 2012
  ident: 10.1016/j.patrec.2018.07.017_bib0008
  article-title: Improving retrieval of short texts through document expansion
– start-page: 147
  year: 2003
  ident: 10.1016/j.patrec.2018.07.017_bib0009
  article-title: Using the triangle inequality to accelerate k-means
– start-page: 120
  year: 2001
  ident: 10.1016/j.patrec.2018.07.017_bib0013
  article-title: Relevance based language models
– start-page: 254
  year: 1999
  ident: 10.1016/j.patrec.2018.07.017_bib0022
  article-title: Cluster-based language models for distributed retrieval
– volume: 36
  start-page: 3336
  issue: 2
  year: 2009
  ident: 10.1016/j.patrec.2018.07.017_bib0018
  article-title: A simple and fast algorithm for k-medoids clustering
  publication-title: Expert Syst. Appl.
  doi: 10.1016/j.eswa.2008.01.039
– start-page: 277
  year: 1999
  ident: 10.1016/j.patrec.2018.07.017_bib0019
  article-title: Accelerating exact k-means algorithms with geometric reasoning
– start-page: 333
  year: 2013
  ident: 10.1016/j.patrec.2018.07.017_bib0021
  article-title: Ranking document clusters using markov random fields
– volume: 64
  start-page: 1
  issue: C
  year: 2017
  ident: 10.1016/j.patrec.2018.07.017_bib0005
  article-title: A K-partitioning algorithm for clustering large-scale spatio-textual data
  publication-title: Inf. Syst.
  doi: 10.1016/j.is.2016.08.003
– start-page: 727
  year: 2000
  ident: 10.1016/j.patrec.2018.07.017_bib0020
  article-title: X-means: Extending k-means with efficient estimation of the number of clusters
SSID ssj0006398
Score 2.2735353
Snippet •Investigate K-means clustering for large collections of sparse vectors of high dimensionality.•Proposed utilizing the inverted list data-structure to improve...
K-means, along with its several other variants, is the most widely used family of partitional clustering algorithms. Generally speaking, this family of...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 198
SubjectTerms Algorithms
Centroids
Clustering
Computation
Data points
Data structures
Datasets
Inverted index
Iterative methods
Partitions
Problem solving
Scalable K-means
Tweet clustering
Title A Fast Partitional Clustering Algorithm based on Nearest Neighbours Heuristics
URI https://dx.doi.org/10.1016/j.patrec.2018.07.017
https://www.proquest.com/docview/2119939238
Volume 112
WOSCitedRecordID wos000443950800029&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-7344
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006398
  issn: 0167-8655
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Nb9MwFLdGxwEOMAaIwUA-7DYFJbETO8cKbYMJRZM2pN6sfNjQqaRTk0778_ee7aTdKjRA2iVKojqN_Ht-X_H7PUIOMsMrJjkLRGjSgKe6CCAEg4XHdSUyGRe1sUh_F3kuJ5PszPe7b207AdE08uYmu3pUqOEegI2ls_8A9_BQuAHnADocAXY4_hXw48Pjou3ANVw4HiLMDsyWyIdgMyCzn_PFtPv1-xDtV43fCnKksYUROWZJMcnZgjFaegbndef1zHJxYv2L33QEg2e2HGhwzE8w_emaV4MmK9rpnbRCJId9Uz7XtVHv4tKPoFaxlNVZD6cyJUArmGNxHHSq3xvttGLkGk17Axu7fsMbutulES4_40cAjeySkbS8qq628x4r9jm-Cr4JqCRk4GdPyHYskkyOyPb429HkdDDH4ILJnuAdB_T1k3aT3-Z__ck_uWeprftxsUNe-LiBjh3er8iWbnbJy74nB_Uqepc8XyOYfE3yMUVhoGvCQFfCQAdhoFYY6LyhXhjoShjoShjekB_HRxdfvga-g0YACzDuAg7BvalZYqQuTVjVYWRMqaM6TWJZZCaBOWEhL-GC66xITMkFY7pihvEiZUazt2TUzBv9jtBUmhTjUaNRz1es1Lqso1DLOk5MHZk9wvp5U5Wnl8cuJzPV7yO8VG62Fc62CoWC2d4jwTDqytGrPPB70UOivIvoXD8FUvTAyP0eQeVXa6uQ3jCDCIHJ9__94A_k2Wr97JNRt1jqj-Rpdd1N28UnL423pHCYKQ
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Fast+Partitional+Clustering+Algorithm+based+on+Nearest+Neighbours+Heuristics&rft.jtitle=Pattern+recognition+letters&rft.au=Ganguly%2C+Debasis&rft.date=2018-09-01&rft.pub=Elsevier+B.V&rft.issn=0167-8655&rft.eissn=1872-7344&rft.volume=112&rft.spage=198&rft.epage=204&rft_id=info:doi/10.1016%2Fj.patrec.2018.07.017&rft.externalDocID=S0167865518303143
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8655&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8655&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8655&client=summon