A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach

Partitional clustering algorithms represent an interesting issue in pattern recognition due to their high scalability and efficiency. The k-means, proposed since 1965, had shown great efficiency for numeric clustering but is unfortunately inadequate for categorical clustering. In 1998, the k-modes w...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computers & electrical engineering Ročník 68; s. 463 - 483
Hlavní autoři: Ben Salem, Semeh, Naouali, Sami, Chtourou, Zied
Médium: Journal Article
Jazyk:angličtina
Vydáno: Amsterdam Elsevier Ltd 01.05.2018
Elsevier BV
Témata:
ISSN:0045-7906, 1879-0755
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Partitional clustering algorithms represent an interesting issue in pattern recognition due to their high scalability and efficiency. The k-means, proposed since 1965, had shown great efficiency for numeric clustering but is unfortunately inadequate for categorical clustering. In 1998, the k-modes was proposed as an extension of the k-means to cluster categorical datasets. In this paper, a new categorical method based on partitions called Manhattan Frequency k-Means (MFk-M) is detailed. It aims to convert the initial categorical data into numeric values using the relative frequency of each modality in the attributes. The L1 (Manhattan distance) norm was also used as an evaluation distance measure to compute the distance between the observations and the centroids. Finally, an approximation is defined to evaluate each resulting partition during the execution of the algorithm to avoid trivial clusterings such as cluster death. Experimental analysis performed on real life datasets highlights the reduced complexity costs and high efficiency of our proposal when compared to the standard k-means and k-modes algorithms.
AbstractList Partitional clustering algorithms represent an interesting issue in pattern recognition due to their high scalability and efficiency. The k-means, proposed since 1965, had shown great efficiency for numeric clustering but is unfortunately inadequate for categorical clustering. In 1998, the k-modes was proposed as an extension of the k-means to cluster categorical datasets. In this paper, a new categorical method based on partitions called Manhattan Frequency k-Means (MFk-M) is detailed. It aims to convert the initial categorical data into numeric values using the relative frequency of each modality in the attributes. The L1 (Manhattan distance) norm was also used as an evaluation distance measure to compute the distance between the observations and the centroids. Finally, an approximation is defined to evaluate each resulting partition during the execution of the algorithm to avoid trivial clusterings such as cluster death. Experimental analysis performed on real life datasets highlights the reduced complexity costs and high efficiency of our proposal when compared to the standard k-means and k-modes algorithms.
Author Chtourou, Zied
Ben Salem, Semeh
Naouali, Sami
Author_xml – sequence: 1
  givenname: Semeh
  surname: Ben Salem
  fullname: Ben Salem, Semeh
  email: semehbensalem0@gmail.com
  organization: Virtual Reality and Information Technology (VRIT), Military Academy of Fandouk Jedid, Tunisia
– sequence: 2
  givenname: Sami
  surname: Naouali
  fullname: Naouali, Sami
  email: snaouali@gmail.com
  organization: Virtual Reality and Information Technology (VRIT), Military Academy of Fandouk Jedid, Tunisia
– sequence: 3
  givenname: Zied
  surname: Chtourou
  fullname: Chtourou, Zied
  email: ziedchtourou@gmail.com
  organization: Digital Research Center of Sfax, B.P. 275, Sakiet Ezzit, Sfax 3021, Tunisia
BookMark eNqNkUFv2zAMhYUhA5Zk-w8adrZH2ZFsn4oiaLsBBXrZzgItU4kyx_IkpUD_fZWlh6GnnAQS7z2RH1dsMfmJGPsqoBQg1PdDafxxppEMTbuyAtGWsCmhqj-wpWibroBGygVbAmxk0XSgPrFVjAfItRLtksVbbjEmjtPAyVoyyT0TnzEkl5yfcORmPMVEwU07juPOB5f2R2594COGHXGDic5dk6UDJoyUIj_Ff3L-pzgSTpH3uT1wnOfg0ew_s48Wx0hf3t41-31_92v7o3h8evi5vX0szAYgFUMjqVZtZ2pUJs_W96pXgqzopKqkVJ3tu7YiApSQt8zqoW-MFHW2GWFtvWbfLrn5278nikkf_CnknaKuoG2hgbqCrOouKhN8jIGsnoM7YnjRAvSZsT7o_xjrM2MNG50ZZ-_NO69xCc_gUkA3XpWwvSRQBvHsKOhoHE2GBhfyMfTg3RUprwuKpN0
CitedBy_id crossref_primary_10_32604_jai_2023_043229
crossref_primary_10_1155_2023_2206625
crossref_primary_10_1016_j_bdr_2020_100170
crossref_primary_10_1007_s13042_021_01293_w
crossref_primary_10_1016_j_eswa_2025_126608
crossref_primary_10_1088_1757_899X_928_3_032081
crossref_primary_10_1016_j_procs_2024_09_444
crossref_primary_10_1155_2020_6617597
crossref_primary_10_1016_j_segan_2023_101091
crossref_primary_10_1016_j_patrec_2022_04_026
crossref_primary_10_1007_s13369_020_04620_5
crossref_primary_10_1002_srin_202000719
crossref_primary_10_1007_s10462_024_10920_1
crossref_primary_10_1016_j_eswa_2020_113555
crossref_primary_10_1016_j_uclim_2024_102234
crossref_primary_10_1080_1206212X_2019_1587892
crossref_primary_10_1016_j_marpolbul_2022_114329
crossref_primary_10_1007_s11869_022_01254_4
crossref_primary_10_1016_j_compenvurbsys_2023_101969
crossref_primary_10_1016_j_eswa_2021_115054
crossref_primary_10_1016_j_procs_2019_08_082
crossref_primary_10_1016_j_eswa_2019_112910
crossref_primary_10_4018_IJSWIS_346377
crossref_primary_10_1007_s42405_024_00814_5
crossref_primary_10_1080_0951192X_2023_2177748
crossref_primary_10_3390_make6020047
crossref_primary_10_1051_matecconf_201925506008
crossref_primary_10_2478_cait_2023_0010
Cites_doi 10.1016/j.patrec.2017.03.008
10.1148/radiol.2016160293
10.1007/s10295-013-1368-1
10.1016/j.eswa.2013.07.002
10.1016/S0031-3203(02)00060-2
10.1016/j.neucom.2013.11.024
10.1023/A:1009769707641
10.1016/j.eswa.2012.07.021
10.1016/j.neucom.2012.11.009
10.1016/j.cose.2015.09.005
10.1016/j.datak.2007.03.016
10.1016/j.ijleo.2015.09.093
10.1016/j.patcog.2014.01.015
10.1016/j.knosys.2011.07.011
10.1109/TPAMI.2015.2462338
10.1016/0167-8655(95)00075-R
10.1109/TPAMI.2007.53
ContentType Journal Article
Copyright 2018 Elsevier Ltd
Copyright Elsevier BV May 2018
Copyright_xml – notice: 2018 Elsevier Ltd
– notice: Copyright Elsevier BV May 2018
DBID AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.compeleceng.2018.04.023
DatabaseName CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1879-0755
EndPage 483
ExternalDocumentID 10_1016_j_compeleceng_2018_04_023
S0045790617327131
GroupedDBID --K
--M
.DC
.~1
0R~
1B1
1~.
1~5
29F
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABEFU
ABFNM
ABJNI
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFFNX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG9
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
R2-
RIG
ROL
RPZ
RXW
SBC
SDF
SDG
SDP
SES
SET
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TAE
TN5
UHS
VOH
WH7
WUQ
XPP
ZMT
~G-
~S-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
7SC
7SP
8FD
AFXIZ
AGCQF
AGRNS
JQ2
L7M
L~C
L~D
SSH
ID FETCH-LOGICAL-c400t-d75e3689c3a6ceffbb6b61ef195625569fb982ee0a5007575edb7c5135e3c1ff3
ISICitedReferencesCount 34
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000437999300036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0045-7906
IngestDate Mon Jul 14 10:32:45 EDT 2025
Sat Nov 29 03:04:34 EST 2025
Tue Nov 18 22:12:43 EST 2025
Fri Feb 23 02:25:59 EST 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords k-means
Crime Mining
Pattern recognition
Categorical clustering
k-modes
Unsupervised learning
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c400t-d75e3689c3a6ceffbb6b61ef195625569fb982ee0a5007575edb7c5135e3c1ff3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://hdl.handle.net/11323/5136
PQID 2088070320
PQPubID 2045266
PageCount 21
ParticipantIDs proquest_journals_2088070320
crossref_primary_10_1016_j_compeleceng_2018_04_023
crossref_citationtrail_10_1016_j_compeleceng_2018_04_023
elsevier_sciencedirect_doi_10_1016_j_compeleceng_2018_04_023
PublicationCentury 2000
PublicationDate May 2018
2018-05-00
20180501
PublicationDateYYYYMMDD 2018-05-01
PublicationDate_xml – month: 05
  year: 2018
  text: May 2018
PublicationDecade 2010
PublicationPlace Amsterdam
PublicationPlace_xml – name: Amsterdam
PublicationTitle Computers & electrical engineering
PublicationYear 2018
Publisher Elsevier Ltd
Elsevier BV
Publisher_xml – name: Elsevier Ltd
– name: Elsevier BV
References Salem, Naouali, Sallami (bib0028) 2017; 11
Ben Salem, Naouali (bib0030) 2016; 7
Pizutti (bib0024) 2016
Tzortzis, Likas (bib0022) 2014; 47
Mostafa, Karray, Mohamed (bib0019) 2012
Cao, liang, Li, Zhao (bib0012) 2013; 108
Boddy (bib0006) 2014; 41
Ienco, Pensa, Meo (bib0020) 2009
Bai, Liang (bib0010) 2014; 133
Ng, Li, Huang, He (bib0027) 2007; 29
Ding, Choi, Tao, Larry (bib0005) 2016; 38
Rostami, Badkoobe, Mohanna (bib0007) 2017
Gan, Kwok-PoNg (bib0014) 2017; 90
West, Bhattacharya (bib0002) 2016; 57
Khan, Ahmad (bib0011) 2013; 40
Cao, Liang, Li, Bai, Dang (bib0013) 2012; 26
Ralambondrainy (bib0026) 1995; 16
Romeo, Tagarelli, Ienco (bib0018) 2014
Ahmad, Dey (bib0016) 2007
Salem, Naouali, Sallami (bib0029) 2017; 11
Shmueli, Bruce, Yahav, Patel, Lichtendahl (bib0001) 2017
Naouali, Ben Salem (bib0009) 2016; 6
Likas, Vlassis, Verbeek (bib0023) 2003; 36
Huang (bib0015) 1998; 2
Xia (bib0025) 2015; 126
X. Dong Kuan, & T. Yingjie A comprehensive survey of clustering algorithms. Springer Verlag Berlin Heidelberg, 2015.
Ben Salem, Naouali (bib0008) 2015
Celebi, Kingravi, Vela (bib0021) 2013; 40
Hazra, Chowdhury, Dutta (bib0003) 2016
Thrall (bib0004) 2016; 279
Romeo (10.1016/j.compeleceng.2018.04.023_bib0018) 2014
Rostami (10.1016/j.compeleceng.2018.04.023_bib0007) 2017
Ng (10.1016/j.compeleceng.2018.04.023_bib0027) 2007; 29
Naouali (10.1016/j.compeleceng.2018.04.023_bib0009) 2016; 6
Mostafa (10.1016/j.compeleceng.2018.04.023_bib0019) 2012
Ienco (10.1016/j.compeleceng.2018.04.023_bib0020) 2009
Ben Salem (10.1016/j.compeleceng.2018.04.023_bib0030) 2016; 7
10.1016/j.compeleceng.2018.04.023_bib0017
Salem (10.1016/j.compeleceng.2018.04.023_bib0029) 2017; 11
Huang (10.1016/j.compeleceng.2018.04.023_bib0015) 1998; 2
Cao (10.1016/j.compeleceng.2018.04.023_bib0012) 2013; 108
Celebi (10.1016/j.compeleceng.2018.04.023_bib0021) 2013; 40
Ralambondrainy (10.1016/j.compeleceng.2018.04.023_bib0026) 1995; 16
Ding (10.1016/j.compeleceng.2018.04.023_bib0005) 2016; 38
Bai (10.1016/j.compeleceng.2018.04.023_bib0010) 2014; 133
Pizutti (10.1016/j.compeleceng.2018.04.023_bib0024) 2016
West (10.1016/j.compeleceng.2018.04.023_bib0002) 2016; 57
Boddy (10.1016/j.compeleceng.2018.04.023_bib0006) 2014; 41
Gan (10.1016/j.compeleceng.2018.04.023_bib0014) 2017; 90
Likas (10.1016/j.compeleceng.2018.04.023_bib0023) 2003; 36
Cao (10.1016/j.compeleceng.2018.04.023_bib0013) 2012; 26
Tzortzis (10.1016/j.compeleceng.2018.04.023_bib0022) 2014; 47
Hazra (10.1016/j.compeleceng.2018.04.023_bib0003) 2016
Salem (10.1016/j.compeleceng.2018.04.023_bib0028) 2017; 11
Ahmad (10.1016/j.compeleceng.2018.04.023_bib0016) 2007
Thrall (10.1016/j.compeleceng.2018.04.023_bib0004) 2016; 279
Khan (10.1016/j.compeleceng.2018.04.023_bib0011) 2013; 40
Shmueli (10.1016/j.compeleceng.2018.04.023_bib0001) 2017
Ben Salem (10.1016/j.compeleceng.2018.04.023_bib0008) 2015
Xia (10.1016/j.compeleceng.2018.04.023_bib0025) 2015; 126
References_xml – volume: 2
  start-page: 283
  year: 1998
  end-page: 304
  ident: bib0015
  article-title: Extension to the k-means algorithm for clustering large datasets with categorical values
  publication-title: Data Min Knowl Discov
– volume: 36
  start-page: 451
  year: 2003
  end-page: 461
  ident: bib0023
  article-title: The global k-means clustering algorithm
  publication-title: Pattern Recognit
– year: 2017
  ident: bib0001
  article-title: Data mining for business analytics: concepts, techniques, and applications in R
– volume: 11
  start-page: 691
  year: 2017
  end-page: 696
  ident: bib0028
  article-title: Clustering categorical data using the k-means algorithm and the attribute's relative frequency
  publication-title: World Academy of Science, 19th international conference on machine learning and applications
– year: 2014
  ident: bib0018
  article-title: Clustering view-segmented documents via tensor modeling
  publication-title: Foundations of intelligent systems, 21st international symposium, ISMIS
– volume: 29
  year: 2007
  ident: bib0027
  article-title: On the impact of dissimilarity measure in K-modes clustering algorithm
  publication-title: IEEE Trans Pattern Anal Mach Intell
– volume: 38
  start-page: 518
  year: 2016
  end-page: 531
  ident: bib0005
  article-title: Davis multi-directional multi-level dual-cross patterns for robust face recognition
  publication-title: IEEE Trans Pattern Anal Mach Intell
– year: 2017
  ident: bib0007
  article-title: Survey on clustering in heterogeneous and homogeneous wireless sensor networks
  publication-title: J Supercomput
– start-page: 452
  year: 2015
  end-page: 459
  ident: bib0008
  article-title: Reducing the multidimensionality of OLAP cubes with genetic algorithms and multiple correspondence analysis
  publication-title: The international conference on advanced wireless, information, and communication technologies (AWICT 2015)
– volume: 40
  start-page: 7444
  year: 2013
  end-page: 7456
  ident: bib0011
  article-title: Cluster center initialization algorithm for k-modes clustering
  publication-title: Expert Syst Appl
– year: 2012
  ident: bib0019
  article-title: An improved k-means document clustering using Wikipedia hierarchical ontology
  publication-title: 21st international conference on pattern recognition (ICPR)
– volume: 47
  start-page: 2505
  year: 2014
  end-page: 2516
  ident: bib0022
  article-title: The min-max k-means clustering algorithm
  publication-title: Pattern Recognit
– year: 2016
  ident: bib0003
  article-title: Cluster based medical image registration using optimized neural network
  publication-title: Handbook of research on advanced hybrid intelligent techniques and applications
– volume: 26
  start-page: 120
  year: 2012
  end-page: 127
  ident: bib0013
  article-title: A dissimilarity measure for the k-modes clustering algorithm
  publication-title: Knowl-Based Syst
– volume: 16
  start-page: 1147
  year: 1995
  end-page: 1157
  ident: bib0026
  article-title: A conceptual version of the k-means algorithm
  publication-title: Pattern Recognit Lett
– volume: 11
  start-page: 691
  year: 2017
  end-page: 696
  ident: bib0029
  article-title: A computational cost-effective clustering algorithm in multidimensional space using the manhattan metric: application to the global terrorism database
  publication-title: World Academy of Science, 19th international conference on machine learning and applications
– volume: 90
  start-page: 8
  year: 2017
  end-page: 14
  ident: bib0014
  article-title: k-means clustering with outlier removal
  publication-title: Pattern Recognit Lett
– volume: 108
  start-page: 23
  year: 2013
  end-page: 30
  ident: bib0012
  article-title: A weighting k-modes algorithm for subspace clustering of categorical data
  publication-title: Neurocomputing
– volume: 6
  year: 2016
  ident: bib0009
  article-title: Towards reducing the multidimensionality of OLAP cubes using the evolutionary algorithms and factor analysis methods
  publication-title: Int J Data Min Knowl Manage Process (IJDKP)
– volume: 133
  start-page: 111
  year: 2014
  end-page: 121
  ident: bib0010
  article-title: The k-modes type clustering plus between-cluster information for categorical data
  publication-title: Neurocomputing
– reference: X. Dong Kuan, & T. Yingjie A comprehensive survey of clustering algorithms. Springer Verlag Berlin Heidelberg, 2015.
– year: 2009
  ident: bib0020
  article-title: Context-based distance learning for categorical data clustering
  publication-title: Advances in intelligent data analysis: 8th international symposium on intelligent data analysis, IDA
– volume: 7
  year: 2016
  ident: bib0030
  article-title: Pattern recognition approach in multidimensional databases: application to the global terrorism database
  publication-title: Int J Adv Comput Sci Appl (IJACSA)
– volume: 57
  start-page: 47
  year: 2016
  end-page: 66
  ident: bib0002
  article-title: Intelligent financial fraud detection: a comprehensive review
  publication-title: Comput Secur
– start-page: 211
  year: 2016
  end-page: 222
  ident: bib0024
  article-title: A k-means based genetic algorithm for data clustering
  publication-title: International joint conference SOCO’16-CISIS’16-ICEUTE’16
– volume: 126
  start-page: 5614
  year: 2015
  end-page: 5619
  ident: bib0025
  article-title: Effectiveness of the Euclidean distance in high dimensional spaces
  publication-title: Int J Light Electron Opt
– volume: 40
  start-page: 200
  year: 2013
  end-page: 210
  ident: bib0021
  article-title: A comparative study of efficient initialization methods for the k-means clustering algorithm
  publication-title: Expert Syst Appl
– volume: 279
  year: 2016
  ident: bib0004
  article-title: Trends and developments shaping the future of diagnostic medical imaging: 2015 annual oration in diagnostic
  publication-title: Radiology
– volume: 41
  start-page: 443
  year: 2014
  end-page: 450
  ident: bib0006
  article-title: Bioinformatics tools for genome mining of polyketide and non-ribosomal peptides
  publication-title: J Ind Microbiol Biotechnol
– start-page: 503
  year: 2007
  end-page: 527
  ident: bib0016
  article-title: A k-means clustering algorithm for mixed numeric and categorical data
  publication-title: Data Knowl Eng
– volume: 90
  start-page: 8
  year: 2017
  ident: 10.1016/j.compeleceng.2018.04.023_bib0014
  article-title: k-means clustering with outlier removal
  publication-title: Pattern Recognit Lett
  doi: 10.1016/j.patrec.2017.03.008
– year: 2016
  ident: 10.1016/j.compeleceng.2018.04.023_bib0003
  article-title: Cluster based medical image registration using optimized neural network
– volume: 279
  issue: 3
  year: 2016
  ident: 10.1016/j.compeleceng.2018.04.023_bib0004
  article-title: Trends and developments shaping the future of diagnostic medical imaging: 2015 annual oration in diagnostic
  publication-title: Radiology
  doi: 10.1148/radiol.2016160293
– volume: 41
  start-page: 443
  issue: 2
  year: 2014
  ident: 10.1016/j.compeleceng.2018.04.023_bib0006
  article-title: Bioinformatics tools for genome mining of polyketide and non-ribosomal peptides
  publication-title: J Ind Microbiol Biotechnol
  doi: 10.1007/s10295-013-1368-1
– volume: 40
  start-page: 7444
  year: 2013
  ident: 10.1016/j.compeleceng.2018.04.023_bib0011
  article-title: Cluster center initialization algorithm for k-modes clustering
  publication-title: Expert Syst Appl
  doi: 10.1016/j.eswa.2013.07.002
– volume: 36
  start-page: 451
  year: 2003
  ident: 10.1016/j.compeleceng.2018.04.023_bib0023
  article-title: The global k-means clustering algorithm
  publication-title: Pattern Recognit
  doi: 10.1016/S0031-3203(02)00060-2
– volume: 133
  start-page: 111
  year: 2014
  ident: 10.1016/j.compeleceng.2018.04.023_bib0010
  article-title: The k-modes type clustering plus between-cluster information for categorical data
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2013.11.024
– year: 2014
  ident: 10.1016/j.compeleceng.2018.04.023_bib0018
  article-title: Clustering view-segmented documents via tensor modeling
– year: 2012
  ident: 10.1016/j.compeleceng.2018.04.023_bib0019
  article-title: An improved k-means document clustering using Wikipedia hierarchical ontology
– start-page: 211
  year: 2016
  ident: 10.1016/j.compeleceng.2018.04.023_bib0024
  article-title: A k-means based genetic algorithm for data clustering
– start-page: 452
  year: 2015
  ident: 10.1016/j.compeleceng.2018.04.023_bib0008
  article-title: Reducing the multidimensionality of OLAP cubes with genetic algorithms and multiple correspondence analysis
– volume: 2
  start-page: 283
  year: 1998
  ident: 10.1016/j.compeleceng.2018.04.023_bib0015
  article-title: Extension to the k-means algorithm for clustering large datasets with categorical values
  publication-title: Data Min Knowl Discov
  doi: 10.1023/A:1009769707641
– year: 2009
  ident: 10.1016/j.compeleceng.2018.04.023_bib0020
  article-title: Context-based distance learning for categorical data clustering
– volume: 11
  start-page: 691
  year: 2017
  ident: 10.1016/j.compeleceng.2018.04.023_bib0029
  article-title: A computational cost-effective clustering algorithm in multidimensional space using the manhattan metric: application to the global terrorism database
– volume: 40
  start-page: 200
  year: 2013
  ident: 10.1016/j.compeleceng.2018.04.023_bib0021
  article-title: A comparative study of efficient initialization methods for the k-means clustering algorithm
  publication-title: Expert Syst Appl
  doi: 10.1016/j.eswa.2012.07.021
– volume: 108
  start-page: 23
  year: 2013
  ident: 10.1016/j.compeleceng.2018.04.023_bib0012
  article-title: A weighting k-modes algorithm for subspace clustering of categorical data
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2012.11.009
– volume: 57
  start-page: 47
  year: 2016
  ident: 10.1016/j.compeleceng.2018.04.023_bib0002
  article-title: Intelligent financial fraud detection: a comprehensive review
  publication-title: Comput Secur
  doi: 10.1016/j.cose.2015.09.005
– start-page: 503
  year: 2007
  ident: 10.1016/j.compeleceng.2018.04.023_bib0016
  article-title: A k-means clustering algorithm for mixed numeric and categorical data
  publication-title: Data Knowl Eng
  doi: 10.1016/j.datak.2007.03.016
– volume: 7
  issue: 8
  year: 2016
  ident: 10.1016/j.compeleceng.2018.04.023_bib0030
  article-title: Pattern recognition approach in multidimensional databases: application to the global terrorism database
  publication-title: Int J Adv Comput Sci Appl (IJACSA)
– volume: 126
  start-page: 5614
  year: 2015
  ident: 10.1016/j.compeleceng.2018.04.023_bib0025
  article-title: Effectiveness of the Euclidean distance in high dimensional spaces
  publication-title: Int J Light Electron Opt
  doi: 10.1016/j.ijleo.2015.09.093
– year: 2017
  ident: 10.1016/j.compeleceng.2018.04.023_bib0001
– volume: 47
  start-page: 2505
  year: 2014
  ident: 10.1016/j.compeleceng.2018.04.023_bib0022
  article-title: The min-max k-means clustering algorithm
  publication-title: Pattern Recognit
  doi: 10.1016/j.patcog.2014.01.015
– ident: 10.1016/j.compeleceng.2018.04.023_bib0017
– volume: 11
  start-page: 691
  year: 2017
  ident: 10.1016/j.compeleceng.2018.04.023_bib0028
  article-title: Clustering categorical data using the k-means algorithm and the attribute's relative frequency
– volume: 26
  start-page: 120
  year: 2012
  ident: 10.1016/j.compeleceng.2018.04.023_bib0013
  article-title: A dissimilarity measure for the k-modes clustering algorithm
  publication-title: Knowl-Based Syst
  doi: 10.1016/j.knosys.2011.07.011
– volume: 38
  start-page: 518
  issue: 3
  year: 2016
  ident: 10.1016/j.compeleceng.2018.04.023_bib0005
  article-title: Davis multi-directional multi-level dual-cross patterns for robust face recognition
  publication-title: IEEE Trans Pattern Anal Mach Intell
  doi: 10.1109/TPAMI.2015.2462338
– volume: 16
  start-page: 1147
  year: 1995
  ident: 10.1016/j.compeleceng.2018.04.023_bib0026
  article-title: A conceptual version of the k-means algorithm
  publication-title: Pattern Recognit Lett
  doi: 10.1016/0167-8655(95)00075-R
– volume: 29
  issue: 3
  year: 2007
  ident: 10.1016/j.compeleceng.2018.04.023_bib0027
  article-title: On the impact of dissimilarity measure in K-modes clustering algorithm
  publication-title: IEEE Trans Pattern Anal Mach Intell
  doi: 10.1109/TPAMI.2007.53
– volume: 6
  issue: 1
  year: 2016
  ident: 10.1016/j.compeleceng.2018.04.023_bib0009
  article-title: Towards reducing the multidimensionality of OLAP cubes using the evolutionary algorithms and factor analysis methods
  publication-title: Int J Data Min Knowl Manage Process (IJDKP)
– year: 2017
  ident: 10.1016/j.compeleceng.2018.04.023_bib0007
  article-title: Survey on clustering in heterogeneous and homogeneous wireless sensor networks
  publication-title: J Supercomput
SSID ssj0004618
Score 2.3694174
Snippet Partitional clustering algorithms represent an interesting issue in pattern recognition due to their high scalability and efficiency. The k-means, proposed...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 463
SubjectTerms Algorithms
Artificial intelligence
Categorical clustering
Centroids
Clustering
Complexity theory
Cost analysis
Crime Mining
Datasets
Distance measurement
Efficiency
k-means
k-modes
Partitions
Pattern recognition
Unsupervised learning
Title A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach
URI https://dx.doi.org/10.1016/j.compeleceng.2018.04.023
https://www.proquest.com/docview/2088070320
Volume 68
WOSCitedRecordID wos000437999300036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1879-0755
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004618
  issn: 0045-7906
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3di9QwEA_Lnog-iJ_c6SkRfCuFfqYJ-LLIifpwCHfCvpWkTc89u91l2z3u7_EvdSZJP1AOVsSXsnSbbNr57cxk-psZQt5FOlacKeGXiYANShqVvgwD4WslpSp1kgiuTLOJ7PycL5fi62z2s8-FuamzpuG3t2L7X0UN50DYmDr7F-IeJoUT8BmEDkcQOxwPEvzCq2RreeOWrIHcoC1e6MJ-Rb3H6ggmO7G-2uxW3fe1YRvWyAr3kCJ15SqHIH-01V3r7U1IQXo__LUG4-ah8SuHguRTD7dvE9EaUNkuO2YuPVY-HGIAGpQLWCiDyQu91mNsWm4w29PGrNerkYTQwSPY7M0rlZXLy3Ixi5CPDEEbSOuTaUbmklHOCVbPDFxlbKuPeYY5VraSb6-wbR8ep3ETpx-t8U5sV5w_7IINUVyjWLd453DLSOvjpsxtFI_GcKAoXuBycDVhFkewk4cd9lGUpYLPydHi89nyyyT7NrT23i3_Pnk7sgjv-MG7vKDf_AHj5Fw-Jo_c7oQuLKqekJlunpKHk5qVz0i7oIgvCviiA77oBF90xBcd8EUBX9Tgi07wRXt8UYMvKqnDFzX4oj2-npNvH88uP3zyXecOvwCb0PllluqYcVHEkhWwFqWYYqGuMDkVa96JSgkeaR1gP44Mdgy6VFmRhjEMK8Kqil-QebNp9DGhLJZlGhdCMlYlEjypikWaFVmShlUkmDohvH-SeeHK2mN3lTrv-YvX-UQIOQohD5IchHBComHo1tZ2OWTQ-15cuXNSrfOZA9YOGX7aizh3SqOF78GKgumNgpf_Nvsr8mD8u52Sebfb69fkXnHTrdrdGwfcX8RHzOs
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+fast+and+effective+partitional+clustering+algorithm+for+large+categorical+datasets+using+a+k-means+based+approach&rft.jtitle=Computers+%26+electrical+engineering&rft.au=Ben+Salem%2C+Semeh&rft.au=Naouali%2C+Sami&rft.au=Chtourou%2C+Zied&rft.date=2018-05-01&rft.pub=Elsevier+Ltd&rft.issn=0045-7906&rft.eissn=1879-0755&rft.volume=68&rft.spage=463&rft.epage=483&rft_id=info:doi/10.1016%2Fj.compeleceng.2018.04.023&rft.externalDocID=S0045790617327131
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0045-7906&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0045-7906&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0045-7906&client=summon