Distributed Strategies for Mining Outliers in Large Data Sets

We introduce a distributed method for detecting distance-based outliers in very large data sets. Our approach is based on the concept of outlier detection solving set [2], which is a small subset of the data set that can be also employed for predicting novel outliers. The method exploits parallel co...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on knowledge and data engineering Ročník 25; číslo 7; s. 1520 - 1532
Hlavní autoři: Angiulli, F., Basta, S., Lodi, S., Sartori, C.
Médium: Journal Article
Jazyk:angličtina
Vydáno: IEEE 01.07.2013
Témata:
ISSN:1041-4347
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract We introduce a distributed method for detecting distance-based outliers in very large data sets. Our approach is based on the concept of outlier detection solving set [2], which is a small subset of the data set that can be also employed for predicting novel outliers. The method exploits parallel computation in order to obtain vast time savings. Indeed, beyond preserving the correctness of the result, the proposed schema exhibits excellent performances. From the theoretical point of view, for common settings, the temporal cost of our algorithm is expected to be at least three orders of magnitude faster than the classical nested-loop like approach to detect outliers. Experimental results show that the algorithm is efficient and that its running time scales quite well for an increasing number of nodes. We discuss also a variant of the basic strategy which reduces the amount of data to be transferred in order to improve both the communication cost and the overall runtime. Importantly, the solving set computed by our approach in a distributed environment has the same quality as that produced by the corresponding centralized method.
AbstractList We introduce a distributed method for detecting distance-based outliers in very large data sets. Our approach is based on the concept of outlier detection solving set [2], which is a small subset of the data set that can be also employed for predicting novel outliers. The method exploits parallel computation in order to obtain vast time savings. Indeed, beyond preserving the correctness of the result, the proposed schema exhibits excellent performances. From the theoretical point of view, for common settings, the temporal cost of our algorithm is expected to be at least three orders of magnitude faster than the classical nested-loop like approach to detect outliers. Experimental results show that the algorithm is efficient and that its running time scales quite well for an increasing number of nodes. We discuss also a variant of the basic strategy which reduces the amount of data to be transferred in order to improve both the communication cost and the overall runtime. Importantly, the solving set computed by our approach in a distributed environment has the same quality as that produced by the corresponding centralized method.
Author Angiulli, F.
Basta, S.
Lodi, S.
Sartori, C.
Author_xml – sequence: 1
  givenname: F.
  surname: Angiulli
  fullname: Angiulli, F.
  email: f.angiulli@dimes.unical.it
  organization: DIMES Dept., Univ. of Calabria, Rende, Italy
– sequence: 2
  givenname: S.
  surname: Basta
  fullname: Basta, S.
  email: basta@icar.cnr.it
  organization: Inst. of High-Performance Comput. & Networking, Rende, Italy
– sequence: 3
  givenname: S.
  surname: Lodi
  fullname: Lodi, S.
  email: stefano.lodi@unibo.it
  organization: Dept. of Comput. Sci. & Eng., Univ. of Bologna, Bologna, Italy
– sequence: 4
  givenname: C.
  surname: Sartori
  fullname: Sartori, C.
  email: claudio.sartori@unibo.it
  organization: Dept. of Comput. Sci. & Eng., Univ. of Bologna, Bologna, Italy
BookMark eNp1jz1PwzAURT0UibawsbH4B5DwbMd1MjCgpnyIoA4tc-Q4z5FRSJDtDvx7GhUxIDHd5dyrexZkNowDEnLFIGUMitv9S7lJOTCeKjYjcwYZSzKRqXOyCOEdAHKVszm5K12I3jWHiC3dRa8jdg4DtaOnr25wQ0e3h9g79IG6gVbad0hLHTXdYQwX5MzqPuDlTy7J28Nmv35Kqu3j8_q-SgyXMiYFAtqMc5RSoW2FkMxY1Fy1KjcGGpu3bQMrpsAgWqVzgaAFcqNawQGZWBJ-2jV-DMGjrY2LOrpxOD52fc2gnqTrSbqepGs1lW7-lD69-9D-6z_8-oQ7RPxFj6dkXqzEN-VlZbg
CODEN ITKEEH
CitedBy_id crossref_primary_10_1016_j_ins_2019_07_045
crossref_primary_10_1016_j_neucom_2015_05_135
crossref_primary_10_1016_j_eswa_2016_10_026
crossref_primary_10_1080_0305215X_2024_2315501
crossref_primary_10_1109_TPDS_2016_2528984
crossref_primary_10_1155_2017_2649535
crossref_primary_10_1016_j_is_2020_101569
crossref_primary_10_1016_j_ins_2015_11_005
crossref_primary_10_1016_j_procs_2020_01_006
crossref_primary_10_1109_TKDE_2016_2555804
crossref_primary_10_1007_s10586_018_1767_1
crossref_primary_10_1109_TCSS_2019_2918193
crossref_primary_10_1080_01431161_2024_2377837
crossref_primary_10_1016_j_knosys_2021_107256
crossref_primary_10_1109_TSMC_2017_2718592
crossref_primary_10_1007_s11390_015_1596_0
crossref_primary_10_1080_08839514_2021_1975393
crossref_primary_10_1016_j_eswa_2020_113215
crossref_primary_10_1007_s11042_018_5905_9
crossref_primary_10_1186_s40537_020_00320_x
crossref_primary_10_1016_j_eswa_2019_02_020
crossref_primary_10_3390_f16060915
crossref_primary_10_1145_3381028
crossref_primary_10_1080_00051144_2019_1576966
crossref_primary_10_1016_j_trc_2014_07_005
Cites_doi 10.1145/956750.956758
10.1109/TKDE.2005.31
10.1007/s10618-008-0093-2
10.1007/s10618-005-0014-6
10.1007/978-3-642-15277-1_32
10.1145/1541880.1541882
10.1109/60.749142
10.1145/1497577.1497581
10.1145/1150402.1150447
10.1109/TKDE.2006.29
10.1049/cp:19950597
10.1109/ICDM.2005.116
10.1109/ACC.2002.1024528
10.1145/342009.335437
10.1137/1.9781611972771.47
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TKDE.2012.71
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library (IEL) (UW System Shared)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EndPage 1532
ExternalDocumentID 10_1109_TKDE_2012_71
6175896
Genre orig-research
GroupedDBID -~X
.DC
0R~
1OL
29I
4.4
5GY
5VS
6IK
97E
9M8
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RXW
RZB
TAE
TAF
TN5
UHB
VH1
AAYXX
CITATION
ID FETCH-LOGICAL-c255t-9e0ef422e557efd3351cfea27d78cc0bf8ddb06170ceef7a83e0a3e2c7d320e13
IEDL.DBID RIE
ISICitedReferencesCount 42
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000319461800007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1041-4347
IngestDate Tue Nov 18 22:35:36 EST 2025
Sat Nov 29 08:05:25 EST 2025
Wed Aug 27 02:52:16 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c255t-9e0ef422e557efd3351cfea27d78cc0bf8ddb06170ceef7a83e0a3e2c7d320e13
PageCount 13
ParticipantIDs crossref_primary_10_1109_TKDE_2012_71
crossref_citationtrail_10_1109_TKDE_2012_71
ieee_primary_6175896
PublicationCentury 2000
PublicationDate 2013-07-01
PublicationDateYYYYMMDD 2013-07-01
PublicationDate_xml – month: 07
  year: 2013
  text: 2013-07-01
  day: 01
PublicationDecade 2010
PublicationTitle IEEE transactions on knowledge and data engineering
PublicationTitleAbbrev TKDE
PublicationYear 2013
Publisher IEEE
Publisher_xml – name: IEEE
References Han (ref11)
ref13
Asuncion (ref5)
ref20
(ref14) 2000
Hung (ref12); 12
ref22
ref10
ref21
ref2
ref1
ref17
ref19
ref18
ref8
ref7
Knorr (ref15)
ref9
ref4
ref3
ref6
Koufakou (ref16); 20
References_xml – volume-title: Advances in Distributed and Parallel Knowledge Discovery
  year: 2000
  ident: ref14
– start-page: 392
  volume-title: Proc. 24rd Int’l Conf. Very Large Data Bases (VLDB)
  ident: ref15
  article-title: Algorithms for Mining Distance-Based Outliers in Large Datasets
– ident: ref6
  doi: 10.1145/956750.956758
– ident: ref4
  doi: 10.1109/TKDE.2005.31
– ident: ref9
  doi: 10.1007/s10618-008-0093-2
– ident: ref18
  doi: 10.1007/s10618-005-0014-6
– ident: ref1
  doi: 10.1007/978-3-642-15277-1_32
– volume: 20
  start-page: 259
  volume-title: Data Mining Knowledge Discovery
  ident: ref16
  article-title: A Fast Outlier Detection Strategy for Distributed High-Dimensional Data Sets with Mixed Attributes
– ident: ref7
  doi: 10.1145/1541880.1541882
– ident: ref10
  doi: 10.1109/60.749142
– ident: ref3
  doi: 10.1145/1497577.1497581
– ident: ref20
  doi: 10.1145/1150402.1150447
– ident: ref2
  doi: 10.1109/TKDE.2006.29
– ident: ref21
  doi: 10.1049/cp:19950597
– volume: 12
  start-page: 5
  issue: 1
  volume-title: Distributed and Parallel Databases
  ident: ref12
  article-title: Parallel Mining of Outliers in Large Database
– ident: ref17
  doi: 10.1109/ICDM.2005.116
– ident: ref13
  doi: 10.1109/ACC.2002.1024528
– ident: ref19
  doi: 10.1145/342009.335437
– ident: ref8
  doi: 10.1137/1.9781611972771.47
– volume-title: Data Mining, Concepts and Technique
  ident: ref11
– volume-title: Large-Scale Parallel Data Mining
  ident: ref22
– volume-title: UCI Machine Learning Repository
  ident: ref5
SSID ssj0008781
Score 2.298693
Snippet We introduce a distributed method for detecting distance-based outliers in very large data sets. Our approach is based on the concept of outlier detection...
SourceID crossref
ieee
SourceType Enrichment Source
Index Database
Publisher
StartPage 1520
SubjectTerms Arrays
Data mining
Decision support systems
Distance-based outliers
Distributed databases
Nickel
outlier detection
parallel and distributed algorithms
Upper bound
Title Distributed Strategies for Mining Outliers in Large Data Sets
URI https://ieeexplore.ieee.org/document/6175896
Volume 25
WOSCitedRecordID wos000319461800007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE/IET Electronic Library (IEL) (UW System Shared)
  issn: 1041-4347
  databaseCode: RIE
  dateStart: 19890101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://ieeexplore.ieee.org/
  omitProxy: false
  ssIdentifier: ssj0008781
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwGP0CxIMeREEj_koPetLBtq60PXgwAjFR0URMuC1r9zUhIWBg-Pe7lrFgogdvy_Idlrd13_u21_cArrpdHelAGI-ZgHmRVtITSkjPRNTHnD9ExlcubIIPh2I8lm8VuC33wiCiE59h2x66f_npXK_sp7JO3m2ZkN0qVDnn671a5VtXcBdImk8X-UxEI16K3GVn9NTrWxFX2ObBj_azlafi2smg_r8LOYD9gjaS-_V9PoQKzhpQ30QykGKFNmBvy1-wCXc9a4trE60wJRsfWlySnKiSF5cMQV5X2dSmYZPJjDxbVTjpJVlC3jFbHsHHoD96ePSKvARP54NB5kn00URhiIxxNCmlLNAGk5CnXGjtKyPSVFnK4ued0fBEUPQTiqHmKQ19DOgx1GbzGZ4AoUmEnOfUTlEWKRqohEnBpOmahAmjeAtuNjDGujATt5kW09gNFb6MLeixBT3mQQuuy-rPtYnGH3VNi3VZU8B8-vvpM9gNXTaF1c6eQy1brPACdvRXNlkuLt3T8Q2Ewbi6
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwGP3QKagHf4u_zUFPWm2bZEkOHsQpinMKTvBWkvQLCLLJ1vn322RdmaAHb6V8h_La9Htf-_IewHGzaZlNpIu4S3jErFGRNFJFjtEYS_7AXGxC2ITodOTbm3qegbN6LwwiBvEZnvvD8C8_79uR_1R2UXZbLlVzFuY4Y2ky3q1Vv3elCJGk5XxRTkWUiVrmri66D60bL-NKz0XyowFNJaqEhnK78r9LWYXlijiSq_GdXoMZ7K3DyiSUgVRrdB2WphwGN-Cy5Y1xfaYV5mTiRItDUlJV8hiyIcjTqPjwedjkvUfaXhdOWrrQ5AWL4Sa83t50r--iKjEhsuVoUEQKY3QsTZFzgS6nlCfWoU5FLqS1sXEyz40nLXHZG53QkmKsKaZW5DSNMaFb0Oj1e7gNhGqGQpTkzlDODE2M5kpy5ZpOc-mM2IHTCYyZrezEfarFRxbGilhlHvTMg56JZAdO6urPsY3GH3UbHuu6poJ59_fTR7Bw131sZ-37zsMeLKYhqcIrafehUQxGeADz9qt4Hw4Ow5PyDUz_vAE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Distributed+Strategies+for+Mining+Outliers+in+Large+Data+Sets&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Angiulli%2C+F.&rft.au=Basta%2C+S.&rft.au=Lodi%2C+S.&rft.au=Sartori%2C+C.&rft.date=2013-07-01&rft.pub=IEEE&rft.issn=1041-4347&rft.volume=25&rft.issue=7&rft.spage=1520&rft.epage=1532&rft_id=info:doi/10.1109%2FTKDE.2012.71&rft.externalDocID=6175896
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon