Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network

The inner product measures how closely two feature vectors are related. It is an important primitive for many popular data mining tasks, for example, clustering, classification, correlation computation, and decision tree construction. If the entire data set is available at a single site, then comput...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on knowledge and data engineering Ročník 20; číslo 4; s. 475 - 488
Hlavní autoři: Das, K., Bhaduri, K., Kun Liu, Kargupta, H.
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York, NY IEEE 01.04.2008
IEEE Computer Society
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1041-4347, 1558-2191
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The inner product measures how closely two feature vectors are related. It is an important primitive for many popular data mining tasks, for example, clustering, classification, correlation computation, and decision tree construction. If the entire data set is available at a single site, then computing the inner product matrix and identifying the top (in terms of magnitude) entries is trivial. However, in many real-world scenarios, data is distributed across many locations and transmitting the data to a central server would be quite communication intensive and not scalable. This paper presents an approximate local algorithm for identifying top-l, inner products among pairs of feature vectors in a large asynchronous distributed environment such as a peer-to-peer (P2P) network. We develop a probabilistic algorithm for this purpose using order statistics and the Hoeffding bound. We present experimental results to show the effectiveness and scalability of the algorithm. Finally, we demonstrate an application of this technique for interest-based community formation in a P2P environment.
AbstractList The inner product measures how closely two feature vectors are related. It is an important primitive for many popular data mining tasks, for example, clustering, classification, correlation computation, and decision tree construction. If the entire data set is available at a single site, then computing the inner product matrix and identifying the top (in terms of magnitude) entries is trivial. However, in many real-world scenarios, data is distributed across many locations and transmitting the data to a central server would be quite communication intensive and not scalable. This paper presents an approximate local algorithm for identifying top-l, inner products among pairs of feature vectors in a large asynchronous distributed environment such as a peer-to-peer (P2P) network. We develop a probabilistic algorithm for this purpose using order statistics and the Hoeffding bound. We present experimental results to show the effectiveness and scalability of the algorithm. Finally, we demonstrate an application of this technique for interest-based community formation in a P2P environment.
[...] we demonstrate an application of this technique for interest-based community formation in a P2P environment.
Author Das, K.
Kun Liu
Kargupta, H.
Bhaduri, K.
Author_xml – sequence: 1
  givenname: K.
  surname: Das
  fullname: Das, K.
  organization: Univ. of Maryland Baltimore County, Baltimore
– sequence: 2
  givenname: K.
  surname: Bhaduri
  fullname: Bhaduri, K.
  organization: Univ. of Maryland Baltimore County, Baltimore
– sequence: 3
  surname: Kun Liu
  fullname: Kun Liu
  organization: Univ. of Maryland Baltimore County, Baltimore
– sequence: 4
  givenname: H.
  surname: Kargupta
  fullname: Kargupta, H.
  organization: Univ. of Maryland Baltimore County, Baltimore
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=20194842$$DView record in Pascal Francis
BookMark eNp9kcFvFCEUh4mpie3q3cQLMVFPs8IACxybdqsbG-1hPROGeUTqLIzApPG_74zbeuihpx-H73vhvd8ZOokpAkJvKVlTSvTn_bfL7bolRK6pJpLyF-iUCqGalmp6Mr8Jpw1nXL5CZ6XcEkKUVPQU_boMpebQTRV6vOsh1uCDszWkiJPH-zQ2A97FCBnf5NRPruLtAIeZK9jGHoc5z8dxeHRCxBbfAOSmpmZJ_B3qXcq_X6OX3g4F3jzkCv282u4vvjbXP77sLs6vG8cUq422zlPN_UZ56BTnXd-2QimtCdkwq0XPne0c76Tj3gP1FrygRHrLBG01dWyFPh3njjn9maBUcwjFwTDYCGkqRklBhBJiM5MfnyUZF1K0869W6P0T8DZNOc5bGE1bRgiV7Qx9eIBscXbw2UYXihlzONj817Rk3krxhdscOZdTKRm8caH-u13NNgyGErP0aZY-zdKnOfY5i-SJ-Dj7GeXdUQkA8B_nTCqmCLsHg7asQw
CODEN ITKEEH
CitedBy_id crossref_primary_10_1002_sam_10033
crossref_primary_10_1007_s10115_011_0474_5
crossref_primary_10_1109_TCSI_2012_2220471
crossref_primary_10_1109_TKDE_2008_169
crossref_primary_10_1002_sam_10009
crossref_primary_10_1002_sam_10006
crossref_primary_10_1016_j_asoc_2015_06_060
crossref_primary_10_4018_jeei_2012040103
crossref_primary_10_1016_j_datak_2009_04_006
Cites_doi 10.1137/1.9781611972764.38
10.1057/palgrave.jors.2600906
10.1109/TSMCB.2004.836888
10.1109/WI.2004.10170
10.1109/TKDE.2006.14
10.1093/biomet/57.1.97
10.1080/01621459.1963.10500830
10.1137/1.9781611972764.14
10.1109/MIC.2006.74
10.1109/ICDM.2004.10114
10.1109/MASCOT.2001.948886
10.1145/1041410.1041421
10.1007/978-3-642-04898-2_436
10.1109/HPDC.2003.1210033
10.1109/ICDCS.2007.6238553
10.1109/ICDE.2005.115
10.1109/HICSS.2006.126
10.1145/1233321.1233323
10.1145/872757.872764
10.1109/SFFCS.1999.814637
10.1063/1.1699114
10.1007/s100440200017
10.1145/1055558.1055597
ContentType Journal Article
Copyright 2008 INIST-CNRS
Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008
Copyright_xml – notice: 2008 INIST-CNRS
– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008
DBID 97E
RIA
RIE
AAYXX
CITATION
IQODW
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
F28
FR3
DOI 10.1109/TKDE.2007.190714
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Pascal-Francis
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
Engineering Research Database
ANTE: Abstracts in New Technology & Engineering
DatabaseTitleList Technology Research Database

Technology Research Database
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
Applied Sciences
Statistics
EISSN 1558-2191
EndPage 488
ExternalDocumentID 2545287341
20194842
10_1109_TKDE_2007_190714
4378380
Genre orig-research
GroupedDBID -~X
.DC
0R~
1OL
29I
4.4
5GY
5VS
6IK
97E
9M8
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RXW
RZB
TAE
TAF
TN5
UHB
VH1
AAYXX
CITATION
IQODW
RIG
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
F28
FR3
ID FETCH-LOGICAL-c383t-9acf194f68feb844bd22588990063a95d4cabc4b7c4ffe1faef5107fa351291c3
IEDL.DBID RIE
ISICitedReferencesCount 14
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000254045000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1041-4347
IngestDate Sun Sep 28 01:08:25 EDT 2025
Sat Sep 27 19:37:01 EDT 2025
Sun Nov 30 04:02:48 EST 2025
Mon Jul 21 09:13:34 EDT 2025
Sat Nov 29 08:08:15 EST 2025
Tue Nov 18 22:25:23 EST 2025
Wed Aug 27 02:52:17 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords Probabilistic algorithms
Knowledge management applications
Algorithms for data and knowledge management
Data mining
Mining methods and algorithms
Cluster analysis
Distributed data mining
Data analysis
Correlation
Peer to peer
Statistical analysis
Probabilistic approach
Scalability
Matrix product
pier-to-peer network
Information extraction
Distributed system
Decision tree
Modeling
Classification
inner product
Localization
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
CC BY 4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c383t-9acf194f68feb844bd22588990063a95d4cabc4b7c4ffe1faef5107fa351291c3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
PQID 912300172
PQPubID 23500
PageCount 14
ParticipantIDs proquest_miscellaneous_34575238
proquest_journals_912300172
crossref_citationtrail_10_1109_TKDE_2007_190714
ieee_primary_4378380
pascalfrancis_primary_20194842
proquest_miscellaneous_875058556
crossref_primary_10_1109_TKDE_2007_190714
PublicationCentury 2000
PublicationDate 2008-04-01
PublicationDateYYYYMMDD 2008-04-01
PublicationDate_xml – month: 04
  year: 2008
  text: 2008-04-01
  day: 01
PublicationDecade 2000
PublicationPlace New York, NY
PublicationPlace_xml – name: New York, NY
– name: New York
PublicationTitle IEEE transactions on knowledge and data engineering
PublicationTitleAbbrev TKDE
PublicationYear 2008
Publisher IEEE
IEEE Computer Society
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: IEEE Computer Society
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref35
ref12
ref15
Maslow (ref31) 1987
ref14
Trajkova (ref37)
ref33
ref10
ref2
ref1
(ref30) 2007
ref17
ref16
Bawa (ref27) 2003
ref19
ref18
Hoeffding (ref6) 1963
(ref8) 2000
Lovász (ref20) 1993; 2
Egecioglu (ref38)
ref24
ref23
Kargupta (ref7) 2004
ref22
ref21
Mane (ref26) 2005
ref28
Saroiu (ref29)
Orponen (ref25) 2004
ref9
ref4
ref3
ref5
Scott (ref36) 2000
Fagin (ref11)
Castano (ref32)
Khambatti (ref34) 2002; 5
References_xml – start-page: 219
  volume-title: Proc. ACM Int’l Conf. Information and Knowledge Management (CIKM ’00)
  ident: ref38
  article-title: Dimensionality Reduction and Similarity Computation by Inner Product Approximations
– ident: ref15
  doi: 10.1137/1.9781611972764.38
– volume: 2
  start-page: 1
  issue: 80
  year: 1993
  ident: ref20
  article-title: Random Walks on Graphs: A Survey
  publication-title: Combinatorics
– volume-title: Social Network Analysis: A Handbook
  year: 2000
  ident: ref36
– ident: ref19
  doi: 10.1057/palgrave.jors.2600906
– start-page: 216
  volume-title: Proc. ACM SIGMOD ’96
  ident: ref11
  article-title: Combining Fuzzy Information from Multiple Systems
– volume: 5
  start-page: 155
  issue: 4
  year: 2002
  ident: ref34
  article-title: Efficient Discovery of Implicitly Formed Peer-to-Peer Communities
  publication-title: Int’l J. Parallel and Distributed Systems and Networks
– ident: ref10
  doi: 10.1109/TSMCB.2004.836888
– ident: ref35
  doi: 10.1109/WI.2004.10170
– start-page: 380
  volume-title: Proc. Recherche d’Information Assistée par Ordinateur (RIAO ’04)
  ident: ref37
  article-title: Improving Ontology-Based User Profiles
– ident: ref2
  doi: 10.1109/TKDE.2006.14
– ident: ref22
  doi: 10.1093/biomet/57.1.97
– volume-title: Motivation and Personality
  year: 1987
  ident: ref31
– start-page: 13
  issue: 58
  year: 1963
  ident: ref6
  article-title: Probability for Sums of Bounded Random Variables
  publication-title: J. Am. Statistical Assoc.
  doi: 10.1080/01621459.1963.10500830
– volume-title: Technical Report cond-mat/0406048, arXiv.org e-Print archive
  year: 2004
  ident: ref25
  article-title: Efficient Algorithms for Sampling and Clustering of Large Nonuniform Networks
– ident: ref14
  doi: 10.1137/1.9781611972764.14
– ident: ref16
  doi: 10.1109/MIC.2006.74
– volume-title: DDMT
  year: 2007
  ident: ref30
– ident: ref1
  doi: 10.1109/ICDM.2004.10114
– volume-title: Existential Pleasures of Distributed Data Mining. Data Mining: Next Generation Challenges and Future Directions
  year: 2004
  ident: ref7
– ident: ref28
  doi: 10.1109/MASCOT.2001.948886
– ident: ref33
  doi: 10.1145/1041410.1041421
– ident: ref5
  doi: 10.1007/978-3-642-04898-2_436
– volume-title: Advances in Distributed and Parallel Knowledge Discovery
  year: 2000
  ident: ref8
– ident: ref13
  doi: 10.1109/HPDC.2003.1210033
– ident: ref23
  doi: 10.1109/ICDCS.2007.6238553
– volume-title: Estimating Aggregates on a Peer-to-Peer Network
  year: 2003
  ident: ref27
– start-page: 156
  volume-title: Proc. SPIE/ACM Conf. Multimedia Computing and Networking (MMCN ’02)
  ident: ref29
  article-title: A Measurement Study of Peer-to-Peer File Sharing Systems
– ident: ref12
  doi: 10.1109/ICDE.2005.115
– ident: ref24
  doi: 10.1109/HICSS.2006.126
– volume-title: Technical Report 05-030, Univ. of Minnesota
  year: 2005
  ident: ref26
  article-title: Network Size Estimation in A Peer-to-Peer Network
– ident: ref3
  doi: 10.1145/1233321.1233323
– volume-title: Proc. Second Ann. European Semantic Web Conf. (ESWC ’05)
  ident: ref32
  article-title: Semantic Self-Formation of Communities of Peers
– ident: ref4
  doi: 10.1145/872757.872764
– ident: ref9
  doi: 10.1109/SFFCS.1999.814637
– ident: ref21
  doi: 10.1063/1.1699114
– ident: ref17
  doi: 10.1007/s100440200017
– ident: ref18
  doi: 10.1145/1055558.1055597
SSID ssj0008781
Score 1.9819145
Snippet The inner product measures how closely two feature vectors are related. It is an important primitive for many popular data mining tasks, for example,...
[...] we demonstrate an application of this technique for interest-based community formation in a P2P environment.
SourceID proquest
pascalfrancis
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 475
SubjectTerms Algorithms
Algorithms for data and knowledge management
Applied sciences
Classification tree analysis
Computation
Computer science; control theory; systems
Computer systems and distributed systems. User interface
Data mining
Data processing. List processing. Character string processing
Decision trees
Exact sciences and technology
Knowledge management applications
Large-scale systems
Mathematical analysis
Memory organisation. Data processing
Mining methods and algorithms
Network servers
Networks
Partitioning algorithms
Peer to peer computing
Probabilistic algorithms
Scalability
Software
Statistical distributions
Statistics
Studies
Vectors (mathematics)
Title Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network
URI https://ieeexplore.ieee.org/document/4378380
https://www.proquest.com/docview/912300172
https://www.proquest.com/docview/34575238
https://www.proquest.com/docview/875058556
Volume 20
WOSCitedRecordID wos000254045000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2191
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0008781
  issn: 1041-4347
  databaseCode: RIE
  dateStart: 19890101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6VigMcKLQgQqH4wAUJswlx4_hY0a1AoNUeFtRb5NhjsVKVVM0uv58ZOxuKeEicEslOZHk89nyexwfwqvKlsUVg16BHqU69l6ZEJXN0eQiu8jYWnv_6WS8W9eWlWe7BmykXBhFj8Bm-5dfoy_e92_JV2UyVui5rAuh3tK5Srta069Y6EpISuiBMVCq9c0nmZrb6dD5PxQrp9Iv5OreOoMipwhGRdqBJCYnN4reNOZ42Fwf_N86H8GC0KsVZWgaPYA-7QzjYMTaIUYEP4f6t8oNH8O2cq-Yy4RV6kTJ2w3iFJ_ogVv21vBIfmZtLLFNhWDFP0eaDsJ0Xa3qe_XSAi3UnrFjS3-Wml_wUixRk_hi-XMxX7z_IkXlBOkKsG2msC4VRoaoDtrVSrSe1rwmasUVjSazK2dapVjsVAhbBYiDd1sGWbD8UrnwC-13f4VMQStNctI5gE9cWDFUbTGEcKk9rwTqsMpjthNG4sSw5s2NcNRGe5KZh8TFbpm6S-DJ4PX1xnUpy_KPvEYtn6jdKJoOTX-Q9tZM5ZFSt3mVwvFsAzajUQ2PolI-YOYOXUytpI7tYbIf9dmhKReYvWUEZiL_0IHyYE0Q7rZ79eWjHcC-FpHBw0HPY39xs8QXcdd836-HmJK75HzceAHI
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB4hQGp7KBSKmlLAh16Q6m5CTBIfESwCsaz2sK24RY4fYiWUILLL72fGzgYqaKWeEslOZHk89nyexwfwPTOpVIkj16CxXBwbw2VqBY-tjp3TmVG-8PzvUT4eFzc3crICP_pcGGutDz6zP-nV-_JNoxd0VTYQaV6kBQL0NWLO6rK1-n23yD0lKeILREWpyJdOyVgOpldnw1CuEM8_n7Hz4hDyrCoUE6lanBYX-Cxebc3-vDnf-L-RbsLHzq5kJ2EhfIIVW2_BxpKzgXUqvAUfXhQg3IbbM6qbS5RX1rCQs-u6SzzWODZt7vkduyR2LjYJpWHZMMSbt0zVhs3wefLsAmezmik2wb_zecPpycYhzPwz_DofTk8veMe9wDVi1jmXSrtECpcVzlaFEJVBxS8QnJFNo1CwQqtKiyrXwjmbOGUdanfuVEoWRKLTHVitm9p-ASZynItKI3Ci6oIuq5xMpLbC4GpQ2mYRDJbCKHVXmJz4Me5KD1BiWZL4iC8zL4P4Ijjsv7gPRTn-0XebxNP36yQTwf4f8u7b0SCSohBHEewuF0DZqXVbSjznPWqO4KBvRX0kJ4uqbbNoy1SgAYx2UATsLz0QIcYI0o6zr28P7QDeXUyvR-Xocny1C-9DgAqFCn2D1fnDwu7Bun6cz9qHfb_-nwCwlgO7
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Distributed+Identification+of+Top-l+Inner+Product+Elements+and+its+Application+in+a+Peer-to-Peer+Network&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Das%2C+K.&rft.au=Bhaduri%2C+K.&rft.au=Kun+Liu&rft.au=Kargupta%2C+H.&rft.date=2008-04-01&rft.pub=IEEE&rft.issn=1041-4347&rft.volume=20&rft.issue=4&rft.spage=475&rft.epage=488&rft_id=info:doi/10.1109%2FTKDE.2007.190714&rft.externalDocID=4378380
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon