Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network
The inner product measures how closely two feature vectors are related. It is an important primitive for many popular data mining tasks, for example, clustering, classification, correlation computation, and decision tree construction. If the entire data set is available at a single site, then comput...
Uloženo v:
| Vydáno v: | IEEE transactions on knowledge and data engineering Ročník 20; číslo 4; s. 475 - 488 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York, NY
IEEE
01.04.2008
IEEE Computer Society The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 1041-4347, 1558-2191 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | The inner product measures how closely two feature vectors are related. It is an important primitive for many popular data mining tasks, for example, clustering, classification, correlation computation, and decision tree construction. If the entire data set is available at a single site, then computing the inner product matrix and identifying the top (in terms of magnitude) entries is trivial. However, in many real-world scenarios, data is distributed across many locations and transmitting the data to a central server would be quite communication intensive and not scalable. This paper presents an approximate local algorithm for identifying top-l, inner products among pairs of feature vectors in a large asynchronous distributed environment such as a peer-to-peer (P2P) network. We develop a probabilistic algorithm for this purpose using order statistics and the Hoeffding bound. We present experimental results to show the effectiveness and scalability of the algorithm. Finally, we demonstrate an application of this technique for interest-based community formation in a P2P environment. |
|---|---|
| AbstractList | The inner product measures how closely two feature vectors are related. It is an important primitive for many popular data mining tasks, for example, clustering, classification, correlation computation, and decision tree construction. If the entire data set is available at a single site, then computing the inner product matrix and identifying the top (in terms of magnitude) entries is trivial. However, in many real-world scenarios, data is distributed across many locations and transmitting the data to a central server would be quite communication intensive and not scalable. This paper presents an approximate local algorithm for identifying top-l, inner products among pairs of feature vectors in a large asynchronous distributed environment such as a peer-to-peer (P2P) network. We develop a probabilistic algorithm for this purpose using order statistics and the Hoeffding bound. We present experimental results to show the effectiveness and scalability of the algorithm. Finally, we demonstrate an application of this technique for interest-based community formation in a P2P environment. [...] we demonstrate an application of this technique for interest-based community formation in a P2P environment. |
| Author | Das, K. Kun Liu Kargupta, H. Bhaduri, K. |
| Author_xml | – sequence: 1 givenname: K. surname: Das fullname: Das, K. organization: Univ. of Maryland Baltimore County, Baltimore – sequence: 2 givenname: K. surname: Bhaduri fullname: Bhaduri, K. organization: Univ. of Maryland Baltimore County, Baltimore – sequence: 3 surname: Kun Liu fullname: Kun Liu organization: Univ. of Maryland Baltimore County, Baltimore – sequence: 4 givenname: H. surname: Kargupta fullname: Kargupta, H. organization: Univ. of Maryland Baltimore County, Baltimore |
| BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=20194842$$DView record in Pascal Francis |
| BookMark | eNp9kcFvFCEUh4mpie3q3cQLMVFPs8IACxybdqsbG-1hPROGeUTqLIzApPG_74zbeuihpx-H73vhvd8ZOokpAkJvKVlTSvTn_bfL7bolRK6pJpLyF-iUCqGalmp6Mr8Jpw1nXL5CZ6XcEkKUVPQU_boMpebQTRV6vOsh1uCDszWkiJPH-zQ2A97FCBnf5NRPruLtAIeZK9jGHoc5z8dxeHRCxBbfAOSmpmZJ_B3qXcq_X6OX3g4F3jzkCv282u4vvjbXP77sLs6vG8cUq422zlPN_UZ56BTnXd-2QimtCdkwq0XPne0c76Tj3gP1FrygRHrLBG01dWyFPh3njjn9maBUcwjFwTDYCGkqRklBhBJiM5MfnyUZF1K0869W6P0T8DZNOc5bGE1bRgiV7Qx9eIBscXbw2UYXihlzONj817Rk3krxhdscOZdTKRm8caH-u13NNgyGErP0aZY-zdKnOfY5i-SJ-Dj7GeXdUQkA8B_nTCqmCLsHg7asQw |
| CODEN | ITKEEH |
| CitedBy_id | crossref_primary_10_1002_sam_10033 crossref_primary_10_1007_s10115_011_0474_5 crossref_primary_10_1109_TCSI_2012_2220471 crossref_primary_10_1109_TKDE_2008_169 crossref_primary_10_1002_sam_10009 crossref_primary_10_1002_sam_10006 crossref_primary_10_1016_j_asoc_2015_06_060 crossref_primary_10_4018_jeei_2012040103 crossref_primary_10_1016_j_datak_2009_04_006 |
| Cites_doi | 10.1137/1.9781611972764.38 10.1057/palgrave.jors.2600906 10.1109/TSMCB.2004.836888 10.1109/WI.2004.10170 10.1109/TKDE.2006.14 10.1093/biomet/57.1.97 10.1080/01621459.1963.10500830 10.1137/1.9781611972764.14 10.1109/MIC.2006.74 10.1109/ICDM.2004.10114 10.1109/MASCOT.2001.948886 10.1145/1041410.1041421 10.1007/978-3-642-04898-2_436 10.1109/HPDC.2003.1210033 10.1109/ICDCS.2007.6238553 10.1109/ICDE.2005.115 10.1109/HICSS.2006.126 10.1145/1233321.1233323 10.1145/872757.872764 10.1109/SFFCS.1999.814637 10.1063/1.1699114 10.1007/s100440200017 10.1145/1055558.1055597 |
| ContentType | Journal Article |
| Copyright | 2008 INIST-CNRS Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008 |
| Copyright_xml | – notice: 2008 INIST-CNRS – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008 |
| DBID | 97E RIA RIE AAYXX CITATION IQODW 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
| DOI | 10.1109/TKDE.2007.190714 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Pascal-Francis Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ANTE: Abstracts in New Technology & Engineering Engineering Research Database |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional Engineering Research Database ANTE: Abstracts in New Technology & Engineering |
| DatabaseTitleList | Technology Research Database Technology Research Database Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science Applied Sciences Statistics |
| EISSN | 1558-2191 |
| EndPage | 488 |
| ExternalDocumentID | 2545287341 20194842 10_1109_TKDE_2007_190714 4378380 |
| Genre | orig-research |
| GroupedDBID | -~X .DC 0R~ 1OL 29I 4.4 5GY 5VS 6IK 97E 9M8 AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNI RNS RXW RZB TAE TAF TN5 UHB VH1 AAYXX CITATION IQODW RIG 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
| ID | FETCH-LOGICAL-c383t-9acf194f68feb844bd22588990063a95d4cabc4b7c4ffe1faef5107fa351291c3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 14 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000254045000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1041-4347 |
| IngestDate | Sun Sep 28 01:08:25 EDT 2025 Sat Sep 27 19:37:01 EDT 2025 Sun Nov 30 04:02:48 EST 2025 Mon Jul 21 09:13:34 EDT 2025 Sat Nov 29 08:08:15 EST 2025 Tue Nov 18 22:25:23 EST 2025 Wed Aug 27 02:52:17 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Keywords | Probabilistic algorithms Knowledge management applications Algorithms for data and knowledge management Data mining Mining methods and algorithms Cluster analysis Distributed data mining Data analysis Correlation Peer to peer Statistical analysis Probabilistic approach Scalability Matrix product pier-to-peer network Information extraction Distributed system Decision tree Modeling Classification inner product Localization |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html CC BY 4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c383t-9acf194f68feb844bd22588990063a95d4cabc4b7c4ffe1faef5107fa351291c3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 |
| PQID | 912300172 |
| PQPubID | 23500 |
| PageCount | 14 |
| ParticipantIDs | proquest_miscellaneous_34575238 proquest_journals_912300172 crossref_citationtrail_10_1109_TKDE_2007_190714 ieee_primary_4378380 pascalfrancis_primary_20194842 proquest_miscellaneous_875058556 crossref_primary_10_1109_TKDE_2007_190714 |
| PublicationCentury | 2000 |
| PublicationDate | 2008-04-01 |
| PublicationDateYYYYMMDD | 2008-04-01 |
| PublicationDate_xml | – month: 04 year: 2008 text: 2008-04-01 day: 01 |
| PublicationDecade | 2000 |
| PublicationPlace | New York, NY |
| PublicationPlace_xml | – name: New York, NY – name: New York |
| PublicationTitle | IEEE transactions on knowledge and data engineering |
| PublicationTitleAbbrev | TKDE |
| PublicationYear | 2008 |
| Publisher | IEEE IEEE Computer Society The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: IEEE Computer Society – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref35 ref12 ref15 Maslow (ref31) 1987 ref14 Trajkova (ref37) ref33 ref10 ref2 ref1 (ref30) 2007 ref17 ref16 Bawa (ref27) 2003 ref19 ref18 Hoeffding (ref6) 1963 (ref8) 2000 Lovász (ref20) 1993; 2 Egecioglu (ref38) ref24 ref23 Kargupta (ref7) 2004 ref22 ref21 Mane (ref26) 2005 ref28 Saroiu (ref29) Orponen (ref25) 2004 ref9 ref4 ref3 ref5 Scott (ref36) 2000 Fagin (ref11) Castano (ref32) Khambatti (ref34) 2002; 5 |
| References_xml | – start-page: 219 volume-title: Proc. ACM Int’l Conf. Information and Knowledge Management (CIKM ’00) ident: ref38 article-title: Dimensionality Reduction and Similarity Computation by Inner Product Approximations – ident: ref15 doi: 10.1137/1.9781611972764.38 – volume: 2 start-page: 1 issue: 80 year: 1993 ident: ref20 article-title: Random Walks on Graphs: A Survey publication-title: Combinatorics – volume-title: Social Network Analysis: A Handbook year: 2000 ident: ref36 – ident: ref19 doi: 10.1057/palgrave.jors.2600906 – start-page: 216 volume-title: Proc. ACM SIGMOD ’96 ident: ref11 article-title: Combining Fuzzy Information from Multiple Systems – volume: 5 start-page: 155 issue: 4 year: 2002 ident: ref34 article-title: Efficient Discovery of Implicitly Formed Peer-to-Peer Communities publication-title: Int’l J. Parallel and Distributed Systems and Networks – ident: ref10 doi: 10.1109/TSMCB.2004.836888 – ident: ref35 doi: 10.1109/WI.2004.10170 – start-page: 380 volume-title: Proc. Recherche d’Information Assistée par Ordinateur (RIAO ’04) ident: ref37 article-title: Improving Ontology-Based User Profiles – ident: ref2 doi: 10.1109/TKDE.2006.14 – ident: ref22 doi: 10.1093/biomet/57.1.97 – volume-title: Motivation and Personality year: 1987 ident: ref31 – start-page: 13 issue: 58 year: 1963 ident: ref6 article-title: Probability for Sums of Bounded Random Variables publication-title: J. Am. Statistical Assoc. doi: 10.1080/01621459.1963.10500830 – volume-title: Technical Report cond-mat/0406048, arXiv.org e-Print archive year: 2004 ident: ref25 article-title: Efficient Algorithms for Sampling and Clustering of Large Nonuniform Networks – ident: ref14 doi: 10.1137/1.9781611972764.14 – ident: ref16 doi: 10.1109/MIC.2006.74 – volume-title: DDMT year: 2007 ident: ref30 – ident: ref1 doi: 10.1109/ICDM.2004.10114 – volume-title: Existential Pleasures of Distributed Data Mining. Data Mining: Next Generation Challenges and Future Directions year: 2004 ident: ref7 – ident: ref28 doi: 10.1109/MASCOT.2001.948886 – ident: ref33 doi: 10.1145/1041410.1041421 – ident: ref5 doi: 10.1007/978-3-642-04898-2_436 – volume-title: Advances in Distributed and Parallel Knowledge Discovery year: 2000 ident: ref8 – ident: ref13 doi: 10.1109/HPDC.2003.1210033 – ident: ref23 doi: 10.1109/ICDCS.2007.6238553 – volume-title: Estimating Aggregates on a Peer-to-Peer Network year: 2003 ident: ref27 – start-page: 156 volume-title: Proc. SPIE/ACM Conf. Multimedia Computing and Networking (MMCN ’02) ident: ref29 article-title: A Measurement Study of Peer-to-Peer File Sharing Systems – ident: ref12 doi: 10.1109/ICDE.2005.115 – ident: ref24 doi: 10.1109/HICSS.2006.126 – volume-title: Technical Report 05-030, Univ. of Minnesota year: 2005 ident: ref26 article-title: Network Size Estimation in A Peer-to-Peer Network – ident: ref3 doi: 10.1145/1233321.1233323 – volume-title: Proc. Second Ann. European Semantic Web Conf. (ESWC ’05) ident: ref32 article-title: Semantic Self-Formation of Communities of Peers – ident: ref4 doi: 10.1145/872757.872764 – ident: ref9 doi: 10.1109/SFFCS.1999.814637 – ident: ref21 doi: 10.1063/1.1699114 – ident: ref17 doi: 10.1007/s100440200017 – ident: ref18 doi: 10.1145/1055558.1055597 |
| SSID | ssj0008781 |
| Score | 1.9819145 |
| Snippet | The inner product measures how closely two feature vectors are related. It is an important primitive for many popular data mining tasks, for example,... [...] we demonstrate an application of this technique for interest-based community formation in a P2P environment. |
| SourceID | proquest pascalfrancis crossref ieee |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 475 |
| SubjectTerms | Algorithms Algorithms for data and knowledge management Applied sciences Classification tree analysis Computation Computer science; control theory; systems Computer systems and distributed systems. User interface Data mining Data processing. List processing. Character string processing Decision trees Exact sciences and technology Knowledge management applications Large-scale systems Mathematical analysis Memory organisation. Data processing Mining methods and algorithms Network servers Networks Partitioning algorithms Peer to peer computing Probabilistic algorithms Scalability Software Statistical distributions Statistics Studies Vectors (mathematics) |
| Title | Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network |
| URI | https://ieeexplore.ieee.org/document/4378380 https://www.proquest.com/docview/912300172 https://www.proquest.com/docview/34575238 https://www.proquest.com/docview/875058556 |
| Volume | 20 |
| WOSCitedRecordID | wos000254045000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2191 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0008781 issn: 1041-4347 databaseCode: RIE dateStart: 19890101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6VigMcKLQgQqH4wAUJswlx4_hY0a1AoNUeFtRb5NhjsVKVVM0uv58ZOxuKeEicEslOZHk89nyexwfwqvKlsUVg16BHqU69l6ZEJXN0eQiu8jYWnv_6WS8W9eWlWe7BmykXBhFj8Bm-5dfoy_e92_JV2UyVui5rAuh3tK5Srta069Y6EpISuiBMVCq9c0nmZrb6dD5PxQrp9Iv5OreOoMipwhGRdqBJCYnN4reNOZ42Fwf_N86H8GC0KsVZWgaPYA-7QzjYMTaIUYEP4f6t8oNH8O2cq-Yy4RV6kTJ2w3iFJ_ogVv21vBIfmZtLLFNhWDFP0eaDsJ0Xa3qe_XSAi3UnrFjS3-Wml_wUixRk_hi-XMxX7z_IkXlBOkKsG2msC4VRoaoDtrVSrSe1rwmasUVjSazK2dapVjsVAhbBYiDd1sGWbD8UrnwC-13f4VMQStNctI5gE9cWDFUbTGEcKk9rwTqsMpjthNG4sSw5s2NcNRGe5KZh8TFbpm6S-DJ4PX1xnUpy_KPvEYtn6jdKJoOTX-Q9tZM5ZFSt3mVwvFsAzajUQ2PolI-YOYOXUytpI7tYbIf9dmhKReYvWUEZiL_0IHyYE0Q7rZ79eWjHcC-FpHBw0HPY39xs8QXcdd836-HmJK75HzceAHI |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB4hQGp7KBSKmlLAh16Q6m5CTBIfESwCsaz2sK24RY4fYiWUILLL72fGzgYqaKWeEslOZHk89nyexwfwPTOpVIkj16CxXBwbw2VqBY-tjp3TmVG-8PzvUT4eFzc3crICP_pcGGutDz6zP-nV-_JNoxd0VTYQaV6kBQL0NWLO6rK1-n23yD0lKeILREWpyJdOyVgOpldnw1CuEM8_n7Hz4hDyrCoUE6lanBYX-Cxebc3-vDnf-L-RbsLHzq5kJ2EhfIIVW2_BxpKzgXUqvAUfXhQg3IbbM6qbS5RX1rCQs-u6SzzWODZt7vkduyR2LjYJpWHZMMSbt0zVhs3wefLsAmezmik2wb_zecPpycYhzPwz_DofTk8veMe9wDVi1jmXSrtECpcVzlaFEJVBxS8QnJFNo1CwQqtKiyrXwjmbOGUdanfuVEoWRKLTHVitm9p-ASZynItKI3Ci6oIuq5xMpLbC4GpQ2mYRDJbCKHVXmJz4Me5KD1BiWZL4iC8zL4P4Ijjsv7gPRTn-0XebxNP36yQTwf4f8u7b0SCSohBHEewuF0DZqXVbSjznPWqO4KBvRX0kJ4uqbbNoy1SgAYx2UATsLz0QIcYI0o6zr28P7QDeXUyvR-Xocny1C-9DgAqFCn2D1fnDwu7Bun6cz9qHfb_-nwCwlgO7 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Distributed+Identification+of+Top-l+Inner+Product+Elements+and+its+Application+in+a+Peer-to-Peer+Network&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Das%2C+K.&rft.au=Bhaduri%2C+K.&rft.au=Kun+Liu&rft.au=Kargupta%2C+H.&rft.date=2008-04-01&rft.pub=IEEE&rft.issn=1041-4347&rft.volume=20&rft.issue=4&rft.spage=475&rft.epage=488&rft_id=info:doi/10.1109%2FTKDE.2007.190714&rft.externalDocID=4378380 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon |