Similarity Joins: Their implementation and interactions with other database operators
Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed,...
Gespeichert in:
| Veröffentlicht in: | Information systems (Oxford) Jg. 52; S. 149 - 162 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier Ltd
01.08.2015
|
| Schlagworte: | |
| ISSN: | 0306-4379, 1873-6076 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Joins as physical database operators. In this paper, we focus on the study, design, implementation, and optimization of a Similarity Join database operator for metric spaces. We present DBSimJoin, a physical database operator that integrates techniques to: enable a non-blocking behavior, prioritize the early generation of results, and fully support the database iterator interface. The proposed operator can be used with multiple distance functions and data types. We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. We also study ways in which DBSimJoin can be combined with other similarity and non-similarity operators to answer more complex queries, and how DBSimJoin can be used in query transformation rules to improve query performance. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches and scales very well when important parameters like ε, data size, and number of dimensions increase. |
|---|---|
| AbstractList | Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Joins as physical database operators. In this paper, we focus on the study, design, implementation, and optimization of a Similarity Join database operator for metric spaces. We present DBSimJoin, a physical database operator that integrates techniques to: enable a non-blocking behavior, prioritize the early generation of results, and fully support the database iterator interface. The proposed operator can be used with multiple distance functions and data types. We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. We also study ways in which DBSimJoin can be combined with other similarity and non-similarity operators to answer more complex queries, and how DBSimJoin can be used in query transformation rules to improve query performance. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches and scales very well when important parameters like ε, data size, and number of dimensions increase. Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold epsilon . While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Joins as physical database operators. In this paper, we focus on the study, design, implementation, and optimization of a Similarity Join database operator for metric spaces. We present DBSimJoin, a physical database operator that integrates techniques to: enable a non-blocking behavior, prioritize the early generation of results, and fully support the database iterator interface. The proposed operator can be used with multiple distance functions and data types. We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. We also study ways in which DBSimJoin can be combined with other similarity and non-similarity operators to answer more complex queries, and how DBSimJoin can be used in query transformation rules to improve query performance. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches and scales very well when important parameters like epsilon , data size, and number of dimensions increase. |
| Author | Silva, Yasin N. Chon, Jaime Roberts, Ryan Pearson, Spencer S. |
| Author_xml | – sequence: 1 givenname: Yasin N. surname: Silva fullname: Silva, Yasin N. email: ysilva@asu.edu – sequence: 2 givenname: Spencer S. surname: Pearson fullname: Pearson, Spencer S. email: sspearso@asu.edu – sequence: 3 givenname: Jaime orcidid: 0000-0001-7595-4181 surname: Chon fullname: Chon, Jaime email: jchon@asu.edu – sequence: 4 givenname: Ryan surname: Roberts fullname: Roberts, Ryan email: rwrobert@asu.edu |
| BookMark | eNp9kE1P3DAQhq0KJJaPO0cfe0k6jjdOwg2hQouQOABny7HH2lkl9tY2Rfx7stqekMpp9GreZ6R5TtlRiAEZuxRQCxDqx7amXDcg2hpEDdB_YyvRd7JS0KkjtgIJqlrLbjhhpzlvAaBph2HFXp5opskkKu_8PlLIV_x5g5Q4zbsJZwzFFIqBm-A4hYLJ2H3O_I3KhseywcSdKWY0GXncLfsSUz5nx95MGS_-zTP2cvvz-eZX9fB49_vm-qGycuhKJfpm6JwahUHXODvatkEjvXe-H5txGFGAd31vwMveIworG7c2zir0rVLNKM_Y98PdXYp_XjEXPVO2OE0mYHzNWnQdyE6tRbtU4VC1Keac0Otdotmkdy1A7w3qraas9wY1CL0YXBD1CbF00FGSoekr8OoA4vL7X8KksyUMFh0ltEW7SP-HPwDz3Y8X |
| CitedBy_id | crossref_primary_10_1007_s13278_018_0496_z crossref_primary_10_1016_j_jpdc_2024_104885 crossref_primary_10_1016_j_is_2019_101455 crossref_primary_10_1109_ACCESS_2018_2879829 crossref_primary_10_1016_j_is_2019_01_002 crossref_primary_10_1016_j_is_2019_06_006 |
| Cites_doi | 10.1007/s00778-012-0296-4 10.1145/958942.958948 10.1145/2347673.2347676 10.1016/j.infsof.2006.05.006 10.1007/978-3-642-41062-8_27 10.1145/502512.502524 10.14778/2212351.2212353 10.1016/j.jda.2008.09.012 10.1109/ICDE.2008.4497443 10.1145/2000824.2000825 10.1145/1807167.1807222 10.1109/ICDE.2009.113 10.1016/B978-012088469-8/50067-X 10.1145/375663.375714 10.1145/276304.276326 10.1007/3-540-36618-0_32 10.1145/1807167.1807330 10.1007/978-3-642-41062-8_13 10.1145/1366102.1366104 10.14778/1453856.1453957 10.1109/ICDE.2001.914854 10.1109/ICDE.2010.5447873 10.1145/1516360.1516499 10.1109/ICDE.2012.66 10.1145/2213836.2213935 10.1109/ICDE.2006.9 10.1007/s10115-003-0122-9 10.14778/2367502.2367538 |
| ContentType | Journal Article |
| Copyright | 2015 Elsevier Ltd |
| Copyright_xml | – notice: 2015 Elsevier Ltd |
| DBID | AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
| DOI | 10.1016/j.is.2015.01.008 |
| DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1873-6076 |
| EndPage | 162 |
| ExternalDocumentID | 10_1016_j_is_2015_01_008 S0306437915000186 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 13V 1B1 1~. 1~5 29I 4.4 457 4G. 5GY 5VS 63O 7-5 71M 77K 8P~ 9JN 9JO AAAKF AAAKG AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AARIN AAXUO AAYFN ABBOA ABFNM ABKBG ABMAC ABMVD ABTAH ABUCO ABXDB ABYKQ ACDAQ ACGFS ACHRH ACNNM ACNTT ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD AEBSH AEKER AENEX AFFNX AFKWA AFTJW AGHFR AGJBL AGUBO AGUMN AGYEJ AHHHB AHZHX AI. AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALEQD ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM ASPBG AVWKF AXJTR AZFZN BKOJK BLXMC BNSAS CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HAMUX HF~ HLZ HVGLF HZ~ H~9 IHE J1W KOM LG9 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 R2- RIG RNS ROL RPZ SBC SDF SDG SDP SES SEW SPC SPCBC SSB SSD SSL SSV SSZ T5K TN5 UHS VH1 WUQ XSW ZCG ZY4 ~G- 77I 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABJNI ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO ADVLN AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD 7SC 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c397t-18297d6b1aed2dcbc52ea3ffdf8b2b9be10fd88a0f38fee1c32d4adc6ef5662b3 |
| ISICitedReferencesCount | 11 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000356983400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0306-4379 |
| IngestDate | Sun Sep 28 01:57:00 EDT 2025 Sat Nov 29 07:22:02 EST 2025 Tue Nov 18 22:19:47 EST 2025 Fri Feb 23 02:35:47 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Similarity queries Query processing and optimization Database operator PostgreSQL Similarity Join |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c397t-18297d6b1aed2dcbc52ea3ffdf8b2b9be10fd88a0f38fee1c32d4adc6ef5662b3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0001-7595-4181 |
| PQID | 1770376415 |
| PQPubID | 23500 |
| PageCount | 14 |
| ParticipantIDs | proquest_miscellaneous_1770376415 crossref_primary_10_1016_j_is_2015_01_008 crossref_citationtrail_10_1016_j_is_2015_01_008 elsevier_sciencedirect_doi_10_1016_j_is_2015_01_008 |
| PublicationCentury | 2000 |
| PublicationDate | 2015-08-01 |
| PublicationDateYYYYMMDD | 2015-08-01 |
| PublicationDate_xml | – month: 08 year: 2015 text: 2015-08-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Information systems (Oxford) |
| PublicationYear | 2015 |
| Publisher | Elsevier Ltd |
| Publisher_xml | – name: Elsevier Ltd |
| References | Chaudhuri, Ganti, Kaushik (bib14) 2006; 29 Dblp bibliography V. Dohnal, C. Gennaro, P. Savino, P. Zezula, Similarity join in metric spaces, in: Proceedings of the 25th European Conference on IR Research, ECIR ׳03, Springer-Verlag, Berlin, Heidelberg, 2003, pp. 452–467. Y.N. Silva, M.U. Arshad, W.G. Aref, Exploiting similarity-aware grouping in decision support systems, in: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ׳09, ACM, New York, NY, USA, 2009, pp. 1144–1147. Y.N. Silva, J.M. Reed, L.M. Tsosie, Mapreduce-based similarity join for metric spaces, in: Proceedings of the 1st International Workshop on Cloud Intelligence, Cloud-I ׳12, ACM, New York, NY, USA, 2012, pp. 3:1–3:8. M.D. Lieberman, J. Sankaranarayanan, H. Samet, A fast similarity join algorithm using graphics processing units, in: Proceeding of the 17th International Conference on World Wide Web, WWW ׳08, ACM, New York, NY, USA, 2008, pp. 131–140. B. Bryan, F. Eberhardt, C. Faloutsos, Compact similarity joins, in: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE ׳08, IEEE Computer Society, Washington, DC, USA, 2008, pp. 346–355. Silva, Aref, Larson, Pearson, Ali (bib30) 2013; 22 L. Gravano, P.G. Ipeirotis, H.V. Jagadish, N. Koudas, S. Muthukrishnan, D. Srivastava, Approximate string joins in a database (almost) for free, in: Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ׳01, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001, pp. 491–500. F.N. Afrati, A.D. Sarma, D. Menestrina, A. Parameswaran, J.D. Ullman, Fuzzy joins using mapreduce, in: Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, ICDE ׳12, IEEE Computer Society, Washington, DC, USA, 2012, pp. 498–509. Y.N. Silva, W.G. Aref, M.H. Ali, Similarity group-by, in: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE ׳09, IEEE Computer Society, Washington, DC, USA, 2009, pp. 904–915. C. Böhm, B. Braunmüller, F. Krebs, H.-P. Kriegel, Epsilon grid order: An algorithm for the similarity join on massive high-dimensional data, in: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, SIGMOD ׳01, ACM, New York, NY, USA, 2001, pp. 379–388. K. Fredriksson, B. Braithwaite, Quicker similarity joins in metric spaces, in: Proceedings of the 6th International Conference on Similarity Search and Applications, SISAP ׳13, Springer, Berlin, Heidelberg, 2013, pp. 127–140. A. Frank, A. Asuncion, UCI machine learning repository 2013. R. Vernica, M.J. Carey, C. Li, Efficient parallel set-similarity joins using mapreduce, in: Proceedings of the 2010 International Conference on Management of Data, SIGMOD ׳10, ACM, New York, NY, USA, 2010, pp. 495–506. Jacox, Samet (bib7) 2008; 33 Böhm, Krebs (bib9) 2004; 6 J.-P. Dittrich, B. Seeger, Gess: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces, in: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ׳01, ACM, New York, NY, USA, 2001, pp. 47–56. Y.N. Silva, J.M. Reed, Exploiting mapreduce-based similarity joins, in: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ׳12, ACM, New York, NY, USA, 2012, pp. 693–696. Xiao, Wang, Lin (bib20) 2008; 1 Y.N. Silva, W.G. Aref, M.H. Ali, The similarity join database operator, in: Proceedings of the 2010 IEEE International Conference on Data Engineering, ICDE ׳10, IEEE Computer Society, Washington, DC, USA, 2010, pp. 892–903. G.R. Hjaltason, H. Samet, Incremental distance join algorithms for spatial databases, in: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD ׳98, ACM, New York, NY, USA, 1998, pp. 237–248. Paredes, Reyes (bib12) 2009; 7 Postgresql C. Xia, H. Lu, B.C. Ooi, J. Hu, Gorder: an efficient method for knn join processing, in: Proceedings of the 30th International Conference on Very large Data Bases – vol. 30, VLDB ׳04, VLDB Endowment, 2004, pp. 756–767. 2010. Xiao, Wang, Lin, Yu, Wang (bib21) 2011; 36 C. Böhm, H.-P. Kriegel, A cost model and index architecture for the similarity join, in: Proceedings of the 17th International Conference on Data Engineering, ICDE ׳01, IEEE Computer Society, Washington, DC, USA, 2001, pp. 411–420. Silva, Pearson (bib18) 2012; 5 Hjaltason, Samet (bib29) 2003; 28 Y.N. Silva, S.S. Pearson, J.A. Cheney, Database similarity join for metric spaces, in: Proceedings of the 6th International Conference on Similarity Search and Applications, SISAP ׳13, Springer, Berlin, Heidelberg, 2013, pp. 266–279. Yu, Cui, Wang, Su (bib10) 2007; 49 V. Dohnal, C. Gennaro, P. Zezula, Similarity join in metric spaces using ed-index, in: Proceedings of the 25th European Conference on IR Research, ECIR׳03, Springer-Verlag, Berlin, Heidelberg, 2003, pp. 452–467. S. Chaudhuri, V. Ganti, R. Kaushik, A primitive operator for similarity joins in data cleaning, in: Proceedings of the 22nd International Conference on Data Engineering, ICDE ׳06, IEEE Computer Society, Washington, DC, USA, 2006, pp. 5. Postgis Metwally, Faloutsos (bib25) 2012; 5 Y.N. Silva, A.M. Aly, W.G. Aref, P.-A. Larson, Simdb: a similarity-aware database system, in: Proceedings of the 2010 International Conference on Management of Data, SIGMOD ׳10, ACM, New York, NY, USA, 2010, pp. 1243–1246. Yu (10.1016/j.is.2015.01.008_bib10) 2007; 49 Xiao (10.1016/j.is.2015.01.008_bib21) 2011; 36 Silva (10.1016/j.is.2015.01.008_bib18) 2012; 5 Chaudhuri (10.1016/j.is.2015.01.008_bib14) 2006; 29 10.1016/j.is.2015.01.008_bib22 Silva (10.1016/j.is.2015.01.008_bib30) 2013; 22 10.1016/j.is.2015.01.008_bib27 10.1016/j.is.2015.01.008_bib28 10.1016/j.is.2015.01.008_bib26 10.1016/j.is.2015.01.008_bib23 10.1016/j.is.2015.01.008_bib24 Metwally (10.1016/j.is.2015.01.008_bib25) 2012; 5 Hjaltason (10.1016/j.is.2015.01.008_bib29) 2003; 28 Paredes (10.1016/j.is.2015.01.008_bib12) 2009; 7 Xiao (10.1016/j.is.2015.01.008_bib20) 2008; 1 10.1016/j.is.2015.01.008_bib32 10.1016/j.is.2015.01.008_bib11 10.1016/j.is.2015.01.008_bib33 10.1016/j.is.2015.01.008_bib31 10.1016/j.is.2015.01.008_bib6 10.1016/j.is.2015.01.008_bib5 10.1016/j.is.2015.01.008_bib19 10.1016/j.is.2015.01.008_bib4 10.1016/j.is.2015.01.008_bib16 10.1016/j.is.2015.01.008_bib3 10.1016/j.is.2015.01.008_bib17 10.1016/j.is.2015.01.008_bib2 10.1016/j.is.2015.01.008_bib36 10.1016/j.is.2015.01.008_bib1 10.1016/j.is.2015.01.008_bib15 10.1016/j.is.2015.01.008_bib34 10.1016/j.is.2015.01.008_bib13 10.1016/j.is.2015.01.008_bib35 Böhm (10.1016/j.is.2015.01.008_bib9) 2004; 6 10.1016/j.is.2015.01.008_bib8 Jacox (10.1016/j.is.2015.01.008_bib7) 2008; 33 |
| References_xml | – reference: V. Dohnal, C. Gennaro, P. Savino, P. Zezula, Similarity join in metric spaces, in: Proceedings of the 25th European Conference on IR Research, ECIR ׳03, Springer-Verlag, Berlin, Heidelberg, 2003, pp. 452–467. – reference: Y.N. Silva, A.M. Aly, W.G. Aref, P.-A. Larson, Simdb: a similarity-aware database system, in: Proceedings of the 2010 International Conference on Management of Data, SIGMOD ׳10, ACM, New York, NY, USA, 2010, pp. 1243–1246. – reference: F.N. Afrati, A.D. Sarma, D. Menestrina, A. Parameswaran, J.D. Ullman, Fuzzy joins using mapreduce, in: Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, ICDE ׳12, IEEE Computer Society, Washington, DC, USA, 2012, pp. 498–509. – volume: 29 start-page: 60 year: 2006 end-page: 66 ident: bib14 article-title: Data debugger publication-title: IEEE Data Eng. Bull. – reference: Y.N. Silva, S.S. Pearson, J.A. Cheney, Database similarity join for metric spaces, in: Proceedings of the 6th International Conference on Similarity Search and Applications, SISAP ׳13, Springer, Berlin, Heidelberg, 2013, pp. 266–279. – reference: Y.N. Silva, M.U. Arshad, W.G. Aref, Exploiting similarity-aware grouping in decision support systems, in: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ׳09, ACM, New York, NY, USA, 2009, pp. 1144–1147. – reference: R. Vernica, M.J. Carey, C. Li, Efficient parallel set-similarity joins using mapreduce, in: Proceedings of the 2010 International Conference on Management of Data, SIGMOD ׳10, ACM, New York, NY, USA, 2010, pp. 495–506. – volume: 36 start-page: 15:1 year: 2011 end-page: 15:41 ident: bib21 article-title: Efficient similarity joins for near-duplicate detection publication-title: ACM Trans. Database Syst. – reference: Dblp bibliography 〈 – volume: 28 start-page: 517 year: 2003 end-page: 580 ident: bib29 article-title: Index-driven similarity search in metric spaces (survey article) publication-title: ACM Trans. Database Syst. – reference: S. Chaudhuri, V. Ganti, R. Kaushik, A primitive operator for similarity joins in data cleaning, in: Proceedings of the 22nd International Conference on Data Engineering, ICDE ׳06, IEEE Computer Society, Washington, DC, USA, 2006, pp. 5. – reference: V. Dohnal, C. Gennaro, P. Zezula, Similarity join in metric spaces using ed-index, in: Proceedings of the 25th European Conference on IR Research, ECIR׳03, Springer-Verlag, Berlin, Heidelberg, 2003, pp. 452–467. – reference: Postgresql 〈 – reference: L. Gravano, P.G. Ipeirotis, H.V. Jagadish, N. Koudas, S. Muthukrishnan, D. Srivastava, Approximate string joins in a database (almost) for free, in: Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ׳01, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001, pp. 491–500. – reference: M.D. Lieberman, J. Sankaranarayanan, H. Samet, A fast similarity join algorithm using graphics processing units, in: Proceeding of the 17th International Conference on World Wide Web, WWW ׳08, ACM, New York, NY, USA, 2008, pp. 131–140. – volume: 6 start-page: 728 year: 2004 end-page: 749 ident: bib9 article-title: The k-nearest neighbour join publication-title: Knowl. Inf. Syst. – volume: 22 start-page: 395 year: 2013 end-page: 420 ident: bib30 article-title: Similarity queries publication-title: VLDB J. – reference: J.-P. Dittrich, B. Seeger, Gess: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces, in: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ׳01, ACM, New York, NY, USA, 2001, pp. 47–56. – reference: A. Frank, A. Asuncion, UCI machine learning repository 〈 – reference: 〉, 2013. – volume: 33 start-page: 7:1 year: 2008 end-page: 7:38 ident: bib7 article-title: Metric space similarity joins publication-title: ACM Trans. Database Syst. – volume: 49 start-page: 332 year: 2007 end-page: 344 ident: bib10 article-title: Efficient index-based knn join processing for high-dimensional data publication-title: Inf. Softw. Technol. – reference: B. Bryan, F. Eberhardt, C. Faloutsos, Compact similarity joins, in: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE ׳08, IEEE Computer Society, Washington, DC, USA, 2008, pp. 346–355. – reference: 〉, 2010. – volume: 1 start-page: 933 year: 2008 end-page: 944 ident: bib20 article-title: Ed-join publication-title: Proc. VLDB Endow. – volume: 7 start-page: 18 year: 2009 end-page: 35 ident: bib12 article-title: Solving similarity joins and range queries in metric spaces with the list of twin clusters publication-title: J. Discrete Algorithms – reference: Y.N. Silva, W.G. Aref, M.H. Ali, The similarity join database operator, in: Proceedings of the 2010 IEEE International Conference on Data Engineering, ICDE ׳10, IEEE Computer Society, Washington, DC, USA, 2010, pp. 892–903. – reference: Y.N. Silva, W.G. Aref, M.H. Ali, Similarity group-by, in: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE ׳09, IEEE Computer Society, Washington, DC, USA, 2009, pp. 904–915. – reference: Y.N. Silva, J.M. Reed, L.M. Tsosie, Mapreduce-based similarity join for metric spaces, in: Proceedings of the 1st International Workshop on Cloud Intelligence, Cloud-I ׳12, ACM, New York, NY, USA, 2012, pp. 3:1–3:8. – reference: C. Böhm, H.-P. Kriegel, A cost model and index architecture for the similarity join, in: Proceedings of the 17th International Conference on Data Engineering, ICDE ׳01, IEEE Computer Society, Washington, DC, USA, 2001, pp. 411–420. – reference: Y.N. Silva, J.M. Reed, Exploiting mapreduce-based similarity joins, in: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ׳12, ACM, New York, NY, USA, 2012, pp. 693–696. – reference: C. Xia, H. Lu, B.C. Ooi, J. Hu, Gorder: an efficient method for knn join processing, in: Proceedings of the 30th International Conference on Very large Data Bases – vol. 30, VLDB ׳04, VLDB Endowment, 2004, pp. 756–767. – volume: 5 start-page: 704 year: 2012 end-page: 715 ident: bib25 article-title: V-smart-join publication-title: Proc. VLDB Endow. – reference: C. Böhm, B. Braunmüller, F. Krebs, H.-P. Kriegel, Epsilon grid order: An algorithm for the similarity join on massive high-dimensional data, in: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, SIGMOD ׳01, ACM, New York, NY, USA, 2001, pp. 379–388. – volume: 5 start-page: 1922 year: 2012 end-page: 1925 ident: bib18 article-title: Exploiting database similarity joins for metric spaces publication-title: Proc. VLDB Endow. – reference: K. Fredriksson, B. Braithwaite, Quicker similarity joins in metric spaces, in: Proceedings of the 6th International Conference on Similarity Search and Applications, SISAP ׳13, Springer, Berlin, Heidelberg, 2013, pp. 127–140. – reference: G.R. Hjaltason, H. Samet, Incremental distance join algorithms for spatial databases, in: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD ׳98, ACM, New York, NY, USA, 1998, pp. 237–248. – reference: Postgis 〈 – volume: 22 start-page: 395 year: 2013 ident: 10.1016/j.is.2015.01.008_bib30 article-title: Similarity queries publication-title: VLDB J. doi: 10.1007/s00778-012-0296-4 – volume: 28 start-page: 517 year: 2003 ident: 10.1016/j.is.2015.01.008_bib29 article-title: Index-driven similarity search in metric spaces (survey article) publication-title: ACM Trans. Database Syst. doi: 10.1145/958942.958948 – ident: 10.1016/j.is.2015.01.008_bib24 doi: 10.1145/2347673.2347676 – volume: 49 start-page: 332 year: 2007 ident: 10.1016/j.is.2015.01.008_bib10 article-title: Efficient index-based knn join processing for high-dimensional data publication-title: Inf. Softw. Technol. doi: 10.1016/j.infsof.2006.05.006 – ident: 10.1016/j.is.2015.01.008_bib28 doi: 10.1007/978-3-642-41062-8_27 – ident: 10.1016/j.is.2015.01.008_bib6 doi: 10.1145/502512.502524 – volume: 5 start-page: 704 year: 2012 ident: 10.1016/j.is.2015.01.008_bib25 article-title: V-smart-join publication-title: Proc. VLDB Endow. doi: 10.14778/2212351.2212353 – volume: 7 start-page: 18 year: 2009 ident: 10.1016/j.is.2015.01.008_bib12 article-title: Solving similarity joins and range queries in metric spaces with the list of twin clusters publication-title: J. Discrete Algorithms doi: 10.1016/j.jda.2008.09.012 – ident: 10.1016/j.is.2015.01.008_bib19 doi: 10.1109/ICDE.2008.4497443 – ident: 10.1016/j.is.2015.01.008_bib15 – ident: 10.1016/j.is.2015.01.008_bib34 – ident: 10.1016/j.is.2015.01.008_bib32 – volume: 36 start-page: 15:1 year: 2011 ident: 10.1016/j.is.2015.01.008_bib21 article-title: Efficient similarity joins for near-duplicate detection publication-title: ACM Trans. Database Syst. doi: 10.1145/2000824.2000825 – ident: 10.1016/j.is.2015.01.008_bib23 doi: 10.1145/1807167.1807222 – ident: 10.1016/j.is.2015.01.008_bib35 doi: 10.1109/ICDE.2009.113 – ident: 10.1016/j.is.2015.01.008_bib11 doi: 10.1016/B978-012088469-8/50067-X – ident: 10.1016/j.is.2015.01.008_bib22 – ident: 10.1016/j.is.2015.01.008_bib5 doi: 10.1145/375663.375714 – ident: 10.1016/j.is.2015.01.008_bib8 doi: 10.1145/276304.276326 – ident: 10.1016/j.is.2015.01.008_bib4 doi: 10.1007/3-540-36618-0_32 – ident: 10.1016/j.is.2015.01.008_bib17 doi: 10.1145/1807167.1807330 – ident: 10.1016/j.is.2015.01.008_bib1 – volume: 29 start-page: 60 year: 2006 ident: 10.1016/j.is.2015.01.008_bib14 article-title: Data debugger publication-title: IEEE Data Eng. Bull. – ident: 10.1016/j.is.2015.01.008_bib31 doi: 10.1007/978-3-642-41062-8_13 – volume: 33 start-page: 7:1 year: 2008 ident: 10.1016/j.is.2015.01.008_bib7 article-title: Metric space similarity joins publication-title: ACM Trans. Database Syst. doi: 10.1145/1366102.1366104 – volume: 1 start-page: 933 year: 2008 ident: 10.1016/j.is.2015.01.008_bib20 article-title: Ed-join publication-title: Proc. VLDB Endow. doi: 10.14778/1453856.1453957 – ident: 10.1016/j.is.2015.01.008_bib2 doi: 10.1109/ICDE.2001.914854 – ident: 10.1016/j.is.2015.01.008_bib16 doi: 10.1109/ICDE.2010.5447873 – ident: 10.1016/j.is.2015.01.008_bib33 – ident: 10.1016/j.is.2015.01.008_bib36 doi: 10.1145/1516360.1516499 – ident: 10.1016/j.is.2015.01.008_bib3 doi: 10.1007/3-540-36618-0_32 – ident: 10.1016/j.is.2015.01.008_bib26 doi: 10.1109/ICDE.2012.66 – ident: 10.1016/j.is.2015.01.008_bib27 doi: 10.1145/2213836.2213935 – ident: 10.1016/j.is.2015.01.008_bib13 doi: 10.1109/ICDE.2006.9 – volume: 6 start-page: 728 year: 2004 ident: 10.1016/j.is.2015.01.008_bib9 article-title: The k-nearest neighbour join publication-title: Knowl. Inf. Syst. doi: 10.1007/s10115-003-0122-9 – volume: 5 start-page: 1922 year: 2012 ident: 10.1016/j.is.2015.01.008_bib18 article-title: Exploiting database similarity joins for metric spaces publication-title: Proc. VLDB Endow. doi: 10.14778/2367502.2367538 |
| SSID | ssj0002599 |
| Score | 2.1566527 |
| Snippet | Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They... |
| SourceID | proquest crossref elsevier |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 149 |
| SubjectTerms | Data processing Database operator Information systems Operators Optimization PostgreSQL Query processing Query processing and optimization Similarity Similarity Join Similarity queries Transformations |
| Title | Similarity Joins: Their implementation and interactions with other database operators |
| URI | https://dx.doi.org/10.1016/j.is.2015.01.008 https://www.proquest.com/docview/1770376415 |
| Volume | 52 |
| WOSCitedRecordID | wos000356983400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1873-6076 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002599 issn: 0306-4379 databaseCode: AIEXJ dateStart: 19950301 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFLZg4wEeBgwQGxcZCSGhKSK2mxtv09QJqqogtUXlyXIcR8u0pSHppvHvOb7kMhDTeOAlqpzEqXw-n3N8rgi9paHMs3yUe0KxQBfVDr1YBtSTlMkgTiQRJprw2zSazeLVKvnqbLqNaScQlWV8dZVU_5XUMAbE1qmz_0DublIYgN9AdLgC2eF6K8LPi_MCjqtau56stTXZxlUUtc6IdMHimzYIWVeLqG1ug8tzMxlZBzpwVAu4g3WljCO-GWqxLofJzGJLQRvbrU0-HNgW5sXZpVFOv4umKHuvD_Dh2mV6zSvNWureAnt0Ym9MRHHegc7Gf1uj0E-HZ2eqIEEXKNelaPmhpwsgDtmvLWDr-Cex9UudKCaWUf_B5a3B4RQm1bF5tu6qH_cSrfXiz77w4-V0yhfj1eJd9cPTvca0T941XrmLtmkUJMDOtw8_j1eTToLDkTCx3if7d51728YFXv_o39SZ3wS70VYWj9COO2bgQwuPx-iOKnfRw7aFB3YcfRc9GNSjfIKWPXawwc5HbJCDryMHA3LwEDlYIwcb5OAWObhDzlO0PB4vjj55ru-GJ0E73XhEp1tnYUqEymgmU9i7SrAc9nSc0jRJFfHzLI6Fn7M4V4pIRrORyGSocjgc0JQ9Q1vlulTPEU6YSjPF4GYiRylLU1DoVUQFGcEoo2wPfWiXj0tXlF73RjnjbfThKS8arhec-4TDgu-h990blS3IcsOzrKUIdwqlVRQ5IOmGt960xOPAa7UDTZRqfdFwEoF8jELQefdv8cwLdL_fBC_R1qa-UK_QPXm5KZr6tcPcL3OypWo |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Similarity+Joins%3A+Their+implementation+and+interactions+with+other+database+operators&rft.jtitle=Information+systems+%28Oxford%29&rft.au=Silva%2C+Yasin+N&rft.au=Pearson%2C+Spencer+S&rft.au=Chon%2C+Jaime&rft.au=Roberts%2C+Ryan&rft.date=2015-08-01&rft.issn=0306-4379&rft.volume=52&rft.spage=149&rft.epage=162&rft_id=info:doi/10.1016%2Fj.is.2015.01.008&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0306-4379&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0306-4379&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0306-4379&client=summon |