Similarity Joins: Their implementation and interactions with other database operators

Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information systems (Oxford) Jg. 52; S. 149 - 162
Hauptverfasser: Silva, Yasin N., Pearson, Spencer S., Chon, Jaime, Roberts, Ryan
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.08.2015
Schlagworte:
ISSN:0306-4379, 1873-6076
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Joins as physical database operators. In this paper, we focus on the study, design, implementation, and optimization of a Similarity Join database operator for metric spaces. We present DBSimJoin, a physical database operator that integrates techniques to: enable a non-blocking behavior, prioritize the early generation of results, and fully support the database iterator interface. The proposed operator can be used with multiple distance functions and data types. We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. We also study ways in which DBSimJoin can be combined with other similarity and non-similarity operators to answer more complex queries, and how DBSimJoin can be used in query transformation rules to improve query performance. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches and scales very well when important parameters like ε, data size, and number of dimensions increase.
AbstractList Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold ε. While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Joins as physical database operators. In this paper, we focus on the study, design, implementation, and optimization of a Similarity Join database operator for metric spaces. We present DBSimJoin, a physical database operator that integrates techniques to: enable a non-blocking behavior, prioritize the early generation of results, and fully support the database iterator interface. The proposed operator can be used with multiple distance functions and data types. We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. We also study ways in which DBSimJoin can be combined with other similarity and non-similarity operators to answer more complex queries, and how DBSimJoin can be used in query transformation rules to improve query performance. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches and scales very well when important parameters like ε, data size, and number of dimensions increase.
Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They retrieve all data pairs whose distances are smaller than a predefined threshold epsilon . While several standalone implementations have been proposed, very little work has addressed the implementation of Similarity Joins as physical database operators. In this paper, we focus on the study, design, implementation, and optimization of a Similarity Join database operator for metric spaces. We present DBSimJoin, a physical database operator that integrates techniques to: enable a non-blocking behavior, prioritize the early generation of results, and fully support the database iterator interface. The proposed operator can be used with multiple distance functions and data types. We describe the changes in each query engine module to implement DBSimJoin and provide details of our implementation in PostgreSQL. We also study ways in which DBSimJoin can be combined with other similarity and non-similarity operators to answer more complex queries, and how DBSimJoin can be used in query transformation rules to improve query performance. The extensive performance evaluation shows that DBSimJoin significantly outperforms alternative approaches and scales very well when important parameters like epsilon , data size, and number of dimensions increase.
Author Silva, Yasin N.
Chon, Jaime
Roberts, Ryan
Pearson, Spencer S.
Author_xml – sequence: 1
  givenname: Yasin N.
  surname: Silva
  fullname: Silva, Yasin N.
  email: ysilva@asu.edu
– sequence: 2
  givenname: Spencer S.
  surname: Pearson
  fullname: Pearson, Spencer S.
  email: sspearso@asu.edu
– sequence: 3
  givenname: Jaime
  orcidid: 0000-0001-7595-4181
  surname: Chon
  fullname: Chon, Jaime
  email: jchon@asu.edu
– sequence: 4
  givenname: Ryan
  surname: Roberts
  fullname: Roberts, Ryan
  email: rwrobert@asu.edu
BookMark eNp9kE1P3DAQhq0KJJaPO0cfe0k6jjdOwg2hQouQOABny7HH2lkl9tY2Rfx7stqekMpp9GreZ6R5TtlRiAEZuxRQCxDqx7amXDcg2hpEDdB_YyvRd7JS0KkjtgIJqlrLbjhhpzlvAaBph2HFXp5opskkKu_8PlLIV_x5g5Q4zbsJZwzFFIqBm-A4hYLJ2H3O_I3KhseywcSdKWY0GXncLfsSUz5nx95MGS_-zTP2cvvz-eZX9fB49_vm-qGycuhKJfpm6JwahUHXODvatkEjvXe-H5txGFGAd31vwMveIworG7c2zir0rVLNKM_Y98PdXYp_XjEXPVO2OE0mYHzNWnQdyE6tRbtU4VC1Keac0Otdotmkdy1A7w3qraas9wY1CL0YXBD1CbF00FGSoekr8OoA4vL7X8KksyUMFh0ltEW7SP-HPwDz3Y8X
CitedBy_id crossref_primary_10_1007_s13278_018_0496_z
crossref_primary_10_1016_j_jpdc_2024_104885
crossref_primary_10_1016_j_is_2019_101455
crossref_primary_10_1109_ACCESS_2018_2879829
crossref_primary_10_1016_j_is_2019_01_002
crossref_primary_10_1016_j_is_2019_06_006
Cites_doi 10.1007/s00778-012-0296-4
10.1145/958942.958948
10.1145/2347673.2347676
10.1016/j.infsof.2006.05.006
10.1007/978-3-642-41062-8_27
10.1145/502512.502524
10.14778/2212351.2212353
10.1016/j.jda.2008.09.012
10.1109/ICDE.2008.4497443
10.1145/2000824.2000825
10.1145/1807167.1807222
10.1109/ICDE.2009.113
10.1016/B978-012088469-8/50067-X
10.1145/375663.375714
10.1145/276304.276326
10.1007/3-540-36618-0_32
10.1145/1807167.1807330
10.1007/978-3-642-41062-8_13
10.1145/1366102.1366104
10.14778/1453856.1453957
10.1109/ICDE.2001.914854
10.1109/ICDE.2010.5447873
10.1145/1516360.1516499
10.1109/ICDE.2012.66
10.1145/2213836.2213935
10.1109/ICDE.2006.9
10.1007/s10115-003-0122-9
10.14778/2367502.2367538
ContentType Journal Article
Copyright 2015 Elsevier Ltd
Copyright_xml – notice: 2015 Elsevier Ltd
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.is.2015.01.008
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1873-6076
EndPage 162
ExternalDocumentID 10_1016_j_is_2015_01_008
S0306437915000186
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
13V
1B1
1~.
1~5
29I
4.4
457
4G.
5GY
5VS
63O
7-5
71M
77K
8P~
9JN
9JO
AAAKF
AAAKG
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AARIN
AAXUO
AAYFN
ABBOA
ABFNM
ABKBG
ABMAC
ABMVD
ABTAH
ABUCO
ABXDB
ABYKQ
ACDAQ
ACGFS
ACHRH
ACNNM
ACNTT
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
AEBSH
AEKER
AENEX
AFFNX
AFKWA
AFTJW
AGHFR
AGJBL
AGUBO
AGUMN
AGYEJ
AHHHB
AHZHX
AI.
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALEQD
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
ASPBG
AVWKF
AXJTR
AZFZN
BKOJK
BLXMC
BNSAS
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HAMUX
HF~
HLZ
HVGLF
HZ~
H~9
IHE
J1W
KOM
LG9
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
R2-
RIG
RNS
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSB
SSD
SSL
SSV
SSZ
T5K
TN5
UHS
VH1
WUQ
XSW
ZCG
ZY4
~G-
77I
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABJNI
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c397t-18297d6b1aed2dcbc52ea3ffdf8b2b9be10fd88a0f38fee1c32d4adc6ef5662b3
ISICitedReferencesCount 11
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000356983400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0306-4379
IngestDate Sun Sep 28 01:57:00 EDT 2025
Sat Nov 29 07:22:02 EST 2025
Tue Nov 18 22:19:47 EST 2025
Fri Feb 23 02:35:47 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Similarity queries
Query processing and optimization
Database operator
PostgreSQL
Similarity Join
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c397t-18297d6b1aed2dcbc52ea3ffdf8b2b9be10fd88a0f38fee1c32d4adc6ef5662b3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0001-7595-4181
PQID 1770376415
PQPubID 23500
PageCount 14
ParticipantIDs proquest_miscellaneous_1770376415
crossref_primary_10_1016_j_is_2015_01_008
crossref_citationtrail_10_1016_j_is_2015_01_008
elsevier_sciencedirect_doi_10_1016_j_is_2015_01_008
PublicationCentury 2000
PublicationDate 2015-08-01
PublicationDateYYYYMMDD 2015-08-01
PublicationDate_xml – month: 08
  year: 2015
  text: 2015-08-01
  day: 01
PublicationDecade 2010
PublicationTitle Information systems (Oxford)
PublicationYear 2015
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Chaudhuri, Ganti, Kaushik (bib14) 2006; 29
Dblp bibliography
V. Dohnal, C. Gennaro, P. Savino, P. Zezula, Similarity join in metric spaces, in: Proceedings of the 25th European Conference on IR Research, ECIR ׳03, Springer-Verlag, Berlin, Heidelberg, 2003, pp. 452–467.
Y.N. Silva, M.U. Arshad, W.G. Aref, Exploiting similarity-aware grouping in decision support systems, in: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ׳09, ACM, New York, NY, USA, 2009, pp. 1144–1147.
Y.N. Silva, J.M. Reed, L.M. Tsosie, Mapreduce-based similarity join for metric spaces, in: Proceedings of the 1st International Workshop on Cloud Intelligence, Cloud-I ׳12, ACM, New York, NY, USA, 2012, pp. 3:1–3:8.
M.D. Lieberman, J. Sankaranarayanan, H. Samet, A fast similarity join algorithm using graphics processing units, in: Proceeding of the 17th International Conference on World Wide Web, WWW ׳08, ACM, New York, NY, USA, 2008, pp. 131–140.
B. Bryan, F. Eberhardt, C. Faloutsos, Compact similarity joins, in: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE ׳08, IEEE Computer Society, Washington, DC, USA, 2008, pp. 346–355.
Silva, Aref, Larson, Pearson, Ali (bib30) 2013; 22
L. Gravano, P.G. Ipeirotis, H.V. Jagadish, N. Koudas, S. Muthukrishnan, D. Srivastava, Approximate string joins in a database (almost) for free, in: Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ׳01, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001, pp. 491–500.
F.N. Afrati, A.D. Sarma, D. Menestrina, A. Parameswaran, J.D. Ullman, Fuzzy joins using mapreduce, in: Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, ICDE ׳12, IEEE Computer Society, Washington, DC, USA, 2012, pp. 498–509.
Y.N. Silva, W.G. Aref, M.H. Ali, Similarity group-by, in: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE ׳09, IEEE Computer Society, Washington, DC, USA, 2009, pp. 904–915.
C. Böhm, B. Braunmüller, F. Krebs, H.-P. Kriegel, Epsilon grid order: An algorithm for the similarity join on massive high-dimensional data, in: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, SIGMOD ׳01, ACM, New York, NY, USA, 2001, pp. 379–388.
K. Fredriksson, B. Braithwaite, Quicker similarity joins in metric spaces, in: Proceedings of the 6th International Conference on Similarity Search and Applications, SISAP ׳13, Springer, Berlin, Heidelberg, 2013, pp. 127–140.
A. Frank, A. Asuncion, UCI machine learning repository
2013.
R. Vernica, M.J. Carey, C. Li, Efficient parallel set-similarity joins using mapreduce, in: Proceedings of the 2010 International Conference on Management of Data, SIGMOD ׳10, ACM, New York, NY, USA, 2010, pp. 495–506.
Jacox, Samet (bib7) 2008; 33
Böhm, Krebs (bib9) 2004; 6
J.-P. Dittrich, B. Seeger, Gess: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces, in: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ׳01, ACM, New York, NY, USA, 2001, pp. 47–56.
Y.N. Silva, J.M. Reed, Exploiting mapreduce-based similarity joins, in: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ׳12, ACM, New York, NY, USA, 2012, pp. 693–696.
Xiao, Wang, Lin (bib20) 2008; 1
Y.N. Silva, W.G. Aref, M.H. Ali, The similarity join database operator, in: Proceedings of the 2010 IEEE International Conference on Data Engineering, ICDE ׳10, IEEE Computer Society, Washington, DC, USA, 2010, pp. 892–903.
G.R. Hjaltason, H. Samet, Incremental distance join algorithms for spatial databases, in: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD ׳98, ACM, New York, NY, USA, 1998, pp. 237–248.
Paredes, Reyes (bib12) 2009; 7
Postgresql
C. Xia, H. Lu, B.C. Ooi, J. Hu, Gorder: an efficient method for knn join processing, in: Proceedings of the 30th International Conference on Very large Data Bases – vol. 30, VLDB ׳04, VLDB Endowment, 2004, pp. 756–767.
2010.
Xiao, Wang, Lin, Yu, Wang (bib21) 2011; 36
C. Böhm, H.-P. Kriegel, A cost model and index architecture for the similarity join, in: Proceedings of the 17th International Conference on Data Engineering, ICDE ׳01, IEEE Computer Society, Washington, DC, USA, 2001, pp. 411–420.
Silva, Pearson (bib18) 2012; 5
Hjaltason, Samet (bib29) 2003; 28
Y.N. Silva, S.S. Pearson, J.A. Cheney, Database similarity join for metric spaces, in: Proceedings of the 6th International Conference on Similarity Search and Applications, SISAP ׳13, Springer, Berlin, Heidelberg, 2013, pp. 266–279.
Yu, Cui, Wang, Su (bib10) 2007; 49
V. Dohnal, C. Gennaro, P. Zezula, Similarity join in metric spaces using ed-index, in: Proceedings of the 25th European Conference on IR Research, ECIR׳03, Springer-Verlag, Berlin, Heidelberg, 2003, pp. 452–467.
S. Chaudhuri, V. Ganti, R. Kaushik, A primitive operator for similarity joins in data cleaning, in: Proceedings of the 22nd International Conference on Data Engineering, ICDE ׳06, IEEE Computer Society, Washington, DC, USA, 2006, pp. 5.
Postgis
Metwally, Faloutsos (bib25) 2012; 5
Y.N. Silva, A.M. Aly, W.G. Aref, P.-A. Larson, Simdb: a similarity-aware database system, in: Proceedings of the 2010 International Conference on Management of Data, SIGMOD ׳10, ACM, New York, NY, USA, 2010, pp. 1243–1246.
Yu (10.1016/j.is.2015.01.008_bib10) 2007; 49
Xiao (10.1016/j.is.2015.01.008_bib21) 2011; 36
Silva (10.1016/j.is.2015.01.008_bib18) 2012; 5
Chaudhuri (10.1016/j.is.2015.01.008_bib14) 2006; 29
10.1016/j.is.2015.01.008_bib22
Silva (10.1016/j.is.2015.01.008_bib30) 2013; 22
10.1016/j.is.2015.01.008_bib27
10.1016/j.is.2015.01.008_bib28
10.1016/j.is.2015.01.008_bib26
10.1016/j.is.2015.01.008_bib23
10.1016/j.is.2015.01.008_bib24
Metwally (10.1016/j.is.2015.01.008_bib25) 2012; 5
Hjaltason (10.1016/j.is.2015.01.008_bib29) 2003; 28
Paredes (10.1016/j.is.2015.01.008_bib12) 2009; 7
Xiao (10.1016/j.is.2015.01.008_bib20) 2008; 1
10.1016/j.is.2015.01.008_bib32
10.1016/j.is.2015.01.008_bib11
10.1016/j.is.2015.01.008_bib33
10.1016/j.is.2015.01.008_bib31
10.1016/j.is.2015.01.008_bib6
10.1016/j.is.2015.01.008_bib5
10.1016/j.is.2015.01.008_bib19
10.1016/j.is.2015.01.008_bib4
10.1016/j.is.2015.01.008_bib16
10.1016/j.is.2015.01.008_bib3
10.1016/j.is.2015.01.008_bib17
10.1016/j.is.2015.01.008_bib2
10.1016/j.is.2015.01.008_bib36
10.1016/j.is.2015.01.008_bib1
10.1016/j.is.2015.01.008_bib15
10.1016/j.is.2015.01.008_bib34
10.1016/j.is.2015.01.008_bib13
10.1016/j.is.2015.01.008_bib35
Böhm (10.1016/j.is.2015.01.008_bib9) 2004; 6
10.1016/j.is.2015.01.008_bib8
Jacox (10.1016/j.is.2015.01.008_bib7) 2008; 33
References_xml – reference: V. Dohnal, C. Gennaro, P. Savino, P. Zezula, Similarity join in metric spaces, in: Proceedings of the 25th European Conference on IR Research, ECIR ׳03, Springer-Verlag, Berlin, Heidelberg, 2003, pp. 452–467.
– reference: Y.N. Silva, A.M. Aly, W.G. Aref, P.-A. Larson, Simdb: a similarity-aware database system, in: Proceedings of the 2010 International Conference on Management of Data, SIGMOD ׳10, ACM, New York, NY, USA, 2010, pp. 1243–1246.
– reference: F.N. Afrati, A.D. Sarma, D. Menestrina, A. Parameswaran, J.D. Ullman, Fuzzy joins using mapreduce, in: Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, ICDE ׳12, IEEE Computer Society, Washington, DC, USA, 2012, pp. 498–509.
– volume: 29
  start-page: 60
  year: 2006
  end-page: 66
  ident: bib14
  article-title: Data debugger
  publication-title: IEEE Data Eng. Bull.
– reference: Y.N. Silva, S.S. Pearson, J.A. Cheney, Database similarity join for metric spaces, in: Proceedings of the 6th International Conference on Similarity Search and Applications, SISAP ׳13, Springer, Berlin, Heidelberg, 2013, pp. 266–279.
– reference: Y.N. Silva, M.U. Arshad, W.G. Aref, Exploiting similarity-aware grouping in decision support systems, in: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ׳09, ACM, New York, NY, USA, 2009, pp. 1144–1147.
– reference: R. Vernica, M.J. Carey, C. Li, Efficient parallel set-similarity joins using mapreduce, in: Proceedings of the 2010 International Conference on Management of Data, SIGMOD ׳10, ACM, New York, NY, USA, 2010, pp. 495–506.
– volume: 36
  start-page: 15:1
  year: 2011
  end-page: 15:41
  ident: bib21
  article-title: Efficient similarity joins for near-duplicate detection
  publication-title: ACM Trans. Database Syst.
– reference: Dblp bibliography 〈
– volume: 28
  start-page: 517
  year: 2003
  end-page: 580
  ident: bib29
  article-title: Index-driven similarity search in metric spaces (survey article)
  publication-title: ACM Trans. Database Syst.
– reference: S. Chaudhuri, V. Ganti, R. Kaushik, A primitive operator for similarity joins in data cleaning, in: Proceedings of the 22nd International Conference on Data Engineering, ICDE ׳06, IEEE Computer Society, Washington, DC, USA, 2006, pp. 5.
– reference: V. Dohnal, C. Gennaro, P. Zezula, Similarity join in metric spaces using ed-index, in: Proceedings of the 25th European Conference on IR Research, ECIR׳03, Springer-Verlag, Berlin, Heidelberg, 2003, pp. 452–467.
– reference: Postgresql 〈
– reference: L. Gravano, P.G. Ipeirotis, H.V. Jagadish, N. Koudas, S. Muthukrishnan, D. Srivastava, Approximate string joins in a database (almost) for free, in: Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ׳01, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001, pp. 491–500.
– reference: M.D. Lieberman, J. Sankaranarayanan, H. Samet, A fast similarity join algorithm using graphics processing units, in: Proceeding of the 17th International Conference on World Wide Web, WWW ׳08, ACM, New York, NY, USA, 2008, pp. 131–140.
– volume: 6
  start-page: 728
  year: 2004
  end-page: 749
  ident: bib9
  article-title: The k-nearest neighbour join
  publication-title: Knowl. Inf. Syst.
– volume: 22
  start-page: 395
  year: 2013
  end-page: 420
  ident: bib30
  article-title: Similarity queries
  publication-title: VLDB J.
– reference: J.-P. Dittrich, B. Seeger, Gess: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces, in: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ׳01, ACM, New York, NY, USA, 2001, pp. 47–56.
– reference: A. Frank, A. Asuncion, UCI machine learning repository 〈
– reference: 〉, 2013.
– volume: 33
  start-page: 7:1
  year: 2008
  end-page: 7:38
  ident: bib7
  article-title: Metric space similarity joins
  publication-title: ACM Trans. Database Syst.
– volume: 49
  start-page: 332
  year: 2007
  end-page: 344
  ident: bib10
  article-title: Efficient index-based knn join processing for high-dimensional data
  publication-title: Inf. Softw. Technol.
– reference: B. Bryan, F. Eberhardt, C. Faloutsos, Compact similarity joins, in: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE ׳08, IEEE Computer Society, Washington, DC, USA, 2008, pp. 346–355.
– reference: 〉, 2010.
– volume: 1
  start-page: 933
  year: 2008
  end-page: 944
  ident: bib20
  article-title: Ed-join
  publication-title: Proc. VLDB Endow.
– volume: 7
  start-page: 18
  year: 2009
  end-page: 35
  ident: bib12
  article-title: Solving similarity joins and range queries in metric spaces with the list of twin clusters
  publication-title: J. Discrete Algorithms
– reference: Y.N. Silva, W.G. Aref, M.H. Ali, The similarity join database operator, in: Proceedings of the 2010 IEEE International Conference on Data Engineering, ICDE ׳10, IEEE Computer Society, Washington, DC, USA, 2010, pp. 892–903.
– reference: Y.N. Silva, W.G. Aref, M.H. Ali, Similarity group-by, in: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE ׳09, IEEE Computer Society, Washington, DC, USA, 2009, pp. 904–915.
– reference: Y.N. Silva, J.M. Reed, L.M. Tsosie, Mapreduce-based similarity join for metric spaces, in: Proceedings of the 1st International Workshop on Cloud Intelligence, Cloud-I ׳12, ACM, New York, NY, USA, 2012, pp. 3:1–3:8.
– reference: C. Böhm, H.-P. Kriegel, A cost model and index architecture for the similarity join, in: Proceedings of the 17th International Conference on Data Engineering, ICDE ׳01, IEEE Computer Society, Washington, DC, USA, 2001, pp. 411–420.
– reference: Y.N. Silva, J.M. Reed, Exploiting mapreduce-based similarity joins, in: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ׳12, ACM, New York, NY, USA, 2012, pp. 693–696.
– reference: C. Xia, H. Lu, B.C. Ooi, J. Hu, Gorder: an efficient method for knn join processing, in: Proceedings of the 30th International Conference on Very large Data Bases – vol. 30, VLDB ׳04, VLDB Endowment, 2004, pp. 756–767.
– volume: 5
  start-page: 704
  year: 2012
  end-page: 715
  ident: bib25
  article-title: V-smart-join
  publication-title: Proc. VLDB Endow.
– reference: C. Böhm, B. Braunmüller, F. Krebs, H.-P. Kriegel, Epsilon grid order: An algorithm for the similarity join on massive high-dimensional data, in: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, SIGMOD ׳01, ACM, New York, NY, USA, 2001, pp. 379–388.
– volume: 5
  start-page: 1922
  year: 2012
  end-page: 1925
  ident: bib18
  article-title: Exploiting database similarity joins for metric spaces
  publication-title: Proc. VLDB Endow.
– reference: K. Fredriksson, B. Braithwaite, Quicker similarity joins in metric spaces, in: Proceedings of the 6th International Conference on Similarity Search and Applications, SISAP ׳13, Springer, Berlin, Heidelberg, 2013, pp. 127–140.
– reference: G.R. Hjaltason, H. Samet, Incremental distance join algorithms for spatial databases, in: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD ׳98, ACM, New York, NY, USA, 1998, pp. 237–248.
– reference: Postgis 〈
– volume: 22
  start-page: 395
  year: 2013
  ident: 10.1016/j.is.2015.01.008_bib30
  article-title: Similarity queries
  publication-title: VLDB J.
  doi: 10.1007/s00778-012-0296-4
– volume: 28
  start-page: 517
  year: 2003
  ident: 10.1016/j.is.2015.01.008_bib29
  article-title: Index-driven similarity search in metric spaces (survey article)
  publication-title: ACM Trans. Database Syst.
  doi: 10.1145/958942.958948
– ident: 10.1016/j.is.2015.01.008_bib24
  doi: 10.1145/2347673.2347676
– volume: 49
  start-page: 332
  year: 2007
  ident: 10.1016/j.is.2015.01.008_bib10
  article-title: Efficient index-based knn join processing for high-dimensional data
  publication-title: Inf. Softw. Technol.
  doi: 10.1016/j.infsof.2006.05.006
– ident: 10.1016/j.is.2015.01.008_bib28
  doi: 10.1007/978-3-642-41062-8_27
– ident: 10.1016/j.is.2015.01.008_bib6
  doi: 10.1145/502512.502524
– volume: 5
  start-page: 704
  year: 2012
  ident: 10.1016/j.is.2015.01.008_bib25
  article-title: V-smart-join
  publication-title: Proc. VLDB Endow.
  doi: 10.14778/2212351.2212353
– volume: 7
  start-page: 18
  year: 2009
  ident: 10.1016/j.is.2015.01.008_bib12
  article-title: Solving similarity joins and range queries in metric spaces with the list of twin clusters
  publication-title: J. Discrete Algorithms
  doi: 10.1016/j.jda.2008.09.012
– ident: 10.1016/j.is.2015.01.008_bib19
  doi: 10.1109/ICDE.2008.4497443
– ident: 10.1016/j.is.2015.01.008_bib15
– ident: 10.1016/j.is.2015.01.008_bib34
– ident: 10.1016/j.is.2015.01.008_bib32
– volume: 36
  start-page: 15:1
  year: 2011
  ident: 10.1016/j.is.2015.01.008_bib21
  article-title: Efficient similarity joins for near-duplicate detection
  publication-title: ACM Trans. Database Syst.
  doi: 10.1145/2000824.2000825
– ident: 10.1016/j.is.2015.01.008_bib23
  doi: 10.1145/1807167.1807222
– ident: 10.1016/j.is.2015.01.008_bib35
  doi: 10.1109/ICDE.2009.113
– ident: 10.1016/j.is.2015.01.008_bib11
  doi: 10.1016/B978-012088469-8/50067-X
– ident: 10.1016/j.is.2015.01.008_bib22
– ident: 10.1016/j.is.2015.01.008_bib5
  doi: 10.1145/375663.375714
– ident: 10.1016/j.is.2015.01.008_bib8
  doi: 10.1145/276304.276326
– ident: 10.1016/j.is.2015.01.008_bib4
  doi: 10.1007/3-540-36618-0_32
– ident: 10.1016/j.is.2015.01.008_bib17
  doi: 10.1145/1807167.1807330
– ident: 10.1016/j.is.2015.01.008_bib1
– volume: 29
  start-page: 60
  year: 2006
  ident: 10.1016/j.is.2015.01.008_bib14
  article-title: Data debugger
  publication-title: IEEE Data Eng. Bull.
– ident: 10.1016/j.is.2015.01.008_bib31
  doi: 10.1007/978-3-642-41062-8_13
– volume: 33
  start-page: 7:1
  year: 2008
  ident: 10.1016/j.is.2015.01.008_bib7
  article-title: Metric space similarity joins
  publication-title: ACM Trans. Database Syst.
  doi: 10.1145/1366102.1366104
– volume: 1
  start-page: 933
  year: 2008
  ident: 10.1016/j.is.2015.01.008_bib20
  article-title: Ed-join
  publication-title: Proc. VLDB Endow.
  doi: 10.14778/1453856.1453957
– ident: 10.1016/j.is.2015.01.008_bib2
  doi: 10.1109/ICDE.2001.914854
– ident: 10.1016/j.is.2015.01.008_bib16
  doi: 10.1109/ICDE.2010.5447873
– ident: 10.1016/j.is.2015.01.008_bib33
– ident: 10.1016/j.is.2015.01.008_bib36
  doi: 10.1145/1516360.1516499
– ident: 10.1016/j.is.2015.01.008_bib3
  doi: 10.1007/3-540-36618-0_32
– ident: 10.1016/j.is.2015.01.008_bib26
  doi: 10.1109/ICDE.2012.66
– ident: 10.1016/j.is.2015.01.008_bib27
  doi: 10.1145/2213836.2213935
– ident: 10.1016/j.is.2015.01.008_bib13
  doi: 10.1109/ICDE.2006.9
– volume: 6
  start-page: 728
  year: 2004
  ident: 10.1016/j.is.2015.01.008_bib9
  article-title: The k-nearest neighbour join
  publication-title: Knowl. Inf. Syst.
  doi: 10.1007/s10115-003-0122-9
– volume: 5
  start-page: 1922
  year: 2012
  ident: 10.1016/j.is.2015.01.008_bib18
  article-title: Exploiting database similarity joins for metric spaces
  publication-title: Proc. VLDB Endow.
  doi: 10.14778/2367502.2367538
SSID ssj0002599
Score 2.1566527
Snippet Similarity Joins are extensively used in multiple application domains and are recognized among the most useful data processing and analysis operations. They...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 149
SubjectTerms Data processing
Database operator
Information systems
Operators
Optimization
PostgreSQL
Query processing
Query processing and optimization
Similarity
Similarity Join
Similarity queries
Transformations
Title Similarity Joins: Their implementation and interactions with other database operators
URI https://dx.doi.org/10.1016/j.is.2015.01.008
https://www.proquest.com/docview/1770376415
Volume 52
WOSCitedRecordID wos000356983400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1873-6076
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002599
  issn: 0306-4379
  databaseCode: AIEXJ
  dateStart: 19950301
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFLZg4wEeBgwQGxcZCSGhKSK2mxtv09QJqqogtUXlyXIcR8u0pSHppvHvOb7kMhDTeOAlqpzEqXw-n3N8rgi9paHMs3yUe0KxQBfVDr1YBtSTlMkgTiQRJprw2zSazeLVKvnqbLqNaScQlWV8dZVU_5XUMAbE1qmz_0DublIYgN9AdLgC2eF6K8LPi_MCjqtau56stTXZxlUUtc6IdMHimzYIWVeLqG1ug8tzMxlZBzpwVAu4g3WljCO-GWqxLofJzGJLQRvbrU0-HNgW5sXZpVFOv4umKHuvD_Dh2mV6zSvNWureAnt0Ym9MRHHegc7Gf1uj0E-HZ2eqIEEXKNelaPmhpwsgDtmvLWDr-Cex9UudKCaWUf_B5a3B4RQm1bF5tu6qH_cSrfXiz77w4-V0yhfj1eJd9cPTvca0T941XrmLtmkUJMDOtw8_j1eTToLDkTCx3if7d51728YFXv_o39SZ3wS70VYWj9COO2bgQwuPx-iOKnfRw7aFB3YcfRc9GNSjfIKWPXawwc5HbJCDryMHA3LwEDlYIwcb5OAWObhDzlO0PB4vjj55ru-GJ0E73XhEp1tnYUqEymgmU9i7SrAc9nSc0jRJFfHzLI6Fn7M4V4pIRrORyGSocjgc0JQ9Q1vlulTPEU6YSjPF4GYiRylLU1DoVUQFGcEoo2wPfWiXj0tXlF73RjnjbfThKS8arhec-4TDgu-h990blS3IcsOzrKUIdwqlVRQ5IOmGt960xOPAa7UDTZRqfdFwEoF8jELQefdv8cwLdL_fBC_R1qa-UK_QPXm5KZr6tcPcL3OypWo
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Similarity+Joins%3A+Their+implementation+and+interactions+with+other+database+operators&rft.jtitle=Information+systems+%28Oxford%29&rft.au=Silva%2C+Yasin+N&rft.au=Pearson%2C+Spencer+S&rft.au=Chon%2C+Jaime&rft.au=Roberts%2C+Ryan&rft.date=2015-08-01&rft.issn=0306-4379&rft.volume=52&rft.spage=149&rft.epage=162&rft_id=info:doi/10.1016%2Fj.is.2015.01.008&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0306-4379&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0306-4379&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0306-4379&client=summon