A Scalable Similarity Join Algorithm Based on MapReduce and LSH

Similarity joins are recognized to be among the most useful data processing and analysis operations. A similarity join is used to retrieve all data pairs whose distances are smaller than a predefined threshold λ . In this paper, we introduce the MRS-join algorithm to perform similarity joins on larg...

Full description

Saved in:
Bibliographic Details
Published in:International journal of parallel programming Vol. 50; no. 3-4; pp. 360 - 380
Main Authors: Rivault, Sébastien, Bamha, Mostafa, Limet, Sébastien, Robert, Sophie
Format: Journal Article
Language:English
Published: New York Springer US 01.08.2022
Springer Nature B.V
Springer Verlag
Subjects:
ISSN:0885-7458, 1573-7640
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Similarity joins are recognized to be among the most useful data processing and analysis operations. A similarity join is used to retrieve all data pairs whose distances are smaller than a predefined threshold λ . In this paper, we introduce the MRS-join algorithm to perform similarity joins on large trajectory datasets. The MapReduce model and a randomized local sensitive hashing keys redistribution approach are used to balance load among processing nodes while reducing communications and computations to almost all relevant data by using distributed histograms. A cost analysis of the MRS-join algorithm shows that our approach is insensitive to data skew and guarantees perfect balancing properties, in large scale systems, during all stages of similarity join computations. These performances have been confirmed by a series of experiments using the Fréchet distance on large datasets of trajectories from real world and synthetic data benchmarks.
AbstractList Similarity joins are recognized to be among the most useful data processing and analysis operations. A similarity join is used to retrieve all data pairs whose distances are smaller than a predefined threshold λ. In this paper, we introduce the MRS-join algorithm to perform similarity joins on large trajectory datasets. The MapReduce model and a randomized local sensitive hashing keys redistribution approach are used to balance load among processing nodes while reducing communications and computations to almost all relevant data by using distributed histograms. A cost analysis of the MRS-join algorithm shows that our approach is insensitive to data skew and guarantees perfect balancing properties, in large scale systems, during all stages of similarity join computations. These performances have been confirmed by a series of experiments using the Fréchet distance on large datasets of trajectories from real world and synthetic data benchmarks.
Similarity joins are recognized to be among the most useful data processing and analysis operations. A similarity join is used to retrieve all data pairs whose distances are smaller than a predefined threshold λ . In this paper, we introduce the MRS-join algorithm to perform similarity joins on large trajectory datasets. The MapReduce model and a randomized local sensitive hashing keys redistribution approach are used to balance load among processing nodes while reducing communications and computations to almost all relevant data by using distributed histograms. A cost analysis of the MRS-join algorithm shows that our approach is insensitive to data skew and guarantees perfect balancing properties, in large scale systems, during all stages of similarity join computations. These performances have been confirmed by a series of experiments using the Fréchet distance on large datasets of trajectories from real world and synthetic data benchmarks.
Author Robert, Sophie
Rivault, Sébastien
Bamha, Mostafa
Limet, Sébastien
Author_xml – sequence: 1
  givenname: Sébastien
  surname: Rivault
  fullname: Rivault, Sébastien
  organization: Université Orléans, INSA Centre Val de Loire, LIFO, EA
– sequence: 2
  givenname: Mostafa
  surname: Bamha
  fullname: Bamha, Mostafa
  email: Mostafa.Bamha@univ-orleans.fr
  organization: Université Orléans, INSA Centre Val de Loire, LIFO, EA
– sequence: 3
  givenname: Sébastien
  surname: Limet
  fullname: Limet, Sébastien
  organization: Université Orléans, INSA Centre Val de Loire, LIFO, EA
– sequence: 4
  givenname: Sophie
  surname: Robert
  fullname: Robert, Sophie
  organization: Université Orléans, INSA Centre Val de Loire, LIFO, EA
BackLink https://hal.science/hal-03677361$$DView record in HAL
BookMark eNp9kE1LxDAQhoMouH78AU8BTx6qmabNx0lWUVdZEVw9h9k2XSPdZk26gv_erFUED57ChOcZ3nn3yHbnO0vIEbBTYEyeRWBSiIzleZZGzjOxRUZQSp5JUbBtMmJKlZksSrVL9mJ8ZYxpqdSInI_prMIW562lM7d0LQbXf9A77zo6bhc-TS9LeoHR1tR39B5Xj7ZeV5ZiV9PpbHJAdhpsoz38fvfJ8_XV0-Ukmz7c3F6Op1mVK95nFYBFpayyHGrdFBrmqqznBWvmORa5rjTWZQHIWYMSoNGoQUCpGgCty3THPjkZ9r5ga1bBLTF8GI_OTMZTs_ljXEjJBbxDYo8HdhX829rG3rz6dehSPJMLVao8RdKJUgNVBR9jsI2pXI-9810f0LUGmNlUa4ZqTarWfFVrRFLzP-pPon8lPkgxwd3Cht9U_1ifM1yKDw
CitedBy_id crossref_primary_10_1145_3725403
crossref_primary_10_1007_s10766_024_00772_1
Cites_doi 10.1016/j.procs.2015.05.200
10.1145/3231541.3231549
10.1080/13658816.2016.1199806
10.1142/S0129626403001306
10.14778/3137628.3137655
10.1007/s00454-012-9402-z
10.1142/S0218195995000064
10.1007/s00454-017-9878-7
10.1145/1327452.1327492
10.2307/2226729
10.1016/j.procs.2014.05.014
10.14778/2212351.2212353
10.1007/978-3-030-24766-9_19
10.1145/276698.276876
10.1007/11546924_60
10.1109/ICDE.2019.00115
10.1145/1807167.1807273
10.1109/ICDAR.2007.4378752
10.1145/513400.513414
10.1109/FOCS.2014.76
10.1145/3139958.3140062
10.1145/3034786.3056110
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022
The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022
– notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
3V.
7SC
7WY
7WZ
7XB
87Z
8AL
8FD
8FE
8FG
8FK
8FL
8G5
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BEZIV
BGLVJ
CCPQU
DWQXO
FRNLG
F~G
GNUQQ
GUQSH
HCIFZ
JQ2
K60
K6~
K7-
L.-
L7M
L~C
L~D
M0C
M0N
M2O
MBDVC
P5Z
P62
PHGZM
PHGZT
PKEHL
PQBIZ
PQBZA
PQEST
PQGLB
PQQKQ
PQUKI
Q9U
1XC
DOI 10.1007/s10766-022-00733-6
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Collection
Computing Database (Alumni Edition)
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni Edition)
Research Library (Alumni Edition)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials
ProQuest Central
Business Premium Collection
Technology collection
ProQuest One Community College
ProQuest Central
Business Premium Collection (Alumni)
ABI/INFORM Global (Corporate)
ProQuest Central Student
Research Library Prep
SciTech Premium Collection
ProQuest Computer Science Collection
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
Computer Science Database
ABI/INFORM Professional Advanced
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ABI/INFORM Global (OCUL)
Computing Database
ProQuest research library
Research Library (Corporate)
ProQuest advanced technologies & aerospace journals
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Business (UW System Shared)
ProQuest One Business (Alumni)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central Basic
Hyper Article en Ligne (HAL)
DatabaseTitle CrossRef
ABI/INFORM Global (Corporate)
ProQuest Business Collection (Alumni Edition)
ProQuest One Business
Research Library Prep
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
Research Library (Alumni Edition)
ABI/INFORM Complete
ProQuest Central
ABI/INFORM Professional Advanced
ProQuest One Applied & Life Sciences
ProQuest Central Korea
ProQuest Research Library
ProQuest Central (New)
Advanced Technologies Database with Aerospace
ABI/INFORM Complete (Alumni Edition)
Advanced Technologies & Aerospace Collection
Business Premium Collection
ABI/INFORM Global
ProQuest Computing
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Business Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Business (Alumni)
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
Business Premium Collection (Alumni)
DatabaseTitleList ABI/INFORM Global (Corporate)

Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central - New (Subscription)
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-7640
EndPage 380
ExternalDocumentID oai:HAL:hal-03677361v1
10_1007_s10766_022_00733_6
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
-~X
.4S
.86
.DC
.VR
06D
0R~
0VY
199
1N0
2.D
203
28-
29J
2J2
2JN
2JY
2KG
2LR
2P1
2VQ
2~H
30V
3V.
4.4
406
408
409
40D
40E
5GY
5QI
5VS
67Z
6NX
78A
7WY
8FE
8FG
8FL
8G5
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYJJ
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDBF
ABDPE
ABDZT
ABECU
ABFSI
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTAH
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFO
ACGFS
ACHSB
ACHXU
ACIHN
ACKNC
ACMDZ
ACMLO
ACNCT
ACOKC
ACOMO
ACPIV
ACREN
ACUHS
ACZOJ
ADHIR
ADINQ
ADKNI
ADKPE
ADMLS
ADRFC
ADTPH
ADURQ
ADYFF
ADYOE
ADZKW
AEAQA
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFKRA
AFLOW
AFQWF
AFWTZ
AFYQB
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMTXH
AMXSW
AMYLF
AOCGG
ARAPS
ARCSS
ARMRJ
AXYYD
AYJHY
AZFZN
AZQEC
B-.
B0M
BA0
BBWZM
BDATZ
BENPR
BEZIV
BGLVJ
BGNMA
BKOMP
BPHCQ
BSONS
CAG
CCPQU
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
DWQXO
E.L
EAD
EAP
EAS
EBLON
EBS
EDO
EIOEI
EJD
EMK
EPL
ESBYG
ESX
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRNLG
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GQ8
GROUPED_ABI_INFORM_COMPLETE
GROUPED_ABI_INFORM_RESEARCH
GUQSH
GXS
H13
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
H~9
I-F
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K60
K6V
K6~
K7-
KDC
KOV
KOW
LAK
LLZTM
M0C
M0N
M2O
M4Y
MA-
MS~
N2Q
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P62
P9O
PF0
PQBIZ
PQBZA
PQQKQ
PROAC
PT4
PT5
Q2X
QOK
QOS
R89
R9I
RHV
RNI
RNS
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TAE
TEORI
TN5
TSG
TSK
TSV
TUC
TUS
U2A
U5U
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VXZ
W23
W48
WH7
WK8
YLTOR
Z45
Z7R
Z7X
Z81
Z83
Z88
Z8R
Z8W
Z92
ZMTXR
ZY4
~8M
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABRTQ
ACSTC
ADHKG
AEZWR
AFDZB
AFFHD
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
PHGZM
PHGZT
PQGLB
7SC
7XB
8AL
8FD
8FK
JQ2
L.-
L7M
L~C
L~D
MBDVC
PKEHL
PQEST
PQUKI
Q9U
1XC
ID FETCH-LOGICAL-c283t-c11ea88e8e31d9f491b85db40fb2a429c9ad541a30fa711f9a916158f11995573
IEDL.DBID RSV
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000800989600002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0885-7458
IngestDate Tue Oct 14 20:11:51 EDT 2025
Wed Nov 05 01:10:17 EST 2025
Sat Nov 29 01:59:46 EST 2025
Tue Nov 18 22:40:09 EST 2025
Fri Feb 21 02:46:00 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 3-4
Keywords Data skew
Similarity join operations
Local sensitive hashing (LSH)
Hadoop framework
MapReduce model
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c283t-c11ea88e8e31d9f491b85db40fb2a429c9ad541a30fa711f9a916158f11995573
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 2685822839
PQPubID 48389
PageCount 21
ParticipantIDs hal_primary_oai_HAL_hal_03677361v1
proquest_journals_2685822839
crossref_citationtrail_10_1007_s10766_022_00733_6
crossref_primary_10_1007_s10766_022_00733_6
springer_journals_10_1007_s10766_022_00733_6
PublicationCentury 2000
PublicationDate 2022-08-01
PublicationDateYYYYMMDD 2022-08-01
PublicationDate_xml – month: 08
  year: 2022
  text: 2022-08-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle International journal of parallel programming
PublicationTitleAbbrev Int J Parallel Prog
PublicationYear 2022
Publisher Springer US
Springer Nature B.V
Springer Verlag
Publisher_xml – name: Springer US
– name: Springer Nature B.V
– name: Springer Verlag
References CR2
CR3
Buchin, Buchin, Meulemans, Mulzer (CR7) 2017; 58
CR6
Florence (CR12) 1950; 60
Alt, Godau (CR1) 1995; 05
CR5
Bamha, Exbrayat (CR4) 2003; 13
Driemel, Har-Peled, Wenk (CR10) 2012; 48
Hassan, Bamha, Loulergue (CR14) 2014; 29
CR17
CR16
CR15
Hassan, Bamha (CR13) 2015; 51
Metwally, Faloutsos (CR19) 2012; 5
CR23
CR11
Ceccarello, Driemel, Silvestri, Friggstad, Sack, Salavatipour (CR8) 2019
CR20
Konzack, Mcketterick, Ophelders, Buchin, Giuggioli, Long, Nelson, Westenberg, Buchin (CR18) 2017; 31
Dean, Ghemawat (CR9) 2008; 51
Xie, Li, Phillips (CR22) 2017; 10
Werner, Oliver (CR21) 2018; 10
A Driemel (733_CR10) 2012; 48
733_CR2
MAH Hassan (733_CR14) 2014; 29
A Metwally (733_CR19) 2012; 5
733_CR3
733_CR5
733_CR6
MAH Hassan (733_CR13) 2015; 51
H Alt (733_CR1) 1995; 05
M Ceccarello (733_CR8) 2019
M Konzack (733_CR18) 2017; 31
PS Florence (733_CR12) 1950; 60
733_CR20
733_CR11
D Xie (733_CR22) 2017; 10
M Bamha (733_CR4) 2003; 13
K Buchin (733_CR7) 2017; 58
733_CR23
J Dean (733_CR9) 2008; 51
M Werner (733_CR21) 2018; 10
733_CR15
733_CR16
733_CR17
References_xml – volume: 51
  start-page: 70
  year: 2015
  end-page: 79
  ident: CR13
  article-title: Towards scalability and data skew handling in groupby-joins using mapreduce model
  publication-title: Procedia Comput. Sci.
  doi: 10.1016/j.procs.2015.05.200
– volume: 10
  start-page: 24
  issue: 1
  year: 2018
  end-page: 27
  ident: CR21
  article-title: ACM SIGSPATIAL GIS cup 2017: range queries under fréchet distance
  publication-title: SIGSPATIAL Special
  doi: 10.1145/3231541.3231549
– ident: CR3
– ident: CR15
– volume: 31
  start-page: 320
  issue: 2
  year: 2017
  end-page: 345
  ident: CR18
  article-title: Visual analytics of delays and interaction in movement data
  publication-title: Int. J. Geogr. Inf. Sci.
  doi: 10.1080/13658816.2016.1199806
– ident: CR2
– volume: 13
  start-page: 317
  issue: 3
  year: 2003
  end-page: 328
  ident: CR4
  article-title: Pipelining a skew-insensitive parallel join algorithm
  publication-title: Parallel Process. Lett.
  doi: 10.1142/S0129626403001306
– ident: CR16
– ident: CR17
– ident: CR11
– volume: 10
  start-page: 1478
  issue: 11
  year: 2017
  end-page: 1489
  ident: CR22
  article-title: Distributed trajectory similarity search
  publication-title: Proc. VLDB Endowment
  doi: 10.14778/3137628.3137655
– volume: 48
  start-page: 94
  issue: 1
  year: 2012
  end-page: 127
  ident: CR10
  article-title: Approximating the fréchet distance for realistic curves in near linear time
  publication-title: Discret. Comput. Geomet.
  doi: 10.1007/s00454-012-9402-z
– volume: 05
  start-page: 75
  issue: 1
  year: 1995
  end-page: 91
  ident: CR1
  article-title: Computing the fréchet distance between two polygonal curves
  publication-title: Int. J. Comput. Geomet. Appl.
  doi: 10.1142/S0218195995000064
– volume: 58
  start-page: 180
  issue: 1
  year: 2017
  end-page: 216
  ident: CR7
  article-title: Four soviets walk the dog: Improved bounds for computing the fréchet distance
  publication-title: Discret. Comput. Geomet.
  doi: 10.1007/s00454-017-9878-7
– volume: 51
  start-page: 107
  issue: 1
  year: 2008
  end-page: 113
  ident: CR9
  article-title: Mapreduce: simplified data processing on large clusters
  publication-title: Commun. ACM
  doi: 10.1145/1327452.1327492
– ident: CR6
– volume: 60
  start-page: 808
  issue: 240
  year: 1950
  end-page: 810
  ident: CR12
  article-title: Human behaviour and the principle of least effort
  publication-title: Econ. J.
  doi: 10.2307/2226729
– volume: 29
  start-page: 145
  year: 2014
  end-page: 158
  ident: CR14
  article-title: Handling data-skew effects in join operations using mapreduce
  publication-title: Procedia Comput. Sci.
  doi: 10.1016/j.procs.2014.05.014
– ident: CR5
– volume: 5
  start-page: 704
  issue: 8
  year: 2012
  end-page: 715
  ident: CR19
  article-title: V-smart-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors
  publication-title: Proc. VLDB Endow.
  doi: 10.14778/2212351.2212353
– start-page: 254
  year: 2019
  end-page: 268
  ident: CR8
  article-title: Fresh: Fréchet similarity with hashing
  publication-title: Algorithms and Data Structures
  doi: 10.1007/978-3-030-24766-9_19
– ident: CR23
– ident: CR20
– ident: 733_CR17
  doi: 10.1145/276698.276876
– volume: 5
  start-page: 704
  issue: 8
  year: 2012
  ident: 733_CR19
  publication-title: Proc. VLDB Endow.
  doi: 10.14778/2212351.2212353
– ident: 733_CR3
  doi: 10.1007/11546924_60
– volume: 10
  start-page: 1478
  issue: 11
  year: 2017
  ident: 733_CR22
  publication-title: Proc. VLDB Endowment
  doi: 10.14778/3137628.3137655
– start-page: 254
  volume-title: Algorithms and Data Structures
  year: 2019
  ident: 733_CR8
  doi: 10.1007/978-3-030-24766-9_19
– ident: 733_CR23
  doi: 10.1109/ICDE.2019.00115
– ident: 733_CR5
  doi: 10.1145/1807167.1807273
– volume: 58
  start-page: 180
  issue: 1
  year: 2017
  ident: 733_CR7
  publication-title: Discret. Comput. Geomet.
  doi: 10.1007/s00454-017-9878-7
– ident: 733_CR20
  doi: 10.1109/ICDAR.2007.4378752
– volume: 51
  start-page: 107
  issue: 1
  year: 2008
  ident: 733_CR9
  publication-title: Commun. ACM
  doi: 10.1145/1327452.1327492
– ident: 733_CR16
  doi: 10.1145/513400.513414
– volume: 29
  start-page: 145
  year: 2014
  ident: 733_CR14
  publication-title: Procedia Comput. Sci.
  doi: 10.1016/j.procs.2014.05.014
– ident: 733_CR6
  doi: 10.1109/FOCS.2014.76
– volume: 31
  start-page: 320
  issue: 2
  year: 2017
  ident: 733_CR18
  publication-title: Int. J. Geogr. Inf. Sci.
  doi: 10.1080/13658816.2016.1199806
– ident: 733_CR11
– volume: 60
  start-page: 808
  issue: 240
  year: 1950
  ident: 733_CR12
  publication-title: Econ. J.
  doi: 10.2307/2226729
– volume: 13
  start-page: 317
  issue: 3
  year: 2003
  ident: 733_CR4
  publication-title: Parallel Process. Lett.
  doi: 10.1142/S0129626403001306
– volume: 51
  start-page: 70
  year: 2015
  ident: 733_CR13
  publication-title: Procedia Comput. Sci.
  doi: 10.1016/j.procs.2015.05.200
– volume: 10
  start-page: 24
  issue: 1
  year: 2018
  ident: 733_CR21
  publication-title: SIGSPATIAL Special
  doi: 10.1145/3231541.3231549
– volume: 48
  start-page: 94
  issue: 1
  year: 2012
  ident: 733_CR10
  publication-title: Discret. Comput. Geomet.
  doi: 10.1007/s00454-012-9402-z
– volume: 05
  start-page: 75
  issue: 1
  year: 1995
  ident: 733_CR1
  publication-title: Int. J. Comput. Geomet. Appl.
  doi: 10.1142/S0218195995000064
– ident: 733_CR2
  doi: 10.1145/3139958.3140062
– ident: 733_CR15
  doi: 10.1145/3034786.3056110
SSID ssj0009788
Score 2.2747247
Snippet Similarity joins are recognized to be among the most useful data processing and analysis operations. A similarity join is used to retrieve all data pairs whose...
SourceID hal
proquest
crossref
springer
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 360
SubjectTerms Algorithms
Cognitive science
Computer Science
Cost analysis
Data processing
Datasets
Histograms
Processor Architectures
Similarity
Software Engineering/Programming and Operating Systems
Special Issue on High-Level Parallel Programming and Applications 2021
Theory of Computation
Time series
SummonAdditionalLinks – databaseName: ABI/INFORM Global (OCUL)
  dbid: M0C
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEB509eDFt7i6ShBvGmz6SnKSVZRFVhEf4K2kSaoLa3fdXf39ZmrqqqAXb6VN09Av05kmM98HsB87A7KCh9RqLmkcW0NzxSOqUpPowHAXlOhKbIJfXYmHB3ntF9zGPq2y_iZWH2oz0LhGfhQiUTpytcjj4QtF1SjcXfUSGrMwh5ENpvRdBqdT0l1e6U46Q0oojxPhi2Z86RxPMf02pJVuIU2_OabZJ0yL_BJz_tgmrbzP-dJ_x70Miz7uJO2PibICM7ZchaVa04F4E1-D47Y7Vn0sqCK3veee-_F1cTq5GPRK0u4_up4nT8_kxPk-QwYluVTDGyR_tUSVhnRvO-twf352d9qhXmSBajeoCdUOKyWEFTZiRhYOu1wkJo-DIg-Vc1ZaKpPETEVBoThjhVQSg0RRMCzuTni0AY1yUNpNIFIxXYTchHEexUliXFe51qKQOihspIImsPoNZ9ozkKMQRj-bcicjKplDJatQydImHHzeM_zg3_iz9Z4D7rMhUmd32t0MzzlPzXmUsjfWhFaNVOYNdpxNYWrCYY319PLvj9z6u7dtWAirSYYpgy1oTEavdgfm9dukNx7tVtP1HbRd64c
  priority: 102
  providerName: ProQuest
Title A Scalable Similarity Join Algorithm Based on MapReduce and LSH
URI https://link.springer.com/article/10.1007/s10766-022-00733-6
https://www.proquest.com/docview/2685822839
https://hal.science/hal-03677361
Volume 50
WOSCitedRecordID wos000800989600002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 1573-7640
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0009788
  issn: 0885-7458
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dT9swED-tsIe9AINNlI_KmnhjluIkju0nVBCo4qOrWgZsL5FjO7RSSREt_P2cQ0IZgknbixUnjhPd-XIX-e73A9iJ0YCcFCF1Rigax87STIuI6sRyE1iBQYkpySZEtyuvrlSvKgqb1tnu9ZZk-aV-UewmEp8wG9KSaZAmDVhEdyc9YUN_cDGH2hUl2ySaD6ci5rIqlXl7jj_cUWPokyFfRJqvNkdLn3O0_H9vuwJLVYxJ2k-L4jN8cMUqLNf8DaQy5zXYa-OxHvviKTIY3YzwJxdjcnI8GRWkPb6eYG94Q_bRz1kyKciZvu17oFdHdGHJ6aDzBX4eHZ4fdGhFqEANRhEzalAvWkonXcSsylFPmeQ2i4M8CzU6JqO05THTUZBrwViutPIBocyZL-TmIvoKC8WkcOtAlGYmD4UN4yyKObc4VWaMzJUJchfpoAmslmtqKrRxT3oxTuc4yV5CKUooLSWUJk3Yfb7n9glr46-jv6G6ngd6mOxO-zT159ArCxEl7IE1YavWZloZ5zQNPea-h_1RTfhea29--f1Hbvzb8E34FJYLwKcLbsHC7O7ebcNH8zAbTe9a0BCXv1qwuH_Y7fWxdyIotmfBgW_DH9j2-O9WubQfAQxd58c
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3db9MwED9tAwleGJ-ibICF4AksYseJ7YdpKh9TR7sKsSHtzXNsh1Xq0rJ2Q_xT-xs5p8kKSOxtD7xFie3Y8e_O5_judwAvBQpQUJLT4KSmQgRPCytTanOfucRLNEpcnWxCDofq8FB_XoGLNhYmulW2OrFW1H7i4j_ytzwSpUeuFr09_U5j1qh4utqm0FjAoh9-_sAt22xr9wPO7yvOdz4evO_RJqsAdVh7Th12zioVVEiZ1yV2tlCZL0RSFtyidnba-kwwmyallYyV2upoFamSxWjmTKbY7ircEKmSUa76ki5JfmWd5xIFN6NSZKoJ0mlC9WQe3X05rfMk0vyPhXD1OLph_mbj_nUsW692O-v_23e6C3cau5p0F4JwD1ZCdR_W25wVpFFhD2C7i9d2HAPGyP7oZIQbe9yHkE-TUUW64284kvnxCXmHa7snk4rs2emXSG4biK08Gez3HsLXaxnGI1irJlV4DERb5kouPRdFKrLMY1OFc6rULilDapMOsHZGjWsY1mOij7FZckNHFBhEgalRYPIOvL6sM13wi1xZ-gUC5bJgpAbvdQcm3kNLRMo0Z-esA5stMkyjkGZmCYsOvGmxtXz871c-ubq153Crd7A3MIPdYX8DbvMa4NE9chPW5qdn4SncdOfz0ez0WS0qBI6uG3O_ADVLRxU
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V3db9MwED9tAyFeGJ-iMMBC8ATWEieO7Qc0FUbVsVJVDKRpL8bxB6vUpWUtQ_xr_HWc02QFJPa2B96ixHHs5Hfnu_jufgDPchQgLwWj3gpF89w7WhqRUVM4bhMn0CixNdmEGA7l4aEarcHPNhcmhlW2OrFW1G5q4z_ybRYLpcdaLWo7NGERo93ezuwrjQxScae1pdNYQmTf__iO7tv81d4ufuvnjPXefnzTpw3DALXY04JaHKiR0kufpU4FHHgpuSvzJJTMoKa2yjiepyZLghFpGpRR0UKSIY2ZzVxk2O86XBHoY8ZwwhE_WhX8FTXnJQoxpyLnsknYadL2RBFDfxmtORNp8ceiuH4cQzJ_s3f_2qKtV77e5v_8zm7CjcbeJt2lgNyCNV_dhs2Wy4I0qu0O7HTx2ExiIhk5GJ-M0eFH_4S8m44r0p18wZksjk_Ia1zzHZlW5L2ZfYhFbz0xlSODg_5d-HQp07gHG9W08veBKJPawIRjeZnlnDvsqrRWBmWT4DOTdCBtv662TeX1SAAy0aua0RERGhGha0ToogMvzu-ZLeuOXNj6KYLmvGEsGd7vDnQ8hxaKEFmRnqUd2GpRohtFNdcriHTgZYuz1eV_P_LBxb09gWsINT3YG-4_hOusxnqMmtyCjcXpN_8IrtqzxXh--riWGgKfLxtyvwCz4lA5
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Scalable+Similarity+Join+Algorithm+Based+on+MapReduce+and+LSH&rft.jtitle=International+journal+of+parallel+programming&rft.au=Rivault%2C+S%C3%A9bastien&rft.au=Bamha%2C+Mostafa&rft.au=Limet%2C+S%C3%A9bastien&rft.au=Robert%2C+Sophie&rft.date=2022-08-01&rft.pub=Springer+Verlag&rft.issn=0885-7458&rft.eissn=1573-7640&rft_id=info:doi/10.1007%2Fs10766-022-00733-6&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Ahal-03677361v1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-7458&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-7458&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-7458&client=summon