Fast and scalable inequality joins

Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as B + -tree, R ∗ -tree and Bitm...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:The VLDB journal Ročník 26; číslo 1; s. 125 - 150
Hlavní autoři: Khayyat, Zuhair, Lucia, William, Singh, Meghna, Ouzzani, Mourad, Papotti, Paolo, Quiané-Ruiz, Jorge-Arnulfo, Tang, Nan, Kalnis, Panos
Médium: Journal Article
Jazyk:angličtina
Vydáno: Berlin/Heidelberg Springer Berlin Heidelberg 01.02.2017
Springer Nature B.V
Témata:
ISSN:1066-8888, 0949-877X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as B + -tree, R ∗ -tree and Bitmap. However, inequality joins have received little attention and queries containing such joins are notably very slow. In this paper, we introduce fast inequality join algorithms based on sorted arrays and space-efficient bit-arrays. We further introduce a simple method to estimate the selectivity of inequality joins which is then used to optimize multiple predicate queries and multi-way joins. Moreover, we study an incremental inequality join algorithm to handle scenarios where data keeps changing. We have implemented a centralized version of these algorithms on top of PostgreSQL, a distributed version on top of Spark SQL, and an existing data cleaning system, Nadeef . By comparing our algorithms against well-known optimization techniques for inequality joins, we show our solution is more scalable and several orders of magnitude faster.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1066-8888
0949-877X
DOI:10.1007/s00778-016-0441-6