An Efficient Filter Strategy for Theta-Join Query in Distributed Environment

Theta-join query is a very popular application in traditional databases, but due to tremendous computation cost and communication cost in distributed environment, it is not efficiently processed for big data. Current researches focus on processing theta-join by using MapReduce framework, which mainl...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings - International Workshops on Parallel Processing s. 77 - 84
Hlavní autoři:	Wenjie Liu, Zhanhuai Li, Yuntao Zhou
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.08.2017
Témata:	Big Data big data query distributed computing Distributed databases Electronic mail filter strategy Filtering algorithms Sparks theta-join Transforms
ISSN:	1530-2016
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Theta-join query is a very popular application in traditional databases, but due to tremendous computation cost and communication cost in distributed environment, it is not efficiently processed for big data. Current researches focus on processing theta-join by using MapReduce framework, which mainly consider the overheads of load balance in the network, when the data sets become larger, massive intermediate results lead to high communication cost. In this work, we propose a filter method for theta-join to reduce the computation and communication cost in distributed environment, which can effectively improve the theta-join query. We consider both the load balance in the cluster and the memory cost in the parallel framework. We have implemented our method in a popular general-purpose data processing framework, Spark. The experimental results demonstrate that our method can significantly improve the performance of theta-joins comparing the state-of-art solutions.
ISSN:	1530-2016
DOI:	10.1109/ICPPW.2017.24