SEJ: An Even Approach to Multiway Theta-Joins Using MapReduce

Data analyzing and processing are important tasks in cloud computing. The MapReduce framework has been increasingly used to analyze large-scale data over large clusters. Compared with parallel relational database, it has the advantages of excellent scalability and good fault tolerance. However, its...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2012 International Conference on Cloud and Green Computing s. 73 - 80
Hlavní autoři: Changchun Zhang, Jing Li, Lei Wu, Meiyan Lin, Weiqing Liu
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.11.2012
Témata:
ISBN:1467330272, 9781467330275
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Data analyzing and processing are important tasks in cloud computing. The MapReduce framework has been increasingly used to analyze large-scale data over large clusters. Compared with parallel relational database, it has the advantages of excellent scalability and good fault tolerance. However, its performance is not as good as that of parallel relational database. How to efficiently implement join operation using MapReduce is an attractive point to which researchers have been paying attention. Multiway equi-joins and two-way theta-joins using MapReduce have been solved recently. In this paper, we introduce a communication cost model to evaluate multiway theta-joins for the first time and propose a randomized algorithm Strict-Even-Join to solve it. Our algorithm only requires cardinality of input datasets and guarantees the data is distributed across reducers when input datasets are skew. The results of three experiments we have conducted show that our approach is feasible.
ISBN:1467330272
9781467330275
DOI:10.1109/CGC.2012.9