SEJ: An Even Approach to Multiway Theta-Joins Using MapReduce
Data analyzing and processing are important tasks in cloud computing. The MapReduce framework has been increasingly used to analyze large-scale data over large clusters. Compared with parallel relational database, it has the advantages of excellent scalability and good fault tolerance. However, its...
Uloženo v:
| Vydáno v: | 2012 International Conference on Cloud and Green Computing s. 73 - 80 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
01.11.2012
|
| Témata: | |
| ISBN: | 1467330272, 9781467330275 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Data analyzing and processing are important tasks in cloud computing. The MapReduce framework has been increasingly used to analyze large-scale data over large clusters. Compared with parallel relational database, it has the advantages of excellent scalability and good fault tolerance. However, its performance is not as good as that of parallel relational database. How to efficiently implement join operation using MapReduce is an attractive point to which researchers have been paying attention. Multiway equi-joins and two-way theta-joins using MapReduce have been solved recently. In this paper, we introduce a communication cost model to evaluate multiway theta-joins for the first time and propose a randomized algorithm Strict-Even-Join to solve it. Our algorithm only requires cardinality of input datasets and guarantees the data is distributed across reducers when input datasets are skew. The results of three experiments we have conducted show that our approach is feasible. |
|---|---|
| ISBN: | 1467330272 9781467330275 |
| DOI: | 10.1109/CGC.2012.9 |

