Scalable Big Spatial Data Processing with SQL Query Compilation and Distributed Morsel-driven Parallelism
The rapid rise in spatial data volumes from diverse sources necessitate efficient spatial data processing capability. Although most relational databases support spatial extensions of SQL query features, they offer limited scalability. Traditional relational database query processing follows a pull-b...
Uloženo v:
| Vydáno v: | IEEE International Conference on Big Data s. 302 - 311 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
15.12.2024
|
| Témata: | |
| ISSN: | 2573-2978 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | The rapid rise in spatial data volumes from diverse sources necessitate efficient spatial data processing capability. Although most relational databases support spatial extensions of SQL query features, they offer limited scalability. Traditional relational database query processing follows a pull-based (or tuple-at-a-time) model of query processing. This is not efficient for processing large volumes of data. A number of specialized spatial data processing systems were developed that extend cluster computing frameworks, such as Spark and Hadoop. However, these systems are characterized by limited or no support for spatial SQL query execution. The few systems that support SQL querying, suffer from the overheads of the pull-based model.We present a compilation-based distributed SQL query processing system. It follows a data-centric query compilation approach that takes a SQL query and generates distributed C++ (UPC++) based physical query plans. The generated code is compiled and executed on a distributed in-memory high performance framework based on the Partitioned Global Address Space (PGAS) paradigm. We also introduce morsel-driven parallelism for scalable spatial query execution in a distributed runtime. We conduct experimental evaluation of our system with two real-world datasets on a number of spatial query workloads. Experimental results demonstrate that our system performs significantly better than a leading spatial big data system Apache Sedona and distributed parallel relational database Citus. |
|---|---|
| ISSN: | 2573-2978 |
| DOI: | 10.1109/BigData62323.2024.10825523 |