Scalable Big Spatial Data Processing with SQL Query Compilation and Distributed Morsel-driven Parallelism

The rapid rise in spatial data volumes from diverse sources necessitate efficient spatial data processing capability. Although most relational databases support spatial extensions of SQL query features, they offer limited scalability. Traditional relational database query processing follows a pull-b...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE International Conference on Big Data s. 302 - 311
Hlavní autoři:	Sahni, Rahul, Zhang, Xiaozheng, Chatterjee, Sudip, Ray, Suprio
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 15.12.2024
Témata:	Big Data Computational modeling Data processing Parallel processing Query processing Relational databases Runtime Scalability Sparks Spatial databases
ISSN:	2573-2978
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	The rapid rise in spatial data volumes from diverse sources necessitate efficient spatial data processing capability. Although most relational databases support spatial extensions of SQL query features, they offer limited scalability. Traditional relational database query processing follows a pull-based (or tuple-at-a-time) model of query processing. This is not efficient for processing large volumes of data. A number of specialized spatial data processing systems were developed that extend cluster computing frameworks, such as Spark and Hadoop. However, these systems are characterized by limited or no support for spatial SQL query execution. The few systems that support SQL querying, suffer from the overheads of the pull-based model.We present a compilation-based distributed SQL query processing system. It follows a data-centric query compilation approach that takes a SQL query and generates distributed C++ (UPC++) based physical query plans. The generated code is compiled and executed on a distributed in-memory high performance framework based on the Partitioned Global Address Space (PGAS) paradigm. We also introduce morsel-driven parallelism for scalable spatial query execution in a distributed runtime. We conduct experimental evaluation of our system with two real-world datasets on a number of spatial query workloads. Experimental results demonstrate that our system performs significantly better than a leading spatial big data system Apache Sedona and distributed parallel relational database Citus.
ISSN:	2573-2978
DOI:	10.1109/BigData62323.2024.10825523