SpatialSSJP: QoS-Aware Adaptive Approximate Stream-Static Spatial Join Processor
The widespread adoption of Internet of Things (IoT) motivated the emergence of mixed workloads in smart cities, where fast arriving geo-referenced big data streams are joined with archive tables, aiming at enriching streams with descriptive attributes that enable insightful analytics. Applications a...
Saved in:
| Published in: | IEEE transactions on parallel and distributed systems Vol. 35; no. 1; pp. 73 - 88 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.01.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 1045-9219, 1558-2183 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The widespread adoption of Internet of Things (IoT) motivated the emergence of mixed workloads in smart cities, where fast arriving geo-referenced big data streams are joined with archive tables, aiming at enriching streams with descriptive attributes that enable insightful analytics. Applications are now relying on finding, in real-time, to which geographical regions data streaming tuples belong. This problem requires a computationally intensive stream-static join for joining a dynamic stream with a disk-resident static table. In addition, the time-varying nature of fluctuation in geospatial data arriving online calls for an approximate solution that can trade-off QoS constraints while ensuring that the system survives sudden spikes in data loads. In this paper, we present SpatialSSJP, an adaptive spatial-aware approximate query processing system that specifically focuses on stream-static joins in a way that guarantees achieving an agreed set of Quality-of-Service goals and maintains geo-statistics of stateful online aggregations over stream-static join results. SpatialSSJP employs a state-of-art stratified-like sampling design to select well-balanced representative geospatial data stream samples and serve them to a stream-static geospatial join operator downstream. We implemented a prototype atop Spark Structured Streaming. Our extensive evaluations on big real datasets show that our system can survive and mitigate harsh join workloads and outperform state-of-art baselines by significant magnitudes, without risking rigorous error bounds in terms of the accuracy of the output results. SpatialSSJP achieves a relative accuracy gain against plain Spark joins of approximately 10% in worst cases but reaching up to 50% in best case scenarios. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1045-9219 1558-2183 |
| DOI: | 10.1109/TPDS.2023.3330669 |