LocationSpark: In-memory Distributed Spatial Query Processing and Optimization

Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory...

Full description

Saved in:

Bibliographic Details
Published in:	Frontiers in big data Vol. 3; p. 30
Main Authors:	Tang, Mingjie, Yu, Yongyang, Mahmood, Ahmed R., Malluhi, Qutaibah M., Ouzzani, Mourad, Aref, Walid G.
Format:	Journal Article
Language:	English
Published:	Switzerland Frontiers Media S.A 16.10.2020
Subjects:	Big Data in-memory computation parallel computing query optimization query processing spatial data query processing spatial data in-memory computation query optimization parallel computing
ISSN:	2624-909X, 2624-909X
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory and distributed setup to address scalability. More specifically, we introduce new techniques for handling query skew that commonly happens in practice, and minimizes communication costs accordingly. We propose a distributed query scheduler that uses a new cost model to minimize the cost of spatial query processing. The scheduler generates query execution plans that minimize the effect of query skew. The query scheduler utilizes new spatial indexing techniques based on bitmap filters to forward queries to the appropriate local nodes. Each local computation node is responsible for optimizing and selecting its best local query execution plan based on the indexes and the nature of the spatial queries in that node. All the proposed spatial query processing and optimization techniques are prototyped inside Spark, a distributed memory-based computation system. Our prototype system is termed LocationSpark. The experimental study is based on real datasets and demonstrates that LocationSpark can enhance distributed spatial query processing by up to an order of magnitude over existing in-memory and distributed spatial systems.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Reviewed by: Suprio Ray, University of New Brunswick Fredericton, Canada; Amr Magdy, University of California, Riverside, United States; Yun Li, George Mason University, United States This article was submitted to Data Mining and Management, a section of the journal Frontiers in Big Data Edited by: Andreas Zuefle, George Mason University, United States
ISSN:	2624-909X 2624-909X
DOI:	10.3389/fdata.2020.00030