LRP: learned robust data partitioning for efficient processing of large dynamic queries

The interconnection between query processing and data partitioning is pivotal for the acceleration of massive data processing during query execution, primarily by minimizing the number of scanned block files. Existing partitioning techniques predominantly focus on query accesses on numeric columns f...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Frontiers of Computer Science Ročník 19; číslo 9; s. 199607
Hlavní autoři: LIU, Pengju, CAI, Pan, ZHONG, Kai, LI, Cuiping, CHEN, Hong
Médium: Journal Article
Jazyk:angličtina
Vydáno: Beijing Higher Education Press 01.09.2025
Springer Nature B.V
Témata:
ISSN:2095-2228, 2095-2236
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The interconnection between query processing and data partitioning is pivotal for the acceleration of massive data processing during query execution, primarily by minimizing the number of scanned block files. Existing partitioning techniques predominantly focus on query accesses on numeric columns for constructing partitions, often overlooking non-numeric columns and thus limiting optimization potential. Additionally, these techniques, despite creating fine-grained partitions from representative queries to enhance system performance, experience from notable performance declines due to unpredictable fluctuations in future queries. To tackle these issues, we introduce LRP, a learned robust partitioning system for dynamic query processing. LRP first proposes a method for data and query encoding that captures comprehensive column access patterns from historical queries. It then employs Multi-Layer Perceptron and Long Short-Term Memory networks to predict shifts in the distribution of historical queries. To create high-quality, robust partitions based on these predictions, LRP adopts a greedy beam search algorithm for optimal partition division and implements a data redundancy mechanism to share frequently accessed data across partitions. Experimental evaluations reveal that LRP yields partitions with more stable performance under incoming queries and significantly surpasses state-of-the-art partitioning methods.
Bibliografie:data partitioning
data encoding
data redundancy
Document received on :2024-05-21
Document accepted on :2024-09-08
query prediction
beam search
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2095-2228
2095-2236
DOI:10.1007/s11704-024-40509-4