CARP: Range Query-Optimized Indexing for Streaming Data

Ingestion of data generated by high-performance scientific applications continues to stress available storage resources. Efficient range-based analyses on this data can be enabled by reordering it on attributes of interest, but require expensive post-processing sorts to realize the query benefits of...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:SC24: International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 19
Hlavní autoři: Jain, Ankush, Cranor, Charles D., Zheng, Qing, Settlemyer, Bradley W., Amvrosiadis, George, Grider, Gary A.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 17.11.2024
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Ingestion of data generated by high-performance scientific applications continues to stress available storage resources. Efficient range-based analyses on this data can be enabled by reordering it on attributes of interest, but require expensive post-processing sorts to realize the query benefits of reordering. In-situ indexing techniques, while write-efficient, are orders of magnitude slower at range queries than sorted indices. Range queries are necessary for analyzing continuous physical attributes and tracking phenomena such as energy bands and wave fronts. We present CARP, a scalable data partitioner for range queries that reorders data in-situ as it is streamed to storage during application I/O. Motivated by our findings that real application distributions tend to be highly skewed and dynamic, CARP dynamically discovers and adapts its data partitions to track these characteristics. As a result, CARP can approximate the query performance of a sort without any ingestion overhead, making it 5 \times faster than prior work.
DOI:10.1109/SC41406.2024.00093