Robust external hash aggregation in the solid state age

Saved in:
Bibliographic Details
Title: Robust external hash aggregation in the solid state age
Authors: Kuiper, L.N. (Laurens), Boncz, P.A. (Peter), Mühleisen, H.F. (Hannes)
Publication Year: 2024
Collection: CWI's Institutional Repository (Centrum voor Wiskunde en Informatica)
Subject Terms: Relational databases, Database query processing, Aggregation
Description: Analytical database systems offer high-performance in-memory aggregation. If there are many unique groups, temporary query intermediates may not fit RAM, requiring the use of external storage. However, switching from an in-memory to an external algorithm can degrade performance sharply. We revisit external hash aggregation on modern hardware, aiming instead for robust performance that avoids a 'performance cliff' when memory runs out. To achieve this, we introduce two techniques for handling temporary query intermediates. First, we propose unifying the memory management of temporary and persistent data. Second, we propose using a page layout that can be spilled to disk despite being optimized for main memory performance. These two techniques allow operator implementations to process larger-than-memory query intermediates with only minor modifications. We integrate these into DuckDB's parallel hash aggregation. Experimental results show that our implementation gracefully degrades performance as query intermediates exceed the available memory limit, while main memory performance is competitive with other analytical database systems.
Document Type: conference object
Language: English
Relation: https://ir.cwi.nl/pub/34360
DOI: 10.1109/ICDE60146.2024.00288
Availability: https://ir.cwi.nl/pub/34360
https://doi.org/10.1109/ICDE60146.2024.00288
Accession Number: edsbas.AFB0FD60
Database: BASE
Description
Abstract:Analytical database systems offer high-performance in-memory aggregation. If there are many unique groups, temporary query intermediates may not fit RAM, requiring the use of external storage. However, switching from an in-memory to an external algorithm can degrade performance sharply. We revisit external hash aggregation on modern hardware, aiming instead for robust performance that avoids a 'performance cliff' when memory runs out. To achieve this, we introduce two techniques for handling temporary query intermediates. First, we propose unifying the memory management of temporary and persistent data. Second, we propose using a page layout that can be spilled to disk despite being optimized for main memory performance. These two techniques allow operator implementations to process larger-than-memory query intermediates with only minor modifications. We integrate these into DuckDB's parallel hash aggregation. Experimental results show that our implementation gracefully degrades performance as query intermediates exceed the available memory limit, while main memory performance is competitive with other analytical database systems.
DOI:10.1109/ICDE60146.2024.00288