Robust external hash aggregation in the solid state age

Uloženo v:
Podrobná bibliografie
Název: Robust external hash aggregation in the solid state age
Autoři: Kuiper, L.N. (Laurens), Boncz, P.A. (Peter), Mühleisen, H.F. (Hannes)
Rok vydání: 2024
Sbírka: CWI's Institutional Repository (Centrum voor Wiskunde en Informatica)
Témata: Relational databases, Database query processing, Aggregation
Popis: Analytical database systems offer high-performance in-memory aggregation. If there are many unique groups, temporary query intermediates may not fit RAM, requiring the use of external storage. However, switching from an in-memory to an external algorithm can degrade performance sharply. We revisit external hash aggregation on modern hardware, aiming instead for robust performance that avoids a 'performance cliff' when memory runs out. To achieve this, we introduce two techniques for handling temporary query intermediates. First, we propose unifying the memory management of temporary and persistent data. Second, we propose using a page layout that can be spilled to disk despite being optimized for main memory performance. These two techniques allow operator implementations to process larger-than-memory query intermediates with only minor modifications. We integrate these into DuckDB's parallel hash aggregation. Experimental results show that our implementation gracefully degrades performance as query intermediates exceed the available memory limit, while main memory performance is competitive with other analytical database systems.
Druh dokumentu: conference object
Jazyk: English
Relation: https://ir.cwi.nl/pub/34360
DOI: 10.1109/ICDE60146.2024.00288
Dostupnost: https://ir.cwi.nl/pub/34360
https://doi.org/10.1109/ICDE60146.2024.00288
Přístupové číslo: edsbas.AFB0FD60
Databáze: BASE
Popis
Abstrakt:Analytical database systems offer high-performance in-memory aggregation. If there are many unique groups, temporary query intermediates may not fit RAM, requiring the use of external storage. However, switching from an in-memory to an external algorithm can degrade performance sharply. We revisit external hash aggregation on modern hardware, aiming instead for robust performance that avoids a 'performance cliff' when memory runs out. To achieve this, we introduce two techniques for handling temporary query intermediates. First, we propose unifying the memory management of temporary and persistent data. Second, we propose using a page layout that can be spilled to disk despite being optimized for main memory performance. These two techniques allow operator implementations to process larger-than-memory query intermediates with only minor modifications. We integrate these into DuckDB's parallel hash aggregation. Experimental results show that our implementation gracefully degrades performance as query intermediates exceed the available memory limit, while main memory performance is competitive with other analytical database systems.
DOI:10.1109/ICDE60146.2024.00288