Resource-efficient processing of large data volumes

Saved in:
Bibliographic Details
Title: Resource-efficient processing of large data volumes
Authors: Noll, Stefan
Contributors: Teubner, Jens, Giceva, Jana
Publisher Information: Technische Universität Dortmund, 2021.
Publication Year: 2021
Subject Terms: Bulk loading, Cache-Speicher, 12. Responsible consumption, Resource efficiency, Memory tracing, 8. Economic growth, CPU cache partitioning, Ressourceneffizienz, Ablaufverfolgung, Datenbanksystem, Main-memory database systems
Description: The complex system environment of data processing applications makes it very challenging to achieve high resource efficiency. In this thesis, we develop solutions that improve resource efficiency at multiple system levels by focusing on three scenarios that are relevant—but not limited—to database management systems. First, we address the challenge of understanding complex systems by analyzing memory access characteristics via efficient memory tracing. Second, we leverage information about memory access characteristics to optimize the cache usage of algorithms and to avoid cache pollution by applying hardware-based cache partitioning. Third, after optimizing resource usage within a multicore processor, we optimize resource usage across multiple computer systems by addressing the problem of resource contention for bulk loading, i.e., ingesting large volumes of data into the system. We develop a distributed bulk loading mechanism, which utilizes network bandwidth and compute power more efficiently and improves both bulk loading throughput and query processing performance.
Document Type: Doctoral thesis
File Description: application/pdf
Language: English
DOI: 10.17877/de290r-21938
Access URL: http://hdl.handle.net/2003/40058
Accession Number: edsair.doi.dedup.....79f5f77658b90721f91d743b917af4d8
Database: OpenAIRE
Description
Abstract:The complex system environment of data processing applications makes it very challenging to achieve high resource efficiency. In this thesis, we develop solutions that improve resource efficiency at multiple system levels by focusing on three scenarios that are relevant—but not limited—to database management systems. First, we address the challenge of understanding complex systems by analyzing memory access characteristics via efficient memory tracing. Second, we leverage information about memory access characteristics to optimize the cache usage of algorithms and to avoid cache pollution by applying hardware-based cache partitioning. Third, after optimizing resource usage within a multicore processor, we optimize resource usage across multiple computer systems by addressing the problem of resource contention for bulk loading, i.e., ingesting large volumes of data into the system. We develop a distributed bulk loading mechanism, which utilizes network bandwidth and compute power more efficiently and improves both bulk loading throughput and query processing performance.
DOI:10.17877/de290r-21938