Self-Adaptive Micro-Batching for Low-Latency GPU-Accelerated Stream Processing

Stream processing is a computing paradigm enabling the continuous processing of unbounded data streams. Some classes of stream processing applications can greatly benefit from the parallel processing power and affordability offered by GPUs. However, efficient GPU utilization with stream processing a...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of parallel programming Vol. 53; no. 2; p. 14
Main Authors:	Leonarczyk, Ricardo, Mencagli, Gabriele, Griebler, Dalvan
Format:	Journal Article
Language:	English
Published:	New York Springer US 01.04.2025 Springer Nature B.V
Subjects:	Adaptive algorithms Algorithms Computer Science Data transfer (computers) Data transmission Effectiveness Graphics processing units Parallel processing Processor Architectures Segments Self adaptive control systems Software Engineering/Programming and Operating Systems Theory of Computation Workload Workloads Parallel programming GPU programming Self-adaptive algorithms Heterogeneous architectures
ISSN:	0885-7458, 1573-7640
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Stream processing is a computing paradigm enabling the continuous processing of unbounded data streams. Some classes of stream processing applications can greatly benefit from the parallel processing power and affordability offered by GPUs. However, efficient GPU utilization with stream processing applications often requires micro-batching techniques, i.e., the continuous processing of data batches to expose data parallelism opportunities and amortize host-device data transfer overheads. Micro-batching further introduces the challenge of finding suitable micro-batch sizes to maintain low-latency processing under highly dynamic workloads. The research field of self-adaptive software provides different techniques to address such a challenge. Our goal is to assess the performance of six self-adaptive algorithms in meeting latency requirements through micro-batch size adaptation. The algorithms are applied to a GPU-accelerated stream processing benchmark with a highly dynamic workload. Four of the six algorithms have already been evaluated using a smaller workload with the same application. We propose two new algorithms to address the shortcomings detected in the former four. The results demonstrate that a highly dynamic workload is challenging for the evaluated algorithms, as they could not meet the most strict latency requirements for more than 38.5% of the stream data items. Overall, all algorithms performed similarly in meeting the latency requirements. However, one of our proposed algorithms met the requirements for 4% more data items than the best of the previously studied algorithms, demonstrating more effectiveness in highly variable workloads. This effectiveness is particularly evident in segments of the workload with abrupt transitions between low- and high-latency regions, where our proposed algorithms met the requirements for 79% of the data items in those segments, compared to 33% for the best of the earlier algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0885-7458 1573-7640
DOI:	10.1007/s10766-025-00793-4