Self-Adaptive Micro-Batching for Low-Latency GPU-Accelerated Stream Processing
Stream processing is a computing paradigm enabling the continuous processing of unbounded data streams. Some classes of stream processing applications can greatly benefit from the parallel processing power and affordability offered by GPUs. However, efficient GPU utilization with stream processing a...
Saved in:
| Published in: | International journal of parallel programming Vol. 53; no. 2; p. 14 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
Springer US
01.04.2025
Springer Nature B.V |
| Subjects: | |
| ISSN: | 0885-7458, 1573-7640 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Stream processing is a computing paradigm enabling the continuous processing of unbounded data streams. Some classes of stream processing applications can greatly benefit from the parallel processing power and affordability offered by GPUs. However, efficient GPU utilization with stream processing applications often requires micro-batching techniques, i.e., the continuous processing of data batches to expose data parallelism opportunities and amortize host-device data transfer overheads. Micro-batching further introduces the challenge of finding suitable micro-batch sizes to maintain low-latency processing under highly dynamic workloads. The research field of self-adaptive software provides different techniques to address such a challenge. Our goal is to assess the performance of six self-adaptive algorithms in meeting latency requirements through micro-batch size adaptation. The algorithms are applied to a GPU-accelerated stream processing benchmark with a highly dynamic workload. Four of the six algorithms have already been evaluated using a smaller workload with the same application. We propose two new algorithms to address the shortcomings detected in the former four. The results demonstrate that a highly dynamic workload is challenging for the evaluated algorithms, as they could not meet the most strict latency requirements for more than 38.5% of the stream data items. Overall, all algorithms performed similarly in meeting the latency requirements. However, one of our proposed algorithms met the requirements for 4% more data items than the best of the previously studied algorithms, demonstrating more effectiveness in highly variable workloads. This effectiveness is particularly evident in segments of the workload with abrupt transitions between low- and high-latency regions, where our proposed algorithms met the requirements for 79% of the data items in those segments, compared to 33% for the best of the earlier algorithms. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0885-7458 1573-7640 |
| DOI: | 10.1007/s10766-025-00793-4 |