CHAUS: Scalable VM-Based Channels for Unbounded Streaming

Stream processing is a special form of the dataflow execution model that offers extensive opportunities for optimization and automatic parallelism. A streaming application is represented by a graph of computation stages that communicate with each other via FIFO channels. In shared-memory environment...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of computer science and technology Ročník 32; číslo 6; s. 1288 - 1304
Hlavní autoři: Zhang, Yu, Yu, Yu-Fen, Cao, Hui-Fang, Chen, Jian-Kang, Zhang, Qi-Liang
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Springer US 01.11.2017
Springer Nature B.V
School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
Témata:
ISSN:1000-9000, 1860-4749
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Stream processing is a special form of the dataflow execution model that offers extensive opportunities for optimization and automatic parallelism. A streaming application is represented by a graph of computation stages that communicate with each other via FIFO channels. In shared-memory environment, an FIFO channel is classically a com- mon, fixed-size synchronized buffer shared between the producer and the consumer. As the number of concurrent stage workers increases, the synchronization overheads, such as contention and waiting times, rise sharply and severely impair application performance. In this paper, we present a novel multithreaded model which isolates memory between threads by default and provides a higher level abstraction for scalable unicast or multicast communication between threads -- CHAUS (Channel for Unbounded Streaming). The CHAUS model hides the underlying synchronization details, but requires the user to declare producer-consumer relationship of a channel in advance. It is the duty of the runtime system to ensure reliable data transmission at data item granularity as declared. To achieve unbounded buffer for streaming and reduce the synchronization overheads, we propose a virtual memory based solution to implement a scalable CHAUS channel. We check the programmability of CHAUS by successfully porting dedup and ferret from PARSEC as well as implementing MapReduce library with Phoenix-like API. The experimental results show that workloads built with CHAUS run faster than those with Pthreads, and CHAUS has the best scalability compared with two Pthread versions. There are three workloads whose CHAUS versions only spend no more than 0.17x runtime of Pthreads on both 16 and 32 cores.
Bibliografie:streaming, thread model, pipeline parallelism, unbounded channel, virtual memory
11-2296/TP
Stream processing is a special form of the dataflow execution model that offers extensive opportunities for optimization and automatic parallelism. A streaming application is represented by a graph of computation stages that communicate with each other via FIFO channels. In shared-memory environment, an FIFO channel is classically a com- mon, fixed-size synchronized buffer shared between the producer and the consumer. As the number of concurrent stage workers increases, the synchronization overheads, such as contention and waiting times, rise sharply and severely impair application performance. In this paper, we present a novel multithreaded model which isolates memory between threads by default and provides a higher level abstraction for scalable unicast or multicast communication between threads -- CHAUS (Channel for Unbounded Streaming). The CHAUS model hides the underlying synchronization details, but requires the user to declare producer-consumer relationship of a channel in advance. It is the duty of the runtime system to ensure reliable data transmission at data item granularity as declared. To achieve unbounded buffer for streaming and reduce the synchronization overheads, we propose a virtual memory based solution to implement a scalable CHAUS channel. We check the programmability of CHAUS by successfully porting dedup and ferret from PARSEC as well as implementing MapReduce library with Phoenix-like API. The experimental results show that workloads built with CHAUS run faster than those with Pthreads, and CHAUS has the best scalability compared with two Pthread versions. There are three workloads whose CHAUS versions only spend no more than 0.17x runtime of Pthreads on both 16 and 32 cores.
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1000-9000
1860-4749
DOI:10.1007/s11390-017-1801-4