S4: Distributed Stream Computing Platform

S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Keyed data events are routed with affinity to Processing Elements (PEs), which consume the events a...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2010 IEEE International Conference on Data Mining Workshops s. 170 - 177
Hlavní autoři: Neumeyer, L, Robbins, B, Nair, A, Kesari, A
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.12.2010
Témata:
ISBN:9781424492442, 1424492440
ISSN:2375-9232
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) emit one or more events which may be consumed by other PEs, (2) publish results. The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. In this paper, we outline the S4 architecture in detail, describe various applications, including real-life deployments. Our design is primarily driven by large scale applications for data mining and machine learning in a production environment. We show that the S4 design is surprisingly flexible and lends itself to run in large clusters built with commodity hardware.
ISBN:9781424492442
1424492440
ISSN:2375-9232
DOI:10.1109/ICDMW.2010.172