BlobSeer: Next-generation data management for large scale infrastructures

As data volumes increase at a high speed in more and more application fields of science, engineering, information services, etc., the challenges posed by data-intensive computing gain increasing importance. The emergence of highly scalable infrastructures, e.g. for cloud computing and for petascale...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of parallel and distributed computing Ročník 71; číslo 2; s. 169 - 184
Hlavní autoři:	Nicolae, Bogdan, Antoniu, Gabriel, Bougé, Luc, Moise, Diana, Carpen-Amarie, Alexandra
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Inc 01.02.2011 Elsevier
Témata:	Algorithms BlobSeer Computation Computer Science Concurrency Data intensive applications Data management Decentralized metadata management Distributed, Parallel, and Cluster Computing Gain High speed High throughput Infrastructure MapReduce Microorganisms Versioning High throughput Versioning Data management Concurrency Decentralized metadata management BlobSeer Data intensive applications MapReduce versioning data management high throughput decentralized metadata management concurrency data intensive applications
ISSN:	0743-7315, 1096-0848
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	As data volumes increase at a high speed in more and more application fields of science, engineering, information services, etc., the challenges posed by data-intensive computing gain increasing importance. The emergence of highly scalable infrastructures, e.g. for cloud computing and for petascale computing and beyond, introduces additional issues for which scalable data management becomes an immediate need. This paper makes several contributions. First, it proposes a set of principles for designing highly scalable distributed storage systems that are optimized for heavy data access concurrency. In particular, we highlight the potentially large benefits of using versioning in this context. Second, based on these principles, we propose a set of versioning algorithms, both for data and metadata, that enable a high throughput under concurrency. Finally, we implement and evaluate these algorithms in the BlobSeer prototype, that we integrate as a storage backend in the Hadoop MapReduce framework. We perform extensive microbenchmarks as well as experiments with real MapReduce applications: they demonstrate that applying the principles defended in our approach brings substantial benefits to data intensive applications.
Bibliografie:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	0743-7315 1096-0848
DOI:	10.1016/j.jpdc.2010.08.004