Overcoming Data Locality: an In-Memory Runtime File System with Symmetrical Data Distribution

Uloženo v:
Podrobná bibliografie
Název: Overcoming Data Locality: an In-Memory Runtime File System with Symmetrical Data Distribution
Autoři: Ru Utaa, Andreea S, Thilo Kielmanna
Přispěvatelé: The Pennsylvania State University CiteSeerX Archives
Zdroj: http://www.cs.vu.nl/%7Ekielmann/papers/fgcs2015.pdf.
Sbírka: CiteSeerX
Témata: many-task computing, in-memory file system, distributed hashing, scalability
Popis: In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communicate via intermediate files; application performance strongly depends on the file sys-tem in use. The state of the art uses runtime systems providing in-memory file storage that is designed for data locality: files are placed on those nodes that write or read them. With data locality, however, task distribution conflicts with data distribution, leading to application slow-down, and worse, to prohibitive storage imbalance. To overcome these limitations, we present MemFS, a fully symmetrical, in-memory runtime file system that stripes files across all com-pute nodes, based on a distributed hash function. Our cluster experiments with Montage and BLAST workflows, using up to 512 cores, show that MemFS has both better performance and better scalability than the state-of-the-art, locality-based file system, AMFS. Furthermore, our evaluation on a public commercial cloud validates our cluster results. On this platform MemFS shows excellent scalability up to 1024 cores and is able to saturate the 10G Ethernet bandwidth when running BLAST and Montage.
Druh dokumentu: text
Popis souboru: application/pdf
Jazyk: English
Relation: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.696.3603; http://www.cs.vu.nl/%7Ekielmann/papers/fgcs2015.pdf
Dostupnost: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.696.3603
http://www.cs.vu.nl/%7Ekielmann/papers/fgcs2015.pdf
Rights: Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Přístupové číslo: edsbas.A5DBC549
Databáze: BASE
Popis
Abstrakt:In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communicate via intermediate files; application performance strongly depends on the file sys-tem in use. The state of the art uses runtime systems providing in-memory file storage that is designed for data locality: files are placed on those nodes that write or read them. With data locality, however, task distribution conflicts with data distribution, leading to application slow-down, and worse, to prohibitive storage imbalance. To overcome these limitations, we present MemFS, a fully symmetrical, in-memory runtime file system that stripes files across all com-pute nodes, based on a distributed hash function. Our cluster experiments with Montage and BLAST workflows, using up to 512 cores, show that MemFS has both better performance and better scalability than the state-of-the-art, locality-based file system, AMFS. Furthermore, our evaluation on a public commercial cloud validates our cluster results. On this platform MemFS shows excellent scalability up to 1024 cores and is able to saturate the 10G Ethernet bandwidth when running BLAST and Montage.