Lessons learned from a year’s worth of benchmarks of large data clouds
Uloženo v:
| Název: | Lessons learned from a year’s worth of benchmarks of large data clouds |
|---|---|
| Autoři: | Yunhong Gu, Robert L Grossman |
| Přispěvatelé: | The Pennsylvania State University CiteSeerX Archives |
| Zdroj: | http://pubs.rgrossman.com/dl/proc-117.pdf. |
| Rok vydání: | 2009 |
| Sbírka: | CiteSeerX |
| Témata: | Performance, Experimentation Keywords Cloud Computing, Data Intensive Computing, High Performance Computing, Grid Computing, MapReduce, Multi-Task Computing |
| Popis: | In this paper, we discuss some of the lessons that we have learned working with the Hadoop and Sector/Sphere systems. Both of these systems are cloud-based systems designed to support data intensive computing. Both include distributed file systems and closely coupled systems for processing data in parallel. Hadoop uses MapReduce, while Sphere supports the ability to execute an arbitrary user defined function over the data managed by Sector. We compare and contrast these systems and discuss some of the design trade-offs necessary in data intensive computing. In our experimental studies over the past year, Sector/Sphere has consistently performed about 2 – 4 times faster than Hadoop. We discuss some of the reasons that might be responsible for this difference in performance. |
| Druh dokumentu: | text |
| Popis souboru: | application/pdf |
| Jazyk: | English |
| Relation: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.617.9116 |
| Dostupnost: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.617.9116 http://pubs.rgrossman.com/dl/proc-117.pdf |
| Rights: | Metadata may be used without restrictions as long as the oai identifier remains attached to it. |
| Přístupové číslo: | edsbas.5EEB6BA4 |
| Databáze: | BASE |
| Abstrakt: | In this paper, we discuss some of the lessons that we have learned working with the Hadoop and Sector/Sphere systems. Both of these systems are cloud-based systems designed to support data intensive computing. Both include distributed file systems and closely coupled systems for processing data in parallel. Hadoop uses MapReduce, while Sphere supports the ability to execute an arbitrary user defined function over the data managed by Sector. We compare and contrast these systems and discuss some of the design trade-offs necessary in data intensive computing. In our experimental studies over the past year, Sector/Sphere has consistently performed about 2 – 4 times faster than Hadoop. We discuss some of the reasons that might be responsible for this difference in performance. |
|---|
Nájsť tento článok vo Web of Science