Using Performance Measurements to Improve MapReduce Algorithms

The Hadoop MapReduce software environment is used for parallel processing of distributively stored data. Data mining algorithms of increasing sophistication are being implemented in MapReduce, bringing new challenges for performance measurement and tuning. We focus on analyzing a job after completio...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Procedia computer science Ročník 9; s. 1920 - 1929
Hlavní autori: Plantenga, Todd D., Choe, Yung Ryn, Yoshimura, Ann
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 2012
Predmet:
ISSN:1877-0509, 1877-0509
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The Hadoop MapReduce software environment is used for parallel processing of distributively stored data. Data mining algorithms of increasing sophistication are being implemented in MapReduce, bringing new challenges for performance measurement and tuning. We focus on analyzing a job after completion, utilizing information collected from Hadoop logs and machine metrics. Our analysis, inspired by [1][2], goes beyond conventional Hadoop Job-Tracker analysis by integrating more data and providing web browser visualization tools. This paper describes examples where measurements helped diagnose subtle issues and improve algorithm performance. Examples demonstrate the value of correlating detailed information that is not usually examined in standard Hadoop performance displays.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2012.04.210