The lightweight distributed metric service a scalable infrastructure for continuous monitoring of large scale computing systems and applications

Understanding how resources of High Performance Compute platforms are utilized by applications both individually and as a composite is key to application and platform performance. Typical system monitoring tools do not provide sufficient fidelity while application profiling tools do not capture the...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis s. 154 - 165
Hlavní autoři: Agelastos, Anthony, Allan, Benjamin, Brandt, Jim, Cassella, Paul, Enos, Jeremy, Fullop, Joshi, Gentile, Ann, Monk, Steve, Naksinehaboon, Nichamon, Ogden, Jeff, Rajan, Mahesh, Showerman, Michael, Stevenson, Joel, Taerat, Narate, Tucker, Tom
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: Piscataway, NJ, USA IEEE Press 16.11.2014
IEEE
Edice:ACM Conferences
Témata:
ISBN:1479955000, 9781479955008
ISSN:2167-4329
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Understanding how resources of High Performance Compute platforms are utilized by applications both individually and as a composite is key to application and platform performance. Typical system monitoring tools do not provide sufficient fidelity while application profiling tools do not capture the complex interplay between applications competing for shared resources. To gain new insights, monitoring tools must run continuously, system wide, at frequencies appropriate to the metrics of interest while having minimal impact on application performance. We introduce the Lightweight Distributed Metric Service for scalable, lightweight monitoring of large scale computing systems and applications. We describe issues and constraints guiding deployment in Sandia National Laboratories' capacity computing environment and on the National Center for Supercomputing Applications' Blue Waters platform including motivations, metrics of choice, and requirements relating to the scale and specialized nature of Blue Waters. We address monitoring overhead and impact on application performance and provide illustrative profiling results.
ISBN:1479955000
9781479955008
ISSN:2167-4329
DOI:10.1109/SC.2014.18