Multiclass classification of distributed memory parallel computations

► We classify unknown distributed memory computations using communi-cation patterns. ► We apply self organizing maps to aid model class selection. ► We use sampling to equalize class distributions over 100GB of data. ► Classifiers achieved 90% F1 scores over 29 classes. ► Our work improves upon prev...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Pattern recognition letters Ročník 34; číslo 3; s. 322 - 329
Hlavní autoři:	Whalen, Sean, Peisert, Sean, Bishop, Matt
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 01.02.2013
Témata:	Bayesian networks Communication patterns High performance computing Multiclass classification Random forests Self-organizing maps Communication patterns Bayesian networks High performance computing Multiclass classification Self-organizing maps Random forests
ISSN:	0167-8655, 1872-7344
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	► We classify unknown distributed memory computations using communi-cation patterns. ► We apply self organizing maps to aid model class selection. ► We use sampling to equalize class distributions over 100GB of data. ► Classifiers achieved 90% F1 scores over 29 classes. ► Our work improves upon previous approaches and has a variety of applications. High Performance Computing (HPC) is a field concerned with solving large-scale problems in science and engineering. However, the computational infrastructure of HPC systems can also be misused as demonstrated by the recent commoditization of cloud computing resources on the black market. As a first step towards addressing this, we introduce a machine learning approach for classifying distributed parallel computations based on communication patterns between compute nodes. We first provide relevant background on message passing and computational equivalence classes called dwarfs and describe our exploratory data analysis using self organizing maps. We then present our classification results across 29 scientific codes using Bayesian networks and compare their performance against Random Forest classifiers. These models, trained with hundreds of gigabytes of communication logs collected at Lawrence Berkeley National Laboratory, perform well without any a priori information and address several shortcomings of previous approaches.
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2012.10.007