Why do users need to take care of their HPC applications efficiency?

Gespeichert in:
Bibliographische Detailangaben
Titel: Why do users need to take care of their HPC applications efficiency?
Autoren: Nikitenko, Dmitry, Shvets, Pavel, Voevodin, Vadim
Weitere Verfasser: Russian Science Foundation (agreement No. 17-71-20114), RFBR (grant No. 20-07-00864)
Quelle: Lobachevskii Journal of Mathematics; Том 41, № 8 (2020): Special issue “Supercomputing Applications, Algorithms and Software Tools”. ; 1818-9962 ; 1995-0802
Verlagsinformationen: Pleiades Publishing, LTD
Publikationsjahr: 2020
Bestand: Kazan Federal University Science Tatarstan / Каза́нский федера́льный университе́т Science Tatarstan (E-Journal)
Schlagwörter: high-performance computing, supercomputer, application efficiency, performance analysis, performance statistics, system software, parallel program, 68M20, 68M99
Beschreibung: High-performance computing takes a very important place in modern scientific research process. And since all scientists want to solve their problems faster, it is very important to speed up these computations. For these purposes, new algorithms are being developed, new HPC systems appear, etc. However, quite little attention is paid to the efficiency of high-performance computations, which often leads to a vast amount of supercomputer resources being idle. It is vital to change this situation; in particular, it is necessary to show users the importance and necessity of optimizing their applications. One of the main steps in this direction is to help users detect performance issues in their programs, analyze their level of criticality as well as root causes, and eliminate them in order to improve application performance. In this article we describe the research being performed at the Lomonosov Moscow State University aimed at solving this problem. In particular, we analyze the results of supercomputer center users survey, showing their opinion on the efficiency analysis. We also share our vision on the HPC center workflow requirements to support system and applications efficiency analysis. After that, we describe a software tool being developed that allows any supercomputer user to obtain and analyze versatile statistics on performance of his HPC jobs, helping him to detect possible root causes of performance degradation.
Publikationsart: article in journal/newspaper
Sprache: English
Relation: http://ojs.kpfu.ru/index.php/ljm/article/downloadSuppFile/1396/992; http://ojs.kpfu.ru/index.php/ljm/article/downloadSuppFile/1396/993; http://ojs.kpfu.ru/index.php/ljm/article/view/1396
Verfügbarkeit: http://ojs.kpfu.ru/index.php/ljm/article/view/1396
Rights: (c) 2020 Lobachevskii Journal of Mathematics
Dokumentencode: edsbas.87B52814
Datenbank: BASE
Beschreibung
Abstract:High-performance computing takes a very important place in modern scientific research process. And since all scientists want to solve their problems faster, it is very important to speed up these computations. For these purposes, new algorithms are being developed, new HPC systems appear, etc. However, quite little attention is paid to the efficiency of high-performance computations, which often leads to a vast amount of supercomputer resources being idle. It is vital to change this situation; in particular, it is necessary to show users the importance and necessity of optimizing their applications. One of the main steps in this direction is to help users detect performance issues in their programs, analyze their level of criticality as well as root causes, and eliminate them in order to improve application performance. In this article we describe the research being performed at the Lomonosov Moscow State University aimed at solving this problem. In particular, we analyze the results of supercomputer center users survey, showing their opinion on the efficiency analysis. We also share our vision on the HPC center workflow requirements to support system and applications efficiency analysis. After that, we describe a software tool being developed that allows any supercomputer user to obtain and analyze versatile statistics on performance of his HPC jobs, helping him to detect possible root causes of performance degradation.