NSYS2PRV: Detailed and Quantitative Analysis of Large-Scale GPU Execution Traces with Paraver
This work presents a tool, a methodology, a set of metrics, and practical examples for evaluating the performance of large-scale AI and traditional HPC applications using GPUs. NSYS2PRV is a tool that converts NVIDIA Nsight Systems reports into traces compatible with Paraver, enabling significantly...
Gespeichert in:
| Veröffentlicht in: | Proceedings / IEEE International Conference on Cluster Computing S. 1 - 12 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
02.09.2025
|
| Schlagworte: | |
| ISSN: | 2168-9253 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | This work presents a tool, a methodology, a set of metrics, and practical examples for evaluating the performance of large-scale AI and traditional HPC applications using GPUs. NSYS2PRV is a tool that converts NVIDIA Nsight Systems reports into traces compatible with Paraver, enabling significantly enhanced insight compared to current performance analysis practices. By leveraging the capabilities of a well-established HPC performance analysis tool, we enable the comparison of execution traces and the quantification of microscopic-level differences to explain behaviors across hundreds or more computing devices. We argue that large-scale GPU applications and AI workloads can greatly benefit from the type of large-scale performance analysis introduced here, an approach that is not yet widely adopted in this domain. Translating nsys-generated traces to Paraver allows analysts to combine the fine-grained, highly accurate execution data obtainable from proprietary tools with the flexibility and scalability of an open-source, parallel performance analysis environment. Paraver also enables easy, customizable computation of efficiency metrics. This work demonstrates a more effective and insightful analysis experience than that offered by the native visualization tools in Nsight Systems. Additionally, we introduce a set of Paravercompatible metrics that guide the analysis process, and we showcase examples where these metrics were successfully applied to real-world AI and HPC workloads. |
|---|---|
| ISSN: | 2168-9253 |
| DOI: | 10.1109/CLUSTER59342.2025.11186477 |