NSYS2PRV: Detailed and Quantitative Analysis of Large-Scale GPU Execution Traces with Paraver
This work presents a tool, a methodology, a set of metrics, and practical examples for evaluating the performance of large-scale AI and traditional HPC applications using GPUs. NSYS2PRV is a tool that converts NVIDIA Nsight Systems reports into traces compatible with Paraver, enabling significantly...
Uloženo v:
| Vydáno v: | Proceedings / IEEE International Conference on Cluster Computing s. 1 - 12 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
02.09.2025
|
| Témata: | |
| ISSN: | 2168-9253 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | This work presents a tool, a methodology, a set of metrics, and practical examples for evaluating the performance of large-scale AI and traditional HPC applications using GPUs. NSYS2PRV is a tool that converts NVIDIA Nsight Systems reports into traces compatible with Paraver, enabling significantly enhanced insight compared to current performance analysis practices. By leveraging the capabilities of a well-established HPC performance analysis tool, we enable the comparison of execution traces and the quantification of microscopic-level differences to explain behaviors across hundreds or more computing devices. We argue that large-scale GPU applications and AI workloads can greatly benefit from the type of large-scale performance analysis introduced here, an approach that is not yet widely adopted in this domain. Translating nsys-generated traces to Paraver allows analysts to combine the fine-grained, highly accurate execution data obtainable from proprietary tools with the flexibility and scalability of an open-source, parallel performance analysis environment. Paraver also enables easy, customizable computation of efficiency metrics. This work demonstrates a more effective and insightful analysis experience than that offered by the native visualization tools in Nsight Systems. Additionally, we introduce a set of Paravercompatible metrics that guide the analysis process, and we showcase examples where these metrics were successfully applied to real-world AI and HPC workloads. |
|---|---|
| AbstractList | This work presents a tool, a methodology, a set of metrics, and practical examples for evaluating the performance of large-scale AI and traditional HPC applications using GPUs. NSYS2PRV is a tool that converts NVIDIA Nsight Systems reports into traces compatible with Paraver, enabling significantly enhanced insight compared to current performance analysis practices. By leveraging the capabilities of a well-established HPC performance analysis tool, we enable the comparison of execution traces and the quantification of microscopic-level differences to explain behaviors across hundreds or more computing devices. We argue that large-scale GPU applications and AI workloads can greatly benefit from the type of large-scale performance analysis introduced here, an approach that is not yet widely adopted in this domain. Translating nsys-generated traces to Paraver allows analysts to combine the fine-grained, highly accurate execution data obtainable from proprietary tools with the flexibility and scalability of an open-source, parallel performance analysis environment. Paraver also enables easy, customizable computation of efficiency metrics. This work demonstrates a more effective and insightful analysis experience than that offered by the native visualization tools in Nsight Systems. Additionally, we introduce a set of Paravercompatible metrics that guide the analysis process, and we showcase examples where these metrics were successfully applied to real-world AI and HPC workloads. |
| Author | Labarta, Jesus Clasca, Marc Garcia-Gasulla, Marta |
| Author_xml | – sequence: 1 givenname: Marc surname: Clasca fullname: Clasca, Marc email: marc.clasca@bsc.es organization: Barcelona Supercomputing Center,Barcelona,Spain – sequence: 2 givenname: Jesus surname: Labarta fullname: Labarta, Jesus email: jesus.labarta@bsc.es organization: Barcelona Supercomputing Center,Barcelona,Spain – sequence: 3 givenname: Marta surname: Garcia-Gasulla fullname: Garcia-Gasulla, Marta email: marta.garcia@bsc.es organization: Barcelona Supercomputing Center,Barcelona,Spain |
| BookMark | eNo1kEFLwzAYQKMouM39Aw_Be2fSL00ab2PWKRSdayd4kPFt_aKR2kmbTffvFdTTg3d4h9dnR82mIcbOpRhJKezFJF8UZTZPLKh4FIs4-dEy1cqYAza0xqYAMgEhbXrIerHUaWTjBE5Yv-vehAADQvfY813xVMSz-eMlv6KAvqaKY1Pxhy02wQcMfkd83GC973zHN47n2L5QVKyxJj6dLXj2Rett8JuGly2uqeOfPrzyGba4o_aUHTusOxr-ccDK66yc3ET5_fR2Ms4jbyFE6HBlhEQpQa3QgKooMQoECSONSp0yJMgiSVK6wlWiNTrrQGrjTAXSwYCd_WY9ES0_Wv-O7X75fwO-AeH7VsQ |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CLUSTER59342.2025.11186477 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798331530198 |
| EISSN | 2168-9253 |
| EndPage | 12 |
| ExternalDocumentID | 11186477 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL RNS |
| ID | FETCH-LOGICAL-i93t-afab701a1134ba734de57430e071748f47e0e9ae1e46dab566af9f3167f7d31f3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Oct 15 14:21:20 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i93t-afab701a1134ba734de57430e071748f47e0e9ae1e46dab566af9f3167f7d31f3 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_11186477 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-Sept.-2 |
| PublicationDateYYYYMMDD | 2025-09-02 |
| PublicationDate_xml | – month: 09 year: 2025 text: 2025-Sept.-2 day: 02 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings / IEEE International Conference on Cluster Computing |
| PublicationTitleAbbrev | CLUSTER |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0037306 |
| Score | 2.3023298 |
| Snippet | This work presents a tool, a methodology, a set of metrics, and practical examples for evaluating the performance of large-scale AI and traditional HPC... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Artificial intelligence Computational efficiency Data visualization Graphics processing units High performance computing Measurement Microscopy Performance analysis performance tools Scalability Statistical analysis |
| Title | NSYS2PRV: Detailed and Quantitative Analysis of Large-Scale GPU Execution Traces with Paraver |
| URI | https://ieeexplore.ieee.org/document/11186477 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcBUHkW85YE1bRI7ccxaKAxVFUiLyoCqS3yRuqSotIifz9lNQAwMbJYlS7bPd_fZvu-Osesk8PMItPASidqTiKRzhQCPNCkBE9uUdY4oPFLjcTKb6bQmqzsuDCK64DPs2ab7yzfLYmOfyvqkl4klTrZYS6l4S9ZqzK6goxrXWUUDX_cHo2lGgDDSQlq-VRj1mtG_6qg4NzLs_HMC-6z7Q8jj6berOWA7WB2yTlORgdcKesRex9lLFqZPzzf81sWGouFQGf64gcqxyci28SYPCV-WfGQDwb2MBIX8Pp3yu08s3FHk5MTIhHD7TstTsEWKVl02Gd5NBg9eXUDBW2ix9qCEXPkBBIGQOSghDUYEGHy0dziZlFKhjxowQBkbyAnYQalLS40vlRFBKY5Zu1pWeMJ4GNECI8JKRpM780PQBCRDgQhhDFGhT1nX7tb8bZsiY95s1Nkf_edsz8rEBWuFF6y9Xm3wku0WH-vF--rKCfYLUuujrQ |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagIMFUHkW88cCaNomdh1lLSxEhCqRFZUDVJb5IXRJUWsTPxzYJiIGBLbIUyfb57r44991HyGXo2JkHglkhR2FxROVzOQNLeVII0tct6wxROAriOJxORVKT1Q0XBhFN8Rl29aP5ly-rfKWvynrKL0NNnFwnG1o6q6ZrNYGXqcPq131FHVv0-tEkVZDQE4xrxpXrdZv3fympmEQybP9zCjuk80PJo8l3stkla1jukXajyUBrF90nL3H6nLrJ49MVvTbVoSgplJI-rKA0fDIV3WjTiYRWBY10KbiVKlMhvUkmdPCBuTmMVKUxFUSovqmlCWiZokWHjIeDcX9k1RIK1lywpQUFZIHtgOMwnkHAuERPQQYb9VccDwseoI0C0EHuS8gUtINCFJocXwSSOQU7IK2yKvGQUNdTC_QUWpJCJTTbBaGgpMsQwfXBy8UR6ejdmr1-NcmYNRt1_Mf4Bdkaje-jWXQb352QbW0fU7rlnpLWcrHCM7KZvy_nb4tzY-RPCgqm9g |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Cluster+Computing&rft.atitle=NSYS2PRV%3A+Detailed+and+Quantitative+Analysis+of+Large-Scale+GPU+Execution+Traces+with+Paraver&rft.au=Clasca%2C+Marc&rft.au=Labarta%2C+Jesus&rft.au=Garcia-Gasulla%2C+Marta&rft.date=2025-09-02&rft.pub=IEEE&rft.eissn=2168-9253&rft.spage=1&rft.epage=12&rft_id=info:doi/10.1109%2FCLUSTER59342.2025.11186477&rft.externalDocID=11186477 |