NSYS2PRV: Detailed and Quantitative Analysis of Large-Scale GPU Execution Traces with Paraver

This work presents a tool, a methodology, a set of metrics, and practical examples for evaluating the performance of large-scale AI and traditional HPC applications using GPUs. NSYS2PRV is a tool that converts NVIDIA Nsight Systems reports into traces compatible with Paraver, enabling significantly...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings / IEEE International Conference on Cluster Computing s. 1 - 12
Hlavní autoři: Clasca, Marc, Labarta, Jesus, Garcia-Gasulla, Marta
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 02.09.2025
Témata:
ISSN:2168-9253
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract This work presents a tool, a methodology, a set of metrics, and practical examples for evaluating the performance of large-scale AI and traditional HPC applications using GPUs. NSYS2PRV is a tool that converts NVIDIA Nsight Systems reports into traces compatible with Paraver, enabling significantly enhanced insight compared to current performance analysis practices. By leveraging the capabilities of a well-established HPC performance analysis tool, we enable the comparison of execution traces and the quantification of microscopic-level differences to explain behaviors across hundreds or more computing devices. We argue that large-scale GPU applications and AI workloads can greatly benefit from the type of large-scale performance analysis introduced here, an approach that is not yet widely adopted in this domain. Translating nsys-generated traces to Paraver allows analysts to combine the fine-grained, highly accurate execution data obtainable from proprietary tools with the flexibility and scalability of an open-source, parallel performance analysis environment. Paraver also enables easy, customizable computation of efficiency metrics. This work demonstrates a more effective and insightful analysis experience than that offered by the native visualization tools in Nsight Systems. Additionally, we introduce a set of Paravercompatible metrics that guide the analysis process, and we showcase examples where these metrics were successfully applied to real-world AI and HPC workloads.
AbstractList This work presents a tool, a methodology, a set of metrics, and practical examples for evaluating the performance of large-scale AI and traditional HPC applications using GPUs. NSYS2PRV is a tool that converts NVIDIA Nsight Systems reports into traces compatible with Paraver, enabling significantly enhanced insight compared to current performance analysis practices. By leveraging the capabilities of a well-established HPC performance analysis tool, we enable the comparison of execution traces and the quantification of microscopic-level differences to explain behaviors across hundreds or more computing devices. We argue that large-scale GPU applications and AI workloads can greatly benefit from the type of large-scale performance analysis introduced here, an approach that is not yet widely adopted in this domain. Translating nsys-generated traces to Paraver allows analysts to combine the fine-grained, highly accurate execution data obtainable from proprietary tools with the flexibility and scalability of an open-source, parallel performance analysis environment. Paraver also enables easy, customizable computation of efficiency metrics. This work demonstrates a more effective and insightful analysis experience than that offered by the native visualization tools in Nsight Systems. Additionally, we introduce a set of Paravercompatible metrics that guide the analysis process, and we showcase examples where these metrics were successfully applied to real-world AI and HPC workloads.
Author Labarta, Jesus
Clasca, Marc
Garcia-Gasulla, Marta
Author_xml – sequence: 1
  givenname: Marc
  surname: Clasca
  fullname: Clasca, Marc
  email: marc.clasca@bsc.es
  organization: Barcelona Supercomputing Center,Barcelona,Spain
– sequence: 2
  givenname: Jesus
  surname: Labarta
  fullname: Labarta, Jesus
  email: jesus.labarta@bsc.es
  organization: Barcelona Supercomputing Center,Barcelona,Spain
– sequence: 3
  givenname: Marta
  surname: Garcia-Gasulla
  fullname: Garcia-Gasulla, Marta
  email: marta.garcia@bsc.es
  organization: Barcelona Supercomputing Center,Barcelona,Spain
BookMark eNo1kEFLwzAYQKMouM39Aw_Be2fSL00ab2PWKRSdayd4kPFt_aKR2kmbTffvFdTTg3d4h9dnR82mIcbOpRhJKezFJF8UZTZPLKh4FIs4-dEy1cqYAza0xqYAMgEhbXrIerHUaWTjBE5Yv-vehAADQvfY813xVMSz-eMlv6KAvqaKY1Pxhy02wQcMfkd83GC973zHN47n2L5QVKyxJj6dLXj2Rett8JuGly2uqeOfPrzyGba4o_aUHTusOxr-ccDK66yc3ET5_fR2Ms4jbyFE6HBlhEQpQa3QgKooMQoECSONSp0yJMgiSVK6wlWiNTrrQGrjTAXSwYCd_WY9ES0_Wv-O7X75fwO-AeH7VsQ
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CLUSTER59342.2025.11186477
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798331530198
EISSN 2168-9253
EndPage 12
ExternalDocumentID 11186477
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i93t-afab701a1134ba734de57430e071748f47e0e9ae1e46dab566af9f3167f7d31f3
IEDL.DBID RIE
IngestDate Wed Oct 15 14:21:20 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-afab701a1134ba734de57430e071748f47e0e9ae1e46dab566af9f3167f7d31f3
PageCount 12
ParticipantIDs ieee_primary_11186477
PublicationCentury 2000
PublicationDate 2025-Sept.-2
PublicationDateYYYYMMDD 2025-09-02
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-Sept.-2
  day: 02
PublicationDecade 2020
PublicationTitle Proceedings / IEEE International Conference on Cluster Computing
PublicationTitleAbbrev CLUSTER
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0037306
Score 2.3023298
Snippet This work presents a tool, a methodology, a set of metrics, and practical examples for evaluating the performance of large-scale AI and traditional HPC...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Artificial intelligence
Computational efficiency
Data visualization
Graphics processing units
High performance computing
Measurement
Microscopy
Performance analysis
performance tools
Scalability
Statistical analysis
Title NSYS2PRV: Detailed and Quantitative Analysis of Large-Scale GPU Execution Traces with Paraver
URI https://ieeexplore.ieee.org/document/11186477
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcBUHkW85YE1bRI7ccxaKAxVFUiLyoCqS3yRuqSotIifz9lNQAwMbJYlS7bPd_fZvu-Osesk8PMItPASidqTiKRzhQCPNCkBE9uUdY4oPFLjcTKb6bQmqzsuDCK64DPs2ab7yzfLYmOfyvqkl4klTrZYS6l4S9ZqzK6goxrXWUUDX_cHo2lGgDDSQlq-VRj1mtG_6qg4NzLs_HMC-6z7Q8jj6berOWA7WB2yTlORgdcKesRex9lLFqZPzzf81sWGouFQGf64gcqxyci28SYPCV-WfGQDwb2MBIX8Pp3yu08s3FHk5MTIhHD7TstTsEWKVl02Gd5NBg9eXUDBW2ix9qCEXPkBBIGQOSghDUYEGHy0dziZlFKhjxowQBkbyAnYQalLS40vlRFBKY5Zu1pWeMJ4GNECI8JKRpM780PQBCRDgQhhDFGhT1nX7tb8bZsiY95s1Nkf_edsz8rEBWuFF6y9Xm3wku0WH-vF--rKCfYLUuujrQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagIMFUHkW88cCaNomdh1lLSxEhCqRFZUDVJb5IXRJUWsTPxzYJiIGBLbIUyfb57r44991HyGXo2JkHglkhR2FxROVzOQNLeVII0tct6wxROAriOJxORVKT1Q0XBhFN8Rl29aP5ly-rfKWvynrKL0NNnFwnG1o6q6ZrNYGXqcPq131FHVv0-tEkVZDQE4xrxpXrdZv3fympmEQybP9zCjuk80PJo8l3stkla1jukXajyUBrF90nL3H6nLrJ49MVvTbVoSgplJI-rKA0fDIV3WjTiYRWBY10KbiVKlMhvUkmdPCBuTmMVKUxFUSovqmlCWiZokWHjIeDcX9k1RIK1lywpQUFZIHtgOMwnkHAuERPQQYb9VccDwseoI0C0EHuS8gUtINCFJocXwSSOQU7IK2yKvGQUNdTC_QUWpJCJTTbBaGgpMsQwfXBy8UR6ejdmr1-NcmYNRt1_Mf4Bdkaje-jWXQb352QbW0fU7rlnpLWcrHCM7KZvy_nb4tzY-RPCgqm9g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Cluster+Computing&rft.atitle=NSYS2PRV%3A+Detailed+and+Quantitative+Analysis+of+Large-Scale+GPU+Execution+Traces+with+Paraver&rft.au=Clasca%2C+Marc&rft.au=Labarta%2C+Jesus&rft.au=Garcia-Gasulla%2C+Marta&rft.date=2025-09-02&rft.pub=IEEE&rft.eissn=2168-9253&rft.spage=1&rft.epage=12&rft_id=info:doi/10.1109%2FCLUSTER59342.2025.11186477&rft.externalDocID=11186477