FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators

NVIDIA Tensor Cores and AMD Matrix Cores (together called Matrix Accelerators) are of growing interest in high-performance computing and machine learning owing to their high performance. Unfortunately, some of their crucial numerical attributes pertaining to departures from full IEEE floating-point...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing S. 39 - 46
Hauptverfasser: Li, Xinyi, Li, Ang, Fang, Bo, Swirydowicz, Katarzyna, Laguna, Ignacio, Gopalakrishnan, Ganesh
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 06.05.2024
Schlagworte:
ISSN:2993-2114
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:NVIDIA Tensor Cores and AMD Matrix Cores (together called Matrix Accelerators) are of growing interest in high-performance computing and machine learning owing to their high performance. Unfortunately, some of their crucial numerical attributes pertaining to departures from full IEEE floating-point compatibility are not documented. This makes it impossible to reliably port codes across these differing accelerators. This paper contributes a collection of Feature Targeted Tests for Numerical Properties that that help determine these features across five floating-point formats, four rounding modes and additional that highlight the rounding behaviors and preservation of extra precision bits. To show the practical relevance of FTTN, we design a simple matrix-multiplication test designed with insights gathered from our feature-tests. We executed this very simple test on five platforms, producing different answers: V100, A100, and MI250X produced 0, MI100 produced 255.875, and Hopper H100 produced 191.875. Our matrix multiplication tests employ patterns found in iterative refinement-based algorithms, highlighting the need to check for significant result variability when porting code across GPUs.
ISSN:2993-2114
DOI:10.1109/CCGrid59990.2024.00014