FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators

NVIDIA Tensor Cores and AMD Matrix Cores (together called Matrix Accelerators) are of growing interest in high-performance computing and machine learning owing to their high performance. Unfortunately, some of their crucial numerical attributes pertaining to departures from full IEEE floating-point...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing S. 39 - 46
Hauptverfasser:	Li, Xinyi, Li, Ang, Fang, Bo, Swirydowicz, Katarzyna, Laguna, Ignacio, Gopalakrishnan, Ganesh
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 06.05.2024
Schlagworte:	AMD GPUs Cloud computing Clustering algorithms Codes Correctness Portability floating-point arithmetic High performance computing Iterative algorithms Machine learning Matrix Units NVIDIA GPUs Reliability Tensor Cores Tensors Testing
ISSN:	2993-2114
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	NVIDIA Tensor Cores and AMD Matrix Cores (together called Matrix Accelerators) are of growing interest in high-performance computing and machine learning owing to their high performance. Unfortunately, some of their crucial numerical attributes pertaining to departures from full IEEE floating-point compatibility are not documented. This makes it impossible to reliably port codes across these differing accelerators. This paper contributes a collection of Feature Targeted Tests for Numerical Properties that that help determine these features across five floating-point formats, four rounding modes and additional that highlight the rounding behaviors and preservation of extra precision bits. To show the practical relevance of FTTN, we design a simple matrix-multiplication test designed with insights gathered from our feature-tests. We executed this very simple test on five platforms, producing different answers: V100, A100, and MI250X produced 0, MI100 produced 255.875, and Hopper H100 produced 191.875. Our matrix multiplication tests employ patterns found in iterative refinement-based algorithms, highlighting the need to check for significant result variability when porting code across GPUs.
ISSN:	2993-2114
DOI:	10.1109/CCGrid59990.2024.00014