Onyx: A 12nm 756 GOPS/W Coarse-Grained Reconfigurable Array for Accelerating Dense and Sparse Applications
Onyx is the first fully programmable accelerator for arbitrary sparse tensor algebra kernels. Unlike prior work, it supports higher-order tensors, multiple inputs, and fusion. It achieves this with a coarse-grained reconfigurable array (CGRA) that has composable memory primitives for storing compres...
Gespeichert in:
| Veröffentlicht in: | Digest of technical papers - Symposium on VLSI Technology S. 1 - 2 |
|---|---|
| Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
16.06.2024
|
| Schlagworte: | |
| ISSN: | 2158-9682 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Onyx is the first fully programmable accelerator for arbitrary sparse tensor algebra kernels. Unlike prior work, it supports higher-order tensors, multiple inputs, and fusion. It achieves this with a coarse-grained reconfigurable array (CGRA) that has composable memory primitives for storing compressed any-order tensors and compute primitives that eliminate ineffectual computations in sparse expressions. Further, Onyx improves dense image processing and machine learning (ML) with application-specialized compute tiles, memory tiles optimized for affine access patterns, and hybrid clock gating in the global buffer. We achieve up to 565x better energy-delay product (EDP) for sparse kernels vs. CPUs with sparse libraries, and up to 76% and 85% lower EDP for image processing and ML, respectively, vs. Amber [1]. |
|---|---|
| ISSN: | 2158-9682 |
| DOI: | 10.1109/VLSITechnologyandCir46783.2024.10631383 |