Onyx: A 12nm 756 GOPS/W Coarse-Grained Reconfigurable Array for Accelerating Dense and Sparse Applications

Onyx is the first fully programmable accelerator for arbitrary sparse tensor algebra kernels. Unlike prior work, it supports higher-order tensors, multiple inputs, and fusion. It achieves this with a coarse-grained reconfigurable array (CGRA) that has composable memory primitives for storing compres...

Full description

Saved in:
Bibliographic Details
Published in:Digest of technical papers - Symposium on VLSI Technology pp. 1 - 2
Main Authors: Koul, Kalhan, Strange, Maxwell, Melchert, Jackson, Carsello, Alex, Mei, Yuchen, Hsu, Olivia, Kong, Taeyoung, Chen, Po-Han, Ke, Huifeng, Zhang, Keyi, Liu, Qiaoyi, Nyengele, Gedeon, Balasingam, Akhilesh, Adivarahan, Jayashree, Sharma, Ritvik, Xie, Zhouhua, Torng, Christopher, Emer, Joel, Kjolstad, Fredrik, Horowitz, Mark, Raina, Priyanka
Format: Conference Proceeding
Language:English
Published: IEEE 16.06.2024
Subjects:
ISSN:2158-9682
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Onyx is the first fully programmable accelerator for arbitrary sparse tensor algebra kernels. Unlike prior work, it supports higher-order tensors, multiple inputs, and fusion. It achieves this with a coarse-grained reconfigurable array (CGRA) that has composable memory primitives for storing compressed any-order tensors and compute primitives that eliminate ineffectual computations in sparse expressions. Further, Onyx improves dense image processing and machine learning (ML) with application-specialized compute tiles, memory tiles optimized for affine access patterns, and hybrid clock gating in the global buffer. We achieve up to 565x better energy-delay product (EDP) for sparse kernels vs. CPUs with sparse libraries, and up to 76% and 85% lower EDP for image processing and ML, respectively, vs. Amber [1].
ISSN:2158-9682
DOI:10.1109/VLSITechnologyandCir46783.2024.10631383