NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator

Graph Neural Networks (GNNs) are emerging as a formidable tool for processing non-euclidean data across various domains, ranging from social network analysis to bioinformatics. Despite their effectiveness, their adoption has not been pervasive because of scalability challenges associated with large-...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) s. 946 - 960
Hlavní autoři:	Shivdikar, Kaustubh, Agostini, Nicolas Bohm, Jayaweera, Malith, Jonatan, Gilbert, Abellan, Jose L., Joshi, Ajay, Kim, John, Kaeli, David
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 29.06.2024
Témata:	Computer architecture Decoupled Computations Graph neural networks Graph Neural Networks (GNN) Hardware-software co-design On-chip Memory Organizations Prefetching Scalability Social networking (online) Source coding Sparse Matrix Multiplication (SpGEMM) Spatial Accelerators
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Graph Neural Networks (GNNs) are emerging as a formidable tool for processing non-euclidean data across various domains, ranging from social network analysis to bioinformatics. Despite their effectiveness, their adoption has not been pervasive because of scalability challenges associated with large-scale graph datasets, particularly when leveraging message passing. They exhibit irregular sparsity patterns, resulting in unbalanced compute resource utilization. Prior accelerators investigating Gustavson's technique adopted look-ahead buffers for prefetching data, aiming to prevent compute stalls. However, these solutions lead to inefficient use of the on-chip memory, leading to redundant data residing in cache.To tackle these challenges, we introduce NeuraChip, a novel GNN spatial accelerator based on Gustavson's algorithm. NeuraChip decouples the multiplication and addition computations in sparse matrix multiplication. This separation allows for independent exploitation of their unique data dependencies, facilitating efficient resource allocation. We introduce a rolling eviction strategy to mitigate data idling in on-chip memory as well as address the prevalent issue of memory bloat in sparse graph computations. Furthermore, the compute resource load balancing is achieved through a dynamic reseeding hash-based mapping, ensuring uniform utilization of computing resources agnostic of sparsity patterns. Finally, we present NeuraSim, an open-source, cycle-accurate, multi-threaded, modular simulator for comprehensive performance analysis.Overall, NeuraChip presents a significant improvement, yielding an average speedup of 22.1 \times over Intel's MKL, 17.1 \times over NVIDIA's cuSPARSE, 16.7 \times over AMD's hipSPARSE, and 1.5 \times over prior state-of-the-art SpGEMM accelerator and 1.3 \times over GNN accelerator. The source code for our open-sourced simulator and performance visualizer is publicly accessible on GitHub 1 . CCS CONCEPTS * Computer systems organization → Multicore architectures; Interconnection architectures; * Computing methodologies → Neural networks; * Theory of computation → Graph algorithms analysis; * Hardware → Hardware accelerators. 1 https://github.com/NeuraChip/neurachip
DOI:	10.1109/ISCA59077.2024.00073