Efficiently Removing Sparsity for High-Throughput Stream Processing ; The International Conference on Field-Programmable Technology (FPT) 2023

Uloženo v:
Podrobná bibliografie
Název: Efficiently Removing Sparsity for High-Throughput Stream Processing ; The International Conference on Field-Programmable Technology (FPT) 2023
Autoři: Papaphilippou, Philippos
Rok vydání: 2023
Sbírka: The University of Dublin, Trinity College: TARA (Trinity's Access to Research Archive)
Témata: Prefix scan, Interconnects, FPGA, Stream compaction, Aggregation, High- throughput computation, Analytics, Computer Architecture, Computer Engineering, Computer Science, Parallel Computer Architecture, Parallel Programming, Parallel Systems
Popis: Big data analytics and machine learning are increasingly targeted by FPGAs due to their significant amount of computing capabilities and internal parallelism. Different programming models are used to distribute the workload to the internals of the FPGAs at different granularities. While the memory bandwidth has been steadily increasing, there are some challenges in the way system-on-chips use this bandwidth. One way system-on-chip architects exploit the increasing memory bandwidth is by widening the datapath width. This is reflected at various points in the system including the widening of vector instructions. On FPGAs, many analytics accelerators are memory-bound, and would benefit from making the most of the available bandwidth. In this paper we present a scalable and highly-efficient building block for building high-throughput streaming accelerators, which removes sparsity on-the-fly without backpressure.
Druh dokumentu: conference object
Popis souboru: application/pdf
Jazyk: English
Relation: Y; http://hdl.handle.net/2262/104146; http://people.tcd.ie/papaphip; 260091
Dostupnost: http://hdl.handle.net/2262/104146
http://people.tcd.ie/papaphip
Rights: Y ; openAccess
Přístupové číslo: edsbas.BD543074
Databáze: BASE
Popis
Abstrakt:Big data analytics and machine learning are increasingly targeted by FPGAs due to their significant amount of computing capabilities and internal parallelism. Different programming models are used to distribute the workload to the internals of the FPGAs at different granularities. While the memory bandwidth has been steadily increasing, there are some challenges in the way system-on-chips use this bandwidth. One way system-on-chip architects exploit the increasing memory bandwidth is by widening the datapath width. This is reflected at various points in the system including the widening of vector instructions. On FPGAs, many analytics accelerators are memory-bound, and would benefit from making the most of the available bandwidth. In this paper we present a scalable and highly-efficient building block for building high-throughput streaming accelerators, which removes sparsity on-the-fly without backpressure.