Efficiently Removing Sparsity for High-Throughput Stream Processing ; The International Conference on Field-Programmable Technology (FPT) 2023

Gespeichert in:
Bibliographische Detailangaben
Titel: Efficiently Removing Sparsity for High-Throughput Stream Processing ; The International Conference on Field-Programmable Technology (FPT) 2023
Autoren: Papaphilippou, Philippos
Publikationsjahr: 2023
Bestand: The University of Dublin, Trinity College: TARA (Trinity's Access to Research Archive)
Schlagwörter: Prefix scan, Interconnects, FPGA, Stream compaction, Aggregation, High- throughput computation, Analytics, Computer Architecture, Computer Engineering, Computer Science, Parallel Computer Architecture, Parallel Programming, Parallel Systems
Beschreibung: Big data analytics and machine learning are increasingly targeted by FPGAs due to their significant amount of computing capabilities and internal parallelism. Different programming models are used to distribute the workload to the internals of the FPGAs at different granularities. While the memory bandwidth has been steadily increasing, there are some challenges in the way system-on-chips use this bandwidth. One way system-on-chip architects exploit the increasing memory bandwidth is by widening the datapath width. This is reflected at various points in the system including the widening of vector instructions. On FPGAs, many analytics accelerators are memory-bound, and would benefit from making the most of the available bandwidth. In this paper we present a scalable and highly-efficient building block for building high-throughput streaming accelerators, which removes sparsity on-the-fly without backpressure.
Publikationsart: conference object
Dateibeschreibung: application/pdf
Sprache: English
Relation: Y; http://hdl.handle.net/2262/104146; http://people.tcd.ie/papaphip; 260091
Verfügbarkeit: http://hdl.handle.net/2262/104146
http://people.tcd.ie/papaphip
Rights: Y ; openAccess
Dokumentencode: edsbas.BD543074
Datenbank: BASE
Beschreibung
Abstract:Big data analytics and machine learning are increasingly targeted by FPGAs due to their significant amount of computing capabilities and internal parallelism. Different programming models are used to distribute the workload to the internals of the FPGAs at different granularities. While the memory bandwidth has been steadily increasing, there are some challenges in the way system-on-chips use this bandwidth. One way system-on-chip architects exploit the increasing memory bandwidth is by widening the datapath width. This is reflected at various points in the system including the widening of vector instructions. On FPGAs, many analytics accelerators are memory-bound, and would benefit from making the most of the available bandwidth. In this paper we present a scalable and highly-efficient building block for building high-throughput streaming accelerators, which removes sparsity on-the-fly without backpressure.