Towards a Reconfigurable Bit-Serial/Bit-Parallel Vector Accelerator using In-Situ Processing-In-SRAM

Vector accelerators can efficiently execute regular data-parallel workloads, but they require expensive multi-ported register files to feed large vector ALUs. Recent work on in-situ processing-in-SRAM shows promise in enabling area-efficient vector acceleration. This work explores two different appr...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE International Symposium on Circuits and Systems proceedings s. 1 - 5
Hlavní autoři: Al-Hawaj, Khalid, Afuye, Olalekan, Agwa, Shady, Apsel, Alyssa, Batten, Christopher
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.10.2020
Témata:
ISBN:9781728133201, 1728133203
ISSN:2158-1525
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Vector accelerators can efficiently execute regular data-parallel workloads, but they require expensive multi-ported register files to feed large vector ALUs. Recent work on in-situ processing-in-SRAM shows promise in enabling area-efficient vector acceleration. This work explores two different approaches to leveraging in-situ processing-in-SRAM: BS-VRAM, which uses bit-serial execution, and BP-VRAM, which uses bit-parallel execution. The two approaches have very different latency vs. throughput trade-offs. BS-VRAM requires more cycles per operation, but is able to execute thousands of operations in parallel, while BP-VRAM requires fewer cycles per operation, but can only execute hundreds of operations in parallel. This paper is the first work to perform a rigorous evaluation of bit-serial vs. bit-parallel in-situ processing-in-SRAM. Our results show that both approaches have similar area overheads. For 32-bit arithmetic operations, BS-VRAM improves throughput by 1.3-5.0× compared to BP-VRAM, while BP-VRAM improves latency by 3.0-23.0× compared to BS-VRAM.
ISBN:9781728133201
1728133203
ISSN:2158-1525
DOI:10.1109/ISCAS45731.2020.9181068