Stream-dataflow acceleration

Demand for low-power data processing hardware continues to rise inexorably. Existing programmable and "general purpose" solutions (eg. SIMD, GPGPUs) are insufficient, as evidenced by the order-of-magnitude improvements and industry adoption of application and domain-specific accelerators i...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) s. 416 - 429
Hlavní autori:	Nowatzki, Tony, Gangadhar, Vinay, Ardalani, Newsha, Sankaralingam, Karthikeyan
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	ACM 01.06.2017
Predmet:	Acceleration Accelerator Architecture CGRA Computer architecture Dataflow Domain-Specific Hardware Microarchitecture Ports (Computers) Programmable Reconfigurable Registers Spatial databases Streaming
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Demand for low-power data processing hardware continues to rise inexorably. Existing programmable and "general purpose" solutions (eg. SIMD, GPGPUs) are insufficient, as evidenced by the order-of-magnitude improvements and industry adoption of application and domain-specific accelerators in important areas like machine learning, computer vision and big data. The stark tradeoffs between efficiency and generality at these two extremes poses a difficult question: how could domain-specific hardware efficiency be achieved without domain-specific hardware solutions? In this work, we rely on the insight that "acceleratable" algorithms have broad common properties: high computational intensity with long phases, simple control patterns and dependences, and simple streaming memory access and reuse patterns. We define a general architecture (a hardware-software interface) which can more efficiently expresses programs with these properties called stream-dataflow. The dataflow component of this architecture enables high concurrency, and the stream component enables communication and coordination at very-low power and area overhead. This paper explores the hardware and software implications, describes its detailed microarchitecture, and evaluates an implementation. Compared to a state-of-the-art domain specific accelerator (DianNao), and fixed-function accelerators for MachSuite, Softbrain can match their performance with only 2× power overhead on average.
DOI:	10.1145/3079856.3080255