Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC

Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper studied a state-of-the-art RNN variant, Gated Recurrent Unit (GRU). We first proposed memoization optimization to avoid 3 out of the 6 dense matrix...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	International Conference on Field-programmable Logic and Applications s. 1 - 4
Hlavní autori:	Nurvitadhi, Eriko, Jaewoong Sim, Sheffield, David, Mishra, Asit, Krishnan, Srivatsan, Marr, Debbie
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	EPFL 01.08.2016
Predmet:	Classification algorithms Field programmable gate arrays Graphics processing units Logic gates Random access memory Recurrent neural networks Runtime
ISSN:	1946-1488
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper studied a state-of-the-art RNN variant, Gated Recurrent Unit (GRU). We first proposed memoization optimization to avoid 3 out of the 6 dense matrix vector multiplications (SGEMVs) that are the majority of the computation in GRU. Then, we study the opportunities to accelerate the remaining SGEMVs using FPGAs, in comparison to 14-nm ASIC, GPU, and multi-core CPU. Results show that FPGA provides superior performance/Watt over CPU and GPU because FPGA's on-chip BRAMs, hard DSPs, and reconfigurable fabric allow for efficiently extracting fine-grained parallelisms from small/medium size matrices used by GRU. Moreover, newer FPGAs with more DSPs, on-chip BRAMs, and higher frequency have the potential to narrow the FPGA-ASIC efficiency gap.
ISSN:	1946-1488
DOI:	10.1109/FPL.2016.7577314