MixFX-SCORE: Heterogeneous Fixed-Point Compilation of Dataflow Computations

Mixed-precision implementation of computation can deliver area, throughput and power improvements for dataflow computations over homogeneous fixed-precision circuits without any loss in accuracy. When designing circuits for reconfigurable hardware, we can exercise independent control over bitwidth s...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines s. 206 - 209
Hlavní autori:	Deheng Ye, Kapre, Nachiket
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 01.05.2014
Predmet:	Analytical models Benchmark testing Field programmable gate arrays Mathematical model Simulated annealing Solid modeling
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Mixed-precision implementation of computation can deliver area, throughput and power improvements for dataflow computations over homogeneous fixed-precision circuits without any loss in accuracy. When designing circuits for reconfigurable hardware, we can exercise independent control over bitwidth selection of each variable in the computation. However, selecting the best precision for each variable is an NP-hard problem. While traditional solutions use automated heuristics like simulated annealing or integer linear programming, they still rely on the manual formulation of resource models, which can be tedious, and potentially inaccurate due to the unpredictable interactions between different stages of the FPGA CAD flow. We develop MixFX-SCORE, an automated tool-flow based on FX-SCORE fixed-point compilation framework and simulated annealing, to address this challenge. We outsource error analysis (Gappa++) and resource model generation (Vivado HLS, Logic Synthesis, Xilinx Place-and-Route) to external tools that offer a more accurate representation of error behavior (backed by proofs) and resource usage (based on actual utilization). We demonstrate 1.1-3.5x LUTs count savings, 1-1.8x DSP count reductions, and 1-3.9x dynamic power improvements while still satisfying the accuracy constraints when compared to homogeneous fixed-point implementations.
DOI:	10.1109/FCCM.2014.64