High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs
Field-programmable gate arrays (FPGAs) have become an attractive option for accelerating scientific applications. Many scientific operations such as matrix-vector multiplication and dot product involve the reduction of a sequentially produced stream of values. Unfortunately, because of the pipelinin...
Uložené v:
| Vydané v: | IEEE transactions on parallel and distributed systems Ročník 18; číslo 10; s. 1377 - 1392 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York
IEEE
01.10.2007
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Predmet: | |
| ISSN: | 1045-9219, 1558-2183 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Field-programmable gate arrays (FPGAs) have become an attractive option for accelerating scientific applications. Many scientific operations such as matrix-vector multiplication and dot product involve the reduction of a sequentially produced stream of values. Unfortunately, because of the pipelining in FPGA-based floating-point units, data hazards may occur during these sequential reduction operations. Improperly designed reduction circuits can adversely impact the performance, impose unrealistic buffer requirements, and consume a significant portion of the FPGA. In this paper, we identify two basic methods for designing serial reduction circuits: the tree-traversal method and the striding method. Using accumulation as an example, we analyze the design trade-offs among the number of adders, buffer size, and latency. We then propose high-performance and area-efficient designs using each method. The proposed designs reduce multiple sets of sequentially delivered floating-point values without stalling the pipeline or imposing unrealistic buffer requirements. Using a Xilinx Virtex-ll Pro FPGA as the target device, we implemented our designs and present performance and area results. |
|---|---|
| Bibliografia: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1045-9219 1558-2183 |
| DOI: | 10.1109/TPDS.2007.1068 |