FPGA implementation of a Cholesky algorithm for a shared-memory multiprocessor architecture

Solving a system of linear equations is a key problem in engineering and science. Matrix factorization is a key component of many methods used to solve such equations. However, the factorization process is very time consuming, so these problems have often been targeted for parallel machines rather t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Parallel algorithms and applications Jg. 19; H. 4; S. 211 - 226
Hauptverfasser:	Haridas, Satchidanand G., Ziavras, Sotirios G.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Taylor & Francis GroupAbingdon, UK 01.12.2004
Schlagworte:	FPGA Matrix inversion Parallel Cholesky factorization Performance evaluation
ISSN:	1063-7192
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Solving a system of linear equations is a key problem in engineering and science. Matrix factorization is a key component of many methods used to solve such equations. However, the factorization process is very time consuming, so these problems have often been targeted for parallel machines rather than sequential ones. Nevertheless, commercially available supercomputers are expensive and only large institutions have the resources to purchase them. Hence, efforts are on to develop moreaffordable alternatives. In this paper, we propose such an approach. We present an implementation of a parallel version of the Cholesky matrix factorization algorithm on a single-chip multiprocessor built inside an APEX20K series Field-Programmable Gate Array (FPGA) developed by Altera. Our multiprocessor system uses an asymmetric, shared-memoryMIMD architecture and was built using the configurable Nios™ processor core which was also developed by Altera. Our system was developed using Altera's System-On-a-Programmable-Chip (SOPC) Quartus II development environment. Our Cholesky implementation is based on an algorithm described by George et al. [6] . This algorithm is scalable and uses a "queue of tasks" approach to ensure dynamic load-balancing among the processing elements. Our implementation assumes dense matrices in the input. We present performance results for uniprocessor and multiprocessor implementations. Our results show that the implementation of multiprocessors inside FPGAs can benefit matrix operations, such as matrix factorization. Further benefits result from good dynamic load-balancing techniques.
ISSN:	1063-7192
DOI:	10.1080/10637190412331279957