Modular matrix computations on multi-linear VLSI arrays
This paper describes the synthesis of modularly extensible matrix algorithms on multi-linear VLSI arrays with application to rectangular matrix multiplication. Multi-linear arrays differ from two-dimensional (2D) arrays in that one of the dimensions in the arrays is independent of the size of the pr...
Uložené v:
| Vydané v: | Parallel computing Ročník 21; číslo 8; s. 1269 - 1284 |
|---|---|
| Hlavný autor: | |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Amsterdam
Elsevier B.V
1995
Elsevier |
| Predmet: | |
| ISSN: | 0167-8191, 1872-7336 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | This paper describes the synthesis of modularly extensible matrix algorithms on multi-linear VLSI arrays with application to rectangular matrix multiplication. Multi-linear arrays differ from two-dimensional (2D) arrays in that one of the dimensions in the arrays is independent of the size of the problem and depends only on the I/O bandwidth of the host computer. They feature bounded clock skew (unlike 2D arrays) and higher throughput and communication bandwidth than linear arrays.
A significant feature of the algorithms discussed in this paper is that they exhibit an interdependence between the number of processing elements (PEs) per row in the multi-linear array, the storage inside each PE, and the size of the problem. This interrelationship permits changes in one of these measures to be accommodated by altering one or more of the other measures. One particular benefit of this feature is that it enables the design of modularly extensible algorithms which can accommodate changes in the problem size by only changing the number of PEs per row in the array. Therefore, for a given host I/O bandwidth, larger problem sizes can be handled by cascading smaller multi-linear arrays, all of which have the same number of rows and are comprised of PEs with a fixed amount of local storage. A second feature of our algorithms is that their performance is also a function of the number of rows of PEs in the array which is in turn a function of the I/O bandwidth of the host. A third feature of our design is that the algorithms are controlled by tag bits inserted with the data and no control memory is required inside the PEs. |
|---|---|
| ISSN: | 0167-8191 1872-7336 |
| DOI: | 10.1016/0167-8191(95)00020-O |