Restructuring and implementations of 2D matrix transpose algorithm using SSE4 vector instructions

Current general-purpose processors are augmented with vector instructions that can process many elements of matrices and vectors in parallel. Transposing a matrix in-place is a main kernel operation required by many scientific and engineering applications to shuttle data before, during, or after pro...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2015 International Conference on Applied Research in Computer Science and Engineering (ICAR) s. 1 - 7
Hlavný autor: Zekri, Ahmed S.
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.10.2015
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Current general-purpose processors are augmented with vector instructions that can process many elements of matrices and vectors in parallel. Transposing a matrix in-place is a main kernel operation required by many scientific and engineering applications to shuttle data before, during, or after processing. This operation increases the traffic on the memory bus and hence clever techniques such as blocking are required to enhance the performance. In this paper, we present an enhanced version of a previously published algorithm for transposing a matrix on a two-dimensional processor arrays. We restructured this algorithm to fit the one-dimensional vector register architecture augmented to general-purpose CPUs. We implemented the new vector algorithm using Intel SSE4 vector instruction set and compare its performance with the standard sequential algorithm in addition to an already employed implementation of Ekhlundh's algorithm. We also studied the automatic compiler optimizations and their effect on the vectorization of the algorithm. The best of our implementations showed a maximum speedup of 1.6 compared with the sequential algorithm, and an almost equal performance compared with Eklundh's algorithm implementation.
DOI:10.1109/ARCSE.2015.7338144