Effective SIMD Vectorization for Intel Xeon Phi Coprocessors

Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Inte...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Scientific programming Ročník 2015; číslo 2015; s. 1 - 14
Hlavní autoři: Panchenko, Nikolay, Masten, Matt, Kozhukhov, Sergey S., Garcia, Eric N., Preis, Serguei V., Saito, Hideki, Tian, Xinmin, Cherkasov, Aleksei G.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Cairo, Egypt Hindawi Publishing Corporation 01.01.2015
John Wiley & Sons, Inc
Témata:
ISSN:1058-9244, 1875-919X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the Intel C/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A set of workloads from several application domains is employed to conduct the performance study of our SIMD vectorization techniques. The performance results show that we achieved up to 12.5x performance gain on the Intel Xeon Phi coprocessor. We also demonstrate a 2000x performance speedup from the seamless integration of SIMD vectorization and parallelization.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1058-9244
1875-919X
DOI:10.1155/2015/269764