Performance Tuning of Tile Matrix Decomposition

Task parallel algorithms have attracted attention as algorithms for highly parallel architectures in recent years. The aim of such algorithms is to keep all computing resources running without stalling by executing a large number of fine-grained tasks asynchronously while observing data dependencies...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) s. 25 - 31
Hlavní autor: Suzuki, Tomohiro
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.10.2019
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Task parallel algorithms have attracted attention as algorithms for highly parallel architectures in recent years. The aim of such algorithms is to keep all computing resources running without stalling by executing a large number of fine-grained tasks asynchronously while observing data dependencies. The tile algorithm of matrix decomposition of dense matrices is implemented using a task parallel programming model following such an approach. In this article, we will consider how to select tile size, which is an important performance parameter.
DOI:10.1109/MCSoC.2019.00011