Parallel Data Partitioning Algorithms for Optimization of Data-Parallel Applications on Modern Extreme-Scale Multicore Platforms for Performance and Energy

Data partitioning algorithms aiming to minimize the execution time and the energy of computations in self-adaptable data-parallel applications on modern extreme-scale multicore platforms must address two critical challenges. First, they must take into account the new complexities inherent in these p...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE access Ročník 6; s. 69075 - 69106
Hlavní autoři: Manumachu, Ravi Reddy, Lastovetsky, Alexey
Médium: Journal Article
Jazyk:angličtina
Vydáno: Piscataway IEEE 2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:2169-3536, 2169-3536
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Data partitioning algorithms aiming to minimize the execution time and the energy of computations in self-adaptable data-parallel applications on modern extreme-scale multicore platforms must address two critical challenges. First, they must take into account the new complexities inherent in these platforms such as severe resource contention and non-uniform memory access. Second, they must have low practical runtime and memory costs. The sequential data partitioning algorithms addressing the first challenge have a theoretical time complexity of O(<inline-formula> <tex-math notation="LaTeX">m * m * p * p </tex-math></inline-formula>) where <inline-formula> <tex-math notation="LaTeX">m </tex-math></inline-formula> is the number of points in the discrete speed/energy function and <inline-formula> <tex-math notation="LaTeX">p </tex-math></inline-formula> is the number of available processors. They, however, exhibit high practical runtime cost and excessive memory footprint, therefore, rendering them impracticable for employment in self-adaptable applications executing on extreme-scale multicore platforms. We present, in this paper, the parallel data partitioning algorithms that address both the challenges. They take as input the functional models of performance and energy consumption against problem size and output workload distributions, which are globally optimal solutions. They have a low time complexity of O(<inline-formula> <tex-math notation="LaTeX">m * m * p </tex-math></inline-formula>) thereby providing a linear speedup of O(<inline-formula> <tex-math notation="LaTeX">p </tex-math></inline-formula>) and low memory complexity of O(<inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula>) where <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is the workload size expressed as a multiple of granularity. They employ dynamic programming approach, which also facilitates the easier integration of performance and energy models of communications. We experimentally study the practical cost of application of our algorithms in two data-parallel applications, matrix multiplication and fast Fourier transform, on a cluster in Grid'5000 platform. We demonstrate that their practical runtime and memory costs are low making them ideal for employment in self-adaptable applications. We also show that the parallel algorithms exhibit tremendous speedups over the sequential algorithms. Finally, using theoretical analysis for a forecast exascale platform, we demonstrate that the parallel algorithms have negligible execution times compared to the matrix multiplication application executing on the platform.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2018.2879228