Parallel Data Partitioning Algorithms for Optimization of Data-Parallel Applications on Modern Extreme-Scale Multicore Platforms for Performance and Energy

Data partitioning algorithms aiming to minimize the execution time and the energy of computations in self-adaptable data-parallel applications on modern extreme-scale multicore platforms must address two critical challenges. First, they must take into account the new complexities inherent in these p...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access Vol. 6; pp. 69075 - 69106
Main Authors: Manumachu, Ravi Reddy, Lastovetsky, Alexey
Format: Journal Article
Language:English
Published: Piscataway IEEE 2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2169-3536, 2169-3536
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data partitioning algorithms aiming to minimize the execution time and the energy of computations in self-adaptable data-parallel applications on modern extreme-scale multicore platforms must address two critical challenges. First, they must take into account the new complexities inherent in these platforms such as severe resource contention and non-uniform memory access. Second, they must have low practical runtime and memory costs. The sequential data partitioning algorithms addressing the first challenge have a theoretical time complexity of O(<inline-formula> <tex-math notation="LaTeX">m * m * p * p </tex-math></inline-formula>) where <inline-formula> <tex-math notation="LaTeX">m </tex-math></inline-formula> is the number of points in the discrete speed/energy function and <inline-formula> <tex-math notation="LaTeX">p </tex-math></inline-formula> is the number of available processors. They, however, exhibit high practical runtime cost and excessive memory footprint, therefore, rendering them impracticable for employment in self-adaptable applications executing on extreme-scale multicore platforms. We present, in this paper, the parallel data partitioning algorithms that address both the challenges. They take as input the functional models of performance and energy consumption against problem size and output workload distributions, which are globally optimal solutions. They have a low time complexity of O(<inline-formula> <tex-math notation="LaTeX">m * m * p </tex-math></inline-formula>) thereby providing a linear speedup of O(<inline-formula> <tex-math notation="LaTeX">p </tex-math></inline-formula>) and low memory complexity of O(<inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula>) where <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is the workload size expressed as a multiple of granularity. They employ dynamic programming approach, which also facilitates the easier integration of performance and energy models of communications. We experimentally study the practical cost of application of our algorithms in two data-parallel applications, matrix multiplication and fast Fourier transform, on a cluster in Grid'5000 platform. We demonstrate that their practical runtime and memory costs are low making them ideal for employment in self-adaptable applications. We also show that the parallel algorithms exhibit tremendous speedups over the sequential algorithms. Finally, using theoretical analysis for a forecast exascale platform, we demonstrate that the parallel algorithms have negligible execution times compared to the matrix multiplication application executing on the platform.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2018.2879228