A Parallelization of Non-Serial Polyadic Dynamic Programming on GPU

Parallelization of Non-Serial Polyadic Dynamic Programming (NPDP) on high-throughput manycore architectures, such as NVIDIA GPUs, suffers from load imbalance, i.e. non-optimal mapping between the sub-problems of NPDP and the processing elements of the GPU. NPDP exhibits non-uniformity in the number...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of computing and information technology Vol. 27; no. 2; pp. 55 - 66
Main Authors:	Diwan, Tausif, Tembhurne, Jitendra
Format:	Journal Article Paper
Language:	English
Published:	Sveuciliste U Zagrebu 01.06.2019 Fakultet elektrotehnike i računarstva Sveučilišta u Zagrebu
Subjects:	Algorithms Computer programming CUDA dynamic programming GPU Multiprocessing NPDP parallel computing India
ISSN:	1330-1136, 1846-3908
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Parallelization of Non-Serial Polyadic Dynamic Programming (NPDP) on high-throughput manycore architectures, such as NVIDIA GPUs, suffers from load imbalance, i.e. non-optimal mapping between the sub-problems of NPDP and the processing elements of the GPU. NPDP exhibits non-uniformity in the number of subproblems as well as computational complexity across the phases. In NPDP parallelization, phases are computed sequentially whereas subproblems of each phase are computed concurrently. Therefore, it is essential to effectively map the subproblems of each phase to the processing elements while implementing thread level parallelism. We propose an adaptive Generalized Mapping Method (GMM) for NPDP parallelization that utilizes the GPU for efficient mapping of subproblems onto processing threads in each phase. Input-size and targeted GPU decide the computing power and the best mapping for each phase in NPDP parallelization. The performance of GMM is compared with different conventional parallelization approaches. For sufficiently large inputs, our technique outperforms the state-of-the-art conventional parallelization approach and achieves a significant speedup of a factor 30. We also summarize the general heuristics for achieving better gain in the NPDP parallelization.
Bibliography:	228266
ISSN:	1330-1136 1846-3908
DOI:	10.20532/cit.2019.1004579