Parallelization of Full Search Motion Estimation Algorithm for Parallel and Distributed Platforms

This work presents an efficient method to map the Full Search algorithm for Motion Estimation (ME) onto General Purpose Graphic Processing Unit (GPGPU) architectures using Compute Unified Device Architecture (CUDA) programming model. Our method jointly exploits the massive parallelism available in c...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	International journal of parallel programming Ročník 42; číslo 2; s. 239 - 264
Hlavní autoři:	Monteiro, Eduarda, Vizzotto, Bruno, Diniz, Cláudio, Maule, Marilena, Zatt, Bruno, Bampi, Sergio
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Boston Springer US 01.04.2014 Springer Nature B.V
Témata:	Algorithms Coding standards Computer architecture Computer Science Devices Distributed processing Libraries Mathematical models Motion simulation Parallel processing Platforms Processor Architectures Programming Quality standards Search algorithms Searching Software Engineering/Programming and Operating Systems Studies Theory of Computation CUDA Motion estimation OpenMP GPU MPI Block matching
ISSN:	0885-7458, 1573-7640
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	This work presents an efficient method to map the Full Search algorithm for Motion Estimation (ME) onto General Purpose Graphic Processing Unit (GPGPU) architectures using Compute Unified Device Architecture (CUDA) programming model. Our method jointly exploits the massive parallelism available in current GPGPU devices and the parallelism potential of Full Search algorithm. Our main goal is to evaluate the feasibility of video codecs implementation using GPGPUs and its advantages and drawbacks compared to other platforms. Therefore, for comparison reasons, three solutions were developed using distinct programming paradigms for distinct underlying hardware architectures: (i) a sequential solution for general-purpose processor (GPP); (ii) a parallel solution for multi-core GPP using OpenMP library; (iii) a distributed solution for cluster/grid machines using Message Passing Interface (MPI) library. The CUDA-based solution for GPGPUs achieves speed-up compatible to the indicated by the theoretical model for different search areas. Our GPGPU Full Search Motion Estimation provides 2×, 20× and 1664× speed-up when compared to MPI, OpenMP and sequential implementations, respectively. Compared to state-of-the-art, our solution reaches up to 17× speed-up.
Bibliografie:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23
ISSN:	0885-7458 1573-7640
DOI:	10.1007/s10766-012-0216-7