Performance Analysis and Optimisation of Two-sided Factorization Algorithms for Heterogeneous Platform

Many applications, ranging from big data analytics to nanostructure designs, require the solution of large dense singular value decomposition (SVD) or eigenvalue problems. A first step in the solution methodology for these problems is the reduction of the matrix at hand to condensed form by two-side...

Full description

Saved in:
Bibliographic Details
Published in:Procedia computer science Vol. 51; pp. 180 - 190
Main Authors: Kabir, Khairul, Haidar, Azzam, Tomov, Stanimire, Dongarra, Jack
Format: Journal Article
Language:English
Published: Elsevier B.V 2015
Subjects:
ISSN:1877-0509, 1877-0509
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Many applications, ranging from big data analytics to nanostructure designs, require the solution of large dense singular value decomposition (SVD) or eigenvalue problems. A first step in the solution methodology for these problems is the reduction of the matrix at hand to condensed form by two-sided orthogonal transformations. This step is standardly used to significantly accelerate the solution process. We present a performance analysis of the main two-sided factorizations used in these reductions: the bidiagonalization, tridiagonalization, and the upper Hessenberg factorizations on heterogeneous systems of multicore CPUs and Xeon Phi coprocessors. We derive a performance model and use it to guide the analysis and to evaluate performance. We develop optimized implementations for these methods that get up to 80% of the optimal performance bounds. Finally, we describe the heterogeneous multicore and coprocessor development considerations and the techniques that enable us to achieve these high-performance results. The work here presents the first highly optimized implementation of these main factorizations for Xeon Phi coprocessors. Compared to the LAPACK versions optmized by Intel for Xeon Phi (in MKL), we achieve up to 50% speedup.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2015.05.222