Compensated summation and dot product algorithms for floating-point vectors on parallel architectures: Error bounds, implementation and application in the Krylov subspace methods

The aim of the paper is to improve parallel algorithms that obtain higher precision in floating point reduction-type operations while working within the basic floating point type. The compensated parallel variants of summation and dot product operations for floating point vectors are considered (lev...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of computational and applied mathematics Jg. 414; S. 114434
Hauptverfasser:	Evstigneev, N.M., Ryabkov, O.I., Bocharov, A.N., Petrovskiy, V.P., Teplyakov, I.O.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier B.V 01.11.2022
Schlagworte:	Accurate dot product Accurate summation Compensated algorithms General purpose GPU Krylov subspace solvers Krylov subspace solvers Compensated algorithms Accurate dot product General purpose GPU Accurate summation
ISSN:	0377-0427, 1879-1778
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The aim of the paper is to improve parallel algorithms that obtain higher precision in floating point reduction-type operations while working within the basic floating point type. The compensated parallel variants of summation and dot product operations for floating point vectors are considered (level 1 BLAS operations). The methods are based on the work of Rump, Ogita and Oishi. Parallel implementations in block and pairwise reduction variants are under consideration. Analytical error bounds are obtained for real- and complex-valued vectors that are represented by floating point numbers according to the IEEE 754 (IEC 60559) standard for all variants of parallel algorithms. The algorithms are written in C++ Compute Unified Device Architecture (CUDA) for Graphics Processing Units (GPUs) and their accuracy is tested for different vector sizes and different condition numbers. The suggested compensated variant is compared to the multiple-precision library for GPUs in terms of efficiency. The designed algorithms are tested in Krylov-type matrix-based methods with preconditioners that originate from different challenging computational problems. It is shown that the compensated variant of algorithms allows one to accelerate convergence and obtain more accurate results even when the matrix operations are in base precision.
ISSN:	0377-0427 1879-1778
DOI:	10.1016/j.cam.2022.114434