Compensated summation and dot product algorithms for floating-point vectors on parallel architectures: Error bounds, implementation and application in the Krylov subspace methods

The aim of the paper is to improve parallel algorithms that obtain higher precision in floating point reduction-type operations while working within the basic floating point type. The compensated parallel variants of summation and dot product operations for floating point vectors are considered (lev...

Full description

Saved in:
Bibliographic Details
Published in:Journal of computational and applied mathematics Vol. 414; p. 114434
Main Authors: Evstigneev, N.M., Ryabkov, O.I., Bocharov, A.N., Petrovskiy, V.P., Teplyakov, I.O.
Format: Journal Article
Language:English
Published: Elsevier B.V 01.11.2022
Subjects:
ISSN:0377-0427, 1879-1778
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The aim of the paper is to improve parallel algorithms that obtain higher precision in floating point reduction-type operations while working within the basic floating point type. The compensated parallel variants of summation and dot product operations for floating point vectors are considered (level 1 BLAS operations). The methods are based on the work of Rump, Ogita and Oishi. Parallel implementations in block and pairwise reduction variants are under consideration. Analytical error bounds are obtained for real- and complex-valued vectors that are represented by floating point numbers according to the IEEE 754 (IEC 60559) standard for all variants of parallel algorithms. The algorithms are written in C++ Compute Unified Device Architecture (CUDA) for Graphics Processing Units (GPUs) and their accuracy is tested for different vector sizes and different condition numbers. The suggested compensated variant is compared to the multiple-precision library for GPUs in terms of efficiency. The designed algorithms are tested in Krylov-type matrix-based methods with preconditioners that originate from different challenging computational problems. It is shown that the compensated variant of algorithms allows one to accelerate convergence and obtain more accurate results even when the matrix operations are in base precision.
AbstractList The aim of the paper is to improve parallel algorithms that obtain higher precision in floating point reduction-type operations while working within the basic floating point type. The compensated parallel variants of summation and dot product operations for floating point vectors are considered (level 1 BLAS operations). The methods are based on the work of Rump, Ogita and Oishi. Parallel implementations in block and pairwise reduction variants are under consideration. Analytical error bounds are obtained for real- and complex-valued vectors that are represented by floating point numbers according to the IEEE 754 (IEC 60559) standard for all variants of parallel algorithms. The algorithms are written in C++ Compute Unified Device Architecture (CUDA) for Graphics Processing Units (GPUs) and their accuracy is tested for different vector sizes and different condition numbers. The suggested compensated variant is compared to the multiple-precision library for GPUs in terms of efficiency. The designed algorithms are tested in Krylov-type matrix-based methods with preconditioners that originate from different challenging computational problems. It is shown that the compensated variant of algorithms allows one to accelerate convergence and obtain more accurate results even when the matrix operations are in base precision.
ArticleNumber 114434
Author Evstigneev, N.M.
Petrovskiy, V.P.
Teplyakov, I.O.
Bocharov, A.N.
Ryabkov, O.I.
Author_xml – sequence: 1
  givenname: N.M.
  orcidid: 0000-0002-8785-6762
  surname: Evstigneev
  fullname: Evstigneev, N.M.
  email: evstigneevnm@gmail.com
  organization: Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, pr. 60-letiya Oktyabrya, Bldg 9, Moscow 117312, Russia
– sequence: 2
  givenname: O.I.
  surname: Ryabkov
  fullname: Ryabkov, O.I.
  organization: Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, pr. 60-letiya Oktyabrya, Bldg 9, Moscow 117312, Russia
– sequence: 3
  givenname: A.N.
  surname: Bocharov
  fullname: Bocharov, A.N.
  organization: Joint Institute for High Temperatures of the Russian Academy of Sciences, Izhorskaya 13 Bldg 2, Moscow 125412, Russia
– sequence: 4
  givenname: V.P.
  surname: Petrovskiy
  fullname: Petrovskiy, V.P.
  organization: Joint Institute for High Temperatures of the Russian Academy of Sciences, Izhorskaya 13 Bldg 2, Moscow 125412, Russia
– sequence: 5
  givenname: I.O.
  surname: Teplyakov
  fullname: Teplyakov, I.O.
  organization: Joint Institute for High Temperatures of the Russian Academy of Sciences, Izhorskaya 13 Bldg 2, Moscow 125412, Russia
BookMark eNp9kM1q3DAQgEVJoZu0D9CbHiDeSrZs2c2pLOkPDfTSns1YGmW1yJKQtAt5rT5hFVwo9JDTMMx88_NdkysfPBLynrM9Z3z4cNorWPcta9s950J04hXZ8VFODZdyvCI71knZMNHKN-Q65xNjbJi42JHfh7BG9BkKaprP6wrFBk_Ba6pDoTEFfVaFgnsMyZbjmqkJiRoXap9_bGKwvtALqhJSphWMkMA5dBSSOtpSC-eE-SO9T6lySzh7nW-pXaPDFX35tw1idFZtufW0HJF-T08uXOpVS46gkK5YjkHnt-S1AZfx3d94Q359vv95-No8_Pjy7fDpoVHtJEtjgHU99qKFsUPeGtEp4D0zg5gUMo5GDT2A4aNiSvRs4h3AIqcexqXV_TJ0N4Rvc1UKOSc0c0x2hfQ0czY_S59Pc5U-P0ufN-mVkf8xym5PlgTWvUjebSTWly4W05yVRa9Q21QtzjrYF-g_z6-kOQ
CitedBy_id crossref_primary_10_1016_j_advwatres_2022_104340
crossref_primary_10_3390_math13020270
crossref_primary_10_3390_math11183875
Cites_doi 10.1137/0709008
10.1145/2693714.2693726
10.1145/567806.567808
10.1016/j.jpdc.2020.02.006
10.1007/BF01397083
10.1016/j.cam.2019.112697
10.1109/MM.2008.31
10.1137/07068816X
10.1137/S1064827596314200
10.1137/19M1257780
10.1007/978-3-030-63393-6_4
10.1137/0914050
10.1145/362854.362889
10.1016/j.jcp.2019.109189
10.1109/TC.2016.2532874
10.1137/030601818
10.1137/050645671
10.1016/j.parco.2015.09.001
10.1109/TC.2007.70819
ContentType Journal Article
Copyright 2022 Elsevier B.V.
Copyright_xml – notice: 2022 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.cam.2022.114434
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
EISSN 1879-1778
ExternalDocumentID 10_1016_j_cam_2022_114434
S0377042722002047
GrantInformation_xml – fundername: RFBR, Russia
  grantid: 20-07-00066a
  funderid: http://dx.doi.org/10.13039/501100002261
GroupedDBID --K
--M
-~X
.~1
0R~
1B1
1RT
1~.
1~5
29K
4.4
457
4G.
5GY
5VS
6I.
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAFTH
AAFWJ
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
ABAOU
ABEFU
ABFNM
ABJNI
ABMAC
ABTAH
ABVKL
ABXDB
ABYKQ
ACAZW
ACDAQ
ACGFS
ACRLP
ADBBV
ADEZE
ADMUD
AEBSH
AEKER
AENEX
AEXQZ
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AIEXJ
AIGVJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
ARUGR
ASPBG
AVWKF
AXJTR
AZFZN
BKOJK
BLXMC
CS3
D-I
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
G8K
GBLVA
HVGLF
HZ~
IHE
IXB
J1W
KOM
LG9
M26
M41
MHUIS
MO0
N9A
NCXOZ
NHB
O-L
O9-
OAUVE
OK1
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
RNS
ROL
RPZ
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSW
SSZ
T5K
TN5
UPT
WUQ
XPP
YQT
ZMT
ZY4
~02
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c297t-fa035e542a83e12f43ca150f649ce01efc65aaf18c0c450913aab795a8b2d5b63
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000811831800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0377-0427
IngestDate Sat Nov 29 07:21:14 EST 2025
Tue Nov 18 22:43:07 EST 2025
Fri Feb 23 02:39:50 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Krylov subspace solvers
Compensated algorithms
Accurate dot product
General purpose GPU
Accurate summation
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c297t-fa035e542a83e12f43ca150f649ce01efc65aaf18c0c450913aab795a8b2d5b63
ORCID 0000-0002-8785-6762
ParticipantIDs crossref_primary_10_1016_j_cam_2022_114434
crossref_citationtrail_10_1016_j_cam_2022_114434
elsevier_sciencedirect_doi_10_1016_j_cam_2022_114434
PublicationCentury 2000
PublicationDate November 2022
2022-11-00
PublicationDateYYYYMMDD 2022-11-01
PublicationDate_xml – month: 11
  year: 2022
  text: November 2022
PublicationDecade 2020
PublicationTitle Journal of computational and applied mathematics
PublicationYear 2022
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Muller, Brunie, de Dinechin, Jeannerod, Joldes, Lefvre, Melquiond, Revol, Torres (b5) 2018
Iakymchuk, Barreda, Wiesenberger, Aliaga, Quintana-Ortí (b34) 2020; 371
Bocharov, Evstigneev, Petrovskiy, Ryabkov, Teplyakov (b39) 2020; 406
Thall (b19) 2006
Mohan (b4) 2016
Kadric, Gurniak, DeHon (b16) 2016; 65
Malcolm (b8) 1971; 14
Boldo, Melquiond (b37) 2008; 57
Lindquist, Luszczek, Dongarra (b30) 2020
Li, Demmel, Bailey, Henry, Hida, Iskandar, Kahan, Kang, Kapur, Martin, Thompson, Tung, Yoo (b2) 2002; 28
Graça, Defour (b20) 2006
Nakayama, Takahashi (b24) 2011
Rump, Ogita, Oishi (b14) 2009; 31
Zemke (b28) 2003
Lindholm, Nickolls, Oberman, Montrym (b38) 2008; 28
(b1) 2008
Anderson (b13) 1999; 20
Bohlender (b10) 1975
Kahan (b9) 1973
Anzt, Heuveline, Rocker (b29) 2012
Ogita, Rump, Oishi (b11) 2005; 26
Goodrich, Eldawy (b15) 2016
Collange, Defour, Graillat, Iakymchuk (b25) 2015; 49
Du, Barrio, Jiang, Cheng (b33) 2017; 309
Blanchard, Higham, Mary (b18) 2020; 42
Joldes, Popescu, Tucker (b32) 2014; 42
Rump, Ogita, Oishi (b12) 2008; 31
Isupov, Knyazkov, Kuvaev (b23) 2020; 140
Higham (b3) 2002
Babuška (b7) 1972; 9
Mukunoki, Takahashi (b21) 2012
Mukunoki, Takahashi (b31) 2014
Dekker (b6) 1971; 18
Elrod, Févotte (b17) 2019
Isupov, Kuvaev (b22) 2018
Higham (b36) 1993; 14
Mukunoki, Ogita, Ozaki (b27) 2020
Chohra, Langlois, Parello (b26) 2017
Mukunoki, Ozaki, Ogita, Iakymchuk (b35) 2021
Babuška (10.1016/j.cam.2022.114434_b7) 1972; 9
Anzt (10.1016/j.cam.2022.114434_b29) 2012
Malcolm (10.1016/j.cam.2022.114434_b8) 1971; 14
Collange (10.1016/j.cam.2022.114434_b25) 2015; 49
Mukunoki (10.1016/j.cam.2022.114434_b35) 2021
Chohra (10.1016/j.cam.2022.114434_b26) 2017
Dekker (10.1016/j.cam.2022.114434_b6) 1971; 18
Anderson (10.1016/j.cam.2022.114434_b13) 1999; 20
(10.1016/j.cam.2022.114434_b1) 2008
Rump (10.1016/j.cam.2022.114434_b14) 2009; 31
Mukunoki (10.1016/j.cam.2022.114434_b21) 2012
Mohan (10.1016/j.cam.2022.114434_b4) 2016
Mukunoki (10.1016/j.cam.2022.114434_b31) 2014
Graça (10.1016/j.cam.2022.114434_b20) 2006
Higham (10.1016/j.cam.2022.114434_b3) 2002
Goodrich (10.1016/j.cam.2022.114434_b15) 2016
Thall (10.1016/j.cam.2022.114434_b19) 2006
Isupov (10.1016/j.cam.2022.114434_b23) 2020; 140
Blanchard (10.1016/j.cam.2022.114434_b18) 2020; 42
Lindholm (10.1016/j.cam.2022.114434_b38) 2008; 28
Zemke (10.1016/j.cam.2022.114434_b28) 2003
Higham (10.1016/j.cam.2022.114434_b36) 1993; 14
Boldo (10.1016/j.cam.2022.114434_b37) 2008; 57
Rump (10.1016/j.cam.2022.114434_b12) 2008; 31
Du (10.1016/j.cam.2022.114434_b33) 2017; 309
Ogita (10.1016/j.cam.2022.114434_b11) 2005; 26
Nakayama (10.1016/j.cam.2022.114434_b24) 2011
Bocharov (10.1016/j.cam.2022.114434_b39) 2020; 406
Isupov (10.1016/j.cam.2022.114434_b22) 2018
Joldes (10.1016/j.cam.2022.114434_b32) 2014; 42
Li (10.1016/j.cam.2022.114434_b2) 2002; 28
Iakymchuk (10.1016/j.cam.2022.114434_b34) 2020; 371
Kahan (10.1016/j.cam.2022.114434_b9) 1973
Muller (10.1016/j.cam.2022.114434_b5) 2018
Bohlender (10.1016/j.cam.2022.114434_b10) 1975
Mukunoki (10.1016/j.cam.2022.114434_b27) 2020
Elrod (10.1016/j.cam.2022.114434_b17) 2019
Kadric (10.1016/j.cam.2022.114434_b16) 2016; 65
Lindquist (10.1016/j.cam.2022.114434_b30) 2020
References_xml – volume: 31
  start-page: 1269
  year: 2009
  end-page: 1302
  ident: b14
  article-title: Accurate floating-point summation part II: Sign, K-fold faithful and rounding to nearest
  publication-title: SIAM J. Sci. Comput.
– volume: 20
  start-page: 1797
  year: 1999
  end-page: 1806
  ident: b13
  article-title: A distillation algorithm for floating-point summation
  publication-title: SIAM J. Sci. Comput.
– year: 2016
  ident: b15
  article-title: Parallel algorithms for summing floating-point numbers
  publication-title: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures
– year: 2003
  ident: b28
  article-title: Krylov Subspace Methods in Finite Precision : A Unified Approach
– year: 2002
  ident: b3
  article-title: Accuracy and Stability of Numerical Algorithms
– volume: 14
  start-page: 783
  year: 1993
  end-page: 799
  ident: b36
  article-title: The accuracy of floating point summation
  publication-title: SIAM J. Sci. Comput.
– volume: 28
  start-page: 39
  year: 2008
  end-page: 55
  ident: b38
  article-title: NVIDIA tesla: A unified graphics and computing architecture
  publication-title: IEEE Micro.
– volume: 140
  start-page: 25
  year: 2020
  end-page: 36
  ident: b23
  article-title: Design and implementation of multiple-precision BLAS Level 1 functions for graphics processing units
  publication-title: J. Parallel Distrib. Comput.
– volume: 9
  start-page: 53
  year: 1972
  end-page: 77
  ident: b7
  article-title: Numerical stability in problems of linear algebra
  publication-title: SIAM J. Numer. Anal.
– start-page: 1
  year: 1973
  end-page: 184
  ident: b9
  article-title: Implementation of algorithms (lecture notes by W. S. Haugeland and D. Hough)
– start-page: 516
  year: 2020
  end-page: 527
  ident: b27
  article-title: Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures
  publication-title: Parallel Processing and Applied Mathematics
– year: 2006
  ident: b19
  article-title: Extended-precision floating-point numbers for GPU computation
  publication-title: ACM SIGGRAPH 2006 Research Posters on - SIGGRAPH’06
– volume: 49
  start-page: 83
  year: 2015
  end-page: 97
  ident: b25
  article-title: Numerical reproducibility for the parallel reduction on multi- and many-core architectures
  publication-title: Parallel Comput.
– volume: 309
  start-page: 245
  year: 2017
  end-page: 271
  ident: b33
  article-title: Accurate quotient-difference algorithm: Error analysis, improvements and applications
  publication-title: Appl. Math. Comput.
– start-page: 23
  year: 2006
  end-page: 32
  ident: b20
  article-title: Implementation of float-float operators on graphics hardware
  publication-title: In 7th Conference on Real Numbers and Computers, RNC7
– year: 2018
  ident: b5
  article-title: Handbook of Floating-Point Arithmetic
– volume: 42
  start-page: A1541
  year: 2020
  end-page: A1557
  ident: b18
  article-title: A class of fast and accurate summation algorithms
  publication-title: SIAM J. Sci. Comput.
– volume: 406
  year: 2020
  ident: b39
  article-title: Implicit method for the solution of supersonic and hypersonic 3D flow problems with Lower-Upper Symmetric-Gauss-seidel preconditioner on multiple graphics processing units
  publication-title: J. Comput. Phys.
– year: 2016
  ident: b4
  article-title: Residue Number Systems
– volume: 65
  start-page: 3224
  year: 2016
  end-page: 3238
  ident: b16
  article-title: Accurate parallel floating-point accumulation
  publication-title: IEEE Trans. Comput.
– volume: 14
  start-page: 731
  year: 1971
  end-page: 736
  ident: b8
  article-title: On accurate floating-point summation
  publication-title: Commun. ACM
– start-page: 609
  year: 2017
  end-page: 620
  ident: b26
  article-title: Reproducible, accurately rounded and efficient BLAS
  publication-title: Euro-Par 2016: Parallel Processing Workshops
– volume: 371
  year: 2020
  ident: b34
  article-title: Reproducibility strategies for parallel preconditioned conjugate gradient
  publication-title: J. Comput. Appl. Math.
– volume: 26
  start-page: 1955
  year: 2005
  end-page: 1988
  ident: b11
  article-title: Accurate sum and dot product
  publication-title: SIAM J. Sci. Comput.
– start-page: 632
  year: 2014
  end-page: 642
  ident: b31
  article-title: Using quadruple precision arithmetic to accelerate krylov subspace methods on GPUs
  publication-title: Parallel Processing and Applied Mathematics
– start-page: 249
  year: 2012
  end-page: 259
  ident: b21
  article-title: Implementation and evaluation of quadruple precision BLAS functions on GPUs
  publication-title: Applied Parallel and Scientific Computing
– volume: 18
  start-page: 224
  year: 1971
  end-page: 242
  ident: b6
  article-title: A floating-point technique for extending the available precision
  publication-title: Numer. Math.
– start-page: 51
  year: 2020
  end-page: 66
  ident: b30
  article-title: Improving the performance of the GMRES method using mixed-precision techniques
  publication-title: Communications in Computer and Information Science
– start-page: 237
  year: 2012
  end-page: 247
  ident: b29
  article-title: Mixed precision iterative refinement methods for linear systems: Convergence analysis based on Krylov subspace methods
  publication-title: Applied Parallel and Scientific Computing
– year: 2008
  ident: b1
  article-title: IEEE standard for floating-point arithmetic
– volume: 42
  start-page: 63
  year: 2014
  end-page: 68
  ident: b32
  article-title: Searching for sinks for the hénon map using a multipleprecision GPU arithmetic library
  publication-title: ACM SIGARCH Comput. Archit. News
– start-page: 621
  year: 1975
  end-page: 632
  ident: b10
  article-title: Floating-point computation of functions with maximum accuracy
  publication-title: 1975 IEEE 3rd Symposium on Computer Arithmetic (ARITH)
– year: 2011
  ident: b24
  article-title: Implementation of multiple-precision floating-point arithmetic library for GPU computing
  publication-title: Parallel and Distributed Computing and Systems
– year: 2021
  ident: b35
  article-title: Conjugate gradient solvers with high accuracy and bit-wise reproducibility between CPU and GPU using ozaki scheme
  publication-title: The International Conference on High Performance Computing in Asia-Pacific Region
– volume: 57
  start-page: 462
  year: 2008
  end-page: 471
  ident: b37
  article-title: Emulation of a FMA and correctly rounded sums: Proved algorithms using rounding to odd
  publication-title: IEEE Trans. Comput.
– year: 2019
  ident: b17
  article-title: Accurate and efficiently vectorized sums and dot products in Julia, version submitted to the correctness2019 workshop
– volume: 28
  start-page: 152
  year: 2002
  end-page: 205
  ident: b2
  article-title: Design, implementation and testing of extended and mixed precision BLAS
  publication-title: ACM Trans. Math. Softw.
– volume: 31
  start-page: 189
  year: 2008
  end-page: 224
  ident: b12
  article-title: Accurate floating-point summation part I: Faithful rounding
  publication-title: SIAM J. Sci. Comput.
– year: 2018
  ident: b22
  article-title: Multiple-precision summation on hybrid CPU-GPU platforms using RNS-based floating-point representation
  publication-title: 2018 Engineering and Telecommunication (EnT-MIPT)
– volume: 9
  start-page: 53
  issue: 1
  year: 1972
  ident: 10.1016/j.cam.2022.114434_b7
  article-title: Numerical stability in problems of linear algebra
  publication-title: SIAM J. Numer. Anal.
  doi: 10.1137/0709008
– start-page: 249
  year: 2012
  ident: 10.1016/j.cam.2022.114434_b21
  article-title: Implementation and evaluation of quadruple precision BLAS functions on GPUs
– year: 2018
  ident: 10.1016/j.cam.2022.114434_b22
  article-title: Multiple-precision summation on hybrid CPU-GPU platforms using RNS-based floating-point representation
– year: 2008
  ident: 10.1016/j.cam.2022.114434_b1
– year: 2011
  ident: 10.1016/j.cam.2022.114434_b24
  article-title: Implementation of multiple-precision floating-point arithmetic library for GPU computing
– year: 2016
  ident: 10.1016/j.cam.2022.114434_b15
  article-title: Parallel algorithms for summing floating-point numbers
– year: 2002
  ident: 10.1016/j.cam.2022.114434_b3
– volume: 42
  start-page: 63
  issue: 4
  year: 2014
  ident: 10.1016/j.cam.2022.114434_b32
  article-title: Searching for sinks for the hénon map using a multipleprecision GPU arithmetic library
  publication-title: ACM SIGARCH Comput. Archit. News
  doi: 10.1145/2693714.2693726
– year: 2019
  ident: 10.1016/j.cam.2022.114434_b17
– volume: 28
  start-page: 152
  issue: 2
  year: 2002
  ident: 10.1016/j.cam.2022.114434_b2
  article-title: Design, implementation and testing of extended and mixed precision BLAS
  publication-title: ACM Trans. Math. Softw.
  doi: 10.1145/567806.567808
– year: 2003
  ident: 10.1016/j.cam.2022.114434_b28
– year: 2018
  ident: 10.1016/j.cam.2022.114434_b5
– volume: 140
  start-page: 25
  year: 2020
  ident: 10.1016/j.cam.2022.114434_b23
  article-title: Design and implementation of multiple-precision BLAS Level 1 functions for graphics processing units
  publication-title: J. Parallel Distrib. Comput.
  doi: 10.1016/j.jpdc.2020.02.006
– start-page: 1
  year: 1973
  ident: 10.1016/j.cam.2022.114434_b9
– volume: 18
  start-page: 224
  issue: 3
  year: 1971
  ident: 10.1016/j.cam.2022.114434_b6
  article-title: A floating-point technique for extending the available precision
  publication-title: Numer. Math.
  doi: 10.1007/BF01397083
– volume: 371
  year: 2020
  ident: 10.1016/j.cam.2022.114434_b34
  article-title: Reproducibility strategies for parallel preconditioned conjugate gradient
  publication-title: J. Comput. Appl. Math.
  doi: 10.1016/j.cam.2019.112697
– volume: 28
  start-page: 39
  issue: 2
  year: 2008
  ident: 10.1016/j.cam.2022.114434_b38
  article-title: NVIDIA tesla: A unified graphics and computing architecture
  publication-title: IEEE Micro.
  doi: 10.1109/MM.2008.31
– start-page: 516
  year: 2020
  ident: 10.1016/j.cam.2022.114434_b27
  article-title: Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures
– start-page: 23
  year: 2006
  ident: 10.1016/j.cam.2022.114434_b20
  article-title: Implementation of float-float operators on graphics hardware
– volume: 31
  start-page: 1269
  issue: 2
  year: 2009
  ident: 10.1016/j.cam.2022.114434_b14
  article-title: Accurate floating-point summation part II: Sign, K-fold faithful and rounding to nearest
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/07068816X
– start-page: 632
  year: 2014
  ident: 10.1016/j.cam.2022.114434_b31
  article-title: Using quadruple precision arithmetic to accelerate krylov subspace methods on GPUs
– volume: 20
  start-page: 1797
  issue: 5
  year: 1999
  ident: 10.1016/j.cam.2022.114434_b13
  article-title: A distillation algorithm for floating-point summation
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/S1064827596314200
– volume: 42
  start-page: A1541
  issue: 3
  year: 2020
  ident: 10.1016/j.cam.2022.114434_b18
  article-title: A class of fast and accurate summation algorithms
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/19M1257780
– start-page: 51
  year: 2020
  ident: 10.1016/j.cam.2022.114434_b30
  article-title: Improving the performance of the GMRES method using mixed-precision techniques
  doi: 10.1007/978-3-030-63393-6_4
– volume: 309
  start-page: 245
  year: 2017
  ident: 10.1016/j.cam.2022.114434_b33
  article-title: Accurate quotient-difference algorithm: Error analysis, improvements and applications
  publication-title: Appl. Math. Comput.
– volume: 14
  start-page: 783
  issue: 4
  year: 1993
  ident: 10.1016/j.cam.2022.114434_b36
  article-title: The accuracy of floating point summation
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/0914050
– volume: 14
  start-page: 731
  issue: 11
  year: 1971
  ident: 10.1016/j.cam.2022.114434_b8
  article-title: On accurate floating-point summation
  publication-title: Commun. ACM
  doi: 10.1145/362854.362889
– start-page: 609
  year: 2017
  ident: 10.1016/j.cam.2022.114434_b26
  article-title: Reproducible, accurately rounded and efficient BLAS
– volume: 406
  year: 2020
  ident: 10.1016/j.cam.2022.114434_b39
  article-title: Implicit method for the solution of supersonic and hypersonic 3D flow problems with Lower-Upper Symmetric-Gauss-seidel preconditioner on multiple graphics processing units
  publication-title: J. Comput. Phys.
  doi: 10.1016/j.jcp.2019.109189
– year: 2021
  ident: 10.1016/j.cam.2022.114434_b35
  article-title: Conjugate gradient solvers with high accuracy and bit-wise reproducibility between CPU and GPU using ozaki scheme
– volume: 65
  start-page: 3224
  issue: 11
  year: 2016
  ident: 10.1016/j.cam.2022.114434_b16
  article-title: Accurate parallel floating-point accumulation
  publication-title: IEEE Trans. Comput.
  doi: 10.1109/TC.2016.2532874
– volume: 26
  start-page: 1955
  issue: 6
  year: 2005
  ident: 10.1016/j.cam.2022.114434_b11
  article-title: Accurate sum and dot product
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/030601818
– start-page: 621
  year: 1975
  ident: 10.1016/j.cam.2022.114434_b10
  article-title: Floating-point computation of functions with maximum accuracy
– volume: 31
  start-page: 189
  issue: 1
  year: 2008
  ident: 10.1016/j.cam.2022.114434_b12
  article-title: Accurate floating-point summation part I: Faithful rounding
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/050645671
– year: 2006
  ident: 10.1016/j.cam.2022.114434_b19
  article-title: Extended-precision floating-point numbers for GPU computation
– start-page: 237
  year: 2012
  ident: 10.1016/j.cam.2022.114434_b29
  article-title: Mixed precision iterative refinement methods for linear systems: Convergence analysis based on Krylov subspace methods
– year: 2016
  ident: 10.1016/j.cam.2022.114434_b4
– volume: 49
  start-page: 83
  year: 2015
  ident: 10.1016/j.cam.2022.114434_b25
  article-title: Numerical reproducibility for the parallel reduction on multi- and many-core architectures
  publication-title: Parallel Comput.
  doi: 10.1016/j.parco.2015.09.001
– volume: 57
  start-page: 462
  issue: 4
  year: 2008
  ident: 10.1016/j.cam.2022.114434_b37
  article-title: Emulation of a FMA and correctly rounded sums: Proved algorithms using rounding to odd
  publication-title: IEEE Trans. Comput.
  doi: 10.1109/TC.2007.70819
SSID ssj0006914
Score 2.3877025
Snippet The aim of the paper is to improve parallel algorithms that obtain higher precision in floating point reduction-type operations while working within the basic...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 114434
SubjectTerms Accurate dot product
Accurate summation
Compensated algorithms
General purpose GPU
Krylov subspace solvers
Title Compensated summation and dot product algorithms for floating-point vectors on parallel architectures: Error bounds, implementation and application in the Krylov subspace methods
URI https://dx.doi.org/10.1016/j.cam.2022.114434
Volume 414
WOSCitedRecordID wos000811831800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: ScienceDirect database
  customDbUrl:
  eissn: 1879-1778
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006914
  issn: 0377-0427
  databaseCode: AIEXJ
  dateStart: 20211214
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFLbKxgM8IK5i3OQHniiJUudih7eCihhoZRID9S1yXAeyZUmVhmr7W_wqfgbHdpwmjCFA4iWqjuzYzffZPrbPBaGn2ZJ4VMS-I5hMnYBT32FLlc3dIzKD4RQKn-tkE3Q-Z4tFfDgafbe-MJuCliU7O4tX_xVqkAHYynX2L-DuXgoC-A2gwxNgh-cfAa9GOOxNuVIltWtaYy2OYQOq7LFUhNcxLz5Xdd58MfEYxllRcWUA7ayqvGzGG2my8FQq8Gqt0q0U4_6Vg7ajm9W1MvVUeZk0GfJTa4vetdi7HrcGle_q86LaQM9SmMtgUjEprNeXKMlCJ52wB5bdK-GvnXbxZnu7AhUxpJTwxbQP2YHbXSed8_Sk0uL37n4nflkppzMjn7pzd7tONCBdn-SagZ_cQ7d_OAL76sngcOSi147xFKPUUTlGzBpoJn5GY2dCTTohuzIExr_1wipjDjyOXcFVLANCVMDloD2THQbv_qDaUk0Rot2Q6RW0S2gYwxKyO92fLd52WkMUmzj0tm_2Bl7bIv7U0K91qJ5edHQT3WixwlNDxFtoJMvb6PrBFp076FuPkrijJAY0MVASt5TEW0pioCQeUhK3lMRQ0VISDyj5AmtCYkPI53hIR91aj444LzF0ERs6YktH3NLxLvr4enb06o3T5gpxBIlp42Tc80MZBoQzX05IFviCw14ni4JYSG8iMxGFnGcTJjwRKCXZ5zylcchZSpZhGvn30E5ZlfI-woKFMK8JHhEoSjljYZpCaUqXhMde6u0hz379RLSB9FU-lyKxFpPHCQCWKMASA9geetZVWZkoMr8rHFhIk1YNNuptAvy7vNqDf6v2EF3bDpxHaKepv8rH6KrYNPm6ftKy9Aer5-HZ
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Compensated+summation+and+dot+product+algorithms+for+floating-point+vectors+on+parallel+architectures%3A+Error+bounds%2C+implementation+and+application+in+the+Krylov+subspace+methods&rft.jtitle=Journal+of+computational+and+applied+mathematics&rft.au=Evstigneev%2C+N.M.&rft.au=Ryabkov%2C+O.I.&rft.au=Bocharov%2C+A.N.&rft.au=Petrovskiy%2C+V.P.&rft.date=2022-11-01&rft.pub=Elsevier+B.V&rft.issn=0377-0427&rft.eissn=1879-1778&rft.volume=414&rft_id=info:doi/10.1016%2Fj.cam.2022.114434&rft.externalDocID=S0377042722002047
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0377-0427&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0377-0427&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0377-0427&client=summon