Compensated summation and dot product algorithms for floating-point vectors on parallel architectures: Error bounds, implementation and application in the Krylov subspace methods
The aim of the paper is to improve parallel algorithms that obtain higher precision in floating point reduction-type operations while working within the basic floating point type. The compensated parallel variants of summation and dot product operations for floating point vectors are considered (lev...
Saved in:
| Published in: | Journal of computational and applied mathematics Vol. 414; p. 114434 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
01.11.2022
|
| Subjects: | |
| ISSN: | 0377-0427, 1879-1778 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The aim of the paper is to improve parallel algorithms that obtain higher precision in floating point reduction-type operations while working within the basic floating point type. The compensated parallel variants of summation and dot product operations for floating point vectors are considered (level 1 BLAS operations). The methods are based on the work of Rump, Ogita and Oishi. Parallel implementations in block and pairwise reduction variants are under consideration. Analytical error bounds are obtained for real- and complex-valued vectors that are represented by floating point numbers according to the IEEE 754 (IEC 60559) standard for all variants of parallel algorithms. The algorithms are written in C++ Compute Unified Device Architecture (CUDA) for Graphics Processing Units (GPUs) and their accuracy is tested for different vector sizes and different condition numbers. The suggested compensated variant is compared to the multiple-precision library for GPUs in terms of efficiency. The designed algorithms are tested in Krylov-type matrix-based methods with preconditioners that originate from different challenging computational problems. It is shown that the compensated variant of algorithms allows one to accelerate convergence and obtain more accurate results even when the matrix operations are in base precision. |
|---|---|
| AbstractList | The aim of the paper is to improve parallel algorithms that obtain higher precision in floating point reduction-type operations while working within the basic floating point type. The compensated parallel variants of summation and dot product operations for floating point vectors are considered (level 1 BLAS operations). The methods are based on the work of Rump, Ogita and Oishi. Parallel implementations in block and pairwise reduction variants are under consideration. Analytical error bounds are obtained for real- and complex-valued vectors that are represented by floating point numbers according to the IEEE 754 (IEC 60559) standard for all variants of parallel algorithms. The algorithms are written in C++ Compute Unified Device Architecture (CUDA) for Graphics Processing Units (GPUs) and their accuracy is tested for different vector sizes and different condition numbers. The suggested compensated variant is compared to the multiple-precision library for GPUs in terms of efficiency. The designed algorithms are tested in Krylov-type matrix-based methods with preconditioners that originate from different challenging computational problems. It is shown that the compensated variant of algorithms allows one to accelerate convergence and obtain more accurate results even when the matrix operations are in base precision. |
| ArticleNumber | 114434 |
| Author | Evstigneev, N.M. Petrovskiy, V.P. Teplyakov, I.O. Bocharov, A.N. Ryabkov, O.I. |
| Author_xml | – sequence: 1 givenname: N.M. orcidid: 0000-0002-8785-6762 surname: Evstigneev fullname: Evstigneev, N.M. email: evstigneevnm@gmail.com organization: Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, pr. 60-letiya Oktyabrya, Bldg 9, Moscow 117312, Russia – sequence: 2 givenname: O.I. surname: Ryabkov fullname: Ryabkov, O.I. organization: Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, pr. 60-letiya Oktyabrya, Bldg 9, Moscow 117312, Russia – sequence: 3 givenname: A.N. surname: Bocharov fullname: Bocharov, A.N. organization: Joint Institute for High Temperatures of the Russian Academy of Sciences, Izhorskaya 13 Bldg 2, Moscow 125412, Russia – sequence: 4 givenname: V.P. surname: Petrovskiy fullname: Petrovskiy, V.P. organization: Joint Institute for High Temperatures of the Russian Academy of Sciences, Izhorskaya 13 Bldg 2, Moscow 125412, Russia – sequence: 5 givenname: I.O. surname: Teplyakov fullname: Teplyakov, I.O. organization: Joint Institute for High Temperatures of the Russian Academy of Sciences, Izhorskaya 13 Bldg 2, Moscow 125412, Russia |
| BookMark | eNp9kM1q3DAQgEVJoZu0D9CbHiDeSrZs2c2pLOkPDfTSns1YGmW1yJKQtAt5rT5hFVwo9JDTMMx88_NdkysfPBLynrM9Z3z4cNorWPcta9s950J04hXZ8VFODZdyvCI71knZMNHKN-Q65xNjbJi42JHfh7BG9BkKaprP6wrFBk_Ba6pDoTEFfVaFgnsMyZbjmqkJiRoXap9_bGKwvtALqhJSphWMkMA5dBSSOtpSC-eE-SO9T6lySzh7nW-pXaPDFX35tw1idFZtufW0HJF-T08uXOpVS46gkK5YjkHnt-S1AZfx3d94Q359vv95-No8_Pjy7fDpoVHtJEtjgHU99qKFsUPeGtEp4D0zg5gUMo5GDT2A4aNiSvRs4h3AIqcexqXV_TJ0N4Rvc1UKOSc0c0x2hfQ0czY_S59Pc5U-P0ufN-mVkf8xym5PlgTWvUjebSTWly4W05yVRa9Q21QtzjrYF-g_z6-kOQ |
| CitedBy_id | crossref_primary_10_1016_j_advwatres_2022_104340 crossref_primary_10_3390_math13020270 crossref_primary_10_3390_math11183875 |
| Cites_doi | 10.1137/0709008 10.1145/2693714.2693726 10.1145/567806.567808 10.1016/j.jpdc.2020.02.006 10.1007/BF01397083 10.1016/j.cam.2019.112697 10.1109/MM.2008.31 10.1137/07068816X 10.1137/S1064827596314200 10.1137/19M1257780 10.1007/978-3-030-63393-6_4 10.1137/0914050 10.1145/362854.362889 10.1016/j.jcp.2019.109189 10.1109/TC.2016.2532874 10.1137/030601818 10.1137/050645671 10.1016/j.parco.2015.09.001 10.1109/TC.2007.70819 |
| ContentType | Journal Article |
| Copyright | 2022 Elsevier B.V. |
| Copyright_xml | – notice: 2022 Elsevier B.V. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.cam.2022.114434 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Mathematics |
| EISSN | 1879-1778 |
| ExternalDocumentID | 10_1016_j_cam_2022_114434 S0377042722002047 |
| GrantInformation_xml | – fundername: RFBR, Russia grantid: 20-07-00066a funderid: http://dx.doi.org/10.13039/501100002261 |
| GroupedDBID | --K --M -~X .~1 0R~ 1B1 1RT 1~. 1~5 29K 4.4 457 4G. 5GY 5VS 6I. 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAFTH AAFWJ AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO ABAOU ABEFU ABFNM ABJNI ABMAC ABTAH ABVKL ABXDB ABYKQ ACAZW ACDAQ ACGFS ACRLP ADBBV ADEZE ADMUD AEBSH AEKER AENEX AEXQZ AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AIEXJ AIGVJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ ARUGR ASPBG AVWKF AXJTR AZFZN BKOJK BLXMC CS3 D-I DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBLVA HVGLF HZ~ IHE IXB J1W KOM LG9 M26 M41 MHUIS MO0 N9A NCXOZ NHB O-L O9- OAUVE OK1 OZT P-8 P-9 P2P PC. Q38 R2- RIG RNS ROL RPZ SDF SDG SDP SES SEW SPC SPCBC SSW SSZ T5K TN5 UPT WUQ XPP YQT ZMT ZY4 ~02 ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO ADVLN AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD |
| ID | FETCH-LOGICAL-c297t-fa035e542a83e12f43ca150f649ce01efc65aaf18c0c450913aab795a8b2d5b63 |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000811831800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0377-0427 |
| IngestDate | Sat Nov 29 07:21:14 EST 2025 Tue Nov 18 22:43:07 EST 2025 Fri Feb 23 02:39:50 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Krylov subspace solvers Compensated algorithms Accurate dot product General purpose GPU Accurate summation |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c297t-fa035e542a83e12f43ca150f649ce01efc65aaf18c0c450913aab795a8b2d5b63 |
| ORCID | 0000-0002-8785-6762 |
| ParticipantIDs | crossref_primary_10_1016_j_cam_2022_114434 crossref_citationtrail_10_1016_j_cam_2022_114434 elsevier_sciencedirect_doi_10_1016_j_cam_2022_114434 |
| PublicationCentury | 2000 |
| PublicationDate | November 2022 2022-11-00 |
| PublicationDateYYYYMMDD | 2022-11-01 |
| PublicationDate_xml | – month: 11 year: 2022 text: November 2022 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of computational and applied mathematics |
| PublicationYear | 2022 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Muller, Brunie, de Dinechin, Jeannerod, Joldes, Lefvre, Melquiond, Revol, Torres (b5) 2018 Iakymchuk, Barreda, Wiesenberger, Aliaga, Quintana-Ortí (b34) 2020; 371 Bocharov, Evstigneev, Petrovskiy, Ryabkov, Teplyakov (b39) 2020; 406 Thall (b19) 2006 Mohan (b4) 2016 Kadric, Gurniak, DeHon (b16) 2016; 65 Malcolm (b8) 1971; 14 Boldo, Melquiond (b37) 2008; 57 Lindquist, Luszczek, Dongarra (b30) 2020 Li, Demmel, Bailey, Henry, Hida, Iskandar, Kahan, Kang, Kapur, Martin, Thompson, Tung, Yoo (b2) 2002; 28 Graça, Defour (b20) 2006 Nakayama, Takahashi (b24) 2011 Rump, Ogita, Oishi (b14) 2009; 31 Zemke (b28) 2003 Lindholm, Nickolls, Oberman, Montrym (b38) 2008; 28 (b1) 2008 Anderson (b13) 1999; 20 Bohlender (b10) 1975 Kahan (b9) 1973 Anzt, Heuveline, Rocker (b29) 2012 Ogita, Rump, Oishi (b11) 2005; 26 Goodrich, Eldawy (b15) 2016 Collange, Defour, Graillat, Iakymchuk (b25) 2015; 49 Du, Barrio, Jiang, Cheng (b33) 2017; 309 Blanchard, Higham, Mary (b18) 2020; 42 Joldes, Popescu, Tucker (b32) 2014; 42 Rump, Ogita, Oishi (b12) 2008; 31 Isupov, Knyazkov, Kuvaev (b23) 2020; 140 Higham (b3) 2002 Babuška (b7) 1972; 9 Mukunoki, Takahashi (b21) 2012 Mukunoki, Takahashi (b31) 2014 Dekker (b6) 1971; 18 Elrod, Févotte (b17) 2019 Isupov, Kuvaev (b22) 2018 Higham (b36) 1993; 14 Mukunoki, Ogita, Ozaki (b27) 2020 Chohra, Langlois, Parello (b26) 2017 Mukunoki, Ozaki, Ogita, Iakymchuk (b35) 2021 Babuška (10.1016/j.cam.2022.114434_b7) 1972; 9 Anzt (10.1016/j.cam.2022.114434_b29) 2012 Malcolm (10.1016/j.cam.2022.114434_b8) 1971; 14 Collange (10.1016/j.cam.2022.114434_b25) 2015; 49 Mukunoki (10.1016/j.cam.2022.114434_b35) 2021 Chohra (10.1016/j.cam.2022.114434_b26) 2017 Dekker (10.1016/j.cam.2022.114434_b6) 1971; 18 Anderson (10.1016/j.cam.2022.114434_b13) 1999; 20 (10.1016/j.cam.2022.114434_b1) 2008 Rump (10.1016/j.cam.2022.114434_b14) 2009; 31 Mukunoki (10.1016/j.cam.2022.114434_b21) 2012 Mohan (10.1016/j.cam.2022.114434_b4) 2016 Mukunoki (10.1016/j.cam.2022.114434_b31) 2014 Graça (10.1016/j.cam.2022.114434_b20) 2006 Higham (10.1016/j.cam.2022.114434_b3) 2002 Goodrich (10.1016/j.cam.2022.114434_b15) 2016 Thall (10.1016/j.cam.2022.114434_b19) 2006 Isupov (10.1016/j.cam.2022.114434_b23) 2020; 140 Blanchard (10.1016/j.cam.2022.114434_b18) 2020; 42 Lindholm (10.1016/j.cam.2022.114434_b38) 2008; 28 Zemke (10.1016/j.cam.2022.114434_b28) 2003 Higham (10.1016/j.cam.2022.114434_b36) 1993; 14 Boldo (10.1016/j.cam.2022.114434_b37) 2008; 57 Rump (10.1016/j.cam.2022.114434_b12) 2008; 31 Du (10.1016/j.cam.2022.114434_b33) 2017; 309 Ogita (10.1016/j.cam.2022.114434_b11) 2005; 26 Nakayama (10.1016/j.cam.2022.114434_b24) 2011 Bocharov (10.1016/j.cam.2022.114434_b39) 2020; 406 Isupov (10.1016/j.cam.2022.114434_b22) 2018 Joldes (10.1016/j.cam.2022.114434_b32) 2014; 42 Li (10.1016/j.cam.2022.114434_b2) 2002; 28 Iakymchuk (10.1016/j.cam.2022.114434_b34) 2020; 371 Kahan (10.1016/j.cam.2022.114434_b9) 1973 Muller (10.1016/j.cam.2022.114434_b5) 2018 Bohlender (10.1016/j.cam.2022.114434_b10) 1975 Mukunoki (10.1016/j.cam.2022.114434_b27) 2020 Elrod (10.1016/j.cam.2022.114434_b17) 2019 Kadric (10.1016/j.cam.2022.114434_b16) 2016; 65 Lindquist (10.1016/j.cam.2022.114434_b30) 2020 |
| References_xml | – volume: 31 start-page: 1269 year: 2009 end-page: 1302 ident: b14 article-title: Accurate floating-point summation part II: Sign, K-fold faithful and rounding to nearest publication-title: SIAM J. Sci. Comput. – volume: 20 start-page: 1797 year: 1999 end-page: 1806 ident: b13 article-title: A distillation algorithm for floating-point summation publication-title: SIAM J. Sci. Comput. – year: 2016 ident: b15 article-title: Parallel algorithms for summing floating-point numbers publication-title: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures – year: 2003 ident: b28 article-title: Krylov Subspace Methods in Finite Precision : A Unified Approach – year: 2002 ident: b3 article-title: Accuracy and Stability of Numerical Algorithms – volume: 14 start-page: 783 year: 1993 end-page: 799 ident: b36 article-title: The accuracy of floating point summation publication-title: SIAM J. Sci. Comput. – volume: 28 start-page: 39 year: 2008 end-page: 55 ident: b38 article-title: NVIDIA tesla: A unified graphics and computing architecture publication-title: IEEE Micro. – volume: 140 start-page: 25 year: 2020 end-page: 36 ident: b23 article-title: Design and implementation of multiple-precision BLAS Level 1 functions for graphics processing units publication-title: J. Parallel Distrib. Comput. – volume: 9 start-page: 53 year: 1972 end-page: 77 ident: b7 article-title: Numerical stability in problems of linear algebra publication-title: SIAM J. Numer. Anal. – start-page: 1 year: 1973 end-page: 184 ident: b9 article-title: Implementation of algorithms (lecture notes by W. S. Haugeland and D. Hough) – start-page: 516 year: 2020 end-page: 527 ident: b27 article-title: Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures publication-title: Parallel Processing and Applied Mathematics – year: 2006 ident: b19 article-title: Extended-precision floating-point numbers for GPU computation publication-title: ACM SIGGRAPH 2006 Research Posters on - SIGGRAPH’06 – volume: 49 start-page: 83 year: 2015 end-page: 97 ident: b25 article-title: Numerical reproducibility for the parallel reduction on multi- and many-core architectures publication-title: Parallel Comput. – volume: 309 start-page: 245 year: 2017 end-page: 271 ident: b33 article-title: Accurate quotient-difference algorithm: Error analysis, improvements and applications publication-title: Appl. Math. Comput. – start-page: 23 year: 2006 end-page: 32 ident: b20 article-title: Implementation of float-float operators on graphics hardware publication-title: In 7th Conference on Real Numbers and Computers, RNC7 – year: 2018 ident: b5 article-title: Handbook of Floating-Point Arithmetic – volume: 42 start-page: A1541 year: 2020 end-page: A1557 ident: b18 article-title: A class of fast and accurate summation algorithms publication-title: SIAM J. Sci. Comput. – volume: 406 year: 2020 ident: b39 article-title: Implicit method for the solution of supersonic and hypersonic 3D flow problems with Lower-Upper Symmetric-Gauss-seidel preconditioner on multiple graphics processing units publication-title: J. Comput. Phys. – year: 2016 ident: b4 article-title: Residue Number Systems – volume: 65 start-page: 3224 year: 2016 end-page: 3238 ident: b16 article-title: Accurate parallel floating-point accumulation publication-title: IEEE Trans. Comput. – volume: 14 start-page: 731 year: 1971 end-page: 736 ident: b8 article-title: On accurate floating-point summation publication-title: Commun. ACM – start-page: 609 year: 2017 end-page: 620 ident: b26 article-title: Reproducible, accurately rounded and efficient BLAS publication-title: Euro-Par 2016: Parallel Processing Workshops – volume: 371 year: 2020 ident: b34 article-title: Reproducibility strategies for parallel preconditioned conjugate gradient publication-title: J. Comput. Appl. Math. – volume: 26 start-page: 1955 year: 2005 end-page: 1988 ident: b11 article-title: Accurate sum and dot product publication-title: SIAM J. Sci. Comput. – start-page: 632 year: 2014 end-page: 642 ident: b31 article-title: Using quadruple precision arithmetic to accelerate krylov subspace methods on GPUs publication-title: Parallel Processing and Applied Mathematics – start-page: 249 year: 2012 end-page: 259 ident: b21 article-title: Implementation and evaluation of quadruple precision BLAS functions on GPUs publication-title: Applied Parallel and Scientific Computing – volume: 18 start-page: 224 year: 1971 end-page: 242 ident: b6 article-title: A floating-point technique for extending the available precision publication-title: Numer. Math. – start-page: 51 year: 2020 end-page: 66 ident: b30 article-title: Improving the performance of the GMRES method using mixed-precision techniques publication-title: Communications in Computer and Information Science – start-page: 237 year: 2012 end-page: 247 ident: b29 article-title: Mixed precision iterative refinement methods for linear systems: Convergence analysis based on Krylov subspace methods publication-title: Applied Parallel and Scientific Computing – year: 2008 ident: b1 article-title: IEEE standard for floating-point arithmetic – volume: 42 start-page: 63 year: 2014 end-page: 68 ident: b32 article-title: Searching for sinks for the hénon map using a multipleprecision GPU arithmetic library publication-title: ACM SIGARCH Comput. Archit. News – start-page: 621 year: 1975 end-page: 632 ident: b10 article-title: Floating-point computation of functions with maximum accuracy publication-title: 1975 IEEE 3rd Symposium on Computer Arithmetic (ARITH) – year: 2011 ident: b24 article-title: Implementation of multiple-precision floating-point arithmetic library for GPU computing publication-title: Parallel and Distributed Computing and Systems – year: 2021 ident: b35 article-title: Conjugate gradient solvers with high accuracy and bit-wise reproducibility between CPU and GPU using ozaki scheme publication-title: The International Conference on High Performance Computing in Asia-Pacific Region – volume: 57 start-page: 462 year: 2008 end-page: 471 ident: b37 article-title: Emulation of a FMA and correctly rounded sums: Proved algorithms using rounding to odd publication-title: IEEE Trans. Comput. – year: 2019 ident: b17 article-title: Accurate and efficiently vectorized sums and dot products in Julia, version submitted to the correctness2019 workshop – volume: 28 start-page: 152 year: 2002 end-page: 205 ident: b2 article-title: Design, implementation and testing of extended and mixed precision BLAS publication-title: ACM Trans. Math. Softw. – volume: 31 start-page: 189 year: 2008 end-page: 224 ident: b12 article-title: Accurate floating-point summation part I: Faithful rounding publication-title: SIAM J. Sci. Comput. – year: 2018 ident: b22 article-title: Multiple-precision summation on hybrid CPU-GPU platforms using RNS-based floating-point representation publication-title: 2018 Engineering and Telecommunication (EnT-MIPT) – volume: 9 start-page: 53 issue: 1 year: 1972 ident: 10.1016/j.cam.2022.114434_b7 article-title: Numerical stability in problems of linear algebra publication-title: SIAM J. Numer. Anal. doi: 10.1137/0709008 – start-page: 249 year: 2012 ident: 10.1016/j.cam.2022.114434_b21 article-title: Implementation and evaluation of quadruple precision BLAS functions on GPUs – year: 2018 ident: 10.1016/j.cam.2022.114434_b22 article-title: Multiple-precision summation on hybrid CPU-GPU platforms using RNS-based floating-point representation – year: 2008 ident: 10.1016/j.cam.2022.114434_b1 – year: 2011 ident: 10.1016/j.cam.2022.114434_b24 article-title: Implementation of multiple-precision floating-point arithmetic library for GPU computing – year: 2016 ident: 10.1016/j.cam.2022.114434_b15 article-title: Parallel algorithms for summing floating-point numbers – year: 2002 ident: 10.1016/j.cam.2022.114434_b3 – volume: 42 start-page: 63 issue: 4 year: 2014 ident: 10.1016/j.cam.2022.114434_b32 article-title: Searching for sinks for the hénon map using a multipleprecision GPU arithmetic library publication-title: ACM SIGARCH Comput. Archit. News doi: 10.1145/2693714.2693726 – year: 2019 ident: 10.1016/j.cam.2022.114434_b17 – volume: 28 start-page: 152 issue: 2 year: 2002 ident: 10.1016/j.cam.2022.114434_b2 article-title: Design, implementation and testing of extended and mixed precision BLAS publication-title: ACM Trans. Math. Softw. doi: 10.1145/567806.567808 – year: 2003 ident: 10.1016/j.cam.2022.114434_b28 – year: 2018 ident: 10.1016/j.cam.2022.114434_b5 – volume: 140 start-page: 25 year: 2020 ident: 10.1016/j.cam.2022.114434_b23 article-title: Design and implementation of multiple-precision BLAS Level 1 functions for graphics processing units publication-title: J. Parallel Distrib. Comput. doi: 10.1016/j.jpdc.2020.02.006 – start-page: 1 year: 1973 ident: 10.1016/j.cam.2022.114434_b9 – volume: 18 start-page: 224 issue: 3 year: 1971 ident: 10.1016/j.cam.2022.114434_b6 article-title: A floating-point technique for extending the available precision publication-title: Numer. Math. doi: 10.1007/BF01397083 – volume: 371 year: 2020 ident: 10.1016/j.cam.2022.114434_b34 article-title: Reproducibility strategies for parallel preconditioned conjugate gradient publication-title: J. Comput. Appl. Math. doi: 10.1016/j.cam.2019.112697 – volume: 28 start-page: 39 issue: 2 year: 2008 ident: 10.1016/j.cam.2022.114434_b38 article-title: NVIDIA tesla: A unified graphics and computing architecture publication-title: IEEE Micro. doi: 10.1109/MM.2008.31 – start-page: 516 year: 2020 ident: 10.1016/j.cam.2022.114434_b27 article-title: Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures – start-page: 23 year: 2006 ident: 10.1016/j.cam.2022.114434_b20 article-title: Implementation of float-float operators on graphics hardware – volume: 31 start-page: 1269 issue: 2 year: 2009 ident: 10.1016/j.cam.2022.114434_b14 article-title: Accurate floating-point summation part II: Sign, K-fold faithful and rounding to nearest publication-title: SIAM J. Sci. Comput. doi: 10.1137/07068816X – start-page: 632 year: 2014 ident: 10.1016/j.cam.2022.114434_b31 article-title: Using quadruple precision arithmetic to accelerate krylov subspace methods on GPUs – volume: 20 start-page: 1797 issue: 5 year: 1999 ident: 10.1016/j.cam.2022.114434_b13 article-title: A distillation algorithm for floating-point summation publication-title: SIAM J. Sci. Comput. doi: 10.1137/S1064827596314200 – volume: 42 start-page: A1541 issue: 3 year: 2020 ident: 10.1016/j.cam.2022.114434_b18 article-title: A class of fast and accurate summation algorithms publication-title: SIAM J. Sci. Comput. doi: 10.1137/19M1257780 – start-page: 51 year: 2020 ident: 10.1016/j.cam.2022.114434_b30 article-title: Improving the performance of the GMRES method using mixed-precision techniques doi: 10.1007/978-3-030-63393-6_4 – volume: 309 start-page: 245 year: 2017 ident: 10.1016/j.cam.2022.114434_b33 article-title: Accurate quotient-difference algorithm: Error analysis, improvements and applications publication-title: Appl. Math. Comput. – volume: 14 start-page: 783 issue: 4 year: 1993 ident: 10.1016/j.cam.2022.114434_b36 article-title: The accuracy of floating point summation publication-title: SIAM J. Sci. Comput. doi: 10.1137/0914050 – volume: 14 start-page: 731 issue: 11 year: 1971 ident: 10.1016/j.cam.2022.114434_b8 article-title: On accurate floating-point summation publication-title: Commun. ACM doi: 10.1145/362854.362889 – start-page: 609 year: 2017 ident: 10.1016/j.cam.2022.114434_b26 article-title: Reproducible, accurately rounded and efficient BLAS – volume: 406 year: 2020 ident: 10.1016/j.cam.2022.114434_b39 article-title: Implicit method for the solution of supersonic and hypersonic 3D flow problems with Lower-Upper Symmetric-Gauss-seidel preconditioner on multiple graphics processing units publication-title: J. Comput. Phys. doi: 10.1016/j.jcp.2019.109189 – year: 2021 ident: 10.1016/j.cam.2022.114434_b35 article-title: Conjugate gradient solvers with high accuracy and bit-wise reproducibility between CPU and GPU using ozaki scheme – volume: 65 start-page: 3224 issue: 11 year: 2016 ident: 10.1016/j.cam.2022.114434_b16 article-title: Accurate parallel floating-point accumulation publication-title: IEEE Trans. Comput. doi: 10.1109/TC.2016.2532874 – volume: 26 start-page: 1955 issue: 6 year: 2005 ident: 10.1016/j.cam.2022.114434_b11 article-title: Accurate sum and dot product publication-title: SIAM J. Sci. Comput. doi: 10.1137/030601818 – start-page: 621 year: 1975 ident: 10.1016/j.cam.2022.114434_b10 article-title: Floating-point computation of functions with maximum accuracy – volume: 31 start-page: 189 issue: 1 year: 2008 ident: 10.1016/j.cam.2022.114434_b12 article-title: Accurate floating-point summation part I: Faithful rounding publication-title: SIAM J. Sci. Comput. doi: 10.1137/050645671 – year: 2006 ident: 10.1016/j.cam.2022.114434_b19 article-title: Extended-precision floating-point numbers for GPU computation – start-page: 237 year: 2012 ident: 10.1016/j.cam.2022.114434_b29 article-title: Mixed precision iterative refinement methods for linear systems: Convergence analysis based on Krylov subspace methods – year: 2016 ident: 10.1016/j.cam.2022.114434_b4 – volume: 49 start-page: 83 year: 2015 ident: 10.1016/j.cam.2022.114434_b25 article-title: Numerical reproducibility for the parallel reduction on multi- and many-core architectures publication-title: Parallel Comput. doi: 10.1016/j.parco.2015.09.001 – volume: 57 start-page: 462 issue: 4 year: 2008 ident: 10.1016/j.cam.2022.114434_b37 article-title: Emulation of a FMA and correctly rounded sums: Proved algorithms using rounding to odd publication-title: IEEE Trans. Comput. doi: 10.1109/TC.2007.70819 |
| SSID | ssj0006914 |
| Score | 2.3877025 |
| Snippet | The aim of the paper is to improve parallel algorithms that obtain higher precision in floating point reduction-type operations while working within the basic... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 114434 |
| SubjectTerms | Accurate dot product Accurate summation Compensated algorithms General purpose GPU Krylov subspace solvers |
| Title | Compensated summation and dot product algorithms for floating-point vectors on parallel architectures: Error bounds, implementation and application in the Krylov subspace methods |
| URI | https://dx.doi.org/10.1016/j.cam.2022.114434 |
| Volume | 414 |
| WOSCitedRecordID | wos000811831800008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: ScienceDirect database customDbUrl: eissn: 1879-1778 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006914 issn: 0377-0427 databaseCode: AIEXJ dateStart: 20211214 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFLbKxgM8IK5i3OQHniiJUudih7eCihhoZRID9S1yXAeyZUmVhmr7W_wqfgbHdpwmjCFA4iWqjuzYzffZPrbPBaGn2ZJ4VMS-I5hMnYBT32FLlc3dIzKD4RQKn-tkE3Q-Z4tFfDgafbe-MJuCliU7O4tX_xVqkAHYynX2L-DuXgoC-A2gwxNgh-cfAa9GOOxNuVIltWtaYy2OYQOq7LFUhNcxLz5Xdd58MfEYxllRcWUA7ayqvGzGG2my8FQq8Gqt0q0U4_6Vg7ajm9W1MvVUeZk0GfJTa4vetdi7HrcGle_q86LaQM9SmMtgUjEprNeXKMlCJ52wB5bdK-GvnXbxZnu7AhUxpJTwxbQP2YHbXSed8_Sk0uL37n4nflkppzMjn7pzd7tONCBdn-SagZ_cQ7d_OAL76sngcOSi147xFKPUUTlGzBpoJn5GY2dCTTohuzIExr_1wipjDjyOXcFVLANCVMDloD2THQbv_qDaUk0Rot2Q6RW0S2gYwxKyO92fLd52WkMUmzj0tm_2Bl7bIv7U0K91qJ5edHQT3WixwlNDxFtoJMvb6PrBFp076FuPkrijJAY0MVASt5TEW0pioCQeUhK3lMRQ0VISDyj5AmtCYkPI53hIR91aj444LzF0ERs6YktH3NLxLvr4enb06o3T5gpxBIlp42Tc80MZBoQzX05IFviCw14ni4JYSG8iMxGFnGcTJjwRKCXZ5zylcchZSpZhGvn30E5ZlfI-woKFMK8JHhEoSjljYZpCaUqXhMde6u0hz379RLSB9FU-lyKxFpPHCQCWKMASA9geetZVWZkoMr8rHFhIk1YNNuptAvy7vNqDf6v2EF3bDpxHaKepv8rH6KrYNPm6ftKy9Aer5-HZ |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Compensated+summation+and+dot+product+algorithms+for+floating-point+vectors+on+parallel+architectures%3A+Error+bounds%2C+implementation+and+application+in+the+Krylov+subspace+methods&rft.jtitle=Journal+of+computational+and+applied+mathematics&rft.au=Evstigneev%2C+N.M.&rft.au=Ryabkov%2C+O.I.&rft.au=Bocharov%2C+A.N.&rft.au=Petrovskiy%2C+V.P.&rft.date=2022-11-01&rft.pub=Elsevier+B.V&rft.issn=0377-0427&rft.eissn=1879-1778&rft.volume=414&rft_id=info:doi/10.1016%2Fj.cam.2022.114434&rft.externalDocID=S0377042722002047 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0377-0427&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0377-0427&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0377-0427&client=summon |