Accelerating scientific computations with mixed precision algorithms
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanc...
Saved in:
| Published in: | Computer physics communications Vol. 180; no. 12; pp. 2526 - 2533 |
|---|---|
| Main Authors: | , , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
01.12.2009
Elsevier |
| Subjects: | |
| ISSN: | 0010-4655, 1879-2944 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.
Program title: ITER-REF
Catalogue identifier: AECO_v1_0
Program summary URL:
http://cpc.cs.qub.ac.uk/summaries/AECO_v1_0.html
Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland
Licensing provisions: Standard CPC licence,
http://cpc.cs.qub.ac.uk/licence/licence.html
No. of lines in distributed program, including test data, etc.: 7211
No. of bytes in distributed program, including test data, etc.: 41 862
Distribution format: tar.gz
Programming language: FORTRAN 77
Computer: desktop, server
Operating system: Unix/Linux
RAM: 512 Mbytes
Classification: 4.8
External routines: BLAS (optional)
Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution.
Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix
A is factored into the product of a lower triangular matrix
L and an upper triangular matrix
U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization
P
A
=
L
U
, where
P is a permutation matrix. The solution for the system is achieved by first solving
L
y
=
P
b
(forward substitution) and then solving
U
x
=
y
(backward substitution). Due to round-off errors, the computed solution,
x, carries a numerical error magnified by the condition number of the coefficient matrix
A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision.
Running time: seconds/minutes |
|---|---|
| AbstractList | On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.
Program title: ITER-REF
Catalogue identifier: AECO_v1_0
Program summary URL:
http://cpc.cs.qub.ac.uk/summaries/AECO_v1_0.html
Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland
Licensing provisions: Standard CPC licence,
http://cpc.cs.qub.ac.uk/licence/licence.html
No. of lines in distributed program, including test data, etc.: 7211
No. of bytes in distributed program, including test data, etc.: 41 862
Distribution format: tar.gz
Programming language: FORTRAN 77
Computer: desktop, server
Operating system: Unix/Linux
RAM: 512 Mbytes
Classification: 4.8
External routines: BLAS (optional)
Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution.
Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix
A is factored into the product of a lower triangular matrix
L and an upper triangular matrix
U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization
P
A
=
L
U
, where
P is a permutation matrix. The solution for the system is achieved by first solving
L
y
=
P
b
(forward substitution) and then solving
U
x
=
y
(backward substitution). Due to round-off errors, the computed solution,
x, carries a numerical error magnified by the condition number of the coefficient matrix
A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision.
Running time: seconds/minutes On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. Program summary: Program title: ITER-REF Catalogue identifier: AECO_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries /AECO_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7211 No. of bytes in distributed program, including test data, etc.: 41?862 Distribution format: tar.gz Programming language: FORTRAN 77 Computer: desktop, server Operating system: Unix/Linux RAM: 512 Mbytes Classification: 4.8 External routines: BLAS (optional) Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization PA=LU, where P is a permutation matrix. The solution for the system is achieved by first solving Ly=Pb (forward substitution) and then solving Ux=y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Running time: seconds/minutes On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. |
| Author | Baboulin, Marc Buttari, Alfredo Langou, Julien Kurzak, Jakub Langou, Julie Luszczek, Piotr Dongarra, Jack Tomov, Stanimire |
| Author_xml | – sequence: 1 givenname: Marc surname: Baboulin fullname: Baboulin, Marc organization: Department of Mathematics, University of Coimbra, Coimbra, Portugal – sequence: 2 givenname: Alfredo surname: Buttari fullname: Buttari, Alfredo organization: French National Institute for Research in Computer Science and Control, Lyon, France – sequence: 3 givenname: Jack surname: Dongarra fullname: Dongarra, Jack organization: Department of Electrical Engineering and Computer Science, University Tennessee, Knoxville, TN, USA – sequence: 4 givenname: Jakub surname: Kurzak fullname: Kurzak, Jakub email: kurzak@eecs.utk.edu organization: Department of Electrical Engineering and Computer Science, University Tennessee, Knoxville, TN, USA – sequence: 5 givenname: Julie surname: Langou fullname: Langou, Julie organization: Department of Electrical Engineering and Computer Science, University Tennessee, Knoxville, TN, USA – sequence: 6 givenname: Julien surname: Langou fullname: Langou, Julien organization: Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, USA – sequence: 7 givenname: Piotr surname: Luszczek fullname: Luszczek, Piotr organization: MathWorks, Inc., Natick, MA, USA – sequence: 8 givenname: Stanimire surname: Tomov fullname: Tomov, Stanimire organization: Department of Electrical Engineering and Computer Science, University Tennessee, Knoxville, TN, USA |
| BackLink | https://hal.science/hal-02420940$$DView record in HAL |
| BookMark | eNp9kL1OwzAURi0EEm3hAdgyITEkXDtO4oipKj9FqsQCs2U7TusqiYOdFnh7XAILQydLn79zde-ZotPOdhqhKwwJBpzfbhPVq4QAsATjBCA7QRPMijImJaWnaAKAIaZ5lp2jqfdbACiKMp2g-7lSutFODKZbR14Z3Q2mNipStu13Q4ht56MPM2yi1nzqKuqdVsaHNBLN2rrw0foLdFaLxuvL33eG3h4fXhfLePXy9LyYr2JFIRtiJitJZJplGitCs1xJJnUqaU7StMJY4rSqiaxZrYuqDhsSyFnFiJB1wQqhWDpDN-PcjWh470wr3Be3wvDlfMUPGRBKoKSwx6F7PXZ7Z9932g-8NT6c2ohO253nKS1zoMHBDBVjUTnrvdM1V2a8e3DCNBwDPxjmWx4M84NhjjEPhgOJ_5F_Ox1j7kZGB097ox3_ca50ZYLYgVfWHKG_ASlkljs |
| CitedBy_id | crossref_primary_10_1002_wics_164 crossref_primary_10_3390_a12090197 crossref_primary_10_1177_1094342015580139 crossref_primary_10_3390_pr9101813 crossref_primary_10_1016_j_procs_2010_04_020 crossref_primary_10_1007_s10586_024_04767_y crossref_primary_10_1016_j_compfluid_2024_106247 crossref_primary_10_1016_j_jocs_2021_101447 crossref_primary_10_1016_j_cpc_2018_12_006 crossref_primary_10_1007_s12046_018_0892_0 crossref_primary_10_1017_S0962492916000015 crossref_primary_10_1029_2022MS003148 crossref_primary_10_1631_jzus_C1200043 crossref_primary_10_1109_TCSI_2024_3497724 crossref_primary_10_1016_j_jcp_2021_110574 crossref_primary_10_1016_j_camwa_2014_01_021 crossref_primary_10_1016_j_future_2023_10_006 crossref_primary_10_1177_10943420251338168 crossref_primary_10_1017_S0962492922000022 crossref_primary_10_1016_j_jcp_2023_112133 crossref_primary_10_1007_s12200_022_00025_4 crossref_primary_10_1137_20M1348571 crossref_primary_10_1109_TCAD_2023_3316994 crossref_primary_10_1007_s11075_017_0367_0 crossref_primary_10_1109_TPWRS_2022_3199181 crossref_primary_10_1002_2016MS000862 crossref_primary_10_1109_ACCESS_2025_3557505 crossref_primary_10_1007_s42514_023_00141_3 crossref_primary_10_1016_j_amc_2022_127611 crossref_primary_10_1109_ACCESS_2023_3262453 crossref_primary_10_1016_j_proeng_2013_08_022 crossref_primary_10_1007_s10569_022_10081_9 crossref_primary_10_1002_cpe_4055 crossref_primary_10_1109_TC_2019_2895031 crossref_primary_10_1109_TED_2022_3177391 crossref_primary_10_1007_s00450_010_0124_2 crossref_primary_10_1177_1094342020938424 crossref_primary_10_1007_s42514_024_00208_9 crossref_primary_10_1177_1094342019846547 crossref_primary_10_1016_j_compfluid_2024_106505 crossref_primary_10_1002_nla_2366 crossref_primary_10_3390_computers14050170 crossref_primary_10_1007_s10589_020_00190_2 crossref_primary_10_1016_j_jpdc_2024_104884 crossref_primary_10_1016_j_cpc_2011_11_026 crossref_primary_10_1108_EC_07_2019_0328 crossref_primary_10_1016_j_cpc_2022_108555 crossref_primary_10_1016_j_advengsoft_2011_10_014 crossref_primary_10_1103_PhysRevApplied_18_024040 crossref_primary_10_1002_tal_1389 crossref_primary_10_1039_C5CP00320B crossref_primary_10_1177_10943420241261960 crossref_primary_10_1016_j_cpc_2013_09_013 crossref_primary_10_1145_3582493 crossref_primary_10_1016_j_cpc_2019_07_002 crossref_primary_10_1137_24M1638513 crossref_primary_10_1016_j_apm_2025_115984 crossref_primary_10_1016_j_cpc_2012_01_002 crossref_primary_10_1016_j_jocs_2019_07_004 crossref_primary_10_1002_cpe_6621 crossref_primary_10_1007_s00382_017_4034_x crossref_primary_10_1109_MCSE_2017_48 crossref_primary_10_3389_fmars_2025_1586015 crossref_primary_10_1007_s12530_022_09428_2 crossref_primary_10_1145_3651155 crossref_primary_10_1007_s13160_019_00360_8 crossref_primary_10_1109_ACCESS_2023_3338443 crossref_primary_10_1145_3264491 crossref_primary_10_1109_TCSII_2024_3359678 |
| Cites_doi | 10.1145/1141885.1141894 10.1145/77626.79170 10.1109/TPDS.2007.70813 10.1016/j.cam.2004.09.024 10.1177/1094342007084026 10.1002/nla.1680010404 10.1137/0720002 10.1145/1377596.1377597 10.1145/992200.992206 10.1137/0904049 10.1145/42288.42291 10.1016/S0045-7825(99)00242-X 10.1137/S0895479894246905 10.1145/355841.355847 10.1145/356044.356047 10.1137/S1064827597323415 10.1016/j.parco.2005.07.004 10.1145/860854.860886 10.1177/109434208700100403 10.1007/BF01386090 10.1145/77626.77627 10.1137/0612048 10.1090/S0025-5718-1980-0572859-4 10.1137/0913048 10.1137/S1064827599362314 10.1137/S0036142902401074 10.1145/305658.287640 10.1145/321386.321394 10.1109/SC.2006.30 10.1016/0377-0427(94)00067-B 10.1137/1037125 10.1137/S0895479899358194 10.1002/cpe.1164 10.1145/42288.42292 10.1137/0610013 |
| ContentType | Journal Article |
| Copyright | 2008 Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: 2008 – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | AAYXX CITATION 7SC 7U5 8FD JQ2 L7M L~C L~D 1XC |
| DOI | 10.1016/j.cpc.2008.11.005 |
| DatabaseName | CrossRef Computer and Information Systems Abstracts Solid State and Superconductivity Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Hyper Article en Ligne (HAL) |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest Computer Science Collection Computer and Information Systems Abstracts Solid State and Superconductivity Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Physics Computer Science |
| EISSN | 1879-2944 |
| EndPage | 2533 |
| ExternalDocumentID | oai:HAL:hal-02420940v1 10_1016_j_cpc_2008_11_005 S0010465508003846 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 1B1 1RT 1~. 1~5 29F 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AARLI AAXUO AAYFN ABBOA ABFNM ABMAC ABNEU ABQEM ABQYD ABXDB ABYKQ ACDAQ ACFVG ACGFS ACLVX ACNNM ACRLP ACSBN ACZNC ADBBV ADECG ADEZE ADJOM ADMUD AEBSH AEKER AENEX AFKWA AFTJW AFZHZ AGHFR AGUBO AGYEJ AHHHB AHZHX AI. AIALX AIEXJ AIKHN AITUG AIVDX AJBFU AJOXV AJSZI ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG ATOGT AVWKF AXJTR AZFZN BBWZM BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FLBIZ FNPLU FYGXN G-2 G-Q GBLVA GBOLZ HLZ HME HMV HVGLF HZ~ IHE IMUCA J1W KOM LG9 LZ4 M38 M41 MO0 N9A NDZJH O-L O9- OAUVE OGIMB OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SCB SDF SDG SES SEW SHN SPC SPCBC SPD SPG SSE SSK SSQ SSV SSZ T5K TN5 UPT VH1 WUQ ZMT ~02 ~G- 9DU AATTM AAXKI AAYWO AAYXX ABJNI ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD 7SC 7U5 8FD JQ2 L7M L~C L~D 1XC |
| ID | FETCH-LOGICAL-c405t-8bdb2b355e1c2456cb8be3b46233d11b13df2bf8fe7df0002068d82abf787ac83 |
| ISICitedReferencesCount | 137 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000273011500011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0010-4655 |
| IngestDate | Sat Oct 25 07:17:08 EDT 2025 Sun Sep 28 12:33:11 EDT 2025 Sat Nov 29 05:32:18 EST 2025 Tue Nov 18 22:12:44 EST 2025 Fri Feb 23 02:30:58 EST 2024 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 12 |
| Keywords | 02.60.Dc Mixed precision Numerical linear algebra Iterative refinement |
| Language | English |
| License | https://www.elsevier.com/tdm/userlicense/1.0 Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c405t-8bdb2b355e1c2456cb8be3b46233d11b13df2bf8fe7df0002068d82abf787ac83 |
| Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
| ORCID | 0000-0003-3207-7021 0009-0006-0903-1307 |
| OpenAccessLink | http://hdl.handle.net/10316/10052 |
| PQID | 34960479 |
| PQPubID | 23500 |
| PageCount | 8 |
| ParticipantIDs | hal_primary_oai_HAL_hal_02420940v1 proquest_miscellaneous_34960479 crossref_citationtrail_10_1016_j_cpc_2008_11_005 crossref_primary_10_1016_j_cpc_2008_11_005 elsevier_sciencedirect_doi_10_1016_j_cpc_2008_11_005 |
| PublicationCentury | 2000 |
| PublicationDate | 2009-12-01 |
| PublicationDateYYYYMMDD | 2009-12-01 |
| PublicationDate_xml | – month: 12 year: 2009 text: 2009-12-01 day: 01 |
| PublicationDecade | 2000 |
| PublicationTitle | Computer physics communications |
| PublicationYear | 2009 |
| Publisher | Elsevier B.V Elsevier |
| Publisher_xml | – name: Elsevier B.V – name: Elsevier |
| References | Notay (bib013) 2000; 22 Vuik (bib017) 1995; 61 J. Langou, et al., Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy, in: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006 Amestoy, Guermouche, L'Excellent, Pralet (bib035) 2006; 32 Dongarra, Moler, Wilkinson (bib046) 1983; 20 J.W. Demmel, Y. Hida, X.S. Li, E.J. Riedy, Extra-precise iterative refinement for overdetermined least squares problems, Technical Report EECS-2007-77, UC Berkeley, 2007, Also LAPACK Working Note 188 Dongarra, Croz, Duff, Hammarling (bib021) 1990; 16 Demmel (bib002) 1997 Stewart (bib005) 1973 Dongarra, Croz, Hammarling, Hanson (bib023) 1988; 14 Moler (bib004) 1967; 14 Demmel (bib006) 2006; 32 Oettli, Prager (bib007) 1964; 6 Kurzak, Buttari, Dongarra (bib028) 2008; 19 Amestoy, Duff, L'Excellent (bib033) 2000; 184 Skeel (bib039) 1980; 35 Barrett (bib008) 1994 Turner, Walker (bib010) 1992; 13 van der Vorst, Vuik (bib018) 1994; 1 Davis (bib036) 1999; 25 Dongarra, Croz, Duff, Hammarling (bib020) 1990; 16 Amestoy, Duff, L'Excellent, Koster (bib034) 2001; 23 Björck (bib045) 1996 Higham (bib040) 2002 Golub, Ye (bib012) 2000; 21 Davis (bib037) 2004; 30 Davis, Duff (bib038) 1997; 18 Dongarra (bib047) 1983; 4 Saad (bib009) 2003 Simoncini, Szyld (bib015) 2003; 40 Axelsson, Vassilevski (bib011) 1991; 12 Buttari (bib026) 2007; 21 Ypma (bib001) 1995; 37 Datta (bib042) 1995 M. Arioli, I.S. Duff, Using FGMRES to obtain backward stability in mixed precision, in: Technical Report RAL-TR-2008-006, Rutherford Appleton Laboratory, 2008 Arioli, Demmel, Duff (bib019) 1989; 10 van den Eshof, Sleijpen, van Gijzen (bib016) 2005; 177 K.O. Geddes, W.W. Zheng, Exploiting fast hardware floating point in high precision computation, in: Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation. Philadelphia, PA, USA, 2003, pp. 111–118 Buttari, Dongarra, Kurzak, Luszczek, Tomov (bib032) 2008; 34 Y. Saad, A flexible inner–outer preconditioned GMRES algorithm, Technical Report 91-279, Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, 1991 Dongarra, Croz, Hammarling, Hanson (bib022) 1988; 14 Lawson, Hanson, Kincaid, Krogh (bib024) 1979; 5 Wilkinson (bib003) 1963 Duff, Reid (bib030) 1983; 9 Ashcraft, Grimes, Lewis, Peyton, Simon (bib031) 1987; 1 Anderson (bib025) 1999 Kurzak, Dongarra (bib029) 2007; 19 Arioli (10.1016/j.cpc.2008.11.005_bib019) 1989; 10 Dongarra (10.1016/j.cpc.2008.11.005_bib020) 1990; 16 Dongarra (10.1016/j.cpc.2008.11.005_bib023) 1988; 14 Amestoy (10.1016/j.cpc.2008.11.005_bib035) 2006; 32 Demmel (10.1016/j.cpc.2008.11.005_bib006) 2006; 32 Ypma (10.1016/j.cpc.2008.11.005_bib001) 1995; 37 Wilkinson (10.1016/j.cpc.2008.11.005_bib003) 1963 Skeel (10.1016/j.cpc.2008.11.005_bib039) 1980; 35 Notay (10.1016/j.cpc.2008.11.005_bib013) 2000; 22 Amestoy (10.1016/j.cpc.2008.11.005_bib034) 2001; 23 Axelsson (10.1016/j.cpc.2008.11.005_bib011) 1991; 12 van den Eshof (10.1016/j.cpc.2008.11.005_bib016) 2005; 177 Lawson (10.1016/j.cpc.2008.11.005_bib024) 1979; 5 Vuik (10.1016/j.cpc.2008.11.005_bib017) 1995; 61 Anderson (10.1016/j.cpc.2008.11.005_bib025) 1999 10.1016/j.cpc.2008.11.005_bib027 Davis (10.1016/j.cpc.2008.11.005_bib037) 2004; 30 Moler (10.1016/j.cpc.2008.11.005_bib004) 1967; 14 Buttari (10.1016/j.cpc.2008.11.005_bib032) 2008; 34 Datta (10.1016/j.cpc.2008.11.005_bib042) 1995 Demmel (10.1016/j.cpc.2008.11.005_bib002) 1997 Barrett (10.1016/j.cpc.2008.11.005_bib008) 1994 Higham (10.1016/j.cpc.2008.11.005_bib040) 2002 Dongarra (10.1016/j.cpc.2008.11.005_bib021) 1990; 16 Davis (10.1016/j.cpc.2008.11.005_bib038) 1997; 18 Turner (10.1016/j.cpc.2008.11.005_bib010) 1992; 13 Kurzak (10.1016/j.cpc.2008.11.005_bib029) 2007; 19 Golub (10.1016/j.cpc.2008.11.005_bib012) 2000; 21 Buttari (10.1016/j.cpc.2008.11.005_bib026) 2007; 21 10.1016/j.cpc.2008.11.005_bib041 Ashcraft (10.1016/j.cpc.2008.11.005_bib031) 1987; 1 Duff (10.1016/j.cpc.2008.11.005_bib030) 1983; 9 Amestoy (10.1016/j.cpc.2008.11.005_bib033) 2000; 184 10.1016/j.cpc.2008.11.005_bib044 Stewart (10.1016/j.cpc.2008.11.005_bib005) 1973 10.1016/j.cpc.2008.11.005_bib043 Oettli (10.1016/j.cpc.2008.11.005_bib007) 1964; 6 Kurzak (10.1016/j.cpc.2008.11.005_bib028) 2008; 19 Saad (10.1016/j.cpc.2008.11.005_bib009) 2003 Björck (10.1016/j.cpc.2008.11.005_bib045) 1996 10.1016/j.cpc.2008.11.005_bib014 Dongarra (10.1016/j.cpc.2008.11.005_bib046) 1983; 20 Davis (10.1016/j.cpc.2008.11.005_bib036) 1999; 25 Simoncini (10.1016/j.cpc.2008.11.005_bib015) 2003; 40 Dongarra (10.1016/j.cpc.2008.11.005_bib022) 1988; 14 van der Vorst (10.1016/j.cpc.2008.11.005_bib018) 1994; 1 Dongarra (10.1016/j.cpc.2008.11.005_bib047) 1983; 4 |
| References_xml | – volume: 14 start-page: 316 year: 1967 ident: bib004 publication-title: J. ACM – volume: 6 start-page: 405 year: 1964 ident: bib007 publication-title: Numer. Math. – volume: 16 start-page: 18 year: 1990 ident: bib020 publication-title: ACM Trans. Math. Software – volume: 32 start-page: 136 year: 2006 ident: bib035 publication-title: Parallel Comput. – volume: 4 start-page: 712 year: 1983 ident: bib047 publication-title: SIAM J. Scientific Statist. Comput. – year: 1963 ident: bib003 article-title: Rounding Errors in Algebraic Processes – volume: 5 start-page: 308 year: 1979 ident: bib024 publication-title: ACM Trans. Math. Software – volume: 1 start-page: 10 year: 1987 ident: bib031 publication-title: Intern. J. Supercomput. Appl. – volume: 21 start-page: 457 year: 2007 ident: bib026 publication-title: Int. J. High Performance Comput. Appl. – volume: 9 start-page: 302 year: 1983 ident: bib030 publication-title: ACM Trans. Math. Software – volume: 1 start-page: 369 year: 1994 ident: bib018 publication-title: Numer. Linear Algebra Appl. – volume: 184 start-page: 501 year: 2000 ident: bib033 publication-title: Comput. Methods Appl. Mech. Eng. – volume: 22 start-page: 1444 year: 2000 ident: bib013 publication-title: SIAM J. Scientific Comput. – reference: J. Langou, et al., Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy, in: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006 – volume: 14 start-page: 18 year: 1988 ident: bib022 publication-title: ACM Trans. Math. Software – volume: 21 start-page: 1305 year: 2000 ident: bib012 publication-title: SIAM J. Scientific Comput. – volume: 14 start-page: 1 year: 1988 ident: bib023 publication-title: ACM Trans. Math. Software – year: 1999 ident: bib025 article-title: LAPACK Users' Guide – volume: 19 start-page: 1 year: 2008 ident: bib028 publication-title: IEEE Trans. Parallel Distrib. Systems – year: 1994 ident: bib008 article-title: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods – volume: 34 start-page: 17 year: 2008 ident: bib032 publication-title: ACM Trans. Math. Software – volume: 32 start-page: 325 year: 2006 ident: bib006 publication-title: ACM Trans. Math. Software – volume: 12 start-page: 625 year: 1991 ident: bib011 publication-title: SIAM J. Matrix Anal. Appl. – year: 1973 ident: bib005 article-title: Introduction to Matrix Computations – volume: 37 start-page: 531 year: 1995 ident: bib001 publication-title: SIAM Review – reference: K.O. Geddes, W.W. Zheng, Exploiting fast hardware floating point in high precision computation, in: Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation. Philadelphia, PA, USA, 2003, pp. 111–118 – volume: 35 start-page: 817 year: 1980 ident: bib039 publication-title: Math. Comput. – volume: 20 start-page: 23 year: 1983 ident: bib046 publication-title: SIAM J. Numer. Anal. – volume: 40 start-page: 2219 year: 2003 ident: bib015 publication-title: SIAM J. Numer. Anal. – year: 1997 ident: bib002 article-title: Applied Numerical Linear Algebra – reference: Y. Saad, A flexible inner–outer preconditioned GMRES algorithm, Technical Report 91-279, Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, 1991 – volume: 177 start-page: 347 year: 2005 ident: bib016 publication-title: J. Comput. Appl. Math. – volume: 18 start-page: 140 year: 1997 ident: bib038 publication-title: SIAM J. Matrix Anal. Appl. – year: 2003 ident: bib009 article-title: Iterative Methods for Sparse Linear Systems – volume: 23 start-page: 15 year: 2001 ident: bib034 publication-title: SIAM J. Matrix Anal. Appl. – volume: 61 start-page: 189 year: 1995 ident: bib017 publication-title: J. Comput. Appl. Math. – volume: 19 start-page: 1371 year: 2007 ident: bib029 publication-title: Concurrency Computat.: Pract. Exper. – volume: 16 start-page: 1 year: 1990 ident: bib021 publication-title: ACM Trans. Math. Software – volume: 25 start-page: 1 year: 1999 ident: bib036 publication-title: ACM Trans. Math. Software – reference: M. Arioli, I.S. Duff, Using FGMRES to obtain backward stability in mixed precision, in: Technical Report RAL-TR-2008-006, Rutherford Appleton Laboratory, 2008 – year: 2002 ident: bib040 article-title: Accuracy and Stability of Numerical Algorithms – year: 1995 ident: bib042 article-title: Numerical Linear Algebra and Applications – reference: J.W. Demmel, Y. Hida, X.S. Li, E.J. Riedy, Extra-precise iterative refinement for overdetermined least squares problems, Technical Report EECS-2007-77, UC Berkeley, 2007, Also LAPACK Working Note 188 – volume: 13 start-page: 815 year: 1992 ident: bib010 publication-title: SIAM J. Sci. Stat. Comput. – volume: 10 start-page: 165 year: 1989 ident: bib019 publication-title: SIAM J. Matrix Anal. Appl. – volume: 30 start-page: 196 year: 2004 ident: bib037 publication-title: ACM Trans. Math. Software – year: 1996 ident: bib045 article-title: Numerical Methods for Least Squares Problems – volume: 32 start-page: 325 year: 2006 ident: 10.1016/j.cpc.2008.11.005_bib006 publication-title: ACM Trans. Math. Software doi: 10.1145/1141885.1141894 – year: 1997 ident: 10.1016/j.cpc.2008.11.005_bib002 – volume: 16 start-page: 1 year: 1990 ident: 10.1016/j.cpc.2008.11.005_bib021 publication-title: ACM Trans. Math. Software doi: 10.1145/77626.79170 – volume: 19 start-page: 1 year: 2008 ident: 10.1016/j.cpc.2008.11.005_bib028 publication-title: IEEE Trans. Parallel Distrib. Systems doi: 10.1109/TPDS.2007.70813 – volume: 177 start-page: 347 year: 2005 ident: 10.1016/j.cpc.2008.11.005_bib016 publication-title: J. Comput. Appl. Math. doi: 10.1016/j.cam.2004.09.024 – year: 1999 ident: 10.1016/j.cpc.2008.11.005_bib025 – volume: 21 start-page: 457 year: 2007 ident: 10.1016/j.cpc.2008.11.005_bib026 publication-title: Int. J. High Performance Comput. Appl. doi: 10.1177/1094342007084026 – year: 2002 ident: 10.1016/j.cpc.2008.11.005_bib040 – volume: 1 start-page: 369 year: 1994 ident: 10.1016/j.cpc.2008.11.005_bib018 publication-title: Numer. Linear Algebra Appl. doi: 10.1002/nla.1680010404 – volume: 20 start-page: 23 year: 1983 ident: 10.1016/j.cpc.2008.11.005_bib046 publication-title: SIAM J. Numer. Anal. doi: 10.1137/0720002 – ident: 10.1016/j.cpc.2008.11.005_bib041 – volume: 34 start-page: 17 year: 2008 ident: 10.1016/j.cpc.2008.11.005_bib032 publication-title: ACM Trans. Math. Software doi: 10.1145/1377596.1377597 – volume: 30 start-page: 196 year: 2004 ident: 10.1016/j.cpc.2008.11.005_bib037 publication-title: ACM Trans. Math. Software doi: 10.1145/992200.992206 – year: 1963 ident: 10.1016/j.cpc.2008.11.005_bib003 – volume: 4 start-page: 712 year: 1983 ident: 10.1016/j.cpc.2008.11.005_bib047 publication-title: SIAM J. Scientific Statist. Comput. doi: 10.1137/0904049 – volume: 14 start-page: 1 year: 1988 ident: 10.1016/j.cpc.2008.11.005_bib023 publication-title: ACM Trans. Math. Software doi: 10.1145/42288.42291 – volume: 184 start-page: 501 year: 2000 ident: 10.1016/j.cpc.2008.11.005_bib033 publication-title: Comput. Methods Appl. Mech. Eng. doi: 10.1016/S0045-7825(99)00242-X – volume: 18 start-page: 140 year: 1997 ident: 10.1016/j.cpc.2008.11.005_bib038 publication-title: SIAM J. Matrix Anal. Appl. doi: 10.1137/S0895479894246905 – volume: 5 start-page: 308 year: 1979 ident: 10.1016/j.cpc.2008.11.005_bib024 publication-title: ACM Trans. Math. Software doi: 10.1145/355841.355847 – volume: 9 start-page: 302 year: 1983 ident: 10.1016/j.cpc.2008.11.005_bib030 publication-title: ACM Trans. Math. Software doi: 10.1145/356044.356047 – volume: 21 start-page: 1305 year: 2000 ident: 10.1016/j.cpc.2008.11.005_bib012 publication-title: SIAM J. Scientific Comput. doi: 10.1137/S1064827597323415 – year: 1995 ident: 10.1016/j.cpc.2008.11.005_bib042 – volume: 32 start-page: 136 year: 2006 ident: 10.1016/j.cpc.2008.11.005_bib035 publication-title: Parallel Comput. doi: 10.1016/j.parco.2005.07.004 – ident: 10.1016/j.cpc.2008.11.005_bib043 doi: 10.1145/860854.860886 – year: 1996 ident: 10.1016/j.cpc.2008.11.005_bib045 – volume: 1 start-page: 10 year: 1987 ident: 10.1016/j.cpc.2008.11.005_bib031 publication-title: Intern. J. Supercomput. Appl. doi: 10.1177/109434208700100403 – volume: 6 start-page: 405 year: 1964 ident: 10.1016/j.cpc.2008.11.005_bib007 publication-title: Numer. Math. doi: 10.1007/BF01386090 – ident: 10.1016/j.cpc.2008.11.005_bib014 – year: 2003 ident: 10.1016/j.cpc.2008.11.005_bib009 – volume: 16 start-page: 18 year: 1990 ident: 10.1016/j.cpc.2008.11.005_bib020 publication-title: ACM Trans. Math. Software doi: 10.1145/77626.77627 – volume: 12 start-page: 625 year: 1991 ident: 10.1016/j.cpc.2008.11.005_bib011 publication-title: SIAM J. Matrix Anal. Appl. doi: 10.1137/0612048 – volume: 35 start-page: 817 year: 1980 ident: 10.1016/j.cpc.2008.11.005_bib039 publication-title: Math. Comput. doi: 10.1090/S0025-5718-1980-0572859-4 – year: 1994 ident: 10.1016/j.cpc.2008.11.005_bib008 – volume: 13 start-page: 815 year: 1992 ident: 10.1016/j.cpc.2008.11.005_bib010 publication-title: SIAM J. Sci. Stat. Comput. doi: 10.1137/0913048 – volume: 22 start-page: 1444 year: 2000 ident: 10.1016/j.cpc.2008.11.005_bib013 publication-title: SIAM J. Scientific Comput. doi: 10.1137/S1064827599362314 – volume: 40 start-page: 2219 year: 2003 ident: 10.1016/j.cpc.2008.11.005_bib015 publication-title: SIAM J. Numer. Anal. doi: 10.1137/S0036142902401074 – year: 1973 ident: 10.1016/j.cpc.2008.11.005_bib005 – volume: 25 start-page: 1 year: 1999 ident: 10.1016/j.cpc.2008.11.005_bib036 publication-title: ACM Trans. Math. Software doi: 10.1145/305658.287640 – ident: 10.1016/j.cpc.2008.11.005_bib044 – volume: 14 start-page: 316 year: 1967 ident: 10.1016/j.cpc.2008.11.005_bib004 publication-title: J. ACM doi: 10.1145/321386.321394 – ident: 10.1016/j.cpc.2008.11.005_bib027 doi: 10.1109/SC.2006.30 – volume: 61 start-page: 189 year: 1995 ident: 10.1016/j.cpc.2008.11.005_bib017 publication-title: J. Comput. Appl. Math. doi: 10.1016/0377-0427(94)00067-B – volume: 37 start-page: 531 year: 1995 ident: 10.1016/j.cpc.2008.11.005_bib001 publication-title: SIAM Review doi: 10.1137/1037125 – volume: 23 start-page: 15 year: 2001 ident: 10.1016/j.cpc.2008.11.005_bib034 publication-title: SIAM J. Matrix Anal. Appl. doi: 10.1137/S0895479899358194 – volume: 19 start-page: 1371 year: 2007 ident: 10.1016/j.cpc.2008.11.005_bib029 publication-title: Concurrency Computat.: Pract. Exper. doi: 10.1002/cpe.1164 – volume: 14 start-page: 18 year: 1988 ident: 10.1016/j.cpc.2008.11.005_bib022 publication-title: ACM Trans. Math. Software doi: 10.1145/42288.42292 – volume: 10 start-page: 165 year: 1989 ident: 10.1016/j.cpc.2008.11.005_bib019 publication-title: SIAM J. Matrix Anal. Appl. doi: 10.1137/0610013 |
| SSID | ssj0007793 |
| Score | 2.3753948 |
| Snippet | On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination... |
| SourceID | hal proquest crossref elsevier |
| SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 2526 |
| SubjectTerms | Computer Science Iterative refinement Mathematical Software Mixed precision Numerical Analysis Numerical linear algebra |
| Title | Accelerating scientific computations with mixed precision algorithms |
| URI | https://dx.doi.org/10.1016/j.cpc.2008.11.005 https://www.proquest.com/docview/34960479 https://hal.science/hal-02420940 |
| Volume | 180 |
| WOSCitedRecordID | wos000273011500011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1879-2944 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0007793 issn: 0010-4655 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6FFCQuiKcITwtxAhnZ69f6aEFReVU9FCk3a18ubYMTOXYUceaHM-NdPxrUih64WNHaXtk7X2bGszPfEPI6SGOVBkK7XqgLN0wL6nLAkRup2GeJ4pESYdtsIjk8ZPN5ejSZ_O5qYTaLpCzZdpuu_quoYQyEjaWz1xB3PykMwG8QOhxB7HD8J8FnUoIpQcFipKCtd8R0oDZ5vKlt5lsbfv15utUtS4Bps_OWL06WFZywBOYdf4Ht-2CDIGucaKgpGaLsmOJsu79_gyXtx5u65qacPVsUSE_a-87L8oRXlU3WlX3J0Jem-sXPzeh5Iy7EJdJRjofVtaDhkZ3toq71xqCiY9UZ0XhkhmlkCDL-UvEm2nD2Tq6kSYVFElYvGuxZt4e_Y-b65MMur-0shylMG074CGqJcPdoEqVsSvayT_vzz71FTxJL3mzfqNsdb_MEd57jMv_mxg9MtN2x960Tc3yX3LFfH05mUHOPTHR5n9w6MoJ9QD6MseMM2HHG2HEQO06LHafHjjNg5yH5_nH_-P2Ba9tsuBK89dplQgkqwO_UvsRtcCmY0IEIwTEOlO8LP1AFFQUrdKKKdus6ZopRLgpQ9lyy4BGZlstSPyZOUIC7i9XZSSFD6oWcYzwhkCxWYUR5NCNetzi5tBz02AplkV8qlBl509-yMgQsV10cdiueWw_SeIY5oOeq216BdPrpkXH9IPua4xi6sEgxufFn5GUnvBxUMO6r8VIvm3WOPRewU8OT6zzqU3J7-Nc8I9O6avRzclNu6tN19cIi8A-5P6mX |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Accelerating+scientific+computations+with+mixed+precision+algorithms&rft.jtitle=Computer+physics+communications&rft.au=Baboulin%2C+Marc&rft.au=Buttari%2C+Alfredo&rft.au=Dongarra%2C+Jack&rft.au=Kurzak%2C+Jakub&rft.date=2009-12-01&rft.issn=0010-4655&rft.volume=180&rft.issue=12&rft.spage=2526&rft.epage=2533&rft_id=info:doi/10.1016%2Fj.cpc.2008.11.005&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_cpc_2008_11_005 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0010-4655&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0010-4655&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0010-4655&client=summon |