Accelerating scientific computations with mixed precision algorithms

On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanc...

Full description

Saved in:
Bibliographic Details
Published in:Computer physics communications Vol. 180; no. 12; pp. 2526 - 2533
Main Authors: Baboulin, Marc, Buttari, Alfredo, Dongarra, Jack, Kurzak, Jakub, Langou, Julie, Langou, Julien, Luszczek, Piotr, Tomov, Stanimire
Format: Journal Article
Language:English
Published: Elsevier B.V 01.12.2009
Elsevier
Subjects:
ISSN:0010-4655, 1879-2944
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. Program title: ITER-REF Catalogue identifier: AECO_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AECO_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7211 No. of bytes in distributed program, including test data, etc.: 41 862 Distribution format: tar.gz Programming language: FORTRAN 77 Computer: desktop, server Operating system: Unix/Linux RAM: 512 Mbytes Classification: 4.8 External routines: BLAS (optional) Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization P A = L U , where P is a permutation matrix. The solution for the system is achieved by first solving L y = P b (forward substitution) and then solving U x = y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Running time: seconds/minutes
AbstractList On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. Program title: ITER-REF Catalogue identifier: AECO_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AECO_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7211 No. of bytes in distributed program, including test data, etc.: 41 862 Distribution format: tar.gz Programming language: FORTRAN 77 Computer: desktop, server Operating system: Unix/Linux RAM: 512 Mbytes Classification: 4.8 External routines: BLAS (optional) Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization P A = L U , where P is a permutation matrix. The solution for the system is achieved by first solving L y = P b (forward substitution) and then solving U x = y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Running time: seconds/minutes
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. Program summary: Program title: ITER-REF Catalogue identifier: AECO_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries /AECO_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7211 No. of bytes in distributed program, including test data, etc.: 41?862 Distribution format: tar.gz Programming language: FORTRAN 77 Computer: desktop, server Operating system: Unix/Linux RAM: 512 Mbytes Classification: 4.8 External routines: BLAS (optional) Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization PA=LU, where P is a permutation matrix. The solution for the system is achieved by first solving Ly=Pb (forward substitution) and then solving Ux=y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Running time: seconds/minutes
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.
Author Baboulin, Marc
Buttari, Alfredo
Langou, Julien
Kurzak, Jakub
Langou, Julie
Luszczek, Piotr
Dongarra, Jack
Tomov, Stanimire
Author_xml – sequence: 1
  givenname: Marc
  surname: Baboulin
  fullname: Baboulin, Marc
  organization: Department of Mathematics, University of Coimbra, Coimbra, Portugal
– sequence: 2
  givenname: Alfredo
  surname: Buttari
  fullname: Buttari, Alfredo
  organization: French National Institute for Research in Computer Science and Control, Lyon, France
– sequence: 3
  givenname: Jack
  surname: Dongarra
  fullname: Dongarra, Jack
  organization: Department of Electrical Engineering and Computer Science, University Tennessee, Knoxville, TN, USA
– sequence: 4
  givenname: Jakub
  surname: Kurzak
  fullname: Kurzak, Jakub
  email: kurzak@eecs.utk.edu
  organization: Department of Electrical Engineering and Computer Science, University Tennessee, Knoxville, TN, USA
– sequence: 5
  givenname: Julie
  surname: Langou
  fullname: Langou, Julie
  organization: Department of Electrical Engineering and Computer Science, University Tennessee, Knoxville, TN, USA
– sequence: 6
  givenname: Julien
  surname: Langou
  fullname: Langou, Julien
  organization: Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, USA
– sequence: 7
  givenname: Piotr
  surname: Luszczek
  fullname: Luszczek, Piotr
  organization: MathWorks, Inc., Natick, MA, USA
– sequence: 8
  givenname: Stanimire
  surname: Tomov
  fullname: Tomov, Stanimire
  organization: Department of Electrical Engineering and Computer Science, University Tennessee, Knoxville, TN, USA
BackLink https://hal.science/hal-02420940$$DView record in HAL
BookMark eNp9kL1OwzAURi0EEm3hAdgyITEkXDtO4oipKj9FqsQCs2U7TusqiYOdFnh7XAILQydLn79zde-ZotPOdhqhKwwJBpzfbhPVq4QAsATjBCA7QRPMijImJaWnaAKAIaZ5lp2jqfdbACiKMp2g-7lSutFODKZbR14Z3Q2mNipStu13Q4ht56MPM2yi1nzqKuqdVsaHNBLN2rrw0foLdFaLxuvL33eG3h4fXhfLePXy9LyYr2JFIRtiJitJZJplGitCs1xJJnUqaU7StMJY4rSqiaxZrYuqDhsSyFnFiJB1wQqhWDpDN-PcjWh470wr3Be3wvDlfMUPGRBKoKSwx6F7PXZ7Z9932g-8NT6c2ohO253nKS1zoMHBDBVjUTnrvdM1V2a8e3DCNBwDPxjmWx4M84NhjjEPhgOJ_5F_Ox1j7kZGB097ox3_ca50ZYLYgVfWHKG_ASlkljs
CitedBy_id crossref_primary_10_1002_wics_164
crossref_primary_10_3390_a12090197
crossref_primary_10_1177_1094342015580139
crossref_primary_10_3390_pr9101813
crossref_primary_10_1016_j_procs_2010_04_020
crossref_primary_10_1007_s10586_024_04767_y
crossref_primary_10_1016_j_compfluid_2024_106247
crossref_primary_10_1016_j_jocs_2021_101447
crossref_primary_10_1016_j_cpc_2018_12_006
crossref_primary_10_1007_s12046_018_0892_0
crossref_primary_10_1017_S0962492916000015
crossref_primary_10_1029_2022MS003148
crossref_primary_10_1631_jzus_C1200043
crossref_primary_10_1109_TCSI_2024_3497724
crossref_primary_10_1016_j_jcp_2021_110574
crossref_primary_10_1016_j_camwa_2014_01_021
crossref_primary_10_1016_j_future_2023_10_006
crossref_primary_10_1177_10943420251338168
crossref_primary_10_1017_S0962492922000022
crossref_primary_10_1016_j_jcp_2023_112133
crossref_primary_10_1007_s12200_022_00025_4
crossref_primary_10_1137_20M1348571
crossref_primary_10_1109_TCAD_2023_3316994
crossref_primary_10_1007_s11075_017_0367_0
crossref_primary_10_1109_TPWRS_2022_3199181
crossref_primary_10_1002_2016MS000862
crossref_primary_10_1109_ACCESS_2025_3557505
crossref_primary_10_1007_s42514_023_00141_3
crossref_primary_10_1016_j_amc_2022_127611
crossref_primary_10_1109_ACCESS_2023_3262453
crossref_primary_10_1016_j_proeng_2013_08_022
crossref_primary_10_1007_s10569_022_10081_9
crossref_primary_10_1002_cpe_4055
crossref_primary_10_1109_TC_2019_2895031
crossref_primary_10_1109_TED_2022_3177391
crossref_primary_10_1007_s00450_010_0124_2
crossref_primary_10_1177_1094342020938424
crossref_primary_10_1007_s42514_024_00208_9
crossref_primary_10_1177_1094342019846547
crossref_primary_10_1016_j_compfluid_2024_106505
crossref_primary_10_1002_nla_2366
crossref_primary_10_3390_computers14050170
crossref_primary_10_1007_s10589_020_00190_2
crossref_primary_10_1016_j_jpdc_2024_104884
crossref_primary_10_1016_j_cpc_2011_11_026
crossref_primary_10_1108_EC_07_2019_0328
crossref_primary_10_1016_j_cpc_2022_108555
crossref_primary_10_1016_j_advengsoft_2011_10_014
crossref_primary_10_1103_PhysRevApplied_18_024040
crossref_primary_10_1002_tal_1389
crossref_primary_10_1039_C5CP00320B
crossref_primary_10_1177_10943420241261960
crossref_primary_10_1016_j_cpc_2013_09_013
crossref_primary_10_1145_3582493
crossref_primary_10_1016_j_cpc_2019_07_002
crossref_primary_10_1137_24M1638513
crossref_primary_10_1016_j_apm_2025_115984
crossref_primary_10_1016_j_cpc_2012_01_002
crossref_primary_10_1016_j_jocs_2019_07_004
crossref_primary_10_1002_cpe_6621
crossref_primary_10_1007_s00382_017_4034_x
crossref_primary_10_1109_MCSE_2017_48
crossref_primary_10_3389_fmars_2025_1586015
crossref_primary_10_1007_s12530_022_09428_2
crossref_primary_10_1145_3651155
crossref_primary_10_1007_s13160_019_00360_8
crossref_primary_10_1109_ACCESS_2023_3338443
crossref_primary_10_1145_3264491
crossref_primary_10_1109_TCSII_2024_3359678
Cites_doi 10.1145/1141885.1141894
10.1145/77626.79170
10.1109/TPDS.2007.70813
10.1016/j.cam.2004.09.024
10.1177/1094342007084026
10.1002/nla.1680010404
10.1137/0720002
10.1145/1377596.1377597
10.1145/992200.992206
10.1137/0904049
10.1145/42288.42291
10.1016/S0045-7825(99)00242-X
10.1137/S0895479894246905
10.1145/355841.355847
10.1145/356044.356047
10.1137/S1064827597323415
10.1016/j.parco.2005.07.004
10.1145/860854.860886
10.1177/109434208700100403
10.1007/BF01386090
10.1145/77626.77627
10.1137/0612048
10.1090/S0025-5718-1980-0572859-4
10.1137/0913048
10.1137/S1064827599362314
10.1137/S0036142902401074
10.1145/305658.287640
10.1145/321386.321394
10.1109/SC.2006.30
10.1016/0377-0427(94)00067-B
10.1137/1037125
10.1137/S0895479899358194
10.1002/cpe.1164
10.1145/42288.42292
10.1137/0610013
ContentType Journal Article
Copyright 2008
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: 2008
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
7SC
7U5
8FD
JQ2
L7M
L~C
L~D
1XC
DOI 10.1016/j.cpc.2008.11.005
DatabaseName CrossRef
Computer and Information Systems Abstracts
Solid State and Superconductivity Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Hyper Article en Ligne (HAL)
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Solid State and Superconductivity Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database

DeliveryMethod fulltext_linktorsrc
Discipline Physics
Computer Science
EISSN 1879-2944
EndPage 2533
ExternalDocumentID oai:HAL:hal-02420940v1
10_1016_j_cpc_2008_11_005
S0010465508003846
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
1B1
1RT
1~.
1~5
29F
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AARLI
AAXUO
AAYFN
ABBOA
ABFNM
ABMAC
ABNEU
ABQEM
ABQYD
ABXDB
ABYKQ
ACDAQ
ACFVG
ACGFS
ACLVX
ACNNM
ACRLP
ACSBN
ACZNC
ADBBV
ADECG
ADEZE
ADJOM
ADMUD
AEBSH
AEKER
AENEX
AFKWA
AFTJW
AFZHZ
AGHFR
AGUBO
AGYEJ
AHHHB
AHZHX
AI.
AIALX
AIEXJ
AIKHN
AITUG
AIVDX
AJBFU
AJOXV
AJSZI
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
ATOGT
AVWKF
AXJTR
AZFZN
BBWZM
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FLBIZ
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HLZ
HME
HMV
HVGLF
HZ~
IHE
IMUCA
J1W
KOM
LG9
LZ4
M38
M41
MO0
N9A
NDZJH
O-L
O9-
OAUVE
OGIMB
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SCB
SDF
SDG
SES
SEW
SHN
SPC
SPCBC
SPD
SPG
SSE
SSK
SSQ
SSV
SSZ
T5K
TN5
UPT
VH1
WUQ
ZMT
~02
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
7SC
7U5
8FD
JQ2
L7M
L~C
L~D
1XC
ID FETCH-LOGICAL-c405t-8bdb2b355e1c2456cb8be3b46233d11b13df2bf8fe7df0002068d82abf787ac83
ISICitedReferencesCount 137
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000273011500011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0010-4655
IngestDate Sat Oct 25 07:17:08 EDT 2025
Sun Sep 28 12:33:11 EDT 2025
Sat Nov 29 05:32:18 EST 2025
Tue Nov 18 22:12:44 EST 2025
Fri Feb 23 02:30:58 EST 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 12
Keywords 02.60.Dc
Mixed precision
Numerical linear algebra
Iterative refinement
Language English
License https://www.elsevier.com/tdm/userlicense/1.0
Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c405t-8bdb2b355e1c2456cb8be3b46233d11b13df2bf8fe7df0002068d82abf787ac83
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ORCID 0000-0003-3207-7021
0009-0006-0903-1307
OpenAccessLink http://hdl.handle.net/10316/10052
PQID 34960479
PQPubID 23500
PageCount 8
ParticipantIDs hal_primary_oai_HAL_hal_02420940v1
proquest_miscellaneous_34960479
crossref_citationtrail_10_1016_j_cpc_2008_11_005
crossref_primary_10_1016_j_cpc_2008_11_005
elsevier_sciencedirect_doi_10_1016_j_cpc_2008_11_005
PublicationCentury 2000
PublicationDate 2009-12-01
PublicationDateYYYYMMDD 2009-12-01
PublicationDate_xml – month: 12
  year: 2009
  text: 2009-12-01
  day: 01
PublicationDecade 2000
PublicationTitle Computer physics communications
PublicationYear 2009
Publisher Elsevier B.V
Elsevier
Publisher_xml – name: Elsevier B.V
– name: Elsevier
References Notay (bib013) 2000; 22
Vuik (bib017) 1995; 61
J. Langou, et al., Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy, in: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006
Amestoy, Guermouche, L'Excellent, Pralet (bib035) 2006; 32
Dongarra, Moler, Wilkinson (bib046) 1983; 20
J.W. Demmel, Y. Hida, X.S. Li, E.J. Riedy, Extra-precise iterative refinement for overdetermined least squares problems, Technical Report EECS-2007-77, UC Berkeley, 2007, Also LAPACK Working Note 188
Dongarra, Croz, Duff, Hammarling (bib021) 1990; 16
Demmel (bib002) 1997
Stewart (bib005) 1973
Dongarra, Croz, Hammarling, Hanson (bib023) 1988; 14
Moler (bib004) 1967; 14
Demmel (bib006) 2006; 32
Oettli, Prager (bib007) 1964; 6
Kurzak, Buttari, Dongarra (bib028) 2008; 19
Amestoy, Duff, L'Excellent (bib033) 2000; 184
Skeel (bib039) 1980; 35
Barrett (bib008) 1994
Turner, Walker (bib010) 1992; 13
van der Vorst, Vuik (bib018) 1994; 1
Davis (bib036) 1999; 25
Dongarra, Croz, Duff, Hammarling (bib020) 1990; 16
Amestoy, Duff, L'Excellent, Koster (bib034) 2001; 23
Björck (bib045) 1996
Higham (bib040) 2002
Golub, Ye (bib012) 2000; 21
Davis (bib037) 2004; 30
Davis, Duff (bib038) 1997; 18
Dongarra (bib047) 1983; 4
Saad (bib009) 2003
Simoncini, Szyld (bib015) 2003; 40
Axelsson, Vassilevski (bib011) 1991; 12
Buttari (bib026) 2007; 21
Ypma (bib001) 1995; 37
Datta (bib042) 1995
M. Arioli, I.S. Duff, Using FGMRES to obtain backward stability in mixed precision, in: Technical Report RAL-TR-2008-006, Rutherford Appleton Laboratory, 2008
Arioli, Demmel, Duff (bib019) 1989; 10
van den Eshof, Sleijpen, van Gijzen (bib016) 2005; 177
K.O. Geddes, W.W. Zheng, Exploiting fast hardware floating point in high precision computation, in: Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation. Philadelphia, PA, USA, 2003, pp. 111–118
Buttari, Dongarra, Kurzak, Luszczek, Tomov (bib032) 2008; 34
Y. Saad, A flexible inner–outer preconditioned GMRES algorithm, Technical Report 91-279, Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, 1991
Dongarra, Croz, Hammarling, Hanson (bib022) 1988; 14
Lawson, Hanson, Kincaid, Krogh (bib024) 1979; 5
Wilkinson (bib003) 1963
Duff, Reid (bib030) 1983; 9
Ashcraft, Grimes, Lewis, Peyton, Simon (bib031) 1987; 1
Anderson (bib025) 1999
Kurzak, Dongarra (bib029) 2007; 19
Arioli (10.1016/j.cpc.2008.11.005_bib019) 1989; 10
Dongarra (10.1016/j.cpc.2008.11.005_bib020) 1990; 16
Dongarra (10.1016/j.cpc.2008.11.005_bib023) 1988; 14
Amestoy (10.1016/j.cpc.2008.11.005_bib035) 2006; 32
Demmel (10.1016/j.cpc.2008.11.005_bib006) 2006; 32
Ypma (10.1016/j.cpc.2008.11.005_bib001) 1995; 37
Wilkinson (10.1016/j.cpc.2008.11.005_bib003) 1963
Skeel (10.1016/j.cpc.2008.11.005_bib039) 1980; 35
Notay (10.1016/j.cpc.2008.11.005_bib013) 2000; 22
Amestoy (10.1016/j.cpc.2008.11.005_bib034) 2001; 23
Axelsson (10.1016/j.cpc.2008.11.005_bib011) 1991; 12
van den Eshof (10.1016/j.cpc.2008.11.005_bib016) 2005; 177
Lawson (10.1016/j.cpc.2008.11.005_bib024) 1979; 5
Vuik (10.1016/j.cpc.2008.11.005_bib017) 1995; 61
Anderson (10.1016/j.cpc.2008.11.005_bib025) 1999
10.1016/j.cpc.2008.11.005_bib027
Davis (10.1016/j.cpc.2008.11.005_bib037) 2004; 30
Moler (10.1016/j.cpc.2008.11.005_bib004) 1967; 14
Buttari (10.1016/j.cpc.2008.11.005_bib032) 2008; 34
Datta (10.1016/j.cpc.2008.11.005_bib042) 1995
Demmel (10.1016/j.cpc.2008.11.005_bib002) 1997
Barrett (10.1016/j.cpc.2008.11.005_bib008) 1994
Higham (10.1016/j.cpc.2008.11.005_bib040) 2002
Dongarra (10.1016/j.cpc.2008.11.005_bib021) 1990; 16
Davis (10.1016/j.cpc.2008.11.005_bib038) 1997; 18
Turner (10.1016/j.cpc.2008.11.005_bib010) 1992; 13
Kurzak (10.1016/j.cpc.2008.11.005_bib029) 2007; 19
Golub (10.1016/j.cpc.2008.11.005_bib012) 2000; 21
Buttari (10.1016/j.cpc.2008.11.005_bib026) 2007; 21
10.1016/j.cpc.2008.11.005_bib041
Ashcraft (10.1016/j.cpc.2008.11.005_bib031) 1987; 1
Duff (10.1016/j.cpc.2008.11.005_bib030) 1983; 9
Amestoy (10.1016/j.cpc.2008.11.005_bib033) 2000; 184
10.1016/j.cpc.2008.11.005_bib044
Stewart (10.1016/j.cpc.2008.11.005_bib005) 1973
10.1016/j.cpc.2008.11.005_bib043
Oettli (10.1016/j.cpc.2008.11.005_bib007) 1964; 6
Kurzak (10.1016/j.cpc.2008.11.005_bib028) 2008; 19
Saad (10.1016/j.cpc.2008.11.005_bib009) 2003
Björck (10.1016/j.cpc.2008.11.005_bib045) 1996
10.1016/j.cpc.2008.11.005_bib014
Dongarra (10.1016/j.cpc.2008.11.005_bib046) 1983; 20
Davis (10.1016/j.cpc.2008.11.005_bib036) 1999; 25
Simoncini (10.1016/j.cpc.2008.11.005_bib015) 2003; 40
Dongarra (10.1016/j.cpc.2008.11.005_bib022) 1988; 14
van der Vorst (10.1016/j.cpc.2008.11.005_bib018) 1994; 1
Dongarra (10.1016/j.cpc.2008.11.005_bib047) 1983; 4
References_xml – volume: 14
  start-page: 316
  year: 1967
  ident: bib004
  publication-title: J. ACM
– volume: 6
  start-page: 405
  year: 1964
  ident: bib007
  publication-title: Numer. Math.
– volume: 16
  start-page: 18
  year: 1990
  ident: bib020
  publication-title: ACM Trans. Math. Software
– volume: 32
  start-page: 136
  year: 2006
  ident: bib035
  publication-title: Parallel Comput.
– volume: 4
  start-page: 712
  year: 1983
  ident: bib047
  publication-title: SIAM J. Scientific Statist. Comput.
– year: 1963
  ident: bib003
  article-title: Rounding Errors in Algebraic Processes
– volume: 5
  start-page: 308
  year: 1979
  ident: bib024
  publication-title: ACM Trans. Math. Software
– volume: 1
  start-page: 10
  year: 1987
  ident: bib031
  publication-title: Intern. J. Supercomput. Appl.
– volume: 21
  start-page: 457
  year: 2007
  ident: bib026
  publication-title: Int. J. High Performance Comput. Appl.
– volume: 9
  start-page: 302
  year: 1983
  ident: bib030
  publication-title: ACM Trans. Math. Software
– volume: 1
  start-page: 369
  year: 1994
  ident: bib018
  publication-title: Numer. Linear Algebra Appl.
– volume: 184
  start-page: 501
  year: 2000
  ident: bib033
  publication-title: Comput. Methods Appl. Mech. Eng.
– volume: 22
  start-page: 1444
  year: 2000
  ident: bib013
  publication-title: SIAM J. Scientific Comput.
– reference: J. Langou, et al., Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy, in: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006
– volume: 14
  start-page: 18
  year: 1988
  ident: bib022
  publication-title: ACM Trans. Math. Software
– volume: 21
  start-page: 1305
  year: 2000
  ident: bib012
  publication-title: SIAM J. Scientific Comput.
– volume: 14
  start-page: 1
  year: 1988
  ident: bib023
  publication-title: ACM Trans. Math. Software
– year: 1999
  ident: bib025
  article-title: LAPACK Users' Guide
– volume: 19
  start-page: 1
  year: 2008
  ident: bib028
  publication-title: IEEE Trans. Parallel Distrib. Systems
– year: 1994
  ident: bib008
  article-title: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods
– volume: 34
  start-page: 17
  year: 2008
  ident: bib032
  publication-title: ACM Trans. Math. Software
– volume: 32
  start-page: 325
  year: 2006
  ident: bib006
  publication-title: ACM Trans. Math. Software
– volume: 12
  start-page: 625
  year: 1991
  ident: bib011
  publication-title: SIAM J. Matrix Anal. Appl.
– year: 1973
  ident: bib005
  article-title: Introduction to Matrix Computations
– volume: 37
  start-page: 531
  year: 1995
  ident: bib001
  publication-title: SIAM Review
– reference: K.O. Geddes, W.W. Zheng, Exploiting fast hardware floating point in high precision computation, in: Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation. Philadelphia, PA, USA, 2003, pp. 111–118
– volume: 35
  start-page: 817
  year: 1980
  ident: bib039
  publication-title: Math. Comput.
– volume: 20
  start-page: 23
  year: 1983
  ident: bib046
  publication-title: SIAM J. Numer. Anal.
– volume: 40
  start-page: 2219
  year: 2003
  ident: bib015
  publication-title: SIAM J. Numer. Anal.
– year: 1997
  ident: bib002
  article-title: Applied Numerical Linear Algebra
– reference: Y. Saad, A flexible inner–outer preconditioned GMRES algorithm, Technical Report 91-279, Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, 1991
– volume: 177
  start-page: 347
  year: 2005
  ident: bib016
  publication-title: J. Comput. Appl. Math.
– volume: 18
  start-page: 140
  year: 1997
  ident: bib038
  publication-title: SIAM J. Matrix Anal. Appl.
– year: 2003
  ident: bib009
  article-title: Iterative Methods for Sparse Linear Systems
– volume: 23
  start-page: 15
  year: 2001
  ident: bib034
  publication-title: SIAM J. Matrix Anal. Appl.
– volume: 61
  start-page: 189
  year: 1995
  ident: bib017
  publication-title: J. Comput. Appl. Math.
– volume: 19
  start-page: 1371
  year: 2007
  ident: bib029
  publication-title: Concurrency Computat.: Pract. Exper.
– volume: 16
  start-page: 1
  year: 1990
  ident: bib021
  publication-title: ACM Trans. Math. Software
– volume: 25
  start-page: 1
  year: 1999
  ident: bib036
  publication-title: ACM Trans. Math. Software
– reference: M. Arioli, I.S. Duff, Using FGMRES to obtain backward stability in mixed precision, in: Technical Report RAL-TR-2008-006, Rutherford Appleton Laboratory, 2008
– year: 2002
  ident: bib040
  article-title: Accuracy and Stability of Numerical Algorithms
– year: 1995
  ident: bib042
  article-title: Numerical Linear Algebra and Applications
– reference: J.W. Demmel, Y. Hida, X.S. Li, E.J. Riedy, Extra-precise iterative refinement for overdetermined least squares problems, Technical Report EECS-2007-77, UC Berkeley, 2007, Also LAPACK Working Note 188
– volume: 13
  start-page: 815
  year: 1992
  ident: bib010
  publication-title: SIAM J. Sci. Stat. Comput.
– volume: 10
  start-page: 165
  year: 1989
  ident: bib019
  publication-title: SIAM J. Matrix Anal. Appl.
– volume: 30
  start-page: 196
  year: 2004
  ident: bib037
  publication-title: ACM Trans. Math. Software
– year: 1996
  ident: bib045
  article-title: Numerical Methods for Least Squares Problems
– volume: 32
  start-page: 325
  year: 2006
  ident: 10.1016/j.cpc.2008.11.005_bib006
  publication-title: ACM Trans. Math. Software
  doi: 10.1145/1141885.1141894
– year: 1997
  ident: 10.1016/j.cpc.2008.11.005_bib002
– volume: 16
  start-page: 1
  year: 1990
  ident: 10.1016/j.cpc.2008.11.005_bib021
  publication-title: ACM Trans. Math. Software
  doi: 10.1145/77626.79170
– volume: 19
  start-page: 1
  year: 2008
  ident: 10.1016/j.cpc.2008.11.005_bib028
  publication-title: IEEE Trans. Parallel Distrib. Systems
  doi: 10.1109/TPDS.2007.70813
– volume: 177
  start-page: 347
  year: 2005
  ident: 10.1016/j.cpc.2008.11.005_bib016
  publication-title: J. Comput. Appl. Math.
  doi: 10.1016/j.cam.2004.09.024
– year: 1999
  ident: 10.1016/j.cpc.2008.11.005_bib025
– volume: 21
  start-page: 457
  year: 2007
  ident: 10.1016/j.cpc.2008.11.005_bib026
  publication-title: Int. J. High Performance Comput. Appl.
  doi: 10.1177/1094342007084026
– year: 2002
  ident: 10.1016/j.cpc.2008.11.005_bib040
– volume: 1
  start-page: 369
  year: 1994
  ident: 10.1016/j.cpc.2008.11.005_bib018
  publication-title: Numer. Linear Algebra Appl.
  doi: 10.1002/nla.1680010404
– volume: 20
  start-page: 23
  year: 1983
  ident: 10.1016/j.cpc.2008.11.005_bib046
  publication-title: SIAM J. Numer. Anal.
  doi: 10.1137/0720002
– ident: 10.1016/j.cpc.2008.11.005_bib041
– volume: 34
  start-page: 17
  year: 2008
  ident: 10.1016/j.cpc.2008.11.005_bib032
  publication-title: ACM Trans. Math. Software
  doi: 10.1145/1377596.1377597
– volume: 30
  start-page: 196
  year: 2004
  ident: 10.1016/j.cpc.2008.11.005_bib037
  publication-title: ACM Trans. Math. Software
  doi: 10.1145/992200.992206
– year: 1963
  ident: 10.1016/j.cpc.2008.11.005_bib003
– volume: 4
  start-page: 712
  year: 1983
  ident: 10.1016/j.cpc.2008.11.005_bib047
  publication-title: SIAM J. Scientific Statist. Comput.
  doi: 10.1137/0904049
– volume: 14
  start-page: 1
  year: 1988
  ident: 10.1016/j.cpc.2008.11.005_bib023
  publication-title: ACM Trans. Math. Software
  doi: 10.1145/42288.42291
– volume: 184
  start-page: 501
  year: 2000
  ident: 10.1016/j.cpc.2008.11.005_bib033
  publication-title: Comput. Methods Appl. Mech. Eng.
  doi: 10.1016/S0045-7825(99)00242-X
– volume: 18
  start-page: 140
  year: 1997
  ident: 10.1016/j.cpc.2008.11.005_bib038
  publication-title: SIAM J. Matrix Anal. Appl.
  doi: 10.1137/S0895479894246905
– volume: 5
  start-page: 308
  year: 1979
  ident: 10.1016/j.cpc.2008.11.005_bib024
  publication-title: ACM Trans. Math. Software
  doi: 10.1145/355841.355847
– volume: 9
  start-page: 302
  year: 1983
  ident: 10.1016/j.cpc.2008.11.005_bib030
  publication-title: ACM Trans. Math. Software
  doi: 10.1145/356044.356047
– volume: 21
  start-page: 1305
  year: 2000
  ident: 10.1016/j.cpc.2008.11.005_bib012
  publication-title: SIAM J. Scientific Comput.
  doi: 10.1137/S1064827597323415
– year: 1995
  ident: 10.1016/j.cpc.2008.11.005_bib042
– volume: 32
  start-page: 136
  year: 2006
  ident: 10.1016/j.cpc.2008.11.005_bib035
  publication-title: Parallel Comput.
  doi: 10.1016/j.parco.2005.07.004
– ident: 10.1016/j.cpc.2008.11.005_bib043
  doi: 10.1145/860854.860886
– year: 1996
  ident: 10.1016/j.cpc.2008.11.005_bib045
– volume: 1
  start-page: 10
  year: 1987
  ident: 10.1016/j.cpc.2008.11.005_bib031
  publication-title: Intern. J. Supercomput. Appl.
  doi: 10.1177/109434208700100403
– volume: 6
  start-page: 405
  year: 1964
  ident: 10.1016/j.cpc.2008.11.005_bib007
  publication-title: Numer. Math.
  doi: 10.1007/BF01386090
– ident: 10.1016/j.cpc.2008.11.005_bib014
– year: 2003
  ident: 10.1016/j.cpc.2008.11.005_bib009
– volume: 16
  start-page: 18
  year: 1990
  ident: 10.1016/j.cpc.2008.11.005_bib020
  publication-title: ACM Trans. Math. Software
  doi: 10.1145/77626.77627
– volume: 12
  start-page: 625
  year: 1991
  ident: 10.1016/j.cpc.2008.11.005_bib011
  publication-title: SIAM J. Matrix Anal. Appl.
  doi: 10.1137/0612048
– volume: 35
  start-page: 817
  year: 1980
  ident: 10.1016/j.cpc.2008.11.005_bib039
  publication-title: Math. Comput.
  doi: 10.1090/S0025-5718-1980-0572859-4
– year: 1994
  ident: 10.1016/j.cpc.2008.11.005_bib008
– volume: 13
  start-page: 815
  year: 1992
  ident: 10.1016/j.cpc.2008.11.005_bib010
  publication-title: SIAM J. Sci. Stat. Comput.
  doi: 10.1137/0913048
– volume: 22
  start-page: 1444
  year: 2000
  ident: 10.1016/j.cpc.2008.11.005_bib013
  publication-title: SIAM J. Scientific Comput.
  doi: 10.1137/S1064827599362314
– volume: 40
  start-page: 2219
  year: 2003
  ident: 10.1016/j.cpc.2008.11.005_bib015
  publication-title: SIAM J. Numer. Anal.
  doi: 10.1137/S0036142902401074
– year: 1973
  ident: 10.1016/j.cpc.2008.11.005_bib005
– volume: 25
  start-page: 1
  year: 1999
  ident: 10.1016/j.cpc.2008.11.005_bib036
  publication-title: ACM Trans. Math. Software
  doi: 10.1145/305658.287640
– ident: 10.1016/j.cpc.2008.11.005_bib044
– volume: 14
  start-page: 316
  year: 1967
  ident: 10.1016/j.cpc.2008.11.005_bib004
  publication-title: J. ACM
  doi: 10.1145/321386.321394
– ident: 10.1016/j.cpc.2008.11.005_bib027
  doi: 10.1109/SC.2006.30
– volume: 61
  start-page: 189
  year: 1995
  ident: 10.1016/j.cpc.2008.11.005_bib017
  publication-title: J. Comput. Appl. Math.
  doi: 10.1016/0377-0427(94)00067-B
– volume: 37
  start-page: 531
  year: 1995
  ident: 10.1016/j.cpc.2008.11.005_bib001
  publication-title: SIAM Review
  doi: 10.1137/1037125
– volume: 23
  start-page: 15
  year: 2001
  ident: 10.1016/j.cpc.2008.11.005_bib034
  publication-title: SIAM J. Matrix Anal. Appl.
  doi: 10.1137/S0895479899358194
– volume: 19
  start-page: 1371
  year: 2007
  ident: 10.1016/j.cpc.2008.11.005_bib029
  publication-title: Concurrency Computat.: Pract. Exper.
  doi: 10.1002/cpe.1164
– volume: 14
  start-page: 18
  year: 1988
  ident: 10.1016/j.cpc.2008.11.005_bib022
  publication-title: ACM Trans. Math. Software
  doi: 10.1145/42288.42292
– volume: 10
  start-page: 165
  year: 1989
  ident: 10.1016/j.cpc.2008.11.005_bib019
  publication-title: SIAM J. Matrix Anal. Appl.
  doi: 10.1137/0610013
SSID ssj0007793
Score 2.3753948
Snippet On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination...
SourceID hal
proquest
crossref
elsevier
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 2526
SubjectTerms Computer Science
Iterative refinement
Mathematical Software
Mixed precision
Numerical Analysis
Numerical linear algebra
Title Accelerating scientific computations with mixed precision algorithms
URI https://dx.doi.org/10.1016/j.cpc.2008.11.005
https://www.proquest.com/docview/34960479
https://hal.science/hal-02420940
Volume 180
WOSCitedRecordID wos000273011500011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1879-2944
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0007793
  issn: 0010-4655
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6FFCQuiKcITwtxAhnZ69f6aEFReVU9FCk3a18ubYMTOXYUceaHM-NdPxrUih64WNHaXtk7X2bGszPfEPI6SGOVBkK7XqgLN0wL6nLAkRup2GeJ4pESYdtsIjk8ZPN5ejSZ_O5qYTaLpCzZdpuu_quoYQyEjaWz1xB3PykMwG8QOhxB7HD8J8FnUoIpQcFipKCtd8R0oDZ5vKlt5lsbfv15utUtS4Bps_OWL06WFZywBOYdf4Ht-2CDIGucaKgpGaLsmOJsu79_gyXtx5u65qacPVsUSE_a-87L8oRXlU3WlX3J0Jem-sXPzeh5Iy7EJdJRjofVtaDhkZ3toq71xqCiY9UZ0XhkhmlkCDL-UvEm2nD2Tq6kSYVFElYvGuxZt4e_Y-b65MMur-0shylMG074CGqJcPdoEqVsSvayT_vzz71FTxJL3mzfqNsdb_MEd57jMv_mxg9MtN2x960Tc3yX3LFfH05mUHOPTHR5n9w6MoJ9QD6MseMM2HHG2HEQO06LHafHjjNg5yH5_nH_-P2Ba9tsuBK89dplQgkqwO_UvsRtcCmY0IEIwTEOlO8LP1AFFQUrdKKKdus6ZopRLgpQ9lyy4BGZlstSPyZOUIC7i9XZSSFD6oWcYzwhkCxWYUR5NCNetzi5tBz02AplkV8qlBl509-yMgQsV10cdiueWw_SeIY5oOeq216BdPrpkXH9IPua4xi6sEgxufFn5GUnvBxUMO6r8VIvm3WOPRewU8OT6zzqU3J7-Nc8I9O6avRzclNu6tN19cIi8A-5P6mX
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Accelerating+scientific+computations+with+mixed+precision+algorithms&rft.jtitle=Computer+physics+communications&rft.au=Baboulin%2C+Marc&rft.au=Buttari%2C+Alfredo&rft.au=Dongarra%2C+Jack&rft.au=Kurzak%2C+Jakub&rft.date=2009-12-01&rft.issn=0010-4655&rft.volume=180&rft.issue=12&rft.spage=2526&rft.epage=2533&rft_id=info:doi/10.1016%2Fj.cpc.2008.11.005&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_cpc_2008_11_005
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0010-4655&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0010-4655&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0010-4655&client=summon