Benchmarks of Cuda-Based GMRES Solver for Toeplitz and Hankel Matrices and Applications to Topology Optimization of Photonic Components

Generalized Minimal Residual Method (GMRES) was benchmarked on many types of GPUs for solving linear systems based on dense and sparse matrices. However, there are still no GMRES implementation benchmarks on Tesla V100 compared to GTX 1080 Ti ones or even for Toeplitz-like matrices. The introduced s...

Full description

Saved in:
Bibliographic Details
Published in:Computational mathematics and modeling Vol. 32; no. 4; pp. 438 - 452
Main Authors: Minin, Iu. B., Matveev, S. A., Fedorov, M. V., Zacharov, I. E., Rykovanov, S. G.
Format: Journal Article
Language:English
Published: New York Springer US 01.10.2021
Springer Nature B.V
Subjects:
ISSN:1046-283X, 1573-837X
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Generalized Minimal Residual Method (GMRES) was benchmarked on many types of GPUs for solving linear systems based on dense and sparse matrices. However, there are still no GMRES implementation benchmarks on Tesla V100 compared to GTX 1080 Ti ones or even for Toeplitz-like matrices. The introduced software consists of a Python module and a C++ library which enable to manage streams for concurrent computations of separated linear systems on a GPU (and GPUs). The GMRES solver is parallelized for running on a NVIDIA GPGPU accelerator. The parallelization efficiency is explored when GMRES is applied to solve (Helmholtz equation) linear systems based on the use of Green’s Function Integral Equation Method (GFIEM) for computing electric field distribution in the design domain. The proposed implementation shew the maximal speedup of 55 ( t ¯ = 0.017 s ) and of 125 ( t ¯ = 0.77 s ) for 1024 × 1024 (on GTX 1080 Ti) and 8192 × 8192 (on Tesla V100) dense Toeplitz matrices generated from GFIEM. 1024 × 1024 resolution provides accuracy 6.1% that can be acceptable according to testing and demonstrating on gradient computations and topology optimization. We open up possibilities for robust topology optimization of passive photonic integrated components. That has the advantage, e. g., of faster and more accurate designing photonic components on a PC without a supercomputer.
AbstractList Generalized Minimal Residual Method (GMRES) was benchmarked on many types of GPUs for solving linear systems based on dense and sparse matrices. However, there are still no GMRES implementation benchmarks on Tesla V100 compared to GTX 1080 Ti ones or even for Toeplitz-like matrices. The introduced software consists of a Python module and a C++ library which enable to manage streams for concurrent computations of separated linear systems on a GPU (and GPUs). The GMRES solver is parallelized for running on a NVIDIA GPGPU accelerator. The parallelization efficiency is explored when GMRES is applied to solve (Helmholtz equation) linear systems based on the use of Green’s Function Integral Equation Method (GFIEM) for computing electric field distribution in the design domain. The proposed implementation shew the maximal speedup of 55 ( t ¯ = 0.017 s ) and of 125 ( t ¯ = 0.77 s ) for 1024 × 1024 (on GTX 1080 Ti) and 8192 × 8192 (on Tesla V100) dense Toeplitz matrices generated from GFIEM. 1024 × 1024 resolution provides accuracy 6.1% that can be acceptable according to testing and demonstrating on gradient computations and topology optimization. We open up possibilities for robust topology optimization of passive photonic integrated components. That has the advantage, e. g., of faster and more accurate designing photonic components on a PC without a supercomputer.
Generalized Minimal Residual Method (GMRES) was benchmarked on many types of GPUs for solving linear systems based on dense and sparse matrices. However, there are still no GMRES implementation benchmarks on Tesla V100 compared to GTX 1080 Ti ones or even for Toeplitz-like matrices. The introduced software consists of a Python module and a C++ library which enable to manage streams for concurrent computations of separated linear systems on a GPU (and GPUs). The GMRES solver is parallelized for running on a NVIDIA GPGPU accelerator. The parallelization efficiency is explored when GMRES is applied to solve (Helmholtz equation) linear systems based on the use of Green’s Function Integral Equation Method (GFIEM) for computing electric field distribution in the design domain. The proposed implementation shew the maximal speedup of 55 (t¯=0.017s) and of 125 (t¯=0.77s) for 1024 × 1024 (on GTX 1080 Ti) and 8192 × 8192 (on Tesla V100) dense Toeplitz matrices generated from GFIEM. 1024 × 1024 resolution provides accuracy 6.1% that can be acceptable according to testing and demonstrating on gradient computations and topology optimization. We open up possibilities for robust topology optimization of passive photonic integrated components. That has the advantage, e. g., of faster and more accurate designing photonic components on a PC without a supercomputer.
Author Minin, Iu. B.
Zacharov, I. E.
Matveev, S. A.
Fedorov, M. V.
Rykovanov, S. G.
Author_xml – sequence: 1
  givenname: Iu. B.
  surname: Minin
  fullname: Minin, Iu. B.
  organization: Skoltech Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Fryazino Branch of Kotel’nikov Institute of Radio-Engineering and Electronics of Russian Academy of Sciences
– sequence: 2
  givenname: S. A.
  surname: Matveev
  fullname: Matveev, S. A.
  organization: Skoltech Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Marchuk Institute of Numerical Mathematics, Russian Academy of Sciences
– sequence: 3
  givenname: M. V.
  surname: Fedorov
  fullname: Fedorov, M. V.
  organization: Skoltech Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Sirius University of Science and Technology
– sequence: 4
  givenname: I. E.
  surname: Zacharov
  fullname: Zacharov, I. E.
  organization: Skoltech Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology
– sequence: 5
  givenname: S. G.
  surname: Rykovanov
  fullname: Rykovanov, S. G.
  organization: Skoltech Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology
BookMark eNp9kM1OAyEUhYnRxN8XcEXiGmVgmIGlNv4lGo2tSXeEMkxFpzACNWlfwNeWtiYmLno33HDux-GeQ7DrvDMAnBb4vMC4vogFZoIjTAjCgpUMkR1wULCaIk7r8W7ucVkhwul4HxzG-I4x5oTiA_B9ZZx-m6nwEaFv4WDeKHSlomng7ePL9RAOffdlAmx9gCNv-s6mJVSugXfKfZgOPqoUrDZxfXfZZ12rZL2LMPkM9L7z0wV86pOd2eVaWbk8v_nkndVw4Gd9XsSleAz2WtVFc_J7HoHXm-vR4A49PN3eDy4fkCa0JGiiK01aIZqC1rl01ZSswqxWjSGTGmehFUaJuqBcMU6qGvNCa8PpRAkucEWPwNnm3T74z7mJSb77eXDZUpKqoiVnjIg8RTZTOvgYg2llH2wOaSELLFeBy03gMgcu14FLkiH-D9I2rXdOQdluO0o3aMw-bmrC36-2UD9mL5gD
CitedBy_id crossref_primary_10_3390_en17081883
crossref_primary_10_1134_S106422692310011X
crossref_primary_10_1177_17483026231184168
Cites_doi 10.1109/EnT-MIPT.2018.00040
10.1137/S0895479803437803
10.1016/S0167-8191(97)00004-5
10.1137/0907058
10.1007/978-3-642-33078-0_31
10.1145/1089014.1089021
10.1016/S0024-3795(00)00064-1
10.3389/fbuil.2018.00069
10.1115/1.4005491
10.1002/(SICI)1097-0207(19980830)42:8<1441::AID-NME428>3.0.CO;2-C
10.1515/nanoph-2019-0308
10.1109/IPDPS.2014.48
10.1016/j.laa.2005.03.040
10.2172/1614847
10.1364/JOSAA.21.002223
10.1007/s11227-012-0825-3
10.1515/eng-2019-0059
10.1109/ISQED.2012.6187484
10.1137/10078356X
10.1201/9781420049961
10.1007/BF01732607
10.1137/12086563X
10.1080/01468039308204219
10.1002/num.20484
10.1016/S0167-8191(98)00084-2
10.1364/JOSAA.13.002441
ContentType Journal Article
Copyright Springer Science+Business Media, LLC, part of Springer Nature 2022
Springer Science+Business Media, LLC, part of Springer Nature 2022.
Copyright_xml – notice: Springer Science+Business Media, LLC, part of Springer Nature 2022
– notice: Springer Science+Business Media, LLC, part of Springer Nature 2022.
DBID AAYXX
CITATION
JQ2
DOI 10.1007/s10598-022-09545-2
DatabaseName CrossRef
ProQuest Computer Science Collection
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList
ProQuest Computer Science Collection
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
Computer Science
EISSN 1573-837X
EndPage 452
ExternalDocumentID 10_1007_s10598_022_09545_2
GroupedDBID -52
-5D
-5G
-BR
-EM
-Y2
-~C
.86
.DC
.VR
06D
0R~
0VY
1N0
1SB
2.D
28-
29F
2J2
2JN
2JY
2KG
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5GY
5QI
5VS
642
67Z
6NX
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACIWK
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACSNA
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARMRJ
ASPBG
AVWKF
AXYYD
AZFZN
B-.
BA0
BAPOH
BBWZM
BDATZ
BGNMA
BSONS
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
EBLON
EBS
EIOEI
EJD
ESBYG
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAK
LLZTM
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9R
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RNI
RNS
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCLPG
SDD
SDH
SDM
SHX
SISQX
SJYHP
SMT
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TSG
TSK
TSV
TUC
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
XU3
YLTOR
Z83
Z8W
ZMTXR
ZWQNP
~EX
AAPKM
AAYXX
ABDBE
ABFSG
ABJCF
ABRTQ
ACSTC
ADHKG
AEZWR
AFDZB
AFFHD
AFHIU
AFKRA
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ARAPS
ATHPR
AZQEC
BENPR
BGLVJ
CCPQU
CITATION
DWQXO
GNUQQ
HCIFZ
K7-
M2P
M7S
PHGZM
PHGZT
PQGLB
PTHSS
JQ2
ID FETCH-LOGICAL-c2342-bc6c2f99d137777c6d456057ade2b7099df9ea97138a58267081cce83ba989063
IEDL.DBID RSV
ISSN 1046-283X
IngestDate Wed Sep 17 23:58:22 EDT 2025
Sat Nov 29 04:11:54 EST 2025
Tue Nov 18 20:51:59 EST 2025
Fri Feb 21 02:47:25 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords Python module
C++
performance
GMRES
Helmholtz equation
Toeplitz-like matrix
topology optimization
parallelization
GPU
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2342-bc6c2f99d137777c6d456057ade2b7099df9ea97138a58267081cce83ba989063
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 2663485529
PQPubID 2043702
PageCount 15
ParticipantIDs proquest_journals_2663485529
crossref_primary_10_1007_s10598_022_09545_2
crossref_citationtrail_10_1007_s10598_022_09545_2
springer_journals_10_1007_s10598_022_09545_2
PublicationCentury 2000
PublicationDate 20211000
PublicationDateYYYYMMDD 2021-10-01
PublicationDate_xml – month: 10
  year: 2021
  text: 20211000
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle Computational mathematics and modeling
PublicationTitleAbbrev Comput Math Model
PublicationYear 2021
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References X. Liu, Z. Liu, S. X.-Tan, and A. J. Gordon, “Full-chip thermal analysis of 3D ICs with liquid cooling by GPU-accelerated GMRES method,” in: Thirteenth International Symposium on Quality Electronic Design (ISQED) (2012), pp. 123–128; 10.1109/ISQED.2012.6187484.
ChenZLiuHYuSHsiehBShaoLReservoir simulation on nvidia tesla gpusRec. Adv. Sci. Comp. Appl.201358612530758621278.86001
KavithaSVijayVSakethAMatrix sort-a parallelizable sorting algorithmInt. J. Comp. Appl.2016143916
LiuYMukherjeeSNishimuraNSchanzMYeWSutradharAPanEDumontNFrangiASaezARecent advances and emerging applications of the boundary element methodAppl. Mech. Rev.201164310.1115/1.4005491
BorghiRGoriFSantarsieroMFrezzaFSchettiniGPlane-wave scattering by a set of perfectly conducting circular cylinders in the presence of a plane surfaceJOSA A199613122441245210.1364/JOSAA.13.002441
LiGA block variant of the gmres method on massively parallel processorsPar. Comp.199723810051019146540510.1016/S0167-8191(97)00004-50896.65027
H. Anzt, T. Cojean, G. Flegar, F. Gbel, T. Grtzmacher, P. Nayak, T. Ribizel, Y. M. Tsai, A. E. S. Quintana-Ortí, Ginkgo, A Modern Linear Operator Algebra Framework for High Performance Computing (2020).
MarchukGKuznetsovYOn the question of optimal iteration processes [in Russian]Doklady Akademii SSSR196818113311334231514
J. Bannister, L. Fratta, and M. Gerla, “Optimal topologies for the wavelength-division optical network,” in: Proc. EFOC/LAN’90, Munich, Germany (1990), pp. 53–57.
DrkošováJGreenbaumARozložníkMStrakošZNumerical stability of GMRESBIT Num. Math.1995353309330143091210.1007/BF01732607
I. Zacharov, R. Arslanov, M. Gunin, D. Stefonishin, A. Bykov, S. Pavlov, O. Panarin, A. Maliutin, S. Rykovanov, and M. Fedorov, “Zhores” — Petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in Skolkovo Institute of Science and Technology, vol. 9, pp. 512–520 (2019); 10.1515/eng-2019-0059; https://www.degruyter.com/view/j/eng.2019.9.issue-1/eng-2019-0059/eng-2019-0059.xml.
SøderåardTGreen’s Function Integral Equation Methods in Nano-optics2019Boca RatonCRC Press
CalvettiDLewisBReichelLGmres-type methods for inconsistent systemsLin. Alg. Appl.20003161-3157169178242210.1016/S0024-3795(00)00064-1
I. B. Minin, E. E. Nuzhin, A. I. Boyko, M. S. Litsarev, and I. V. Oseledets, “Evolutionary structural optimization al- gorithm based on fft-jvie solver for inverse design of wave devices,” in: 2018 Engineering and Telecommunication (EnT-MIPT) (2018), pp. 146–150.
OlshevskyVOseledetsITyrtyshnikovETensor properties of multilevel toeplitz and related matricesLin. Alg. Appl.20064121121218085510.1016/j.laa.2005.03.040
LiRSaadYGPU-accelerated preconditioned iterative linear solversJ. Supercomp.201363244346610.1007/s11227-012-0825-3
PartingtonJRPartingtonJRAn Introduction to Hankel Operators1988CambridgeCambridge University Press0668.47022
PanVYStructured Matrices and Polynomials: Unified Superfast Algorithms2012BostonSpringer0996.65028
HerouxMABartlettRAHowleVEHoekstraRJHuJJKoldaTGLehoucqRBLongKRPawlowskiRPPhippsETAn overview of the Trilinos projectACM TOMS2005313397423226680010.1145/1089014.1089021
M. Bobrov, R. Melton, S. Radziszowski, and M. Lukowiak, “Effects of GPU and CPU loads on performance of CUDA applications,” in: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 1, WorldComp (2011).
BanerjeeSMukherjeeBThe photonic ring: Algorithms for optimized node arrangementsFib. & Int. Opt.199312213317110.1080/01468039308204219
AsanoTNodaSIterative optimization of photonic crystal nanocavity designs by using deep neural networksNanoph.20198122243225610.1515/nanoph-2019-0308
I. B. Minin, cuGMRES (2020); https://github.com/iurii-minin/cuGMRES.
BendsoeMPSigmundOTopology Optimization: Theory, Methods, and Applications2013BerlinSpringer1059.74001
SandersJKandrotECUDA by Example: an Introduction to General-Purpose GPU Programming2005
JáJáJAn Introduction to Parallel Algorithms1992New YorkAddison-Wesley Reading0781.68009
Martínez-CastroAEMolina-MoyaJAOrtizPAn iterative parallel solver in gpu applied to frequency domain linear water wave problems by the boundary element methodFront. Built Env.201846910.3389/fbuil.2018.00069
I. B. Minin, pycuGMRES (2020); https://github.com/iurii-minin/pycuGMRES; https://pypi.org/project/pycuGMRES/.
SaadYSchultzMHGMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systemsSIAM J. Sc. Stat. Comp.19867385686984856810.1137/0907058
M. Hoemmen, Communication-avoiding Krylov subspace methods, PhD Thesis, UC Berkeley (2010).
D. Guide, “Cuda c best practices guide,” NVIDIA, July (2013).
I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, and J. Dongarra, “Improving the performance of ca-gmres on multicores with multiple gpus,” in: 2014 IEEE 28th International Parallel and Distributed Processing Symposium (2014), pp. 382–391.
MeurantGComputer Solution of Large Linear Systems1999AmsterdamElsevier0934.65032
KarlsonRA Study of Some Roundoff Effects of the GMRES-Method1991LinköpingUniversitetet i Linköping/Tekniska Högskolan i Linköping
I. Dravins, “Numerical implementations of the generalized minimal residual method (GMRES),” MSc Theses in Math. Sci. (2015).
E. de Sturler, “A parallel variant of GMRES (m),” in: Proceedings of the 13th IMACS World Congress on Computational and Applied Mathematics, IMACS, Criterion Press, vol. 9 (1991).
WalkerHFNiPAnderson acceleration for fixed-point iterationsSIAM J. Num. Anal.201149417151735283106810.1137/10078356X
E. C. Carson, Communication-avoiding Krylov subspace methods in theory and practice, PhD Thesis, UC Berkeley (2015).
FengYTPeriDOwenDRJA multi-grid enhanced gmres algorithm for elasto-plastic problemsInt. J. Num. Meth. Eng.19984281441146210.1002/(SICI)1097-0207(19980830)42:8<1441::AID-NME428>3.0.CO;2-C
P. Ghysels, T. Ashby, K. Meerbergen, and W. Vanroose, “Hiding global communication latency in the gmres algorithm on massively parallel machines,” SIAM J. Sci. Comp., 35(1), 48–71 (2013); 10.1137/12086563X; 10.1137/12086563X.
SmajicJHafnerCErniDOptimization of photonic crystal structuresJOSA A200421112223223210.1364/JOSAA.21.002223
S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. Gropp, et al., Petsc Users Manual (2019).
ChuEGeorgeAInside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms1999Boca RatonCRC Press10.1201/9781420049961
R. Couturier, “Designing scientific applications on GPUs,” Chapman & Hall/CRC Numerical Analysis and Scientific Computing Series, CRC Press, Boca Raton (2013); https://books.google.ru/books?id=C1 SBQAAQBAJ.
LuciaMMaggioFRodriguezGNumerical solution of the helmholtz equation in an infinite strip by wiener- hopf factorizationNum. Meth. Part. Diff. Eq.20102661247127427323771202.65140
HarrisMAn efficient matrix transpose in CUDA C/C++Nvidia2013262018
ReichelLYeQBreakdown-free gmres for singular systemsSIAM J. Math. Anal. Appl.200526410011021217820910.1137/S0895479803437803
T. J. Ashby, P. Ghysels, W. Heirman, and W. Vanroose, “The impact of global communication latency at extreme scales on Krylov methods,” in: International Conference on Algorithms and Architectures for Parallel Processing, Springer (2012), pp. 428–442.
VuikCvan NooyenRRPWesselingAPParallelism in ILU-preconditioned GMRESPar. Comp.1998241419271946165656510.1016/S0167-8191(98)00084-2
G Li (9545_CR26) 1997; 23
T Søderåard (9545_CR1) 2019
HF Walker (9545_CR13) 2011; 49
S Banerjee (9545_CR46) 1993; 12
AE Martínez-Castro (9545_CR4) 2018; 4
S Kavitha (9545_CR38) 2016; 143
C Vuik (9545_CR19) 1998; 24
MA Heroux (9545_CR31) 2005; 31
J Drkošová (9545_CR14) 1995; 35
9545_CR32
9545_CR33
Y Liu (9545_CR27) 2011; 64
R Li (9545_CR7) 2013; 63
9545_CR5
MP Bendsoe (9545_CR44) 2013
E Chu (9545_CR2) 1999
9545_CR8
T Asano (9545_CR48) 2019; 8
9545_CR9
9545_CR29
R Karlson (9545_CR15) 1991
V Olshevsky (9545_CR39) 2006; 412
9545_CR23
9545_CR24
9545_CR21
Z Chen (9545_CR6) 2013; 586
9545_CR22
J Smajic (9545_CR47) 2004; 21
9545_CR28
9545_CR25
J JáJá (9545_CR3) 1992
9545_CR18
L Reichel (9545_CR34) 2005; 26
G Marchuk (9545_CR10) 1968; 181
9545_CR12
9545_CR41
YT Feng (9545_CR17) 1998; 42
9545_CR40
R Borghi (9545_CR43) 1996; 13
D Calvetti (9545_CR35) 2000; 316
VY Pan (9545_CR37) 2012
G Meurant (9545_CR16) 1999
Y Saad (9545_CR11) 1986; 7
9545_CR45
JR Partington (9545_CR36) 1988
9545_CR49
J Sanders (9545_CR30) 2005
M Lucia (9545_CR42) 2010; 26
M Harris (9545_CR20) 2013; 26
References_xml – reference: JáJáJAn Introduction to Parallel Algorithms1992New YorkAddison-Wesley Reading0781.68009
– reference: ChuEGeorgeAInside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms1999Boca RatonCRC Press10.1201/9781420049961
– reference: J. Bannister, L. Fratta, and M. Gerla, “Optimal topologies for the wavelength-division optical network,” in: Proc. EFOC/LAN’90, Munich, Germany (1990), pp. 53–57.
– reference: T. J. Ashby, P. Ghysels, W. Heirman, and W. Vanroose, “The impact of global communication latency at extreme scales on Krylov methods,” in: International Conference on Algorithms and Architectures for Parallel Processing, Springer (2012), pp. 428–442.
– reference: PanVYStructured Matrices and Polynomials: Unified Superfast Algorithms2012BostonSpringer0996.65028
– reference: ReichelLYeQBreakdown-free gmres for singular systemsSIAM J. Math. Anal. Appl.200526410011021217820910.1137/S0895479803437803
– reference: KavithaSVijayVSakethAMatrix sort-a parallelizable sorting algorithmInt. J. Comp. Appl.2016143916
– reference: SmajicJHafnerCErniDOptimization of photonic crystal structuresJOSA A200421112223223210.1364/JOSAA.21.002223
– reference: E. C. Carson, Communication-avoiding Krylov subspace methods in theory and practice, PhD Thesis, UC Berkeley (2015).
– reference: VuikCvan NooyenRRPWesselingAPParallelism in ILU-preconditioned GMRESPar. Comp.1998241419271946165656510.1016/S0167-8191(98)00084-2
– reference: I. B. Minin, pycuGMRES (2020); https://github.com/iurii-minin/pycuGMRES; https://pypi.org/project/pycuGMRES/.
– reference: D. Guide, “Cuda c best practices guide,” NVIDIA, July (2013).
– reference: CalvettiDLewisBReichelLGmres-type methods for inconsistent systemsLin. Alg. Appl.20003161-3157169178242210.1016/S0024-3795(00)00064-1
– reference: BendsoeMPSigmundOTopology Optimization: Theory, Methods, and Applications2013BerlinSpringer1059.74001
– reference: HarrisMAn efficient matrix transpose in CUDA C/C++Nvidia2013262018
– reference: E. de Sturler, “A parallel variant of GMRES (m),” in: Proceedings of the 13th IMACS World Congress on Computational and Applied Mathematics, IMACS, Criterion Press, vol. 9 (1991).
– reference: MeurantGComputer Solution of Large Linear Systems1999AmsterdamElsevier0934.65032
– reference: S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. Gropp, et al., Petsc Users Manual (2019).
– reference: M. Hoemmen, Communication-avoiding Krylov subspace methods, PhD Thesis, UC Berkeley (2010).
– reference: H. Anzt, T. Cojean, G. Flegar, F. Gbel, T. Grtzmacher, P. Nayak, T. Ribizel, Y. M. Tsai, A. E. S. Quintana-Ortí, Ginkgo, A Modern Linear Operator Algebra Framework for High Performance Computing (2020).
– reference: I. Dravins, “Numerical implementations of the generalized minimal residual method (GMRES),” MSc Theses in Math. Sci. (2015).
– reference: LuciaMMaggioFRodriguezGNumerical solution of the helmholtz equation in an infinite strip by wiener- hopf factorizationNum. Meth. Part. Diff. Eq.20102661247127427323771202.65140
– reference: FengYTPeriDOwenDRJA multi-grid enhanced gmres algorithm for elasto-plastic problemsInt. J. Num. Meth. Eng.19984281441146210.1002/(SICI)1097-0207(19980830)42:8<1441::AID-NME428>3.0.CO;2-C
– reference: HerouxMABartlettRAHowleVEHoekstraRJHuJJKoldaTGLehoucqRBLongKRPawlowskiRPPhippsETAn overview of the Trilinos projectACM TOMS2005313397423226680010.1145/1089014.1089021
– reference: BorghiRGoriFSantarsieroMFrezzaFSchettiniGPlane-wave scattering by a set of perfectly conducting circular cylinders in the presence of a plane surfaceJOSA A199613122441245210.1364/JOSAA.13.002441
– reference: R. Couturier, “Designing scientific applications on GPUs,” Chapman & Hall/CRC Numerical Analysis and Scientific Computing Series, CRC Press, Boca Raton (2013); https://books.google.ru/books?id=C1 SBQAAQBAJ.
– reference: SøderåardTGreen’s Function Integral Equation Methods in Nano-optics2019Boca RatonCRC Press
– reference: LiuYMukherjeeSNishimuraNSchanzMYeWSutradharAPanEDumontNFrangiASaezARecent advances and emerging applications of the boundary element methodAppl. Mech. Rev.201164310.1115/1.4005491
– reference: AsanoTNodaSIterative optimization of photonic crystal nanocavity designs by using deep neural networksNanoph.20198122243225610.1515/nanoph-2019-0308
– reference: DrkošováJGreenbaumARozložníkMStrakošZNumerical stability of GMRESBIT Num. Math.1995353309330143091210.1007/BF01732607
– reference: I. B. Minin, E. E. Nuzhin, A. I. Boyko, M. S. Litsarev, and I. V. Oseledets, “Evolutionary structural optimization al- gorithm based on fft-jvie solver for inverse design of wave devices,” in: 2018 Engineering and Telecommunication (EnT-MIPT) (2018), pp. 146–150.
– reference: SaadYSchultzMHGMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systemsSIAM J. Sc. Stat. Comp.19867385686984856810.1137/0907058
– reference: ChenZLiuHYuSHsiehBShaoLReservoir simulation on nvidia tesla gpusRec. Adv. Sci. Comp. Appl.201358612530758621278.86001
– reference: MarchukGKuznetsovYOn the question of optimal iteration processes [in Russian]Doklady Akademii SSSR196818113311334231514
– reference: PartingtonJRPartingtonJRAn Introduction to Hankel Operators1988CambridgeCambridge University Press0668.47022
– reference: Martínez-CastroAEMolina-MoyaJAOrtizPAn iterative parallel solver in gpu applied to frequency domain linear water wave problems by the boundary element methodFront. Built Env.201846910.3389/fbuil.2018.00069
– reference: X. Liu, Z. Liu, S. X.-Tan, and A. J. Gordon, “Full-chip thermal analysis of 3D ICs with liquid cooling by GPU-accelerated GMRES method,” in: Thirteenth International Symposium on Quality Electronic Design (ISQED) (2012), pp. 123–128; 10.1109/ISQED.2012.6187484.
– reference: LiGA block variant of the gmres method on massively parallel processorsPar. Comp.199723810051019146540510.1016/S0167-8191(97)00004-50896.65027
– reference: OlshevskyVOseledetsITyrtyshnikovETensor properties of multilevel toeplitz and related matricesLin. Alg. Appl.20064121121218085510.1016/j.laa.2005.03.040
– reference: I. B. Minin, cuGMRES (2020); https://github.com/iurii-minin/cuGMRES.
– reference: WalkerHFNiPAnderson acceleration for fixed-point iterationsSIAM J. Num. Anal.201149417151735283106810.1137/10078356X
– reference: I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, and J. Dongarra, “Improving the performance of ca-gmres on multicores with multiple gpus,” in: 2014 IEEE 28th International Parallel and Distributed Processing Symposium (2014), pp. 382–391.
– reference: LiRSaadYGPU-accelerated preconditioned iterative linear solversJ. Supercomp.201363244346610.1007/s11227-012-0825-3
– reference: BanerjeeSMukherjeeBThe photonic ring: Algorithms for optimized node arrangementsFib. & Int. Opt.199312213317110.1080/01468039308204219
– reference: P. Ghysels, T. Ashby, K. Meerbergen, and W. Vanroose, “Hiding global communication latency in the gmres algorithm on massively parallel machines,” SIAM J. Sci. Comp., 35(1), 48–71 (2013); 10.1137/12086563X; 10.1137/12086563X.
– reference: KarlsonRA Study of Some Roundoff Effects of the GMRES-Method1991LinköpingUniversitetet i Linköping/Tekniska Högskolan i Linköping
– reference: SandersJKandrotECUDA by Example: an Introduction to General-Purpose GPU Programming2005
– reference: I. Zacharov, R. Arslanov, M. Gunin, D. Stefonishin, A. Bykov, S. Pavlov, O. Panarin, A. Maliutin, S. Rykovanov, and M. Fedorov, “Zhores” — Petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in Skolkovo Institute of Science and Technology, vol. 9, pp. 512–520 (2019); 10.1515/eng-2019-0059; https://www.degruyter.com/view/j/eng.2019.9.issue-1/eng-2019-0059/eng-2019-0059.xml.
– reference: M. Bobrov, R. Melton, S. Radziszowski, and M. Lukowiak, “Effects of GPU and CPU loads on performance of CUDA applications,” in: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 1, WorldComp (2011).
– ident: 9545_CR28
  doi: 10.1109/EnT-MIPT.2018.00040
– volume: 26
  start-page: 1001
  issue: 4
  year: 2005
  ident: 9545_CR34
  publication-title: SIAM J. Math. Anal. Appl.
  doi: 10.1137/S0895479803437803
– volume: 23
  start-page: 1005
  issue: 8
  year: 1997
  ident: 9545_CR26
  publication-title: Par. Comp.
  doi: 10.1016/S0167-8191(97)00004-5
– ident: 9545_CR29
– ident: 9545_CR12
– volume: 7
  start-page: 856
  issue: 3
  year: 1986
  ident: 9545_CR11
  publication-title: SIAM J. Sc. Stat. Comp.
  doi: 10.1137/0907058
– ident: 9545_CR23
  doi: 10.1007/978-3-642-33078-0_31
– volume: 31
  start-page: 397
  issue: 3
  year: 2005
  ident: 9545_CR31
  publication-title: ACM TOMS
  doi: 10.1145/1089014.1089021
– volume: 316
  start-page: 157
  issue: 1-3
  year: 2000
  ident: 9545_CR35
  publication-title: Lin. Alg. Appl.
  doi: 10.1016/S0024-3795(00)00064-1
– volume: 4
  start-page: 69
  year: 2018
  ident: 9545_CR4
  publication-title: Front. Built Env.
  doi: 10.3389/fbuil.2018.00069
– ident: 9545_CR41
– volume-title: A Study of Some Roundoff Effects of the GMRES-Method
  year: 1991
  ident: 9545_CR15
– volume-title: Structured Matrices and Polynomials: Unified Superfast Algorithms
  year: 2012
  ident: 9545_CR37
– ident: 9545_CR45
– ident: 9545_CR22
– volume: 64
  issue: 3
  year: 2011
  ident: 9545_CR27
  publication-title: Appl. Mech. Rev.
  doi: 10.1115/1.4005491
– volume: 26
  start-page: 2018
  year: 2013
  ident: 9545_CR20
  publication-title: Nvidia
– volume: 42
  start-page: 1441
  issue: 8
  year: 1998
  ident: 9545_CR17
  publication-title: Int. J. Num. Meth. Eng.
  doi: 10.1002/(SICI)1097-0207(19980830)42:8<1441::AID-NME428>3.0.CO;2-C
– volume: 8
  start-page: 2243
  issue: 12
  year: 2019
  ident: 9545_CR48
  publication-title: Nanoph.
  doi: 10.1515/nanoph-2019-0308
– volume-title: Green’s Function Integral Equation Methods in Nano-optics
  year: 2019
  ident: 9545_CR1
– ident: 9545_CR8
  doi: 10.1109/IPDPS.2014.48
– volume: 412
  start-page: 1
  issue: 1
  year: 2006
  ident: 9545_CR39
  publication-title: Lin. Alg. Appl.
  doi: 10.1016/j.laa.2005.03.040
– ident: 9545_CR21
– ident: 9545_CR32
  doi: 10.2172/1614847
– volume: 21
  start-page: 2223
  issue: 11
  year: 2004
  ident: 9545_CR47
  publication-title: JOSA A
  doi: 10.1364/JOSAA.21.002223
– ident: 9545_CR25
– volume-title: An Introduction to Hankel Operators
  year: 1988
  ident: 9545_CR36
– volume: 63
  start-page: 443
  issue: 2
  year: 2013
  ident: 9545_CR7
  publication-title: J. Supercomp.
  doi: 10.1007/s11227-012-0825-3
– ident: 9545_CR49
  doi: 10.1515/eng-2019-0059
– volume-title: Computer Solution of Large Linear Systems
  year: 1999
  ident: 9545_CR16
– volume-title: An Introduction to Parallel Algorithms
  year: 1992
  ident: 9545_CR3
– ident: 9545_CR33
– volume: 586
  start-page: 125
  year: 2013
  ident: 9545_CR6
  publication-title: Rec. Adv. Sci. Comp. Appl.
– ident: 9545_CR5
  doi: 10.1109/ISQED.2012.6187484
– volume: 49
  start-page: 1715
  issue: 4
  year: 2011
  ident: 9545_CR13
  publication-title: SIAM J. Num. Anal.
  doi: 10.1137/10078356X
– volume-title: Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms
  year: 1999
  ident: 9545_CR2
  doi: 10.1201/9781420049961
– volume: 143
  start-page: 1
  issue: 9
  year: 2016
  ident: 9545_CR38
  publication-title: Int. J. Comp. Appl.
– volume: 35
  start-page: 309
  issue: 3
  year: 1995
  ident: 9545_CR14
  publication-title: BIT Num. Math.
  doi: 10.1007/BF01732607
– volume-title: Topology Optimization: Theory, Methods, and Applications
  year: 2013
  ident: 9545_CR44
– volume: 181
  start-page: 1331
  year: 1968
  ident: 9545_CR10
  publication-title: Doklady Akademii SSSR
– ident: 9545_CR24
– ident: 9545_CR18
  doi: 10.1137/12086563X
– volume: 12
  start-page: 133
  issue: 2
  year: 1993
  ident: 9545_CR46
  publication-title: Fib. & Int. Opt.
  doi: 10.1080/01468039308204219
– volume: 26
  start-page: 1247
  issue: 6
  year: 2010
  ident: 9545_CR42
  publication-title: Num. Meth. Part. Diff. Eq.
  doi: 10.1002/num.20484
– volume: 24
  start-page: 1927
  issue: 14
  year: 1998
  ident: 9545_CR19
  publication-title: Par. Comp.
  doi: 10.1016/S0167-8191(98)00084-2
– volume: 13
  start-page: 2441
  issue: 12
  year: 1996
  ident: 9545_CR43
  publication-title: JOSA A
  doi: 10.1364/JOSAA.13.002441
– ident: 9545_CR9
– volume-title: CUDA by Example: an Introduction to General-Purpose GPU Programming
  year: 2005
  ident: 9545_CR30
– ident: 9545_CR40
SSID ssj0008230
Score 2.2025847
Snippet Generalized Minimal Residual Method (GMRES) was benchmarked on many types of GPUs for solving linear systems based on dense and sparse matrices. However, there...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 438
SubjectTerms Applications of Mathematics
Benchmarks
Computational Mathematics and Numerical Analysis
Electric fields
Green's functions
Hankel matrices
Helmholtz equations
Integral equations
Linear systems
Mathematical analysis
Mathematical Modeling and Industrial Mathematics
Mathematics
Mathematics and Statistics
Optimization
Photonics
Solvers
Sparse matrices
Topology optimization
Title Benchmarks of Cuda-Based GMRES Solver for Toeplitz and Hankel Matrices and Applications to Topology Optimization of Photonic Components
URI https://link.springer.com/article/10.1007/s10598-022-09545-2
https://www.proquest.com/docview/2663485529
Volume 32
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 1573-837X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0008230
  issn: 1046-283X
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LTwIxEG6MetCDKGpE0fTgTZtA99UegYhcQCJouG3KthsMsGvYxUT_gH_badkFNGqi1-72kc608zUz8w1Cl8oT0hFCEFkJLWKHFUFASxRRHhh32xKup2xTbMLrdNhgwLtZUliSR7vnLklzU68luzk6HQweTwALbIfAxbsF5o7pgg33vcfl_atdRwsOApeA8RxkqTLfj_HZHK0w5he3qLE2zcL_1rmP9jJ0iWsLdThAGyoqokJeuQFnB7mIdttLttbkEL3XoXU0FbNxguMQN-ZSkDpYN4lv2yAg3It1-DQGeIv7sQLUmr5hEUncEtFYTXDbsPyrxLTV1hziOI2hgynC8Irv4G6aZkmfepbuKE41LS_Wq4sjHc9xhB6aN_1Gi2QFGkhALZuSYeAGNORcatpCzwtcCXAMAKCQig49wJ4y5EpweAcz4cA7xgP8EQSKWUPBGQdwdIw2I5jhBGEYkDqKabcy1QXZOXMcNqyC8igrlFVaQtVcTn6QsZfrIhoTf8W7rPfdh333zb770Odq2ed5wd3x69_lXPx-do4TH-CLpelzKC-h61zcq88_j3b6t9_P0A7VwTImSrCMNtPZXJ2j7eAlfUpmF0a_PwAxY_ND
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fT8IwEG6MmqgPoqgRRe2Db9oEuo2tj0BEjIBE0PC2lLULBtgMGyb6D_hvey0boFETfe3WH-lde19zd98hdC5tLizOOREF3yCmX-AEtEQSaYNxNw1esqWpi03YrZbT67F2khQWpdHuqUtS39RLyW6WSgeDxxPAAtMicPGumWCxFGP-fedxfv8q19GMg6BEwHj2klSZ78f4bI4WGPOLW1Rbm1rmf-vcQdsJusTlmTrsohUZZFEmrdyAk4OcRVvNOVtrtIfeK9A6GPPJMMKhj6tTwUkFrJvA100QEO6EKnwaA7zF3VACao3fMA8ErvNgKEe4qVn-ZaTbyksOcRyH0EEXYXjFd3A3jZOkTzVLexDGipYXq9WFgYrn2EcPtatutU6SAg3Eo4ZJSd8redRnTCjaQtv2SgLgGABALiTt24A9hc8kZ_AOdrgF7xgb8IfnScfoc-YwAEcHaDWAGQ4RhgGpJR3lVqaqIDtzLMvpF0F5pOGLIs2hYion10vYy1URjZG74F1W--7Cvrt6313oczHv8zzj7vj173wqfjc5x5EL8MVQ9DmU5dBlKu7F559HO_rb72doo95tNtzGTev2GG1SFTijIwbzaDWeTOUJWvde4qdocqp1_QOgDfYn
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LSwMxEA5SRfTgW3ybgzcNbbPPHH1VRVsLVultSTdZKtVd6a6C_gH_tjPpbltFBfGa3TxIJplvmJlvCNnTnlSOlJKpSmQxO6pIBlKimfZAuduWdD1tm2ITXqPht9uiOZbFb6LdC5fkIKcBWZrirPykovJY4puDqWFgSAFEsB0Gj_CkjYH0aK_f3A3fYnQjDfgIXAaKtJ2nzXw_xmfVNMKbX1ykRvPU5v-_5gUyl6NOejgQk0UyoeMlMl9UdKD5BV8is_Uhi2u6TN6PoLX7KPu9lCYRPX5Wkh2B1lP0rA4HR28SDKumAHtpK9GAZrM3KmNFz2Xc0w-0btj_dWraDscc5TRLoIMpzvBKr-HNesyTQXGWZjfJkK6X4uqSGOM8Vsht7bR1fM7ywg0s5JbNWSd0Qx4JoZDO0PNCVwFMA2AoleYdDzCpioSWAuxjXzpg33iAS8JQ-1ZHCl8AaFolpRhmWCMUBuSO9tHdzLFQu_Adx-9UQai0FakqXyfV4syCMGc1x-IaD8GIjxn3PYB9D8y-B9Bnf9jnacDp8evfW4UoBPn9TgOANRbS6nCxTg6Kox99_nm0jb_9vkummye14OqicblJZjjG05hAwi1SyvrPeptMhS_ZfdrfMWL_AXw5_ws
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Benchmarks+of+Cuda-Based+GMRES+Solver+for+Toeplitz+and+Hankel+Matrices+and+Applications+to+Topology+Optimization+of+Photonic+Components&rft.jtitle=Computational+mathematics+and+modeling&rft.au=Minin%2C+Iu.+B.&rft.au=Matveev%2C+S.+A.&rft.au=Fedorov%2C+M.+V.&rft.au=Zacharov%2C+I.+E.&rft.date=2021-10-01&rft.issn=1046-283X&rft.eissn=1573-837X&rft.volume=32&rft.issue=4&rft.spage=438&rft.epage=452&rft_id=info:doi/10.1007%2Fs10598-022-09545-2&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s10598_022_09545_2
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1046-283X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1046-283X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1046-283X&client=summon