Benchmarks of Cuda-Based GMRES Solver for Toeplitz and Hankel Matrices and Applications to Topology Optimization of Photonic Components
Generalized Minimal Residual Method (GMRES) was benchmarked on many types of GPUs for solving linear systems based on dense and sparse matrices. However, there are still no GMRES implementation benchmarks on Tesla V100 compared to GTX 1080 Ti ones or even for Toeplitz-like matrices. The introduced s...
Uloženo v:
| Vydáno v: | Computational mathematics and modeling Ročník 32; číslo 4; s. 438 - 452 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
Springer US
01.10.2021
Springer Nature B.V |
| Témata: | |
| ISSN: | 1046-283X, 1573-837X |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Generalized Minimal Residual Method (GMRES) was benchmarked on many types of GPUs for solving linear systems based on dense and sparse matrices. However, there are still no GMRES implementation benchmarks on Tesla V100 compared to GTX 1080 Ti ones or even for Toeplitz-like matrices. The introduced software consists of a
Python
module and a C++ library which enable to manage streams for concurrent computations of separated linear systems on a GPU (and GPUs). The GMRES solver is parallelized for running on a NVIDIA GPGPU accelerator. The parallelization efficiency is explored when GMRES is applied to solve (Helmholtz equation) linear systems based on the use of Green’s Function Integral Equation Method (GFIEM) for computing electric field distribution in the design domain. The proposed implementation shew the maximal speedup of 55 (
t
¯
=
0.017
s
) and of 125 (
t
¯
=
0.77
s
) for 1024 × 1024 (on GTX 1080 Ti) and 8192 × 8192 (on Tesla V100) dense Toeplitz matrices generated from GFIEM. 1024 × 1024 resolution provides accuracy 6.1% that can be acceptable according to testing and demonstrating on gradient computations and topology optimization. We open up possibilities for robust topology optimization of passive photonic integrated components. That has the advantage, e. g., of faster and more accurate designing photonic components on a PC without a supercomputer. |
|---|---|
| AbstractList | Generalized Minimal Residual Method (GMRES) was benchmarked on many types of GPUs for solving linear systems based on dense and sparse matrices. However, there are still no GMRES implementation benchmarks on Tesla V100 compared to GTX 1080 Ti ones or even for Toeplitz-like matrices. The introduced software consists of a
Python
module and a C++ library which enable to manage streams for concurrent computations of separated linear systems on a GPU (and GPUs). The GMRES solver is parallelized for running on a NVIDIA GPGPU accelerator. The parallelization efficiency is explored when GMRES is applied to solve (Helmholtz equation) linear systems based on the use of Green’s Function Integral Equation Method (GFIEM) for computing electric field distribution in the design domain. The proposed implementation shew the maximal speedup of 55 (
t
¯
=
0.017
s
) and of 125 (
t
¯
=
0.77
s
) for 1024 × 1024 (on GTX 1080 Ti) and 8192 × 8192 (on Tesla V100) dense Toeplitz matrices generated from GFIEM. 1024 × 1024 resolution provides accuracy 6.1% that can be acceptable according to testing and demonstrating on gradient computations and topology optimization. We open up possibilities for robust topology optimization of passive photonic integrated components. That has the advantage, e. g., of faster and more accurate designing photonic components on a PC without a supercomputer. Generalized Minimal Residual Method (GMRES) was benchmarked on many types of GPUs for solving linear systems based on dense and sparse matrices. However, there are still no GMRES implementation benchmarks on Tesla V100 compared to GTX 1080 Ti ones or even for Toeplitz-like matrices. The introduced software consists of a Python module and a C++ library which enable to manage streams for concurrent computations of separated linear systems on a GPU (and GPUs). The GMRES solver is parallelized for running on a NVIDIA GPGPU accelerator. The parallelization efficiency is explored when GMRES is applied to solve (Helmholtz equation) linear systems based on the use of Green’s Function Integral Equation Method (GFIEM) for computing electric field distribution in the design domain. The proposed implementation shew the maximal speedup of 55 (t¯=0.017s) and of 125 (t¯=0.77s) for 1024 × 1024 (on GTX 1080 Ti) and 8192 × 8192 (on Tesla V100) dense Toeplitz matrices generated from GFIEM. 1024 × 1024 resolution provides accuracy 6.1% that can be acceptable according to testing and demonstrating on gradient computations and topology optimization. We open up possibilities for robust topology optimization of passive photonic integrated components. That has the advantage, e. g., of faster and more accurate designing photonic components on a PC without a supercomputer. |
| Author | Minin, Iu. B. Zacharov, I. E. Matveev, S. A. Fedorov, M. V. Rykovanov, S. G. |
| Author_xml | – sequence: 1 givenname: Iu. B. surname: Minin fullname: Minin, Iu. B. organization: Skoltech Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Fryazino Branch of Kotel’nikov Institute of Radio-Engineering and Electronics of Russian Academy of Sciences – sequence: 2 givenname: S. A. surname: Matveev fullname: Matveev, S. A. organization: Skoltech Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Marchuk Institute of Numerical Mathematics, Russian Academy of Sciences – sequence: 3 givenname: M. V. surname: Fedorov fullname: Fedorov, M. V. organization: Skoltech Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Sirius University of Science and Technology – sequence: 4 givenname: I. E. surname: Zacharov fullname: Zacharov, I. E. organization: Skoltech Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology – sequence: 5 givenname: S. G. surname: Rykovanov fullname: Rykovanov, S. G. organization: Skoltech Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology |
| BookMark | eNp9kM1OAyEUhYnRxN8XcEXiGmVgmIGlNv4lGo2tSXeEMkxFpzACNWlfwNeWtiYmLno33HDux-GeQ7DrvDMAnBb4vMC4vogFZoIjTAjCgpUMkR1wULCaIk7r8W7ucVkhwul4HxzG-I4x5oTiA_B9ZZx-m6nwEaFv4WDeKHSlomng7ePL9RAOffdlAmx9gCNv-s6mJVSugXfKfZgOPqoUrDZxfXfZZ12rZL2LMPkM9L7z0wV86pOd2eVaWbk8v_nkndVw4Gd9XsSleAz2WtVFc_J7HoHXm-vR4A49PN3eDy4fkCa0JGiiK01aIZqC1rl01ZSswqxWjSGTGmehFUaJuqBcMU6qGvNCa8PpRAkucEWPwNnm3T74z7mJSb77eXDZUpKqoiVnjIg8RTZTOvgYg2llH2wOaSELLFeBy03gMgcu14FLkiH-D9I2rXdOQdluO0o3aMw-bmrC36-2UD9mL5gD |
| CitedBy_id | crossref_primary_10_3390_en17081883 crossref_primary_10_1134_S106422692310011X crossref_primary_10_1177_17483026231184168 |
| Cites_doi | 10.1109/EnT-MIPT.2018.00040 10.1137/S0895479803437803 10.1016/S0167-8191(97)00004-5 10.1137/0907058 10.1007/978-3-642-33078-0_31 10.1145/1089014.1089021 10.1016/S0024-3795(00)00064-1 10.3389/fbuil.2018.00069 10.1115/1.4005491 10.1002/(SICI)1097-0207(19980830)42:8<1441::AID-NME428>3.0.CO;2-C 10.1515/nanoph-2019-0308 10.1109/IPDPS.2014.48 10.1016/j.laa.2005.03.040 10.2172/1614847 10.1364/JOSAA.21.002223 10.1007/s11227-012-0825-3 10.1515/eng-2019-0059 10.1109/ISQED.2012.6187484 10.1137/10078356X 10.1201/9781420049961 10.1007/BF01732607 10.1137/12086563X 10.1080/01468039308204219 10.1002/num.20484 10.1016/S0167-8191(98)00084-2 10.1364/JOSAA.13.002441 |
| ContentType | Journal Article |
| Copyright | Springer Science+Business Media, LLC, part of Springer Nature 2022 Springer Science+Business Media, LLC, part of Springer Nature 2022. |
| Copyright_xml | – notice: Springer Science+Business Media, LLC, part of Springer Nature 2022 – notice: Springer Science+Business Media, LLC, part of Springer Nature 2022. |
| DBID | AAYXX CITATION JQ2 |
| DOI | 10.1007/s10598-022-09545-2 |
| DatabaseName | CrossRef ProQuest Computer Science Collection |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Mathematics Computer Science |
| EISSN | 1573-837X |
| EndPage | 452 |
| ExternalDocumentID | 10_1007_s10598_022_09545_2 |
| GroupedDBID | -52 -5D -5G -BR -EM -Y2 -~C .86 .DC .VR 06D 0R~ 0VY 1N0 1SB 2.D 28- 29F 2J2 2JN 2JY 2KG 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5GY 5QI 5VS 642 67Z 6NX 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACIWK ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACSNA ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARMRJ ASPBG AVWKF AXYYD AZFZN B-. BA0 BAPOH BBWZM BDATZ BGNMA BSONS CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP EBLON EBS EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG6 HMJXF HQYDN HRMNR HVGLF HZ~ IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9R PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RNI RNS ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCLPG SDD SDH SDM SHX SISQX SJYHP SMT SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TSG TSK TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 XU3 YLTOR Z83 Z8W ZMTXR ZWQNP ~EX AAPKM AAYXX ABDBE ABFSG ABJCF ABRTQ ACSTC ADHKG AEZWR AFDZB AFFHD AFHIU AFKRA AFOHR AGQPQ AHPBZ AHWEU AIXLP ARAPS ATHPR AZQEC BENPR BGLVJ CCPQU CITATION DWQXO GNUQQ HCIFZ K7- M2P M7S PHGZM PHGZT PQGLB PTHSS JQ2 |
| ID | FETCH-LOGICAL-c2342-bc6c2f99d137777c6d456057ade2b7099df9ea97138a58267081cce83ba989063 |
| IEDL.DBID | RSV |
| ISSN | 1046-283X |
| IngestDate | Wed Sep 17 23:58:22 EDT 2025 Sat Nov 29 04:11:54 EST 2025 Tue Nov 18 20:51:59 EST 2025 Fri Feb 21 02:47:25 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Keywords | Python module C++ performance GMRES Helmholtz equation Toeplitz-like matrix topology optimization parallelization GPU |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c2342-bc6c2f99d137777c6d456057ade2b7099df9ea97138a58267081cce83ba989063 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| PQID | 2663485529 |
| PQPubID | 2043702 |
| PageCount | 15 |
| ParticipantIDs | proquest_journals_2663485529 crossref_primary_10_1007_s10598_022_09545_2 crossref_citationtrail_10_1007_s10598_022_09545_2 springer_journals_10_1007_s10598_022_09545_2 |
| PublicationCentury | 2000 |
| PublicationDate | 20211000 |
| PublicationDateYYYYMMDD | 2021-10-01 |
| PublicationDate_xml | – month: 10 year: 2021 text: 20211000 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | Computational mathematics and modeling |
| PublicationTitleAbbrev | Comput Math Model |
| PublicationYear | 2021 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | X. Liu, Z. Liu, S. X.-Tan, and A. J. Gordon, “Full-chip thermal analysis of 3D ICs with liquid cooling by GPU-accelerated GMRES method,” in: Thirteenth International Symposium on Quality Electronic Design (ISQED) (2012), pp. 123–128; 10.1109/ISQED.2012.6187484. ChenZLiuHYuSHsiehBShaoLReservoir simulation on nvidia tesla gpusRec. Adv. Sci. Comp. Appl.201358612530758621278.86001 KavithaSVijayVSakethAMatrix sort-a parallelizable sorting algorithmInt. J. Comp. Appl.2016143916 LiuYMukherjeeSNishimuraNSchanzMYeWSutradharAPanEDumontNFrangiASaezARecent advances and emerging applications of the boundary element methodAppl. Mech. Rev.201164310.1115/1.4005491 BorghiRGoriFSantarsieroMFrezzaFSchettiniGPlane-wave scattering by a set of perfectly conducting circular cylinders in the presence of a plane surfaceJOSA A199613122441245210.1364/JOSAA.13.002441 LiGA block variant of the gmres method on massively parallel processorsPar. Comp.199723810051019146540510.1016/S0167-8191(97)00004-50896.65027 H. Anzt, T. Cojean, G. Flegar, F. Gbel, T. Grtzmacher, P. Nayak, T. Ribizel, Y. M. Tsai, A. E. S. Quintana-Ortí, Ginkgo, A Modern Linear Operator Algebra Framework for High Performance Computing (2020). MarchukGKuznetsovYOn the question of optimal iteration processes [in Russian]Doklady Akademii SSSR196818113311334231514 J. Bannister, L. Fratta, and M. Gerla, “Optimal topologies for the wavelength-division optical network,” in: Proc. EFOC/LAN’90, Munich, Germany (1990), pp. 53–57. DrkošováJGreenbaumARozložníkMStrakošZNumerical stability of GMRESBIT Num. Math.1995353309330143091210.1007/BF01732607 I. Zacharov, R. Arslanov, M. Gunin, D. Stefonishin, A. Bykov, S. Pavlov, O. Panarin, A. Maliutin, S. Rykovanov, and M. Fedorov, “Zhores” — Petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in Skolkovo Institute of Science and Technology, vol. 9, pp. 512–520 (2019); 10.1515/eng-2019-0059; https://www.degruyter.com/view/j/eng.2019.9.issue-1/eng-2019-0059/eng-2019-0059.xml. SøderåardTGreen’s Function Integral Equation Methods in Nano-optics2019Boca RatonCRC Press CalvettiDLewisBReichelLGmres-type methods for inconsistent systemsLin. Alg. Appl.20003161-3157169178242210.1016/S0024-3795(00)00064-1 I. B. Minin, E. E. Nuzhin, A. I. Boyko, M. S. Litsarev, and I. V. Oseledets, “Evolutionary structural optimization al- gorithm based on fft-jvie solver for inverse design of wave devices,” in: 2018 Engineering and Telecommunication (EnT-MIPT) (2018), pp. 146–150. OlshevskyVOseledetsITyrtyshnikovETensor properties of multilevel toeplitz and related matricesLin. Alg. Appl.20064121121218085510.1016/j.laa.2005.03.040 LiRSaadYGPU-accelerated preconditioned iterative linear solversJ. Supercomp.201363244346610.1007/s11227-012-0825-3 PartingtonJRPartingtonJRAn Introduction to Hankel Operators1988CambridgeCambridge University Press0668.47022 PanVYStructured Matrices and Polynomials: Unified Superfast Algorithms2012BostonSpringer0996.65028 HerouxMABartlettRAHowleVEHoekstraRJHuJJKoldaTGLehoucqRBLongKRPawlowskiRPPhippsETAn overview of the Trilinos projectACM TOMS2005313397423226680010.1145/1089014.1089021 M. Bobrov, R. Melton, S. Radziszowski, and M. Lukowiak, “Effects of GPU and CPU loads on performance of CUDA applications,” in: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 1, WorldComp (2011). BanerjeeSMukherjeeBThe photonic ring: Algorithms for optimized node arrangementsFib. & Int. Opt.199312213317110.1080/01468039308204219 AsanoTNodaSIterative optimization of photonic crystal nanocavity designs by using deep neural networksNanoph.20198122243225610.1515/nanoph-2019-0308 I. B. Minin, cuGMRES (2020); https://github.com/iurii-minin/cuGMRES. BendsoeMPSigmundOTopology Optimization: Theory, Methods, and Applications2013BerlinSpringer1059.74001 SandersJKandrotECUDA by Example: an Introduction to General-Purpose GPU Programming2005 JáJáJAn Introduction to Parallel Algorithms1992New YorkAddison-Wesley Reading0781.68009 Martínez-CastroAEMolina-MoyaJAOrtizPAn iterative parallel solver in gpu applied to frequency domain linear water wave problems by the boundary element methodFront. Built Env.201846910.3389/fbuil.2018.00069 I. B. Minin, pycuGMRES (2020); https://github.com/iurii-minin/pycuGMRES; https://pypi.org/project/pycuGMRES/. SaadYSchultzMHGMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systemsSIAM J. Sc. Stat. Comp.19867385686984856810.1137/0907058 M. Hoemmen, Communication-avoiding Krylov subspace methods, PhD Thesis, UC Berkeley (2010). D. Guide, “Cuda c best practices guide,” NVIDIA, July (2013). I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, and J. Dongarra, “Improving the performance of ca-gmres on multicores with multiple gpus,” in: 2014 IEEE 28th International Parallel and Distributed Processing Symposium (2014), pp. 382–391. MeurantGComputer Solution of Large Linear Systems1999AmsterdamElsevier0934.65032 KarlsonRA Study of Some Roundoff Effects of the GMRES-Method1991LinköpingUniversitetet i Linköping/Tekniska Högskolan i Linköping I. Dravins, “Numerical implementations of the generalized minimal residual method (GMRES),” MSc Theses in Math. Sci. (2015). E. de Sturler, “A parallel variant of GMRES (m),” in: Proceedings of the 13th IMACS World Congress on Computational and Applied Mathematics, IMACS, Criterion Press, vol. 9 (1991). WalkerHFNiPAnderson acceleration for fixed-point iterationsSIAM J. Num. Anal.201149417151735283106810.1137/10078356X E. C. Carson, Communication-avoiding Krylov subspace methods in theory and practice, PhD Thesis, UC Berkeley (2015). FengYTPeriDOwenDRJA multi-grid enhanced gmres algorithm for elasto-plastic problemsInt. J. Num. Meth. Eng.19984281441146210.1002/(SICI)1097-0207(19980830)42:8<1441::AID-NME428>3.0.CO;2-C P. Ghysels, T. Ashby, K. Meerbergen, and W. Vanroose, “Hiding global communication latency in the gmres algorithm on massively parallel machines,” SIAM J. Sci. Comp., 35(1), 48–71 (2013); 10.1137/12086563X; 10.1137/12086563X. SmajicJHafnerCErniDOptimization of photonic crystal structuresJOSA A200421112223223210.1364/JOSAA.21.002223 S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. Gropp, et al., Petsc Users Manual (2019). ChuEGeorgeAInside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms1999Boca RatonCRC Press10.1201/9781420049961 R. Couturier, “Designing scientific applications on GPUs,” Chapman & Hall/CRC Numerical Analysis and Scientific Computing Series, CRC Press, Boca Raton (2013); https://books.google.ru/books?id=C1 SBQAAQBAJ. LuciaMMaggioFRodriguezGNumerical solution of the helmholtz equation in an infinite strip by wiener- hopf factorizationNum. Meth. Part. Diff. Eq.20102661247127427323771202.65140 HarrisMAn efficient matrix transpose in CUDA C/C++Nvidia2013262018 ReichelLYeQBreakdown-free gmres for singular systemsSIAM J. Math. Anal. Appl.200526410011021217820910.1137/S0895479803437803 T. J. Ashby, P. Ghysels, W. Heirman, and W. Vanroose, “The impact of global communication latency at extreme scales on Krylov methods,” in: International Conference on Algorithms and Architectures for Parallel Processing, Springer (2012), pp. 428–442. VuikCvan NooyenRRPWesselingAPParallelism in ILU-preconditioned GMRESPar. Comp.1998241419271946165656510.1016/S0167-8191(98)00084-2 G Li (9545_CR26) 1997; 23 T Søderåard (9545_CR1) 2019 HF Walker (9545_CR13) 2011; 49 S Banerjee (9545_CR46) 1993; 12 AE Martínez-Castro (9545_CR4) 2018; 4 S Kavitha (9545_CR38) 2016; 143 C Vuik (9545_CR19) 1998; 24 MA Heroux (9545_CR31) 2005; 31 J Drkošová (9545_CR14) 1995; 35 9545_CR32 9545_CR33 Y Liu (9545_CR27) 2011; 64 R Li (9545_CR7) 2013; 63 9545_CR5 MP Bendsoe (9545_CR44) 2013 E Chu (9545_CR2) 1999 9545_CR8 T Asano (9545_CR48) 2019; 8 9545_CR9 9545_CR29 R Karlson (9545_CR15) 1991 V Olshevsky (9545_CR39) 2006; 412 9545_CR23 9545_CR24 9545_CR21 Z Chen (9545_CR6) 2013; 586 9545_CR22 J Smajic (9545_CR47) 2004; 21 9545_CR28 9545_CR25 J JáJá (9545_CR3) 1992 9545_CR18 L Reichel (9545_CR34) 2005; 26 G Marchuk (9545_CR10) 1968; 181 9545_CR12 9545_CR41 YT Feng (9545_CR17) 1998; 42 9545_CR40 R Borghi (9545_CR43) 1996; 13 D Calvetti (9545_CR35) 2000; 316 VY Pan (9545_CR37) 2012 G Meurant (9545_CR16) 1999 Y Saad (9545_CR11) 1986; 7 9545_CR45 JR Partington (9545_CR36) 1988 9545_CR49 J Sanders (9545_CR30) 2005 M Lucia (9545_CR42) 2010; 26 M Harris (9545_CR20) 2013; 26 |
| References_xml | – reference: JáJáJAn Introduction to Parallel Algorithms1992New YorkAddison-Wesley Reading0781.68009 – reference: ChuEGeorgeAInside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms1999Boca RatonCRC Press10.1201/9781420049961 – reference: J. Bannister, L. Fratta, and M. Gerla, “Optimal topologies for the wavelength-division optical network,” in: Proc. EFOC/LAN’90, Munich, Germany (1990), pp. 53–57. – reference: T. J. Ashby, P. Ghysels, W. Heirman, and W. Vanroose, “The impact of global communication latency at extreme scales on Krylov methods,” in: International Conference on Algorithms and Architectures for Parallel Processing, Springer (2012), pp. 428–442. – reference: PanVYStructured Matrices and Polynomials: Unified Superfast Algorithms2012BostonSpringer0996.65028 – reference: ReichelLYeQBreakdown-free gmres for singular systemsSIAM J. Math. Anal. Appl.200526410011021217820910.1137/S0895479803437803 – reference: KavithaSVijayVSakethAMatrix sort-a parallelizable sorting algorithmInt. J. Comp. Appl.2016143916 – reference: SmajicJHafnerCErniDOptimization of photonic crystal structuresJOSA A200421112223223210.1364/JOSAA.21.002223 – reference: E. C. Carson, Communication-avoiding Krylov subspace methods in theory and practice, PhD Thesis, UC Berkeley (2015). – reference: VuikCvan NooyenRRPWesselingAPParallelism in ILU-preconditioned GMRESPar. Comp.1998241419271946165656510.1016/S0167-8191(98)00084-2 – reference: I. B. Minin, pycuGMRES (2020); https://github.com/iurii-minin/pycuGMRES; https://pypi.org/project/pycuGMRES/. – reference: D. Guide, “Cuda c best practices guide,” NVIDIA, July (2013). – reference: CalvettiDLewisBReichelLGmres-type methods for inconsistent systemsLin. Alg. Appl.20003161-3157169178242210.1016/S0024-3795(00)00064-1 – reference: BendsoeMPSigmundOTopology Optimization: Theory, Methods, and Applications2013BerlinSpringer1059.74001 – reference: HarrisMAn efficient matrix transpose in CUDA C/C++Nvidia2013262018 – reference: E. de Sturler, “A parallel variant of GMRES (m),” in: Proceedings of the 13th IMACS World Congress on Computational and Applied Mathematics, IMACS, Criterion Press, vol. 9 (1991). – reference: MeurantGComputer Solution of Large Linear Systems1999AmsterdamElsevier0934.65032 – reference: S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. Gropp, et al., Petsc Users Manual (2019). – reference: M. Hoemmen, Communication-avoiding Krylov subspace methods, PhD Thesis, UC Berkeley (2010). – reference: H. Anzt, T. Cojean, G. Flegar, F. Gbel, T. Grtzmacher, P. Nayak, T. Ribizel, Y. M. Tsai, A. E. S. Quintana-Ortí, Ginkgo, A Modern Linear Operator Algebra Framework for High Performance Computing (2020). – reference: I. Dravins, “Numerical implementations of the generalized minimal residual method (GMRES),” MSc Theses in Math. Sci. (2015). – reference: LuciaMMaggioFRodriguezGNumerical solution of the helmholtz equation in an infinite strip by wiener- hopf factorizationNum. Meth. Part. Diff. Eq.20102661247127427323771202.65140 – reference: FengYTPeriDOwenDRJA multi-grid enhanced gmres algorithm for elasto-plastic problemsInt. J. Num. Meth. Eng.19984281441146210.1002/(SICI)1097-0207(19980830)42:8<1441::AID-NME428>3.0.CO;2-C – reference: HerouxMABartlettRAHowleVEHoekstraRJHuJJKoldaTGLehoucqRBLongKRPawlowskiRPPhippsETAn overview of the Trilinos projectACM TOMS2005313397423226680010.1145/1089014.1089021 – reference: BorghiRGoriFSantarsieroMFrezzaFSchettiniGPlane-wave scattering by a set of perfectly conducting circular cylinders in the presence of a plane surfaceJOSA A199613122441245210.1364/JOSAA.13.002441 – reference: R. Couturier, “Designing scientific applications on GPUs,” Chapman & Hall/CRC Numerical Analysis and Scientific Computing Series, CRC Press, Boca Raton (2013); https://books.google.ru/books?id=C1 SBQAAQBAJ. – reference: SøderåardTGreen’s Function Integral Equation Methods in Nano-optics2019Boca RatonCRC Press – reference: LiuYMukherjeeSNishimuraNSchanzMYeWSutradharAPanEDumontNFrangiASaezARecent advances and emerging applications of the boundary element methodAppl. Mech. Rev.201164310.1115/1.4005491 – reference: AsanoTNodaSIterative optimization of photonic crystal nanocavity designs by using deep neural networksNanoph.20198122243225610.1515/nanoph-2019-0308 – reference: DrkošováJGreenbaumARozložníkMStrakošZNumerical stability of GMRESBIT Num. Math.1995353309330143091210.1007/BF01732607 – reference: I. B. Minin, E. E. Nuzhin, A. I. Boyko, M. S. Litsarev, and I. V. Oseledets, “Evolutionary structural optimization al- gorithm based on fft-jvie solver for inverse design of wave devices,” in: 2018 Engineering and Telecommunication (EnT-MIPT) (2018), pp. 146–150. – reference: SaadYSchultzMHGMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systemsSIAM J. Sc. Stat. Comp.19867385686984856810.1137/0907058 – reference: ChenZLiuHYuSHsiehBShaoLReservoir simulation on nvidia tesla gpusRec. Adv. Sci. Comp. Appl.201358612530758621278.86001 – reference: MarchukGKuznetsovYOn the question of optimal iteration processes [in Russian]Doklady Akademii SSSR196818113311334231514 – reference: PartingtonJRPartingtonJRAn Introduction to Hankel Operators1988CambridgeCambridge University Press0668.47022 – reference: Martínez-CastroAEMolina-MoyaJAOrtizPAn iterative parallel solver in gpu applied to frequency domain linear water wave problems by the boundary element methodFront. Built Env.201846910.3389/fbuil.2018.00069 – reference: X. Liu, Z. Liu, S. X.-Tan, and A. J. Gordon, “Full-chip thermal analysis of 3D ICs with liquid cooling by GPU-accelerated GMRES method,” in: Thirteenth International Symposium on Quality Electronic Design (ISQED) (2012), pp. 123–128; 10.1109/ISQED.2012.6187484. – reference: LiGA block variant of the gmres method on massively parallel processorsPar. Comp.199723810051019146540510.1016/S0167-8191(97)00004-50896.65027 – reference: OlshevskyVOseledetsITyrtyshnikovETensor properties of multilevel toeplitz and related matricesLin. Alg. Appl.20064121121218085510.1016/j.laa.2005.03.040 – reference: I. B. Minin, cuGMRES (2020); https://github.com/iurii-minin/cuGMRES. – reference: WalkerHFNiPAnderson acceleration for fixed-point iterationsSIAM J. Num. Anal.201149417151735283106810.1137/10078356X – reference: I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, and J. Dongarra, “Improving the performance of ca-gmres on multicores with multiple gpus,” in: 2014 IEEE 28th International Parallel and Distributed Processing Symposium (2014), pp. 382–391. – reference: LiRSaadYGPU-accelerated preconditioned iterative linear solversJ. Supercomp.201363244346610.1007/s11227-012-0825-3 – reference: BanerjeeSMukherjeeBThe photonic ring: Algorithms for optimized node arrangementsFib. & Int. Opt.199312213317110.1080/01468039308204219 – reference: P. Ghysels, T. Ashby, K. Meerbergen, and W. Vanroose, “Hiding global communication latency in the gmres algorithm on massively parallel machines,” SIAM J. Sci. Comp., 35(1), 48–71 (2013); 10.1137/12086563X; 10.1137/12086563X. – reference: KarlsonRA Study of Some Roundoff Effects of the GMRES-Method1991LinköpingUniversitetet i Linköping/Tekniska Högskolan i Linköping – reference: SandersJKandrotECUDA by Example: an Introduction to General-Purpose GPU Programming2005 – reference: I. Zacharov, R. Arslanov, M. Gunin, D. Stefonishin, A. Bykov, S. Pavlov, O. Panarin, A. Maliutin, S. Rykovanov, and M. Fedorov, “Zhores” — Petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in Skolkovo Institute of Science and Technology, vol. 9, pp. 512–520 (2019); 10.1515/eng-2019-0059; https://www.degruyter.com/view/j/eng.2019.9.issue-1/eng-2019-0059/eng-2019-0059.xml. – reference: M. Bobrov, R. Melton, S. Radziszowski, and M. Lukowiak, “Effects of GPU and CPU loads on performance of CUDA applications,” in: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 1, WorldComp (2011). – ident: 9545_CR28 doi: 10.1109/EnT-MIPT.2018.00040 – volume: 26 start-page: 1001 issue: 4 year: 2005 ident: 9545_CR34 publication-title: SIAM J. Math. Anal. Appl. doi: 10.1137/S0895479803437803 – volume: 23 start-page: 1005 issue: 8 year: 1997 ident: 9545_CR26 publication-title: Par. Comp. doi: 10.1016/S0167-8191(97)00004-5 – ident: 9545_CR29 – ident: 9545_CR12 – volume: 7 start-page: 856 issue: 3 year: 1986 ident: 9545_CR11 publication-title: SIAM J. Sc. Stat. Comp. doi: 10.1137/0907058 – ident: 9545_CR23 doi: 10.1007/978-3-642-33078-0_31 – volume: 31 start-page: 397 issue: 3 year: 2005 ident: 9545_CR31 publication-title: ACM TOMS doi: 10.1145/1089014.1089021 – volume: 316 start-page: 157 issue: 1-3 year: 2000 ident: 9545_CR35 publication-title: Lin. Alg. Appl. doi: 10.1016/S0024-3795(00)00064-1 – volume: 4 start-page: 69 year: 2018 ident: 9545_CR4 publication-title: Front. Built Env. doi: 10.3389/fbuil.2018.00069 – ident: 9545_CR41 – volume-title: A Study of Some Roundoff Effects of the GMRES-Method year: 1991 ident: 9545_CR15 – volume-title: Structured Matrices and Polynomials: Unified Superfast Algorithms year: 2012 ident: 9545_CR37 – ident: 9545_CR45 – ident: 9545_CR22 – volume: 64 issue: 3 year: 2011 ident: 9545_CR27 publication-title: Appl. Mech. Rev. doi: 10.1115/1.4005491 – volume: 26 start-page: 2018 year: 2013 ident: 9545_CR20 publication-title: Nvidia – volume: 42 start-page: 1441 issue: 8 year: 1998 ident: 9545_CR17 publication-title: Int. J. Num. Meth. Eng. doi: 10.1002/(SICI)1097-0207(19980830)42:8<1441::AID-NME428>3.0.CO;2-C – volume: 8 start-page: 2243 issue: 12 year: 2019 ident: 9545_CR48 publication-title: Nanoph. doi: 10.1515/nanoph-2019-0308 – volume-title: Green’s Function Integral Equation Methods in Nano-optics year: 2019 ident: 9545_CR1 – ident: 9545_CR8 doi: 10.1109/IPDPS.2014.48 – volume: 412 start-page: 1 issue: 1 year: 2006 ident: 9545_CR39 publication-title: Lin. Alg. Appl. doi: 10.1016/j.laa.2005.03.040 – ident: 9545_CR21 – ident: 9545_CR32 doi: 10.2172/1614847 – volume: 21 start-page: 2223 issue: 11 year: 2004 ident: 9545_CR47 publication-title: JOSA A doi: 10.1364/JOSAA.21.002223 – ident: 9545_CR25 – volume-title: An Introduction to Hankel Operators year: 1988 ident: 9545_CR36 – volume: 63 start-page: 443 issue: 2 year: 2013 ident: 9545_CR7 publication-title: J. Supercomp. doi: 10.1007/s11227-012-0825-3 – ident: 9545_CR49 doi: 10.1515/eng-2019-0059 – volume-title: Computer Solution of Large Linear Systems year: 1999 ident: 9545_CR16 – volume-title: An Introduction to Parallel Algorithms year: 1992 ident: 9545_CR3 – ident: 9545_CR33 – volume: 586 start-page: 125 year: 2013 ident: 9545_CR6 publication-title: Rec. Adv. Sci. Comp. Appl. – ident: 9545_CR5 doi: 10.1109/ISQED.2012.6187484 – volume: 49 start-page: 1715 issue: 4 year: 2011 ident: 9545_CR13 publication-title: SIAM J. Num. Anal. doi: 10.1137/10078356X – volume-title: Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms year: 1999 ident: 9545_CR2 doi: 10.1201/9781420049961 – volume: 143 start-page: 1 issue: 9 year: 2016 ident: 9545_CR38 publication-title: Int. J. Comp. Appl. – volume: 35 start-page: 309 issue: 3 year: 1995 ident: 9545_CR14 publication-title: BIT Num. Math. doi: 10.1007/BF01732607 – volume-title: Topology Optimization: Theory, Methods, and Applications year: 2013 ident: 9545_CR44 – volume: 181 start-page: 1331 year: 1968 ident: 9545_CR10 publication-title: Doklady Akademii SSSR – ident: 9545_CR24 – ident: 9545_CR18 doi: 10.1137/12086563X – volume: 12 start-page: 133 issue: 2 year: 1993 ident: 9545_CR46 publication-title: Fib. & Int. Opt. doi: 10.1080/01468039308204219 – volume: 26 start-page: 1247 issue: 6 year: 2010 ident: 9545_CR42 publication-title: Num. Meth. Part. Diff. Eq. doi: 10.1002/num.20484 – volume: 24 start-page: 1927 issue: 14 year: 1998 ident: 9545_CR19 publication-title: Par. Comp. doi: 10.1016/S0167-8191(98)00084-2 – volume: 13 start-page: 2441 issue: 12 year: 1996 ident: 9545_CR43 publication-title: JOSA A doi: 10.1364/JOSAA.13.002441 – ident: 9545_CR9 – volume-title: CUDA by Example: an Introduction to General-Purpose GPU Programming year: 2005 ident: 9545_CR30 – ident: 9545_CR40 |
| SSID | ssj0008230 |
| Score | 2.2025847 |
| Snippet | Generalized Minimal Residual Method (GMRES) was benchmarked on many types of GPUs for solving linear systems based on dense and sparse matrices. However, there... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 438 |
| SubjectTerms | Applications of Mathematics Benchmarks Computational Mathematics and Numerical Analysis Electric fields Green's functions Hankel matrices Helmholtz equations Integral equations Linear systems Mathematical analysis Mathematical Modeling and Industrial Mathematics Mathematics Mathematics and Statistics Optimization Photonics Solvers Sparse matrices Topology optimization |
| Title | Benchmarks of Cuda-Based GMRES Solver for Toeplitz and Hankel Matrices and Applications to Topology Optimization of Photonic Components |
| URI | https://link.springer.com/article/10.1007/s10598-022-09545-2 https://www.proquest.com/docview/2663485529 |
| Volume | 32 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 1573-837X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0008230 issn: 1046-283X databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NT8IwFG-MetCDKGpE0fTgTZfA2Nb1CETkAhJBw23p-hEMsBk2TPQf8N_2tWyARk30tnT9Sl_73q_pe7-H0KVDJZMuq1ihR4jlOI4NXxVmcc_jQjFGASOZZBOk2_WHQ9rLgsKS3Ns9f5I0mnot2M3V4WBweQJY4LgWKN4tMHdEO_Ld9x-X-lc_HS04CDwLjOcwC5X5vo_P5miFMb88ixpr0yr8b577aC9Dl7i-2A4HaENGRVTIMzfg7CAX0W5nydaaHKL3BpSOpmw2TnCscHMumNUA6ybwbQcEhPuxdp_GAG_xIJaAWtM3zCKB2ywaywnuGJZ_mZiy-tqDOE5jaGCSMLziO9BN0yzoU4_SG8WppuXFenZxpP05jtBD62bQbFtZggaL2zUtVe5xW1EqNG0hIdwTAMcAADIh7ZAA9hQKtgKFe7DPXLjHEMAfnEu_FsIeoACOjtFmBCOcIOy6nqooFhKfKUdQz68oX0mdpzGsCtBDJVTN5RTwjL1cJ9GYBCveZb3uAax7YNY9sEvoatnmecHd8Wvtci7-IDvHSQDwpabpc2xaQte5uFe_f-7t9G_Vz9COrZ1ljJdgGW2ms7k8R9v8JX1KZhdmf38AZTb0Sw |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dLwNBEJ8IEjwoRZRiH7xxSV3vax9LVEVbEiV9u-ztRwjupHeV8A_4t81u79oSJLxd9vYrO7uzv8nO_AZg36GSSZfVrMjzfctxHBu_aszinseFYowiRjLJJvxuN-j36VUeFJYW3u7Fk6TR1FPBbq4OB0PjCWGB41qoeOccGxG-ttGvb8f6Vz8djTgIPAsvz34eKvN9H5-vownG_PIsam6bZul_81yB5RxdksZoO6zCjIzLUCoyN5D8IJdhqTNma03X4P0YS--e2OAhJYkiJ0PBrGO83QQ566CAyHWi3acJwlvSSySi1uyNsFiQFosf5CPpGJZ_mZqyxtSDOMkSbGCSMLySS9RNT3nQpx7l6i7JNC0v0bNLYu3PsQ43zdPeScvKEzRY3K5rqXKP24pSoWkLfZ97AuEYAkAmpB35iD2Fwq1A0Q4OmIt2jI_4g3MZ1CPcAxTB0QbMxjjCJhDX9VRNscgPmHIE9YKaCpTUeRqjI4F6qAJHhZxCnrOX6yQaj-GEd1mve4jrHpp1D-0KHIzbPI-4O36tXS3EH-bnOA0RvtQ1fY5NK3BYiHvy--fetv5WfQ8WWr1OO2yfdy-2YdHWjjPGY7AKs9lgKHdgnr9k9-lg1-z1D-BA9y8 |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1ZT-MwEB4hFiF4WFgOUWDBD7wtEW2ay49cXRDbUolDfYscHwIBCWoCEvwB_jYzbtJ2ESAh3iLHlzzj8Wd55huALY9roX1Rd5IgDB3P81z8qgtHBoFURgiOGMkmmwg7najX492xKH7r7V49SQ5iGoilKS127pXZGQt88yk0DC9SCBE830Ej_MMjdzm6r59dDm0xPSMN-AgCBw_SXhk2834f_x9NI7z55onUnjytue_PeR5-lqiT7Q7U5BdM6HQB5qqMDqzc4Asw2x6yuOaL8LKHpVd3on-Ts8yw_QclnD089RT720bBsbOM3KoZwl52nmlEs8UzE6liRyK90besbdn_dW7LdsceylmRYQObnOGJnaLNuiuDQWmU7lVWEF0vo9llKfl5LMFF6_B8_8gpEzc40m2StGUgXcO5IjrDMJSBQpiGwFAo7SYhYlJlUEU43o8j4eP9JkRcIqWOmgnqBkfQtAyTKY6wAsz3A1M3IgkjYTzFg6huIqMpf2PSUGifatCoZBbLktWckmvcxiM-Zlr3GNc9tuseuzX4M2xzP-D0-LT2eqUKcbm_8xhhTZNodVxeg-1K9KPfH_e2-rXqmzDdPWjF_447J2sw45I_jXUkXIfJov-gf8OUfCyu8_6GVftXoa8AGQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Benchmarks+of+Cuda-Based+GMRES+Solver+for+Toeplitz+and+Hankel+Matrices+and+Applications+to+Topology+Optimization+of+Photonic+Components&rft.jtitle=Computational+mathematics+and+modeling&rft.au=Minin%2C+Iu.+B.&rft.au=Matveev%2C+S.+A.&rft.au=Fedorov%2C+M.+V.&rft.au=Zacharov%2C+I.+E.&rft.date=2021-10-01&rft.issn=1046-283X&rft.eissn=1573-837X&rft.volume=32&rft.issue=4&rft.spage=438&rft.epage=452&rft_id=info:doi/10.1007%2Fs10598-022-09545-2&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s10598_022_09545_2 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1046-283X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1046-283X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1046-283X&client=summon |