Accelerating solutions of one-dimensional unsteady PDEs with GPU-based swept time–space decomposition

•A GPU implementation of the swept time–space decomposition rule is presented.•Three versions of the scheme are considered.•The shared-memory implementation outperforms the other versions.•The best swept scheme outperforms the classic method by 2–9 times. The expedient design of precision components...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of computational physics Jg. 357; S. 338 - 352
Hauptverfasser:	Magee, Daniel J., Niemeyer, Kyle E.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Cambridge Elsevier Inc 15.03.2018 Elsevier Science Ltd
Schlagworte:	Aerospace engineering Aerospace industry Aircraft components Communication Communication-avoiding algorithms Computational fluid dynamics Computational physics Computer simulation Decomposition Domain decomposition Exhausting Finite difference method GPU computing High-performance computing Influence Memory Parallel processing Partial differential equations Solvers Studies Partial differential equations Computational fluid dynamics High-performance computing Communication-avoiding algorithms Domain decomposition GPU computing
ISSN:	0021-9991, 1090-2716
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	•A GPU implementation of the swept time–space decomposition rule is presented.•Three versions of the scheme are considered.•The shared-memory implementation outperforms the other versions.•The best swept scheme outperforms the classic method by 2–9 times. The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution difficult to achieve in reasonable amounts of time—even in effectively parallelized solvers. Though the scale of the problem relative to available computing power is the greatest impediment to accelerating these applications, significant performance gains can be achieved through careful attention to the details of memory communication and access. The swept time–space decomposition rule reduces communication between sub-domains by exhausting the domain of influence before communicating boundary values. Here we present a GPU implementation of the swept rule, which modifies the algorithm for improved performance on this processing architecture by prioritizing use of private (shared) memory, avoiding interblock communication, and overwriting unnecessary values. It shows significant improvement in the execution time of finite-difference solvers for one-dimensional unsteady PDEs, producing speedups of 2–9× for a range of problem sizes, respectively, compared with simple GPU versions and 7–300× compared with parallel CPU versions. However, for a more sophisticated one-dimensional system of equations discretized with a second-order finite-volume scheme, the swept rule performs 1.2–1.9× worse than a standard implementation for all problem sizes.
AbstractList	•A GPU implementation of the swept time–space decomposition rule is presented.•Three versions of the scheme are considered.•The shared-memory implementation outperforms the other versions.•The best swept scheme outperforms the classic method by 2–9 times. The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution difficult to achieve in reasonable amounts of time—even in effectively parallelized solvers. Though the scale of the problem relative to available computing power is the greatest impediment to accelerating these applications, significant performance gains can be achieved through careful attention to the details of memory communication and access. The swept time–space decomposition rule reduces communication between sub-domains by exhausting the domain of influence before communicating boundary values. Here we present a GPU implementation of the swept rule, which modifies the algorithm for improved performance on this processing architecture by prioritizing use of private (shared) memory, avoiding interblock communication, and overwriting unnecessary values. It shows significant improvement in the execution time of finite-difference solvers for one-dimensional unsteady PDEs, producing speedups of 2–9× for a range of problem sizes, respectively, compared with simple GPU versions and 7–300× compared with parallel CPU versions. However, for a more sophisticated one-dimensional system of equations discretized with a second-order finite-volume scheme, the swept rule performs 1.2–1.9× worse than a standard implementation for all problem sizes. The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution difficult to achieve in reasonable amounts of time-even in effectively parallelized solvers. Though the scale of the problem relative to available computing power is the greatest impediment to accelerating these applications, significant performance gains can be achieved through careful attention to the details of memory communication and access. The swept time-space decomposition rule reduces communication between sub- domains by exhausting the domain of influence before communicating boundary values. Here we present a GPU implementation of the swept rule, which modifies the algorithm for improved performance on this processing architecture by prioritizing use of private (shared) memory, avoiding interblock communication, and overwriting unnecessary values. It shows significant improvement in the execution time of finite-difference solvers for one-dimensional unsteady PDEs, producing speedups of 2-9 x for a range of problem sizes, respectively, compared with simple GPU versions and 7-300 x compared with parallel Cpu versions. However, for a more sophisticated one-dimensional system of equations discretized with a second-order finite-volume scheme, the swept rule performs 1.2-1.9 x worse than a standard implementation for all problem sizes.
Author	Magee, Daniel J. Niemeyer, Kyle E.
Author_xml	– sequence: 1 givenname: Daniel J. orcidid: 0000-0001-9152-3656 surname: Magee fullname: Magee, Daniel J. – sequence: 2 givenname: Kyle E. orcidid: 0000-0003-4425-7097 surname: Niemeyer fullname: Niemeyer, Kyle E. email: kyle.niemeyer@oregonstate.edu
BookMark	eNp9kMFO3DAQhi0EEgvlAbhZ4pwwdjYbRz0hSgEJqRzK2RomE3CUjYPtLeLWd-ANeRK82p564GRp9H8z_r8jsT_5iYU4VVAqUKvzoRxoLjWoplS6BG32xEJBC4Vu1GpfLAC0Ktq2VYfiKMYBAEy9NAvxdEHEIwdMbnqS0Y-b5PwUpe9lPlB0bs1TzBMc5WaKibF7k_c_rqJ8delZXt8_FI8YuZPxleckU45__H2PMxLLjsmvZx_dduM3cdDjGPnk33ssHn5e_b68Ke5-Xd9eXtwVVOk6Fagq4ob0o-GeoesVVWjqGokMMrY9mEobA0tcIbaocivkpkOlsV_W1FJ1LM52e-fgXzYckx38JuTvR6uhyhIAqian1C5FwccYuLdzcGsMb1aB3fq0g80-7danVdpmn5lp_mPIJdx2SwHd-CX5fUdyLv7HcbCRHE_EnQtMyXbefUF_At8vlQA
CitedBy_id	crossref_primary_10_1007_s11227_020_03340_9 crossref_primary_10_1007_s40314_020_01357_7 crossref_primary_10_1016_j_camwa_2022_08_015 crossref_primary_10_3390_mca26030052
Cites_doi	10.1016/j.jpdc.2012.04.003 10.1137/140991133 10.1007/s11227-013-1015-7 10.1016/j.cpc.2014.07.011 10.1109/JPROC.2008.917757 10.1016/j.cpc.2011.05.002 10.1016/j.jcp.2015.11.026 10.1002/pamm.201410456 10.1016/j.procs.2012.04.003 10.1145/1022594.1022596
ContentType	Journal Article
Copyright	2017 Elsevier Inc. Copyright Elsevier Science Ltd. Mar 15, 2018
Copyright_xml	– notice: 2017 Elsevier Inc. – notice: Copyright Elsevier Science Ltd. Mar 15, 2018
DBID	AAYXX CITATION 7SC 7SP 7U5 8FD JQ2 L7M L~C L~D
DOI	10.1016/j.jcp.2017.12.028
DatabaseName	CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Solid State and Superconductivity Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Solid State and Superconductivity Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISSN	1090-2716
EndPage	352
ExternalDocumentID	10_1016_j_jcp_2017_12_028 S0021999117309221
GroupedDBID	--K --M -~X .~1 0R~ 1B1 1RT 1~. 1~5 4.4 457 4G. 5GY 5VS 6OB 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAXUO AAYFN ABBOA ABFRF ABJNI ABMAC ABNEU ABYKQ ACBEA ACDAQ ACFVG ACGFO ACGFS ACNCT ACRLP ACZNC ADBBV ADEZE AEBSH AEFWE AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AIVDX AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD AXJTR BKOJK BLXMC CS3 DM4 DU5 EBS EFBJH EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HLZ HVGLF IHE J1W K-O KOM LG5 LX9 LZ4 M37 M41 MO0 N9A O-L O9- OAUVE OGIMB OZT P-8 P-9 P2P PC. Q38 RIG RNS ROL RPZ SDF SDG SDP SES SPC SPCBC SPD SSQ SSV SSZ T5K TN5 UPT YQT ZMT ZU3 ~02 ~G- 29K 6TJ 8WZ 9DU A6W AAQXK AATTM AAXKI AAYWO AAYXX ABFNM ABWVN ABXDB ACLOT ACNNM ACRPL ACVFH ADCNI ADFGL ADIYS ADJOM ADMUD ADNMO AEIPS AEUPX AFFNX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN BBWZM CAG CITATION COF D-I EFKBS FGOYB G-2 HME HMV HZ~ NDZJH R2- SBC SEW SHN SPG T9H UQL WUQ ZY4 ~HD 7SC 7SP 7U5 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c325t-a13ce7c2b8efe0df1c3a855acc8aea9f08328804a6aa9a1002ae7da12af45c9c3
ISICitedReferencesCount	6
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000427393800016&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	0021-9991
IngestDate	Sun Nov 30 03:52:22 EST 2025 Sat Nov 29 03:10:19 EST 2025 Tue Nov 18 22:37:01 EST 2025 Fri Feb 23 02:17:18 EST 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Partial differential equations Computational fluid dynamics High-performance computing Communication-avoiding algorithms Domain decomposition GPU computing
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c325t-a13ce7c2b8efe0df1c3a855acc8aea9f08328804a6aa9a1002ae7da12af45c9c3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-9152-3656 0000-0003-4425-7097
PQID	2030210037
PQPubID	2047462
PageCount	15
ParticipantIDs	proquest_journals_2030210037 crossref_primary_10_1016_j_jcp_2017_12_028 crossref_citationtrail_10_1016_j_jcp_2017_12_028 elsevier_sciencedirect_doi_10_1016_j_jcp_2017_12_028
PublicationCentury	2000
PublicationDate	2018-03-15
PublicationDateYYYYMMDD	2018-03-15
PublicationDate_xml	– month: 03 year: 2018 text: 2018-03-15 day: 15
PublicationDecade	2010
PublicationPlace	Cambridge
PublicationPlace_xml	– name: Cambridge
PublicationTitle	Journal of computational physics
PublicationYear	2018
Publisher	Elsevier Inc Elsevier Science Ltd
Publisher_xml	– name: Elsevier Inc – name: Elsevier Science Ltd
References	Brodtkorb, Hagen, Sætra (br0190) 2013; 73 Alhubail, Wang (br0030) 2016; 307 NVIDIA Corporation (br0180) 2016 Cruz, Layton, Barba (br0210) 2011; 182 Harris (br0240) Feb. 2014 Owens, Houston, Luebke, Green, Stone, Phillips (br0170) 2008; 96 Magee, Niemeyer (br0220) Nov. 2017 NVIDIA Corporation (br0070) 2016 Datta, Murphy, Volkov, Williams, Carter, Oliker, Patterson, Shalf, Yelick (br0120) 2008 Magee, Niemeyer (br0250) May 2017 Xiao, Feng (br0260) 2010 Gander (br0130) 2015; vol. 9 Alhubail, Wang (br0090) 2015 Malas, Hager, Ltaief, Stengel, Wellein, Keyes (br0110) 2015; 37 Anderson, Ballard, Demmel, Keutzer (br0160) 2011 Storti, Yurtoglu (br0200) 2015 Bermejo-Moreno, Bodart, Larsson, Barney, Nichols, Jones (br0050) 2013 Witherden, Farrington, Vincent (br0080) 2014; 185 Strzodka, Shaheen, Pajak, Seidel (br0100) 2010 Patterson (br0020) 2004; 47 Niemeyer, Sung (br0060) 2014; 67 Baboulin, Donfack, Dongarra, Grigori, Rémy, Tomov (br0150) 2012; 9 Alhubail, Wang, Williams (br0040) 2016 Slotnick, Khodadoust, Alonso, Darmofal, Gropp, Lurie, Mavriplis (br0010) Mar. 2014 Wang (br0230) 2017 Falgout, Friedhoff, Kolev, MacLachlan, Schroder (br0140) 2014; 14 Falgout (10.1016/j.jcp.2017.12.028_br0140) 2014; 14 Storti (10.1016/j.jcp.2017.12.028_br0200) 2015 Harris (10.1016/j.jcp.2017.12.028_br0240) Xiao (10.1016/j.jcp.2017.12.028_br0260) 2010 Alhubail (10.1016/j.jcp.2017.12.028_br0040) Magee (10.1016/j.jcp.2017.12.028_br0250) Niemeyer (10.1016/j.jcp.2017.12.028_br0060) 2014; 67 Strzodka (10.1016/j.jcp.2017.12.028_br0100) 2010 Datta (10.1016/j.jcp.2017.12.028_br0120) 2008 Magee (10.1016/j.jcp.2017.12.028_br0220) Alhubail (10.1016/j.jcp.2017.12.028_br0090) Wang (10.1016/j.jcp.2017.12.028_br0230) Malas (10.1016/j.jcp.2017.12.028_br0110) 2015; 37 NVIDIA Corporation (10.1016/j.jcp.2017.12.028_br0180) Slotnick (10.1016/j.jcp.2017.12.028_br0010) 2014 Alhubail (10.1016/j.jcp.2017.12.028_br0030) 2016; 307 Witherden (10.1016/j.jcp.2017.12.028_br0080) 2014; 185 Brodtkorb (10.1016/j.jcp.2017.12.028_br0190) 2013; 73 Bermejo-Moreno (10.1016/j.jcp.2017.12.028_br0050) 2013 Cruz (10.1016/j.jcp.2017.12.028_br0210) 2011; 182 Baboulin (10.1016/j.jcp.2017.12.028_br0150) 2012; 9 Anderson (10.1016/j.jcp.2017.12.028_br0160) 2011 Patterson (10.1016/j.jcp.2017.12.028_br0020) 2004; 47 NVIDIA Corporation (10.1016/j.jcp.2017.12.028_br0070) Gander (10.1016/j.jcp.2017.12.028_br0130) 2015; vol. 9 Owens (10.1016/j.jcp.2017.12.028_br0170) 2008; 96
References_xml	– start-page: 48 year: 2011 end-page: 58 ident: br0160 article-title: Communication-avoiding QR decomposition for GPUs publication-title: Parallel Distributed Processing Symposium – volume: 185 start-page: 3028 year: 2014 end-page: 3040 ident: br0080 article-title: PyFR: an open source framework for solving advection–diffusion type problems on streaming architectures using the flux reconstruction approach publication-title: Comput. Phys. Commun. – year: 2015 ident: br0090 article-title: KSIDSwept, git commit e575d73 – year: 2016 ident: br0180 article-title: CUDA C programming guide – volume: 47 start-page: 71 year: 2004 end-page: 75 ident: br0020 article-title: Latency lags bandwith publication-title: Commun. ACM – volume: 9 start-page: 17 year: 2012 end-page: 26 ident: br0150 article-title: A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines publication-title: Proc. Comput. Sci. – volume: 96 start-page: 879 year: 2008 end-page: 899 ident: br0170 article-title: GPU computing publication-title: Proc. IEEE – year: Feb. 2014 ident: br0240 article-title: CUDA pro tip: do the Kepler shuffle – year: 2016 ident: br0070 article-title: Whitepaper Nvidia Tesla P100 – start-page: 4:1 year: 2008 end-page: 4:12 ident: br0120 article-title: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures publication-title: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing – volume: vol. 9 start-page: 69 year: 2015 end-page: 113 ident: br0130 article-title: 50 years of time parallel time integration publication-title: Multiple Shooting and Time Domain Decomposition Methods – volume: 37 start-page: C439 year: 2015 end-page: C464 ident: br0110 article-title: Multicore-optimized wavefront diamond blocking for optimizing stencil updates publication-title: SIAM J. Sci. Comput. – start-page: 1 year: 2010 end-page: 12 ident: br0260 article-title: Inter-block GPU communication via fast barrier synchronization publication-title: 2010 IEEE International Symposium on Parallel Distributed Processing – volume: 182 start-page: 2084 year: 2011 end-page: 2098 ident: br0210 article-title: How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms publication-title: Comput. Phys. Commun. – volume: 307 start-page: 110 year: 2016 end-page: 121 ident: br0030 article-title: The swept rule for breaking the latency barrier in time advancing PDEs publication-title: J. Comput. Phys. – year: 2017 ident: br0230 article-title: Decomposition of stencil update formula into atomic stages – year: Mar. 2014 ident: br0010 article-title: CFD Vision 2030 Study: A Path to Revolutionary Computational Aerosciences – volume: 73 start-page: 4 year: 2013 end-page: 13 ident: br0190 article-title: Graphics processing unit (GPU) programming strategies and trends in GPU computing publication-title: J. Parallel Distrib. Comput. – volume: 14 start-page: 951 year: 2014 end-page: 952 ident: br0140 article-title: Parallel time integration with multigrid publication-title: PAMM – volume: 67 start-page: 528 year: 2014 end-page: 564 ident: br0060 article-title: Recent progress and challenges in exploiting graphics processors in computational fluid dynamics publication-title: J. Supercomput. – year: 2015 ident: br0200 article-title: CUDA for Engineers: An Introduction to High-Performance Parallel Computing – year: May 2017 ident: br0250 article-title: Niemeyer-Research-Group/1DSweptCUDA: v2 – start-page: 49 year: 2010 end-page: 59 ident: br0100 article-title: Cache oblivious parallelograms in iterative stencil computations publication-title: Proceedings of the 24th ACM International Conference on Supercomputing – start-page: 62:1 year: 2013 end-page: 62:10 ident: br0050 article-title: Solving the compressible Navier–Stokes equations on up to 1.97 million cores and 4.1 trillion grid points publication-title: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis – year: Nov. 2017 ident: br0220 article-title: Data, plotting scripts, and figures for “Accelerating solutions of PDEs with GPU-based swept time–space decomposition” – year: 2016 ident: br0040 article-title: The swept rule for breaking the latency barrier in time advancing two-dimensional PDEs – start-page: 48 year: 2011 ident: 10.1016/j.jcp.2017.12.028_br0160 article-title: Communication-avoiding QR decomposition for GPUs – start-page: 4:1 year: 2008 ident: 10.1016/j.jcp.2017.12.028_br0120 article-title: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures – volume: 73 start-page: 4 issue: 1 year: 2013 ident: 10.1016/j.jcp.2017.12.028_br0190 article-title: Graphics processing unit (GPU) programming strategies and trends in GPU computing publication-title: J. Parallel Distrib. Comput. doi: 10.1016/j.jpdc.2012.04.003 – ident: 10.1016/j.jcp.2017.12.028_br0070 – ident: 10.1016/j.jcp.2017.12.028_br0040 – volume: 37 start-page: C439 issue: 4 year: 2015 ident: 10.1016/j.jcp.2017.12.028_br0110 article-title: Multicore-optimized wavefront diamond blocking for optimizing stencil updates publication-title: SIAM J. Sci. Comput. doi: 10.1137/140991133 – ident: 10.1016/j.jcp.2017.12.028_br0240 – year: 2014 ident: 10.1016/j.jcp.2017.12.028_br0010 – volume: 67 start-page: 528 issue: 2 year: 2014 ident: 10.1016/j.jcp.2017.12.028_br0060 article-title: Recent progress and challenges in exploiting graphics processors in computational fluid dynamics publication-title: J. Supercomput. doi: 10.1007/s11227-013-1015-7 – start-page: 49 year: 2010 ident: 10.1016/j.jcp.2017.12.028_br0100 article-title: Cache oblivious parallelograms in iterative stencil computations – year: 2015 ident: 10.1016/j.jcp.2017.12.028_br0200 – volume: 185 start-page: 3028 issue: 11 year: 2014 ident: 10.1016/j.jcp.2017.12.028_br0080 article-title: PyFR: an open source framework for solving advection–diffusion type problems on streaming architectures using the flux reconstruction approach publication-title: Comput. Phys. Commun. doi: 10.1016/j.cpc.2014.07.011 – volume: 96 start-page: 879 issue: 5 year: 2008 ident: 10.1016/j.jcp.2017.12.028_br0170 article-title: GPU computing publication-title: Proc. IEEE doi: 10.1109/JPROC.2008.917757 – ident: 10.1016/j.jcp.2017.12.028_br0220 – ident: 10.1016/j.jcp.2017.12.028_br0180 – volume: vol. 9 start-page: 69 year: 2015 ident: 10.1016/j.jcp.2017.12.028_br0130 article-title: 50 years of time parallel time integration – ident: 10.1016/j.jcp.2017.12.028_br0090 – ident: 10.1016/j.jcp.2017.12.028_br0230 – start-page: 1 year: 2010 ident: 10.1016/j.jcp.2017.12.028_br0260 article-title: Inter-block GPU communication via fast barrier synchronization – volume: 182 start-page: 2084 issue: 10 year: 2011 ident: 10.1016/j.jcp.2017.12.028_br0210 article-title: How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms publication-title: Comput. Phys. Commun. doi: 10.1016/j.cpc.2011.05.002 – volume: 307 start-page: 110 year: 2016 ident: 10.1016/j.jcp.2017.12.028_br0030 article-title: The swept rule for breaking the latency barrier in time advancing PDEs publication-title: J. Comput. Phys. doi: 10.1016/j.jcp.2015.11.026 – start-page: 62:1 year: 2013 ident: 10.1016/j.jcp.2017.12.028_br0050 article-title: Solving the compressible Navier–Stokes equations on up to 1.97 million cores and 4.1 trillion grid points – volume: 14 start-page: 951 year: 2014 ident: 10.1016/j.jcp.2017.12.028_br0140 article-title: Parallel time integration with multigrid publication-title: PAMM doi: 10.1002/pamm.201410456 – ident: 10.1016/j.jcp.2017.12.028_br0250 – volume: 9 start-page: 17 year: 2012 ident: 10.1016/j.jcp.2017.12.028_br0150 article-title: A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines publication-title: Proc. Comput. Sci. doi: 10.1016/j.procs.2012.04.003 – volume: 47 start-page: 71 issue: 10 year: 2004 ident: 10.1016/j.jcp.2017.12.028_br0020 article-title: Latency lags bandwith publication-title: Commun. ACM doi: 10.1145/1022594.1022596
SSID	ssj0008548
Score	2.2847972
Snippet	•A GPU implementation of the swept time–space decomposition rule is presented.•Three versions of the scheme are considered.•The shared-memory implementation... The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial...
SourceID	proquest crossref elsevier
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	338
SubjectTerms	Aerospace engineering Aerospace industry Aircraft components Communication Communication-avoiding algorithms Computational fluid dynamics Computational physics Computer simulation Decomposition Domain decomposition Exhausting Finite difference method GPU computing High-performance computing Influence Memory Parallel processing Partial differential equations Solvers Studies
Title	Accelerating solutions of one-dimensional unsteady PDEs with GPU-based swept time–space decomposition
URI	https://dx.doi.org/10.1016/j.jcp.2017.12.028 https://www.proquest.com/docview/2030210037
Volume	357
WOSCitedRecordID	wos000427393800016&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1090-2716 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0008548 issn: 0021-9991 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9NAEF2FlgMXvhGFgvaAOBAtsr12bB8jSPlQCDkkUm6ryXpdNQpuaNKquSH-Av-QX8KMd9ehQY3gwMWK7Di2dp5nJuN5bxh7UUozBQiNwHQ5EzHkiQDqd4Ayi6HsaJ0XNVG4nw4G2WSSD1ut754LczFPqyq7vMwX_9XUuA-NTdTZfzB386O4Az-j0XGLZsftXxm-qzWGEjIsVQr8pSgpPK2MKEjN3ypxtM-r2sTr9vBtz7Hc3g3HggJb0SZF01U9et73Q0h0PppoVtSG7nq9rsltdT0rwtcZbfWkSd4_oQszG3r75sXU4MR8MWvX5bGeG8eScDWJsCbpWVamLZR5soz3T5sOJUsgCAXlpTYEWb8b5IGIUku79I5ZWulq51qlVYH5w-Xb6sPs9UyT_GiY1tVdRzi_Iq89-KyOxv2-GvUmo5eLr4Imj9EbejeG5Qbbj9IkR8-43_3Qm3xs4nmWxDaeu9v278brLsGtq16X3WzF-Tp5Gd1lt51leNei5R5rmeo-u-P-gXC3fssH7Ph38PAGPPy05Fvg4R48nMDDCTy8AQ-vwcMJPD-__ahhw6_A5iEbH_VGb94LN4pDaBklKwGh1CbV0TQzpQmKMtQSsiQBrTMwkJeYyEcYCWLoAORAsr5g0gLCCMo40bmWj9hehTf6mHFZkgBTkeTTDOIkKEFi0AHMYmMDRnemByzwS6i006mncSlz5RsSZwpXXdGqqzBSuOoH7FVzysKKtOz6cuztolyWabNHhYjaddqht6FyT_sSj0uqmQQyfbL78FN2a_OcHLK91dm5ecZu6ovVyfLsuUPcLxW9qYY
linkProvider	Elsevier
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Accelerating+solutions+of+one-dimensional+unsteady+PDEs+with+GPU-based+swept+time%E2%80%93space+decomposition&rft.jtitle=Journal+of+computational+physics&rft.au=Magee%2C+Daniel+J&rft.au=Niemeyer%2C+Kyle+E&rft.date=2018-03-15&rft.pub=Elsevier+Science+Ltd&rft.issn=0021-9991&rft.eissn=1090-2716&rft.volume=357&rft.spage=338&rft_id=info:doi/10.1016%2Fj.jcp.2017.12.028&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0021-9991&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0021-9991&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0021-9991&client=summon