Accelerating solutions of one-dimensional unsteady PDEs with GPU-based swept time–space decomposition

•A GPU implementation of the swept time–space decomposition rule is presented.•Three versions of the scheme are considered.•The shared-memory implementation outperforms the other versions.•The best swept scheme outperforms the classic method by 2–9 times. The expedient design of precision components...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of computational physics Jg. 357; S. 338 - 352
Hauptverfasser: Magee, Daniel J., Niemeyer, Kyle E.
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Cambridge Elsevier Inc 15.03.2018
Elsevier Science Ltd
Schlagworte:
ISSN:0021-9991, 1090-2716
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract •A GPU implementation of the swept time–space decomposition rule is presented.•Three versions of the scheme are considered.•The shared-memory implementation outperforms the other versions.•The best swept scheme outperforms the classic method by 2–9 times. The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution difficult to achieve in reasonable amounts of time—even in effectively parallelized solvers. Though the scale of the problem relative to available computing power is the greatest impediment to accelerating these applications, significant performance gains can be achieved through careful attention to the details of memory communication and access. The swept time–space decomposition rule reduces communication between sub-domains by exhausting the domain of influence before communicating boundary values. Here we present a GPU implementation of the swept rule, which modifies the algorithm for improved performance on this processing architecture by prioritizing use of private (shared) memory, avoiding interblock communication, and overwriting unnecessary values. It shows significant improvement in the execution time of finite-difference solvers for one-dimensional unsteady PDEs, producing speedups of 2–9× for a range of problem sizes, respectively, compared with simple GPU versions and 7–300× compared with parallel CPU versions. However, for a more sophisticated one-dimensional system of equations discretized with a second-order finite-volume scheme, the swept rule performs 1.2–1.9× worse than a standard implementation for all problem sizes.
AbstractList •A GPU implementation of the swept time–space decomposition rule is presented.•Three versions of the scheme are considered.•The shared-memory implementation outperforms the other versions.•The best swept scheme outperforms the classic method by 2–9 times. The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution difficult to achieve in reasonable amounts of time—even in effectively parallelized solvers. Though the scale of the problem relative to available computing power is the greatest impediment to accelerating these applications, significant performance gains can be achieved through careful attention to the details of memory communication and access. The swept time–space decomposition rule reduces communication between sub-domains by exhausting the domain of influence before communicating boundary values. Here we present a GPU implementation of the swept rule, which modifies the algorithm for improved performance on this processing architecture by prioritizing use of private (shared) memory, avoiding interblock communication, and overwriting unnecessary values. It shows significant improvement in the execution time of finite-difference solvers for one-dimensional unsteady PDEs, producing speedups of 2–9× for a range of problem sizes, respectively, compared with simple GPU versions and 7–300× compared with parallel CPU versions. However, for a more sophisticated one-dimensional system of equations discretized with a second-order finite-volume scheme, the swept rule performs 1.2–1.9× worse than a standard implementation for all problem sizes.
The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution difficult to achieve in reasonable amounts of time-even in effectively parallelized solvers. Though the scale of the problem relative to available computing power is the greatest impediment to accelerating these applications, significant performance gains can be achieved through careful attention to the details of memory communication and access. The swept time-space decomposition rule reduces communication between sub- domains by exhausting the domain of influence before communicating boundary values. Here we present a GPU implementation of the swept rule, which modifies the algorithm for improved performance on this processing architecture by prioritizing use of private (shared) memory, avoiding interblock communication, and overwriting unnecessary values. It shows significant improvement in the execution time of finite-difference solvers for one-dimensional unsteady PDEs, producing speedups of 2-9 x for a range of problem sizes, respectively, compared with simple GPU versions and 7-300 x compared with parallel Cpu versions. However, for a more sophisticated one-dimensional system of equations discretized with a second-order finite-volume scheme, the swept rule performs 1.2-1.9 x worse than a standard implementation for all problem sizes.
Author Magee, Daniel J.
Niemeyer, Kyle E.
Author_xml – sequence: 1
  givenname: Daniel J.
  orcidid: 0000-0001-9152-3656
  surname: Magee
  fullname: Magee, Daniel J.
– sequence: 2
  givenname: Kyle E.
  orcidid: 0000-0003-4425-7097
  surname: Niemeyer
  fullname: Niemeyer, Kyle E.
  email: kyle.niemeyer@oregonstate.edu
BookMark eNp9kMFO3DAQhi0EEgvlAbhZ4pwwdjYbRz0hSgEJqRzK2RomE3CUjYPtLeLWd-ANeRK82p564GRp9H8z_r8jsT_5iYU4VVAqUKvzoRxoLjWoplS6BG32xEJBC4Vu1GpfLAC0Ktq2VYfiKMYBAEy9NAvxdEHEIwdMbnqS0Y-b5PwUpe9lPlB0bs1TzBMc5WaKibF7k_c_rqJ8delZXt8_FI8YuZPxleckU45__H2PMxLLjsmvZx_dduM3cdDjGPnk33ssHn5e_b68Ke5-Xd9eXtwVVOk6Fagq4ob0o-GeoesVVWjqGokMMrY9mEobA0tcIbaocivkpkOlsV_W1FJ1LM52e-fgXzYckx38JuTvR6uhyhIAqian1C5FwccYuLdzcGsMb1aB3fq0g80-7danVdpmn5lp_mPIJdx2SwHd-CX5fUdyLv7HcbCRHE_EnQtMyXbefUF_At8vlQA
CitedBy_id crossref_primary_10_1007_s11227_020_03340_9
crossref_primary_10_1007_s40314_020_01357_7
crossref_primary_10_1016_j_camwa_2022_08_015
crossref_primary_10_3390_mca26030052
Cites_doi 10.1016/j.jpdc.2012.04.003
10.1137/140991133
10.1007/s11227-013-1015-7
10.1016/j.cpc.2014.07.011
10.1109/JPROC.2008.917757
10.1016/j.cpc.2011.05.002
10.1016/j.jcp.2015.11.026
10.1002/pamm.201410456
10.1016/j.procs.2012.04.003
10.1145/1022594.1022596
ContentType Journal Article
Copyright 2017 Elsevier Inc.
Copyright Elsevier Science Ltd. Mar 15, 2018
Copyright_xml – notice: 2017 Elsevier Inc.
– notice: Copyright Elsevier Science Ltd. Mar 15, 2018
DBID AAYXX
CITATION
7SC
7SP
7U5
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.jcp.2017.12.028
DatabaseName CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Solid State and Superconductivity Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Solid State and Superconductivity Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISSN 1090-2716
EndPage 352
ExternalDocumentID 10_1016_j_jcp_2017_12_028
S0021999117309221
GroupedDBID --K
--M
-~X
.~1
0R~
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
6OB
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAXUO
AAYFN
ABBOA
ABFRF
ABJNI
ABMAC
ABNEU
ABYKQ
ACBEA
ACDAQ
ACFVG
ACGFO
ACGFS
ACNCT
ACRLP
ACZNC
ADBBV
ADEZE
AEBSH
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AIVDX
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
AXJTR
BKOJK
BLXMC
CS3
DM4
DU5
EBS
EFBJH
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
IHE
J1W
K-O
KOM
LG5
LX9
LZ4
M37
M41
MO0
N9A
O-L
O9-
OAUVE
OGIMB
OZT
P-8
P-9
P2P
PC.
Q38
RIG
RNS
ROL
RPZ
SDF
SDG
SDP
SES
SPC
SPCBC
SPD
SSQ
SSV
SSZ
T5K
TN5
UPT
YQT
ZMT
ZU3
~02
~G-
29K
6TJ
8WZ
9DU
A6W
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABFNM
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADFGL
ADIYS
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFFNX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
BBWZM
CAG
CITATION
COF
D-I
EFKBS
FGOYB
G-2
HME
HMV
HZ~
NDZJH
R2-
SBC
SEW
SHN
SPG
T9H
UQL
WUQ
ZY4
~HD
7SC
7SP
7U5
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c325t-a13ce7c2b8efe0df1c3a855acc8aea9f08328804a6aa9a1002ae7da12af45c9c3
ISICitedReferencesCount 6
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000427393800016&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0021-9991
IngestDate Sun Nov 30 03:52:22 EST 2025
Sat Nov 29 03:10:19 EST 2025
Tue Nov 18 22:37:01 EST 2025
Fri Feb 23 02:17:18 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Partial differential equations
Computational fluid dynamics
High-performance computing
Communication-avoiding algorithms
Domain decomposition
GPU computing
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c325t-a13ce7c2b8efe0df1c3a855acc8aea9f08328804a6aa9a1002ae7da12af45c9c3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-9152-3656
0000-0003-4425-7097
PQID 2030210037
PQPubID 2047462
PageCount 15
ParticipantIDs proquest_journals_2030210037
crossref_primary_10_1016_j_jcp_2017_12_028
crossref_citationtrail_10_1016_j_jcp_2017_12_028
elsevier_sciencedirect_doi_10_1016_j_jcp_2017_12_028
PublicationCentury 2000
PublicationDate 2018-03-15
PublicationDateYYYYMMDD 2018-03-15
PublicationDate_xml – month: 03
  year: 2018
  text: 2018-03-15
  day: 15
PublicationDecade 2010
PublicationPlace Cambridge
PublicationPlace_xml – name: Cambridge
PublicationTitle Journal of computational physics
PublicationYear 2018
Publisher Elsevier Inc
Elsevier Science Ltd
Publisher_xml – name: Elsevier Inc
– name: Elsevier Science Ltd
References Brodtkorb, Hagen, Sætra (br0190) 2013; 73
Alhubail, Wang (br0030) 2016; 307
NVIDIA Corporation (br0180) 2016
Cruz, Layton, Barba (br0210) 2011; 182
Harris (br0240) Feb. 2014
Owens, Houston, Luebke, Green, Stone, Phillips (br0170) 2008; 96
Magee, Niemeyer (br0220) Nov. 2017
NVIDIA Corporation (br0070) 2016
Datta, Murphy, Volkov, Williams, Carter, Oliker, Patterson, Shalf, Yelick (br0120) 2008
Magee, Niemeyer (br0250) May 2017
Xiao, Feng (br0260) 2010
Gander (br0130) 2015; vol. 9
Alhubail, Wang (br0090) 2015
Malas, Hager, Ltaief, Stengel, Wellein, Keyes (br0110) 2015; 37
Anderson, Ballard, Demmel, Keutzer (br0160) 2011
Storti, Yurtoglu (br0200) 2015
Bermejo-Moreno, Bodart, Larsson, Barney, Nichols, Jones (br0050) 2013
Witherden, Farrington, Vincent (br0080) 2014; 185
Strzodka, Shaheen, Pajak, Seidel (br0100) 2010
Patterson (br0020) 2004; 47
Niemeyer, Sung (br0060) 2014; 67
Baboulin, Donfack, Dongarra, Grigori, Rémy, Tomov (br0150) 2012; 9
Alhubail, Wang, Williams (br0040) 2016
Slotnick, Khodadoust, Alonso, Darmofal, Gropp, Lurie, Mavriplis (br0010) Mar. 2014
Wang (br0230) 2017
Falgout, Friedhoff, Kolev, MacLachlan, Schroder (br0140) 2014; 14
Falgout (10.1016/j.jcp.2017.12.028_br0140) 2014; 14
Storti (10.1016/j.jcp.2017.12.028_br0200) 2015
Harris (10.1016/j.jcp.2017.12.028_br0240)
Xiao (10.1016/j.jcp.2017.12.028_br0260) 2010
Alhubail (10.1016/j.jcp.2017.12.028_br0040)
Magee (10.1016/j.jcp.2017.12.028_br0250)
Niemeyer (10.1016/j.jcp.2017.12.028_br0060) 2014; 67
Strzodka (10.1016/j.jcp.2017.12.028_br0100) 2010
Datta (10.1016/j.jcp.2017.12.028_br0120) 2008
Magee (10.1016/j.jcp.2017.12.028_br0220)
Alhubail (10.1016/j.jcp.2017.12.028_br0090)
Wang (10.1016/j.jcp.2017.12.028_br0230)
Malas (10.1016/j.jcp.2017.12.028_br0110) 2015; 37
NVIDIA Corporation (10.1016/j.jcp.2017.12.028_br0180)
Slotnick (10.1016/j.jcp.2017.12.028_br0010) 2014
Alhubail (10.1016/j.jcp.2017.12.028_br0030) 2016; 307
Witherden (10.1016/j.jcp.2017.12.028_br0080) 2014; 185
Brodtkorb (10.1016/j.jcp.2017.12.028_br0190) 2013; 73
Bermejo-Moreno (10.1016/j.jcp.2017.12.028_br0050) 2013
Cruz (10.1016/j.jcp.2017.12.028_br0210) 2011; 182
Baboulin (10.1016/j.jcp.2017.12.028_br0150) 2012; 9
Anderson (10.1016/j.jcp.2017.12.028_br0160) 2011
Patterson (10.1016/j.jcp.2017.12.028_br0020) 2004; 47
NVIDIA Corporation (10.1016/j.jcp.2017.12.028_br0070)
Gander (10.1016/j.jcp.2017.12.028_br0130) 2015; vol. 9
Owens (10.1016/j.jcp.2017.12.028_br0170) 2008; 96
References_xml – start-page: 48
  year: 2011
  end-page: 58
  ident: br0160
  article-title: Communication-avoiding QR decomposition for GPUs
  publication-title: Parallel Distributed Processing Symposium
– volume: 185
  start-page: 3028
  year: 2014
  end-page: 3040
  ident: br0080
  article-title: PyFR: an open source framework for solving advection–diffusion type problems on streaming architectures using the flux reconstruction approach
  publication-title: Comput. Phys. Commun.
– year: 2015
  ident: br0090
  article-title: KSIDSwept, git commit e575d73
– year: 2016
  ident: br0180
  article-title: CUDA C programming guide
– volume: 47
  start-page: 71
  year: 2004
  end-page: 75
  ident: br0020
  article-title: Latency lags bandwith
  publication-title: Commun. ACM
– volume: 9
  start-page: 17
  year: 2012
  end-page: 26
  ident: br0150
  article-title: A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines
  publication-title: Proc. Comput. Sci.
– volume: 96
  start-page: 879
  year: 2008
  end-page: 899
  ident: br0170
  article-title: GPU computing
  publication-title: Proc. IEEE
– year: Feb. 2014
  ident: br0240
  article-title: CUDA pro tip: do the Kepler shuffle
– year: 2016
  ident: br0070
  article-title: Whitepaper Nvidia Tesla P100
– start-page: 4:1
  year: 2008
  end-page: 4:12
  ident: br0120
  article-title: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
  publication-title: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing
– volume: vol. 9
  start-page: 69
  year: 2015
  end-page: 113
  ident: br0130
  article-title: 50 years of time parallel time integration
  publication-title: Multiple Shooting and Time Domain Decomposition Methods
– volume: 37
  start-page: C439
  year: 2015
  end-page: C464
  ident: br0110
  article-title: Multicore-optimized wavefront diamond blocking for optimizing stencil updates
  publication-title: SIAM J. Sci. Comput.
– start-page: 1
  year: 2010
  end-page: 12
  ident: br0260
  article-title: Inter-block GPU communication via fast barrier synchronization
  publication-title: 2010 IEEE International Symposium on Parallel Distributed Processing
– volume: 182
  start-page: 2084
  year: 2011
  end-page: 2098
  ident: br0210
  article-title: How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms
  publication-title: Comput. Phys. Commun.
– volume: 307
  start-page: 110
  year: 2016
  end-page: 121
  ident: br0030
  article-title: The swept rule for breaking the latency barrier in time advancing PDEs
  publication-title: J. Comput. Phys.
– year: 2017
  ident: br0230
  article-title: Decomposition of stencil update formula into atomic stages
– year: Mar. 2014
  ident: br0010
  article-title: CFD Vision 2030 Study: A Path to Revolutionary Computational Aerosciences
– volume: 73
  start-page: 4
  year: 2013
  end-page: 13
  ident: br0190
  article-title: Graphics processing unit (GPU) programming strategies and trends in GPU computing
  publication-title: J. Parallel Distrib. Comput.
– volume: 14
  start-page: 951
  year: 2014
  end-page: 952
  ident: br0140
  article-title: Parallel time integration with multigrid
  publication-title: PAMM
– volume: 67
  start-page: 528
  year: 2014
  end-page: 564
  ident: br0060
  article-title: Recent progress and challenges in exploiting graphics processors in computational fluid dynamics
  publication-title: J. Supercomput.
– year: 2015
  ident: br0200
  article-title: CUDA for Engineers: An Introduction to High-Performance Parallel Computing
– year: May 2017
  ident: br0250
  article-title: Niemeyer-Research-Group/1DSweptCUDA: v2
– start-page: 49
  year: 2010
  end-page: 59
  ident: br0100
  article-title: Cache oblivious parallelograms in iterative stencil computations
  publication-title: Proceedings of the 24th ACM International Conference on Supercomputing
– start-page: 62:1
  year: 2013
  end-page: 62:10
  ident: br0050
  article-title: Solving the compressible Navier–Stokes equations on up to 1.97 million cores and 4.1 trillion grid points
  publication-title: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
– year: Nov. 2017
  ident: br0220
  article-title: Data, plotting scripts, and figures for “Accelerating solutions of PDEs with GPU-based swept time–space decomposition”
– year: 2016
  ident: br0040
  article-title: The swept rule for breaking the latency barrier in time advancing two-dimensional PDEs
– start-page: 48
  year: 2011
  ident: 10.1016/j.jcp.2017.12.028_br0160
  article-title: Communication-avoiding QR decomposition for GPUs
– start-page: 4:1
  year: 2008
  ident: 10.1016/j.jcp.2017.12.028_br0120
  article-title: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
– volume: 73
  start-page: 4
  issue: 1
  year: 2013
  ident: 10.1016/j.jcp.2017.12.028_br0190
  article-title: Graphics processing unit (GPU) programming strategies and trends in GPU computing
  publication-title: J. Parallel Distrib. Comput.
  doi: 10.1016/j.jpdc.2012.04.003
– ident: 10.1016/j.jcp.2017.12.028_br0070
– ident: 10.1016/j.jcp.2017.12.028_br0040
– volume: 37
  start-page: C439
  issue: 4
  year: 2015
  ident: 10.1016/j.jcp.2017.12.028_br0110
  article-title: Multicore-optimized wavefront diamond blocking for optimizing stencil updates
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/140991133
– ident: 10.1016/j.jcp.2017.12.028_br0240
– year: 2014
  ident: 10.1016/j.jcp.2017.12.028_br0010
– volume: 67
  start-page: 528
  issue: 2
  year: 2014
  ident: 10.1016/j.jcp.2017.12.028_br0060
  article-title: Recent progress and challenges in exploiting graphics processors in computational fluid dynamics
  publication-title: J. Supercomput.
  doi: 10.1007/s11227-013-1015-7
– start-page: 49
  year: 2010
  ident: 10.1016/j.jcp.2017.12.028_br0100
  article-title: Cache oblivious parallelograms in iterative stencil computations
– year: 2015
  ident: 10.1016/j.jcp.2017.12.028_br0200
– volume: 185
  start-page: 3028
  issue: 11
  year: 2014
  ident: 10.1016/j.jcp.2017.12.028_br0080
  article-title: PyFR: an open source framework for solving advection–diffusion type problems on streaming architectures using the flux reconstruction approach
  publication-title: Comput. Phys. Commun.
  doi: 10.1016/j.cpc.2014.07.011
– volume: 96
  start-page: 879
  issue: 5
  year: 2008
  ident: 10.1016/j.jcp.2017.12.028_br0170
  article-title: GPU computing
  publication-title: Proc. IEEE
  doi: 10.1109/JPROC.2008.917757
– ident: 10.1016/j.jcp.2017.12.028_br0220
– ident: 10.1016/j.jcp.2017.12.028_br0180
– volume: vol. 9
  start-page: 69
  year: 2015
  ident: 10.1016/j.jcp.2017.12.028_br0130
  article-title: 50 years of time parallel time integration
– ident: 10.1016/j.jcp.2017.12.028_br0090
– ident: 10.1016/j.jcp.2017.12.028_br0230
– start-page: 1
  year: 2010
  ident: 10.1016/j.jcp.2017.12.028_br0260
  article-title: Inter-block GPU communication via fast barrier synchronization
– volume: 182
  start-page: 2084
  issue: 10
  year: 2011
  ident: 10.1016/j.jcp.2017.12.028_br0210
  article-title: How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms
  publication-title: Comput. Phys. Commun.
  doi: 10.1016/j.cpc.2011.05.002
– volume: 307
  start-page: 110
  year: 2016
  ident: 10.1016/j.jcp.2017.12.028_br0030
  article-title: The swept rule for breaking the latency barrier in time advancing PDEs
  publication-title: J. Comput. Phys.
  doi: 10.1016/j.jcp.2015.11.026
– start-page: 62:1
  year: 2013
  ident: 10.1016/j.jcp.2017.12.028_br0050
  article-title: Solving the compressible Navier–Stokes equations on up to 1.97 million cores and 4.1 trillion grid points
– volume: 14
  start-page: 951
  year: 2014
  ident: 10.1016/j.jcp.2017.12.028_br0140
  article-title: Parallel time integration with multigrid
  publication-title: PAMM
  doi: 10.1002/pamm.201410456
– ident: 10.1016/j.jcp.2017.12.028_br0250
– volume: 9
  start-page: 17
  year: 2012
  ident: 10.1016/j.jcp.2017.12.028_br0150
  article-title: A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines
  publication-title: Proc. Comput. Sci.
  doi: 10.1016/j.procs.2012.04.003
– volume: 47
  start-page: 71
  issue: 10
  year: 2004
  ident: 10.1016/j.jcp.2017.12.028_br0020
  article-title: Latency lags bandwith
  publication-title: Commun. ACM
  doi: 10.1145/1022594.1022596
SSID ssj0008548
Score 2.2847972
Snippet •A GPU implementation of the swept time–space decomposition rule is presented.•Three versions of the scheme are considered.•The shared-memory implementation...
The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 338
SubjectTerms Aerospace engineering
Aerospace industry
Aircraft components
Communication
Communication-avoiding algorithms
Computational fluid dynamics
Computational physics
Computer simulation
Decomposition
Domain decomposition
Exhausting
Finite difference method
GPU computing
High-performance computing
Influence
Memory
Parallel processing
Partial differential equations
Solvers
Studies
Title Accelerating solutions of one-dimensional unsteady PDEs with GPU-based swept time–space decomposition
URI https://dx.doi.org/10.1016/j.jcp.2017.12.028
https://www.proquest.com/docview/2030210037
Volume 357
WOSCitedRecordID wos000427393800016&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1090-2716
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0008548
  issn: 0021-9991
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9NAEF2FlgMXvhGFgvaAOBAtsr12bB8jSPlQCDkkUm6ryXpdNQpuaNKquSH-Av-QX8KMd9ehQY3gwMWK7Di2dp5nJuN5bxh7UUozBQiNwHQ5EzHkiQDqd4Ayi6HsaJ0XNVG4nw4G2WSSD1ut754LczFPqyq7vMwX_9XUuA-NTdTZfzB386O4Az-j0XGLZsftXxm-qzWGEjIsVQr8pSgpPK2MKEjN3ypxtM-r2sTr9vBtz7Hc3g3HggJb0SZF01U9et73Q0h0PppoVtSG7nq9rsltdT0rwtcZbfWkSd4_oQszG3r75sXU4MR8MWvX5bGeG8eScDWJsCbpWVamLZR5soz3T5sOJUsgCAXlpTYEWb8b5IGIUku79I5ZWulq51qlVYH5w-Xb6sPs9UyT_GiY1tVdRzi_Iq89-KyOxv2-GvUmo5eLr4Imj9EbejeG5Qbbj9IkR8-43_3Qm3xs4nmWxDaeu9v278brLsGtq16X3WzF-Tp5Gd1lt51leNei5R5rmeo-u-P-gXC3fssH7Ph38PAGPPy05Fvg4R48nMDDCTy8AQ-vwcMJPD-__ahhw6_A5iEbH_VGb94LN4pDaBklKwGh1CbV0TQzpQmKMtQSsiQBrTMwkJeYyEcYCWLoAORAsr5g0gLCCMo40bmWj9hehTf6mHFZkgBTkeTTDOIkKEFi0AHMYmMDRnemByzwS6i006mncSlz5RsSZwpXXdGqqzBSuOoH7FVzysKKtOz6cuztolyWabNHhYjaddqht6FyT_sSj0uqmQQyfbL78FN2a_OcHLK91dm5ecZu6ovVyfLsuUPcLxW9qYY
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Accelerating+solutions+of+one-dimensional+unsteady+PDEs+with+GPU-based+swept+time%E2%80%93space+decomposition&rft.jtitle=Journal+of+computational+physics&rft.au=Magee%2C+Daniel+J&rft.au=Niemeyer%2C+Kyle+E&rft.date=2018-03-15&rft.pub=Elsevier+Science+Ltd&rft.issn=0021-9991&rft.eissn=1090-2716&rft.volume=357&rft.spage=338&rft_id=info:doi/10.1016%2Fj.jcp.2017.12.028&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0021-9991&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0021-9991&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0021-9991&client=summon