Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid

Partial differential equation (PDE) solvers are extensively utilized across numerous scientific and engineering fields. However, achieving high performance and scalability often necessitates intricate and low-level programming, particularly when leveraging deterministic sparsity patterns in structur...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of ACM on programming languages Jg. 7; H. OOPSLA2; S. 686 - 715
Hauptverfasser: Cao, Huanqi, Tang, Shizhi, Zhu, Qianchao, Yu, Bowen, Chen, Wenguang
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York, NY, USA ACM 16.10.2023
Schlagworte:
ISSN:2475-1421, 2475-1421
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Partial differential equation (PDE) solvers are extensively utilized across numerous scientific and engineering fields. However, achieving high performance and scalability often necessitates intricate and low-level programming, particularly when leveraging deterministic sparsity patterns in structured grids. In this paper, we propose an innovative domain-specific language (DSL), Mat2Stencil, with its compiler, for PDE solvers on structured grids. Mat2Stencil introduces a structured sparse matrix abstraction, facilitating modular, flexible, and easy-to-use expression of solvers across a broad spectrum, encompassing components such as Jacobi or Gauss-Seidel preconditioners, incomplete LU or Cholesky decompositions, and multigrid methods built upon them. Our DSL compiler subsequently generates matrix-free code consisting of generalized stencils through multi-stage programming. The code allows spatial loop-carried dependence in the form of quasi-affine loops, in addition to the Jacobi-style stencil’s embarrassingly parallel on spatial dimensions. We further propose a novel automatic parallelization technique for the spatially dependent loops, which offers a compile-time deterministic task partitioning for threading, calculates necessary inter-thread synchronization automatically, and generates an efficient multi-threaded implementation with fine-grained synchronization. Implementing 4 benchmarking programs, 3 of them being the pseudo-applications in NAS Parallel Benchmarks with 6.3% lines of code and 1 being matrix-free High Performance Conjugate Gradients with 16.4% lines of code, we achieve up to 1.67× and on average 1.03× performance compared to manual implementations.
AbstractList Partial differential equation (PDE) solvers are extensively utilized across numerous scientific and engineering fields. However, achieving high performance and scalability often necessitates intricate and low-level programming, particularly when leveraging deterministic sparsity patterns in structured grids. In this paper, we propose an innovative domain-specific language (DSL), Mat2Stencil, with its compiler, for PDE solvers on structured grids. Mat2Stencil introduces a structured sparse matrix abstraction, facilitating modular, flexible, and easy-to-use expression of solvers across a broad spectrum, encompassing components such as Jacobi or Gauss-Seidel preconditioners, incomplete LU or Cholesky decompositions, and multigrid methods built upon them. Our DSL compiler subsequently generates matrix-free code consisting of generalized stencils through multi-stage programming. The code allows spatial loop-carried dependence in the form of quasi-affine loops, in addition to the Jacobi-style stencil’s embarrassingly parallel on spatial dimensions. We further propose a novel automatic parallelization technique for the spatially dependent loops, which offers a compile-time deterministic task partitioning for threading, calculates necessary inter-thread synchronization automatically, and generates an efficient multi-threaded implementation with fine-grained synchronization. Implementing 4 benchmarking programs, 3 of them being the pseudo-applications in NAS Parallel Benchmarks with 6.3% lines of code and 1 being matrix-free High Performance Conjugate Gradients with 16.4% lines of code, we achieve up to 1.67× and on average 1.03× performance compared to manual implementations.
Partial differential equation (PDE) solvers are extensively utilized across numerous scientific and engineering fields. However, achieving high performance and scalability often necessitates intricate and low-level programming, particularly when leveraging deterministic sparsity patterns in structured grids. In this paper, we propose an innovative domain-specific language (DSL), Mat2Stencil, with its compiler, for PDE solvers on structured grids. Mat2Stencil introduces a structured sparse matrix abstraction, facilitating modular, flexible, and easy-to-use expression of solvers across a broad spectrum, encompassing components such as Jacobi or Gauss-Seidel preconditioners, incomplete LU or Cholesky decompositions, and multigrid methods built upon them. Our DSL compiler subsequently generates matrix-free code consisting of generalized stencils through multi-stage programming. The code allows spatial loop-carried dependence in the form of quasi-affine loops, in addition to the Jacobi-style stencil’s embarrassingly parallel on spatial dimensions. We further propose a novel automatic parallelization technique for the spatially dependent loops, which offers a compile-time deterministic task partitioning for threading, calculates necessary inter-thread synchronization automatically, and generates an efficient multi-threaded implementation with fine-grained synchronization. Implementing 4 benchmarking programs, 3 of them being the pseudo-applications in NAS Parallel Benchmarks with 6.3% lines of code and 1 being matrix-free High Performance Conjugate Gradients with 16.4% lines of code, we achieve up to 1.67× and on average 1.03× performance compared to manual implementations.
ArticleNumber 246
Author Tang, Shizhi
Yu, Bowen
Zhu, Qianchao
Cao, Huanqi
Chen, Wenguang
Author_xml – sequence: 1
  givenname: Huanqi
  orcidid: 0000-0002-3870-106X
  surname: Cao
  fullname: Cao, Huanqi
  email: caohq18@mails.tsinghua.edu.cn
  organization: Tsinghua University, Beijing, China
– sequence: 2
  givenname: Shizhi
  orcidid: 0000-0002-6543-0859
  surname: Tang
  fullname: Tang, Shizhi
  email: tsz19@mails.tsinghua.edu.cn
  organization: Tsinghua University, Beijing, China
– sequence: 3
  givenname: Qianchao
  orcidid: 0009-0001-5021-2912
  surname: Zhu
  fullname: Zhu, Qianchao
  email: dysania@pku.edu.cn
  organization: Peking University, Beijing, China
– sequence: 4
  givenname: Bowen
  orcidid: 0000-0001-5537-8244
  surname: Yu
  fullname: Yu, Bowen
  email: yubowen@tsinghua.edu.cn
  organization: Tsinghua University, Beijing, China
– sequence: 5
  givenname: Wenguang
  orcidid: 0000-0002-4281-1018
  surname: Chen
  fullname: Chen, Wenguang
  email: cwg@tsinghua.edu.cn
  organization: Tsinghua University, Beijing, China / Pengcheng Laboratory, Shenzhen, China
BookMark eNptkM1LAzEQxYNUsNbi3VNunlaT7KbZ9Vb7ZaFFYfW8TPMBkXRTsqm0_70rXUXE07yZ-fF4vEvUq32tEbqm5I7SjN-nI8Zyxs5Qn2WCJzRjtPdLX6Bh07wTQmiRZnla9JFbQ2Rl1LW07gGP8dqrvYOA23Owh-QRGq3wtFxh4wOeHXbOShsx1Aovt93SofOgNX6ZznDp3YcODfY1LmPYy7gPrcciWHWFzg24Rg-7OUBv89nr5ClZPS-Wk_EqASZETHgKoih4ATkRVI04Y0ppuqEgDQdmcs2kLLjYEJMxPiIMuOA8lQAmU0JnaTpAyclXBt80QZuqzQnR-joGsK6ipPpqq-raavnbP_wu2C2E4z_kzYkEuf2Bvp-fp3Vxuw
CitedBy_id crossref_primary_10_3390_sym16020181
Cites_doi 10.1137/140968896
10.1145/2184319.2184345
10.1137/1.9780898717938
10.1007/BF01407835
10.1109/CGO.2019.8661197
10.1007/978-3-540-78800-3_24
10.1145/1989493.1989508
10.1109/IPDPSW.2017.89
10.1109/SC.2016.57
10.1007/978-3-030-47956-5_14
10.1145/1916461.1916467
10.1145/3183653
10.1038/s41592-019-0686-2
10.1177/1094342020959423
10.1007/s10766-007-0034-5
10.5194/gmd-12-1165-2019
10.1145/2517208.2517228
10.1177/1094342015593158
10.1145/1178597.1178605
10.1145/3519939.3523448
10.1109/TPDS.2016.2615094
10.1177/109434209100500306
10.1145/1375581.1375595
10.13140/RG.2.2.28998.68169
10.1145/3458817.3476158
10.1145/3314221.3314615
10.1145/1250734.1250761
10.1145/7902.7904
10.1007/978-3-642-15582-6_49
10.5194/gmd-12-4729-2019
10.1109/SC.2008.5222004
10.1109/SC.2010.2
10.5281/zenodo.8149701
10.1137/1.9780898718003
10.1109/SC.2016.5
10.1145/331532.331562
10.1145/2584665
10.1145/2896389
10.1145/3579990.3580006
10.1201/b10376-8
10.1111/j.1365-2478.1983.tb01060.x
10.1145/3278122.3278139
ContentType Journal Article
Copyright Owner/Author
Copyright_xml – notice: Owner/Author
DBID AAYXX
CITATION
DOI 10.1145/3622822
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2475-1421
EndPage 715
ExternalDocumentID 10_1145_3622822
3622822
GroupedDBID AAKMM
AAYFX
ACM
ADPZR
AIKLT
ALMA_UNASSIGNED_HOLDINGS
GUFHI
LHSKQ
M~E
OK1
ROL
AAYXX
AEFXT
AEJOY
AKRVB
CITATION
ID FETCH-LOGICAL-a277t-53a79959a8071d6522dde1b1acf5a2f8e2cc957b0f425602a57553caaf4d7e433
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001087279100026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2475-1421
IngestDate Sun Nov 09 14:46:52 EST 2025
Tue Nov 18 21:53:15 EST 2025
Fri Feb 21 01:29:13 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue OOPSLA2
Keywords domain-specific language
polyhedral compilation
multi-stage programming
structured grid
performance optimization
compiler
finite difference method
stencil
Language English
License This work is licensed under a Creative Commons Attribution 4.0 International License.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a277t-53a79959a8071d6522dde1b1acf5a2f8e2cc957b0f425602a57553caaf4d7e433
ORCID 0000-0002-6543-0859
0009-0001-5021-2912
0000-0002-3870-106X
0000-0002-4281-1018
0000-0001-5537-8244
OpenAccessLink https://dl.acm.org/doi/10.1145/3622822
PageCount 30
ParticipantIDs crossref_citationtrail_10_1145_3622822
crossref_primary_10_1145_3622822
acm_primary_3622822
PublicationCentury 2000
PublicationDate 2023-10-16
PublicationDateYYYYMMDD 2023-10-16
PublicationDate_xml – month: 10
  year: 2023
  text: 2023-10-16
  day: 16
PublicationDecade 2020
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationTitle Proceedings of ACM on programming languages
PublicationTitleAbbrev ACM PACMPL
PublicationYear 2023
Publisher ACM
Publisher_xml – name: ACM
References Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, and Pradeep Dubey. 2010. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs. In Conference on High Performance Computing Networking, Storage and Analysis, SC 2010, New Orleans, LA, USA, November 13-19, 2010. IEEE, 1–13. https://doi.org/10.1109/SC.2010.2 10.1109/SC.2010.2
Xiaoye S. Li and Meiyue Shao. 2011. A Supernodal Approach to Incomplete LU Factorization with Partial Pivoting. ACM Trans. Math. Softw., 37, 4 (2011), 43:1–43:20. https://doi.org/10.1145/1916461.1916467 10.1145/1916461.1916467
Samuel Williams, Nathan Bell, Jee Whan Choi, Michael Garland, Leonid Oliker, and Richard Vu. 2010. Sparse Matrix-Vector Multiplication on Multicore and Accelerators. In Scientific Computing with Multicore and Accelerators, Jakub Kurzak, David A. Bader, and Jack J. Dongarra (Eds.). CRC Press / Taylor & Francis, 83–109. https://doi.org/10.1201/b10376-8 10.1201/b10376-8
Georg Ofenbeck, Tiark Rompf, Alen Stojanov, Martin Odersky, and Markus Püschel. 2013. Spiral in scala: towards the systematic construction of generators for performance libraries. 125–134. https://doi.org/10.1145/2517208.2517228 10.1145/2517208.2517228
Uday Bondhugula, Aravind Acharya, and Albert Cohen. 2016. The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests. ACM Trans. Program. Lang. Syst., 38, 3 (2016), 12:1–12:32. https://doi.org/10.1145/2896389 10.1145/2896389
Sven Verdoolaege and Gerda Janssens. 2017. Scheduling for PPCG. https://doi.org/10.13140/RG.2.2.28998.68169 10.13140/RG.2.2.28998.68169
David A. Padua and Michael Wolfe. 1986. Advanced Compiler Optimizations for Supercomputers. Commun. ACM, 29, 12 (1986), 1184–1201. https://doi.org/10.1145/7902.7904 10.1145/7902.7904
John C. Strikwerda. 2004. Finite Difference Schemes and Partial Differential Equations, Second Edition. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898717938 arxiv:https://epubs.siam.org/doi/pdf/10.1137/1.9780898717938. 10.1137/1.9780898717938
Ali Pinar and Michael T. Heath. 1999. Improving Performance of Sparse Matrix-Vector Multiplication. In Proceedings of the ACM/IEEE Conference on Supercomputing, SC 1999, November 13-19, 1999, Portland, Oregon, USA. ACM, 30. https://doi.org/10.1145/331532.331562 10.1145/331532.331562
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, June 7-13, 2008, Rajiv Gupta and Saman P. Amarasinghe (Eds.). ACM, 101–113. https://doi.org/10.1145/1375581.1375595 10.1145/1375581.1375595
Shoaib Kamil, Kaushik Datta, Samuel Williams, Leonid Oliker, John Shalf, and Katherine A. Yelick. 2006. Implicit and explicit optimizations for stencil computations. In Proceedings of the 2006 workshop on Memory System Performance and Correctness, San Jose, California, USA, October 11, 2006, Antony L. Hosking and Ali-Reza Adl-Tabatabai (Eds.). ACM, 51–60. https://doi.org/10.1145/1178597.1178605 10.1145/1178597.1178605
OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774.
Leonardo Mendonça de Moura and Nikolaj S. Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, C. R. Ramakrishnan and Jakob Rehof (Eds.) (Lecture Notes in Computer Science, Vol. 4963). Springer, 337–340. https://doi.org/10.1007/978-3-540-78800-3_24 10.1007/978-3-540-78800-3_24
M. Louboutin, M. Lange, F. Luporini, N. Kukreja, P. A. Witte, F. J. Herrmann, P. Velesko, and G. J. Gorman. 2019. Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration. Geoscientific Model Development, 12, 3 (2019), 1165–1187. https://doi.org/10.5194/gmd-12-1165-2019 10.5194/gmd-12-1165-2019
Andreas Pieper, Georg Hager, and Holger Fehske. 2021. A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials. Int. J. High Perform. Comput. Appl., 35, 1 (2021), https://doi.org/10.1177/1094342020959423 10.1177/1094342020959423
Nathan Zhang, Michael B. Driscoll, Charles Markley, Samuel Williams, Protonu Basu, and Armando Fox. 2017. Snowflake: A Lightweight Portable Stencil DSL. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops 2017, Orlando / Buena Vista, FL, USA, May 29 - June 2, 2017. IEEE Computer Society, 795–804. https://doi.org/10.1109/IPDPSW.2017.89 10.1109/IPDPSW.2017.89
Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman P. Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2019, Washington, DC, USA, February 16-20, 2019, Mahmut Taylan Kandemir, Alexandra Jimborean, and Tipp Moseley (Eds.). IEEE, 193–205. https://doi.org/10.1109/CGO.2019.8661197 10.1109/CGO.2019.8661197
Shizhi Tang, Jidong Zhai, Haojie Wang, Lin Jiang, Liyan Zheng, Zhenhao Yuan, and Chen Zhang. 2022. FreeTensor: a free-form DSL with holistic optimizations for irregular tensor programs. In PLDI ’22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13 - 17, 2022, Ranjit Jhala and Isil Dillig (Eds.). ACM, 872–887. https://doi.org/10.1145/3519939.3523448 10.1145/3519939.3523448
Randy Allen and Ken Kennedy. 2001. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco, CA, USA. isbn:1-55860-286-0
Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The pochoir stencil compiler. In SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, San Jose, CA, USA, June 4-6, 2011 (Co-located with FCRC 2011), Rajmohan Rajaraman and Friedhelm Meyer auf der Heide (Eds.). ACM, 117–128. https://doi.org/10.1145/1989493.1989508 10.1145/1989493.1989508
Mohamed Essadki, Bertrand Michel, Bruno Maugars, Oleksandr Zinenko, Nicolas Vasilache, and Albert Cohen. 2023. Code Generation for In-Place Stencils. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2023, Montréal, QC, Canada, 25 February 2023- 1 March 2023, Christophe Dubach, Derek Bruening, and Ben Hardekopf (Eds.). ACM, 2–13. https://doi.org/10.1145/3579990.3580006 10.1145/3579990.3580006
X. Huang, X. Huang, D. Wang, Q. Wu, Y. Li, S. Zhang, Y. Chen, M. Wang, Y. Gao, Q. Tang, Y. Chen, Z. Fang, Z. Song, and G. Yang. 2019. OpenArray v1.0: a simple operator library for the decoupling of ocean modeling and parallel computing. Geoscientific Model Development, 12, 11 (2019), 4729–4749. https://doi.org/10.5194/gmd-12-4729-2019 10.5194/gmd-12-4729-2019
Qianchao Zhu, Hao Luo, Chao Yang, Mingshuo Ding, Wanwang Yin, and Xinhui Yuan. 2021. Enabling and scaling the HPCG benchmark on the newest generation Sunway supercomputer with 42 million heterogeneous cores. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021, St. Louis, Missouri, USA, November 14-19, 2021, Bronis R. de Supinski, Mary W. Hall, and Todd Gamblin (Eds.). ACM, 57. https://doi.org/10.1145/3458817.3476158 10.1145/3458817.3476158
Edmond Chow and Aftab Patel. 2015. Fine-Grained Parallel Incomplete LU Factorization. SIAM J. Sci. Comput., 37, 2 (2015), https://doi.org/10.1137/140968896 10.1137/140968896
James Decker. 2019. Implementation of Lightweight Modular Staging (LMS) in Python. https://github.com/jmd1011/snek-LMS
Yousef Saad. 2003. Iterative methods for sparse linear systems. SIAM. isbn:978-0-89871-534-7 https://doi.org/10.1137/1.9780898718003 10.1137/1.9780898718003
C. Skamarock, BogumiŁ a Klemp, Jimy Dudhia, O. Gill, Zhiquan Liu, Judith Berner, Wei Wang, G. Powers, Greg Duda, Dale M. Barker, and Xiangyu Huang. 2019. A Description of the Advanced Research WRF Model Version 4.
Nicolas Stucki, Aggelos Biboudis, and Martin Odersky. 2018. A practical unification of multi-stage programming and macros. In Proceedings of the 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2018, Boston, MA, USA, November 5-6, 2018, Eric Van Wyk and Tiark Rompf (Eds.). ACM, 14–27. https://doi.org/10.1145/3278122.3278139 10.1145/3278122.3278139
Sriram Krishnamoorthy, Muthu Manikandan Baskaran, Uday Bondhugula, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2007. Effective automatic parallelization of stencil computations. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007, Jeanne Ferrante and Kathryn S. McKinley (Eds.). ACM, 235–244. https://doi.org/10.1145/1250734.1250761 10.1145/1250734.1250761
Tiark Rompf and Martin Odersky. 2012. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. Commun. ACM, 55, 6 (2012), 121–130. https://doi.org/10.1145/2184319.2184345 10.1145/2184319.2184345
Walid Taha. 1999. Multi-Stage Programming: Its Theory and Applications. Ph. D. Dissertation. Halmstad University, Sweden. https://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-15052
Huanqi Cao. 2023. Artifact of Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid. https://doi.org/10.5281/zenodo.8149701 10.5281/zenodo.8149701
Christian Lengauer, Sven Apel, Matthias Bolten, Shigeru Chiba, Ulrich Rüde, Jürgen Teich, Armin Größ linger, Frank Hannig, Harald Köstler, Lisa Claus, Alexander Grebhahn, Stefan Groth
Skamarock C. (e_1_2_1_35_1) 2019
Taha Walid (e_1_2_1_39_1)
e_1_2_1_42_1
e_1_2_1_20_1
e_1_2_1_41_1
e_1_2_1_40_1
e_1_2_1_23_1
e_1_2_1_46_1
e_1_2_1_24_1
e_1_2_1_45_1
e_1_2_1_21_1
e_1_2_1_44_1
e_1_2_1_22_1
e_1_2_1_43_1
e_1_2_1_27_1
e_1_2_1_49_1
e_1_2_1_25_1
e_1_2_1_48_1
e_1_2_1_26_1
e_1_2_1_47_1
e_1_2_1_29_1
Allen Randy (e_1_2_1_1_1)
e_1_2_1_7_1
e_1_2_1_31_1
e_1_2_1_8_1
e_1_2_1_30_1
e_1_2_1_5_1
e_1_2_1_6_1
e_1_2_1_3_1
e_1_2_1_12_1
e_1_2_1_4_1
e_1_2_1_13_1
e_1_2_1_34_1
e_1_2_1_10_1
e_1_2_1_33_1
e_1_2_1_2_1
e_1_2_1_11_1
e_1_2_1_32_1
e_1_2_1_16_1
e_1_2_1_17_1
e_1_2_1_38_1
e_1_2_1_14_1
e_1_2_1_37_1
e_1_2_1_15_1
e_1_2_1_36_1
e_1_2_1_9_1
e_1_2_1_18_1
e_1_2_1_19_1
References_xml – reference: Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. 2019. Generating piecewise-regular code from irregular structures. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019, Kathryn S. McKinley and Kathleen Fisher (Eds.). ACM, New York, NY, USA. 625–639. https://doi.org/10.1145/3314221.3314615 10.1145/3314221.3314615
– reference: Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17 (2020), 261–272. https://doi.org/10.1038/s41592-019-0686-2 10.1038/s41592-019-0686-2
– reference: Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages. ACM Trans. Embed. Comput. Syst., 13, 4s (2014), 134:1–134:25. https://doi.org/10.1145/2584665 10.1145/2584665
– reference: Walid Taha. 1999. Multi-Stage Programming: Its Theory and Applications. Ph. D. Dissertation. Halmstad University, Sweden. https://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-15052
– reference: James Decker. 2019. Implementation of Lightweight Modular Staging (LMS) in Python. https://github.com/jmd1011/snek-LMS
– reference: Yousef Saad. 2003. Iterative methods for sparse linear systems. SIAM. isbn:978-0-89871-534-7 https://doi.org/10.1137/1.9780898718003 10.1137/1.9780898718003
– reference: Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman P. Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2019, Washington, DC, USA, February 16-20, 2019, Mahmut Taylan Kandemir, Alexandra Jimborean, and Tipp Moseley (Eds.). IEEE, 193–205. https://doi.org/10.1109/CGO.2019.8661197 10.1109/CGO.2019.8661197
– reference: Randy Allen and Ken Kennedy. 2001. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco, CA, USA. isbn:1-55860-286-0
– reference: Samuel Williams, Nathan Bell, Jee Whan Choi, Michael Garland, Leonid Oliker, and Richard Vu. 2010. Sparse Matrix-Vector Multiplication on Multicore and Accelerators. In Scientific Computing with Multicore and Accelerators, Jakub Kurzak, David A. Bader, and Jack J. Dongarra (Eds.). CRC Press / Taylor & Francis, 83–109. https://doi.org/10.1201/b10376-8 10.1201/b10376-8
– reference: OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774.
– reference: John C. Strikwerda. 2004. Finite Difference Schemes and Partial Differential Equations, Second Edition. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898717938 arxiv:https://epubs.siam.org/doi/pdf/10.1137/1.9780898717938. 10.1137/1.9780898717938
– reference: Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The pochoir stencil compiler. In SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, San Jose, CA, USA, June 4-6, 2011 (Co-located with FCRC 2011), Rajmohan Rajaraman and Friedhelm Meyer auf der Heide (Eds.). ACM, 117–128. https://doi.org/10.1145/1989493.1989508 10.1145/1989493.1989508
– reference: Mohamed Essadki, Bertrand Michel, Bruno Maugars, Oleksandr Zinenko, Nicolas Vasilache, and Albert Cohen. 2023. Code Generation for In-Place Stencils. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2023, Montréal, QC, Canada, 25 February 2023- 1 March 2023, Christophe Dubach, Derek Bruening, and Ben Hardekopf (Eds.). ACM, 2–13. https://doi.org/10.1145/3579990.3580006 10.1145/3579990.3580006
– reference: Chao Yang, Wei Xue, Haohuan Fu, Hongtao You, Xinliang Wang, Yulong Ao, Fangfang Liu, Lin Gan, Ping Xu, Lanning Wang, Guangwen Yang, and Weimin Zheng. 2016. 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, Salt Lake City, UT, USA, November 13-18, 2016, John West and Cherri M. Pancake (Eds.). IEEE Computer Society, 57–68. https://doi.org/10.1109/SC.2016.5 10.1109/SC.2016.5
– reference: Uday Bondhugula, Aravind Acharya, and Albert Cohen. 2016. The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests. ACM Trans. Program. Lang. Syst., 38, 3 (2016), 12:1–12:32. https://doi.org/10.1145/2896389 10.1145/2896389
– reference: Andreas Pieper, Georg Hager, and Holger Fehske. 2021. A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials. Int. J. High Perform. Comput. Appl., 35, 1 (2021), https://doi.org/10.1177/1094342020959423 10.1177/1094342020959423
– reference: Leonardo Mendonça de Moura and Nikolaj S. Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, C. R. Ramakrishnan and Jakob Rehof (Eds.) (Lecture Notes in Computer Science, Vol. 4963). Springer, 337–340. https://doi.org/10.1007/978-3-540-78800-3_24 10.1007/978-3-540-78800-3_24
– reference: Tiark Rompf and Martin Odersky. 2012. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. Commun. ACM, 55, 6 (2012), 121–130. https://doi.org/10.1145/2184319.2184345 10.1145/2184319.2184345
– reference: Jack J. Dongarra, Michael A. Heroux, and Piotr Luszczek. 2016. High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems. Int. J. High Perform. Comput. Appl., 30, 1 (2016), 3–10. https://doi.org/10.1177/1094342015593158 10.1177/1094342015593158
– reference: Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine A. Yelick. 2007. Scientific Computing Kernels on the Cell Processor. Int. J. Parallel Program., 35, 3 (2007), 263–298. https://doi.org/10.1007/s10766-007-0034-5 10.1007/s10766-007-0034-5
– reference: Johannes Habich, T. Zeiser, Georg Hager, and Gerhard Wellein. 2009. Enabling temporal blocking for a lattice Boltzmann flow solver through multicore-aware wavefront parallelization.
– reference: David H. Bailey, Eric Barszcz, John T. Barton, D. S. Browning, Robert L. Carter, Leonardo Dagum, Rod A. Fatoohi, Paul O. Frederickson, T. A. Lasinski, Robert Schreiber, Horst D. Simon, V. Venkatakrishnan, and Sisira Weeratunga. 1991. The Nas Parallel Benchmarks. Int. J. High Perform. Comput. Appl., 5, 3 (1991), 63–73. https://doi.org/10.1177/109434209100500306 10.1177/109434209100500306
– reference: Qianchao Zhu, Hao Luo, Chao Yang, Mingshuo Ding, Wanwang Yin, and Xinhui Yuan. 2021. Enabling and scaling the HPCG benchmark on the newest generation Sunway supercomputer with 42 million heterogeneous cores. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021, St. Louis, Missouri, USA, November 14-19, 2021, Bronis R. de Supinski, Mary W. Hall, and Todd Gamblin (Eds.). ACM, 57. https://doi.org/10.1145/3458817.3476158 10.1145/3458817.3476158
– reference: Sriram Krishnamoorthy, Muthu Manikandan Baskaran, Uday Bondhugula, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2007. Effective automatic parallelization of stencil computations. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007, Jeanne Ferrante and Kathryn S. McKinley (Eds.). ACM, 235–244. https://doi.org/10.1145/1250734.1250761 10.1145/1250734.1250761
– reference: C. Skamarock, BogumiŁ a Klemp, Jimy Dudhia, O. Gill, Zhiquan Liu, Judith Berner, Wei Wang, G. Powers, Greg Duda, Dale M. Barker, and Xiangyu Huang. 2019. A Description of the Advanced Research WRF Model Version 4.
– reference: Shoaib Kamil, Kaushik Datta, Samuel Williams, Leonid Oliker, John Shalf, and Katherine A. Yelick. 2006. Implicit and explicit optimizations for stencil computations. In Proceedings of the 2006 workshop on Memory System Performance and Correctness, San Jose, California, USA, October 11, 2006, Antony L. Hosking and Ali-Reza Adl-Tabatabai (Eds.). ACM, 51–60. https://doi.org/10.1145/1178597.1178605 10.1145/1178597.1178605
– reference: Shizhi Tang, Jidong Zhai, Haojie Wang, Lin Jiang, Liyan Zheng, Zhenhao Yuan, and Chen Zhang. 2022. FreeTensor: a free-form DSL with holistic optimizations for irregular tensor programs. In PLDI ’22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13 - 17, 2022, Ranjit Jhala and Isil Dillig (Eds.). ACM, 872–887. https://doi.org/10.1145/3519939.3523448 10.1145/3519939.3523448
– reference: Nathan Zhang, Michael B. Driscoll, Charles Markley, Samuel Williams, Protonu Basu, and Armando Fox. 2017. Snowflake: A Lightweight Portable Stencil DSL. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops 2017, Orlando / Buena Vista, FL, USA, May 29 - June 2, 2017. IEEE Computer Society, 795–804. https://doi.org/10.1109/IPDPSW.2017.89 10.1109/IPDPSW.2017.89
– reference: Georg Ofenbeck, Tiark Rompf, Alen Stojanov, Martin Odersky, and Markus Püschel. 2013. Spiral in scala: towards the systematic construction of generators for performance libraries. 125–134. https://doi.org/10.1145/2517208.2517228 10.1145/2517208.2517228
– reference: Edmond Chow and Aftab Patel. 2015. Fine-Grained Parallel Incomplete LU Factorization. SIAM J. Sci. Comput., 37, 2 (2015), https://doi.org/10.1137/140968896 10.1137/140968896
– reference: Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, and Pradeep Dubey. 2010. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs. In Conference on High Performance Computing Networking, Storage and Analysis, SC 2010, New Orleans, LA, USA, November 13-19, 2010. IEEE, 1–13. https://doi.org/10.1109/SC.2010.2 10.1109/SC.2010.2
– reference: Sven Verdoolaege. 2010. isl: An Integer Set Library for the Polyhedral Model. In Mathematical Software - ICMS 2010, Third International Congress on Mathematical Software, Kobe, Japan, September 13-17, 2010. Proceedings, Komei Fukuda, Joris van der Hoeven, Michael Joswig, and Nobuki Takayama (Eds.) (Lecture Notes in Computer Science, Vol. 6327). Springer, 299–302. https://doi.org/10.1007/978-3-642-15582-6_49 10.1007/978-3-642-15582-6_49
– reference: Ali Pinar and Michael T. Heath. 1999. Improving Performance of Sparse Matrix-Vector Multiplication. In Proceedings of the ACM/IEEE Conference on Supercomputing, SC 1999, November 13-19, 1999, Portland, Oregon, USA. ACM, 30. https://doi.org/10.1145/331532.331562 10.1145/331532.331562
– reference: George Mcmechan. 2006. Migration by extrapolation of time-dependent boundary values. Geophysical Prospecting, 31 (2006), 04, 413 – 420. https://doi.org/10.1111/j.1365-2478.1983.tb01060.x 10.1111/j.1365-2478.1983.tb01060.x
– reference: David A. Padua and Michael Wolfe. 1986. Advanced Compiler Optimizations for Supercomputers. Commun. ACM, 29, 12 (1986), 1184–1201. https://doi.org/10.1145/7902.7904 10.1145/7902.7904
– reference: Nicolas Stucki, Aggelos Biboudis, and Martin Odersky. 2018. A practical unification of multi-stage programming and macros. In Proceedings of the 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2018, Boston, MA, USA, November 5-6, 2018, Eric Van Wyk and Tiark Rompf (Eds.). ACM, 14–27. https://doi.org/10.1145/3278122.3278139 10.1145/3278122.3278139
– reference: Duane Merrill and Michael Garland. 2016. Merge-based parallel sparse matrix-vector multiplication. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, Salt Lake City, UT, USA, November 13-18, 2016, John West and Cherri M. Pancake (Eds.). IEEE Computer Society, 678–689. https://doi.org/10.1109/SC.2016.57 10.1109/SC.2016.57
– reference: Intel. 2023. Intel oneAPI Math Kernel Library. https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html
– reference: Xiaoye S. Li and Meiyue Shao. 2011. A Supernodal Approach to Incomplete LU Factorization with Partial Pivoting. ACM Trans. Math. Softw., 37, 4 (2011), 43:1–43:20. https://doi.org/10.1145/1916461.1916467 10.1145/1916461.1916467
– reference: M. Louboutin, M. Lange, F. Luporini, N. Kukreja, P. A. Witte, F. J. Herrmann, P. Velesko, and G. J. Gorman. 2019. Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration. Geoscientific Model Development, 12, 3 (2019), 1165–1187. https://doi.org/10.5194/gmd-12-1165-2019 10.5194/gmd-12-1165-2019
– reference: Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, June 7-13, 2008, Rajiv Gupta and Saman P. Amarasinghe (Eds.). ACM, 101–113. https://doi.org/10.1145/1375581.1375595 10.1145/1375581.1375595
– reference: Huanqi Cao. 2023. Artifact of Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid. https://doi.org/10.5281/zenodo.8149701 10.5281/zenodo.8149701
– reference: Amir Shaikhha, Yannis Klonatos, and Christoph Koch. 2018. Building Efficient Query Engines in a High-Level Language. ACM Trans. Database Syst., 43, 1 (2018), 4:1–4:45. https://doi.org/10.1145/3183653 10.1145/3183653
– reference: X. Huang, X. Huang, D. Wang, Q. Wu, Y. Li, S. Zhang, Y. Chen, M. Wang, Y. Gao, Q. Tang, Y. Chen, Z. Fang, Z. Song, and G. Yang. 2019. OpenArray v1.0: a simple operator library for the decoupling of ocean modeling and parallel computing. Geoscientific Model Development, 12, 11 (2019), 4729–4749. https://doi.org/10.5194/gmd-12-4729-2019 10.5194/gmd-12-4729-2019
– reference: Christian Lengauer, Sven Apel, Matthias Bolten, Shigeru Chiba, Ulrich Rüde, Jürgen Teich, Armin Größ linger, Frank Hannig, Harald Köstler, Lisa Claus, Alexander Grebhahn, Stefan Groth, Stefan Kronawitter, Sebastian Kuckuk, Hannah Rittich, Christian Schmitt, and Jonas Schmitt. 2020. ExaStencils: Advanced Multigrid Solver Generation. In Software for Exascale Computing - SPPEXA 2016-2019, Hans-Joachim Bungartz, Severin Reiz, Benjamin Uekermann, Philipp Neumann, and Wolfgang E. Nagel (Eds.) (Lecture Notes in Computational Science and Engineering, Vol. 136). Springer, 405–452. https://doi.org/10.1007/978-3-030-47956-5_14 10.1007/978-3-030-47956-5_14
– reference: Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David A. Patterson, John Shalf, and Katherine A. Yelick. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the ACM/IEEE Conference on High Performance Computing, SC 2008, November 15-21, 2008, Austin, Texas, USA. IEEE/ACM, 4. https://doi.org/10.1109/SC.2008.5222004 10.1109/SC.2008.5222004
– reference: Sven Verdoolaege and Gerda Janssens. 2017. Scheduling for PPCG. https://doi.org/10.13140/RG.2.2.28998.68169 10.13140/RG.2.2.28998.68169
– reference: Paul Feautrier. 1992. Some efficient solutions to the affine scheduling problem. I. One-dimensional time. Int. J. Parallel Program., 21, 5 (1992), 313–347. https://doi.org/10.1007/BF01407835 10.1007/BF01407835
– reference: Uday Bondhugula, Vinayaka Bandishti, and Irshad Pananilath. 2017. Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations. IEEE Trans. Parallel Distributed Syst., 28, 5 (2017), 1285–1298. https://doi.org/10.1109/TPDS.2016.2615094 10.1109/TPDS.2016.2615094
– ident: e_1_2_1_9_1
  doi: 10.1137/140968896
– ident: e_1_2_1_32_1
  doi: 10.1145/2184319.2184345
– ident: e_1_2_1_36_1
  doi: 10.1137/1.9780898717938
– ident: e_1_2_1_15_1
  doi: 10.1007/BF01407835
– ident: e_1_2_1_3_1
  doi: 10.1109/CGO.2019.8661197
– ident: e_1_2_1_11_1
  doi: 10.1007/978-3-540-78800-3_24
– ident: e_1_2_1_16_1
– ident: e_1_2_1_41_1
  doi: 10.1145/1989493.1989508
– ident: e_1_2_1_48_1
  doi: 10.1109/IPDPSW.2017.89
– ident: e_1_2_1_25_1
  doi: 10.1109/SC.2016.57
– ident: e_1_2_1_21_1
  doi: 10.1007/978-3-030-47956-5_14
– ident: e_1_2_1_22_1
  doi: 10.1145/1916461.1916467
– ident: e_1_2_1_34_1
  doi: 10.1145/3183653
– ident: e_1_2_1_44_1
  doi: 10.1038/s41592-019-0686-2
– volume-title: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann
  ident: e_1_2_1_1_1
– ident: e_1_2_1_30_1
  doi: 10.1177/1094342020959423
– ident: e_1_2_1_46_1
  doi: 10.1007/s10766-007-0034-5
– ident: e_1_2_1_23_1
  doi: 10.5194/gmd-12-1165-2019
– ident: e_1_2_1_18_1
– ident: e_1_2_1_27_1
  doi: 10.1145/2517208.2517228
– ident: e_1_2_1_13_1
  doi: 10.1177/1094342015593158
– ident: e_1_2_1_19_1
  doi: 10.1145/1178597.1178605
– ident: e_1_2_1_40_1
  doi: 10.1145/3519939.3523448
– ident: e_1_2_1_6_1
  doi: 10.1109/TPDS.2016.2615094
– ident: e_1_2_1_12_1
– ident: e_1_2_1_4_1
  doi: 10.1177/109434209100500306
– ident: e_1_2_1_7_1
  doi: 10.1145/1375581.1375595
– ident: e_1_2_1_43_1
  doi: 10.13140/RG.2.2.28998.68169
– ident: e_1_2_1_49_1
  doi: 10.1145/3458817.3476158
– ident: e_1_2_1_2_1
  doi: 10.1145/3314221.3314615
– ident: e_1_2_1_20_1
  doi: 10.1145/1250734.1250761
– ident: e_1_2_1_29_1
  doi: 10.1145/7902.7904
– ident: e_1_2_1_42_1
  doi: 10.1007/978-3-642-15582-6_49
– ident: e_1_2_1_17_1
  doi: 10.5194/gmd-12-4729-2019
– ident: e_1_2_1_10_1
  doi: 10.1109/SC.2008.5222004
– ident: e_1_2_1_26_1
  doi: 10.1109/SC.2010.2
– ident: e_1_2_1_8_1
  doi: 10.5281/zenodo.8149701
– ident: e_1_2_1_33_1
  doi: 10.1137/1.9780898718003
– ident: e_1_2_1_47_1
  doi: 10.1109/SC.2016.5
– ident: e_1_2_1_31_1
  doi: 10.1145/331532.331562
– ident: e_1_2_1_38_1
  doi: 10.1145/2584665
– ident: e_1_2_1_5_1
  doi: 10.1145/2896389
– volume-title: Jimy Dudhia, O. Gill, Zhiquan Liu, Judith Berner, Wei Wang, G. Powers, Greg Duda, Dale M. Barker, and Xiangyu Huang.
  year: 2019
  ident: e_1_2_1_35_1
– ident: e_1_2_1_14_1
  doi: 10.1145/3579990.3580006
– ident: e_1_2_1_45_1
  doi: 10.1201/b10376-8
– ident: e_1_2_1_24_1
  doi: 10.1111/j.1365-2478.1983.tb01060.x
– volume-title: Multi-Stage Programming: Its Theory and Applications. Ph. D. Dissertation
  ident: e_1_2_1_39_1
– ident: e_1_2_1_37_1
  doi: 10.1145/3278122.3278139
SSID ssj0001934839
Score 2.2562923
Snippet Partial differential equation (PDE) solvers are extensively utilized across numerous scientific and engineering fields. However, achieving high performance and...
SourceID crossref
acm
SourceType Enrichment Source
Index Database
Publisher
StartPage 686
SubjectTerms Computing methodologies
Domain specific languages
Parallel programming languages
Shared memory algorithms
Software and its engineering
Source code generation
SubjectTermsDisplay Computing methodologies -- Parallel programming languages
Computing methodologies -- Shared memory algorithms
Software and its engineering -- Domain specific languages
Software and its engineering -- Source code generation
Title Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid
URI https://dl.acm.org/doi/10.1145/3622822
Volume 7
WOSCitedRecordID wos001087279100026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2475-1421
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001934839
  issn: 2475-1421
  databaseCode: M~E
  dateStart: 20170101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6FwoELjwJqykN7QFwiQ7xee21uadpSpKYNSpEqLtHaWSuWErukSYk48Jf4i8zswzEBCThwWcWjXSva-bwzHn8zQ8hLIQUYyUR5PBchDD73pEqFF_GuZDmXLMty3WxCnJ3Fl5fJsNX67nJhbmaiLOP1Orn6r6oGGSgbU2f_Qd31TUEAv0HpMILaYfwrxQ_kko3QEy5mJu18UE0013SA1fjX3gHYrUnncHSqGYbIwSuywvDM38_thZ16vFCqMzw86owqJFDrLwsjXXB2hbT1d4vipzafw9oYan5Irz_ABZYBNseYhIuO1o58X-pQ7clKlp-LTRTBxrCnxddpLf00XaHwA0ZhprKqjystPai-2Jw2G8FgmgtnEizNQcc4wMTnJlP6tfqNzJ7UogHI8_Ph6LTHGmdv5Gpq6ythskR_tRAci2mA2Ub-7MYIug__W7axZiya_O1wbBfeIreZAIwjV_RbI6iXBDzW3evq_28StXHtG7sW_aBs3vCDGg7NxQNyz76J0J5B0EPSUuUuue-6fFB76D8iswag3tIetXCiTThRgBMFOFEHJwpwog5OtAEnCnCiFk60KukGThTh9Jh8PD666J94tkeHJ5kQSy8MJJYUTGQMvuokAm8e7KWf-jLLQ3jYYwUPexKKtJtzdK6ZhNeDMMikzPlEKB4ET8hOWZVqj9CARyrK4lT6acSTtJuKPEpjwXzpo5eftMku7Nv4ylRhcZpok1duH8eZLWuP3VVm4y2VtQmtJ7p7bE3Z__OUp-TuBsLPyA5sknpO7mQ3y-J68UKj4QeZuo0N
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Mat2Stencil%3A+A+Modular+Matrix-Based+DSL+for+Explicit+and+Implicit+Matrix-Free+PDE+Solvers+on+Structured+Grid&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Cao%2C+Huanqi&rft.au=Tang%2C+Shizhi&rft.au=Zhu%2C+Qianchao&rft.au=Yu%2C+Bowen&rft.date=2023-10-16&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=7&rft.issue=OOPSLA2&rft.spage=686&rft.epage=715&rft_id=info:doi/10.1145%2F3622822&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3622822
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon