Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid

Partial differential equation (PDE) solvers are extensively utilized across numerous scientific and engineering fields. However, achieving high performance and scalability often necessitates intricate and low-level programming, particularly when leveraging deterministic sparsity patterns in structur...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of ACM on programming languages Jg. 7; H. OOPSLA2; S. 686 - 715
Hauptverfasser:	Cao, Huanqi, Tang, Shizhi, Zhu, Qianchao, Yu, Bowen, Chen, Wenguang
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York, NY, USA ACM 16.10.2023
Schlagworte:	Computing methodologies Domain specific languages Parallel programming languages Shared memory algorithms Software and its engineering Source code generation domain-specific language polyhedral compilation multi-stage programming structured grid performance optimization compiler finite difference method stencil
ISSN:	2475-1421, 2475-1421
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	Partial differential equation (PDE) solvers are extensively utilized across numerous scientific and engineering fields. However, achieving high performance and scalability often necessitates intricate and low-level programming, particularly when leveraging deterministic sparsity patterns in structured grids. In this paper, we propose an innovative domain-specific language (DSL), Mat2Stencil, with its compiler, for PDE solvers on structured grids. Mat2Stencil introduces a structured sparse matrix abstraction, facilitating modular, flexible, and easy-to-use expression of solvers across a broad spectrum, encompassing components such as Jacobi or Gauss-Seidel preconditioners, incomplete LU or Cholesky decompositions, and multigrid methods built upon them. Our DSL compiler subsequently generates matrix-free code consisting of generalized stencils through multi-stage programming. The code allows spatial loop-carried dependence in the form of quasi-affine loops, in addition to the Jacobi-style stencil’s embarrassingly parallel on spatial dimensions. We further propose a novel automatic parallelization technique for the spatially dependent loops, which offers a compile-time deterministic task partitioning for threading, calculates necessary inter-thread synchronization automatically, and generates an efficient multi-threaded implementation with fine-grained synchronization. Implementing 4 benchmarking programs, 3 of them being the pseudo-applications in NAS Parallel Benchmarks with 6.3% lines of code and 1 being matrix-free High Performance Conjugate Gradients with 16.4% lines of code, we achieve up to 1.67× and on average 1.03× performance compared to manual implementations.
AbstractList	Partial differential equation (PDE) solvers are extensively utilized across numerous scientific and engineering fields. However, achieving high performance and scalability often necessitates intricate and low-level programming, particularly when leveraging deterministic sparsity patterns in structured grids. In this paper, we propose an innovative domain-specific language (DSL), Mat2Stencil, with its compiler, for PDE solvers on structured grids. Mat2Stencil introduces a structured sparse matrix abstraction, facilitating modular, flexible, and easy-to-use expression of solvers across a broad spectrum, encompassing components such as Jacobi or Gauss-Seidel preconditioners, incomplete LU or Cholesky decompositions, and multigrid methods built upon them. Our DSL compiler subsequently generates matrix-free code consisting of generalized stencils through multi-stage programming. The code allows spatial loop-carried dependence in the form of quasi-affine loops, in addition to the Jacobi-style stencil’s embarrassingly parallel on spatial dimensions. We further propose a novel automatic parallelization technique for the spatially dependent loops, which offers a compile-time deterministic task partitioning for threading, calculates necessary inter-thread synchronization automatically, and generates an efficient multi-threaded implementation with fine-grained synchronization. Implementing 4 benchmarking programs, 3 of them being the pseudo-applications in NAS Parallel Benchmarks with 6.3% lines of code and 1 being matrix-free High Performance Conjugate Gradients with 16.4% lines of code, we achieve up to 1.67× and on average 1.03× performance compared to manual implementations. Partial differential equation (PDE) solvers are extensively utilized across numerous scientific and engineering fields. However, achieving high performance and scalability often necessitates intricate and low-level programming, particularly when leveraging deterministic sparsity patterns in structured grids. In this paper, we propose an innovative domain-specific language (DSL), Mat2Stencil, with its compiler, for PDE solvers on structured grids. Mat2Stencil introduces a structured sparse matrix abstraction, facilitating modular, flexible, and easy-to-use expression of solvers across a broad spectrum, encompassing components such as Jacobi or Gauss-Seidel preconditioners, incomplete LU or Cholesky decompositions, and multigrid methods built upon them. Our DSL compiler subsequently generates matrix-free code consisting of generalized stencils through multi-stage programming. The code allows spatial loop-carried dependence in the form of quasi-affine loops, in addition to the Jacobi-style stencil’s embarrassingly parallel on spatial dimensions. We further propose a novel automatic parallelization technique for the spatially dependent loops, which offers a compile-time deterministic task partitioning for threading, calculates necessary inter-thread synchronization automatically, and generates an efficient multi-threaded implementation with fine-grained synchronization. Implementing 4 benchmarking programs, 3 of them being the pseudo-applications in NAS Parallel Benchmarks with 6.3% lines of code and 1 being matrix-free High Performance Conjugate Gradients with 16.4% lines of code, we achieve up to 1.67× and on average 1.03× performance compared to manual implementations.
ArticleNumber	246
Author	Tang, Shizhi Yu, Bowen Zhu, Qianchao Cao, Huanqi Chen, Wenguang
Author_xml	– sequence: 1 givenname: Huanqi orcidid: 0000-0002-3870-106X surname: Cao fullname: Cao, Huanqi email: caohq18@mails.tsinghua.edu.cn organization: Tsinghua University, Beijing, China – sequence: 2 givenname: Shizhi orcidid: 0000-0002-6543-0859 surname: Tang fullname: Tang, Shizhi email: tsz19@mails.tsinghua.edu.cn organization: Tsinghua University, Beijing, China – sequence: 3 givenname: Qianchao orcidid: 0009-0001-5021-2912 surname: Zhu fullname: Zhu, Qianchao email: dysania@pku.edu.cn organization: Peking University, Beijing, China – sequence: 4 givenname: Bowen orcidid: 0000-0001-5537-8244 surname: Yu fullname: Yu, Bowen email: yubowen@tsinghua.edu.cn organization: Tsinghua University, Beijing, China – sequence: 5 givenname: Wenguang orcidid: 0000-0002-4281-1018 surname: Chen fullname: Chen, Wenguang email: cwg@tsinghua.edu.cn organization: Tsinghua University, Beijing, China / Pengcheng Laboratory, Shenzhen, China
BookMark	eNptkM1LAzEQxYNUsNbi3VNunlaT7KbZ9Vb7ZaFFYfW8TPMBkXRTsqm0_70rXUXE07yZ-fF4vEvUq32tEbqm5I7SjN-nI8Zyxs5Qn2WCJzRjtPdLX6Bh07wTQmiRZnla9JFbQ2Rl1LW07gGP8dqrvYOA23Owh-QRGq3wtFxh4wOeHXbOShsx1Aovt93SofOgNX6ZznDp3YcODfY1LmPYy7gPrcciWHWFzg24Rg-7OUBv89nr5ClZPS-Wk_EqASZETHgKoih4ATkRVI04Y0ppuqEgDQdmcs2kLLjYEJMxPiIMuOA8lQAmU0JnaTpAyclXBt80QZuqzQnR-joGsK6ipPpqq-raavnbP_wu2C2E4z_kzYkEuf2Bvp-fp3Vxuw
CitedBy_id	crossref_primary_10_3390_sym16020181
Cites_doi	10.1137/140968896 10.1145/2184319.2184345 10.1137/1.9780898717938 10.1007/BF01407835 10.1109/CGO.2019.8661197 10.1007/978-3-540-78800-3_24 10.1145/1989493.1989508 10.1109/IPDPSW.2017.89 10.1109/SC.2016.57 10.1007/978-3-030-47956-5_14 10.1145/1916461.1916467 10.1145/3183653 10.1038/s41592-019-0686-2 10.1177/1094342020959423 10.1007/s10766-007-0034-5 10.5194/gmd-12-1165-2019 10.1145/2517208.2517228 10.1177/1094342015593158 10.1145/1178597.1178605 10.1145/3519939.3523448 10.1109/TPDS.2016.2615094 10.1177/109434209100500306 10.1145/1375581.1375595 10.13140/RG.2.2.28998.68169 10.1145/3458817.3476158 10.1145/3314221.3314615 10.1145/1250734.1250761 10.1145/7902.7904 10.1007/978-3-642-15582-6_49 10.5194/gmd-12-4729-2019 10.1109/SC.2008.5222004 10.1109/SC.2010.2 10.5281/zenodo.8149701 10.1137/1.9780898718003 10.1109/SC.2016.5 10.1145/331532.331562 10.1145/2584665 10.1145/2896389 10.1145/3579990.3580006 10.1201/b10376-8 10.1111/j.1365-2478.1983.tb01060.x 10.1145/3278122.3278139
ContentType	Journal Article
Copyright	Owner/Author
Copyright_xml	– notice: Owner/Author
DBID	AAYXX CITATION
DOI	10.1145/3622822
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2475-1421
EndPage	715
ExternalDocumentID	10_1145_3622822 3622822
GroupedDBID	AAKMM AAYFX ACM ADPZR AIKLT ALMA_UNASSIGNED_HOLDINGS GUFHI LHSKQ M~E OK1 ROL AAYXX AEFXT AEJOY AKRVB CITATION
ID	FETCH-LOGICAL-a277t-53a79959a8071d6522dde1b1acf5a2f8e2cc957b0f425602a57553caaf4d7e433
ISICitedReferencesCount	3
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001087279100026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	2475-1421
IngestDate	Sun Nov 09 14:46:52 EST 2025 Tue Nov 18 21:53:15 EST 2025 Fri Feb 21 01:29:13 EST 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	OOPSLA2
Keywords	domain-specific language polyhedral compilation multi-stage programming structured grid performance optimization compiler finite difference method stencil
Language	English
License	This work is licensed under a Creative Commons Attribution 4.0 International License.
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-a277t-53a79959a8071d6522dde1b1acf5a2f8e2cc957b0f425602a57553caaf4d7e433
ORCID	0000-0002-6543-0859 0009-0001-5021-2912 0000-0002-3870-106X 0000-0002-4281-1018 0000-0001-5537-8244
OpenAccessLink	https://dl.acm.org/doi/10.1145/3622822
PageCount	30
ParticipantIDs	crossref_citationtrail_10_1145_3622822 crossref_primary_10_1145_3622822 acm_primary_3622822
PublicationCentury	2000
PublicationDate	2023-10-16
PublicationDateYYYYMMDD	2023-10-16
PublicationDate_xml	– month: 10 year: 2023 text: 2023-10-16 day: 16
PublicationDecade	2020
PublicationPlace	New York, NY, USA
PublicationPlace_xml	– name: New York, NY, USA
PublicationTitle	Proceedings of ACM on programming languages
PublicationTitleAbbrev	ACM PACMPL
PublicationYear	2023
Publisher	ACM
Publisher_xml	– name: ACM
References	Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, and Pradeep Dubey. 2010. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs. In Conference on High Performance Computing Networking, Storage and Analysis, SC 2010, New Orleans, LA, USA, November 13-19, 2010. IEEE, 1–13. https://doi.org/10.1109/SC.2010.2 10.1109/SC.2010.2 Xiaoye S. Li and Meiyue Shao. 2011. A Supernodal Approach to Incomplete LU Factorization with Partial Pivoting. ACM Trans. Math. Softw., 37, 4 (2011), 43:1–43:20. https://doi.org/10.1145/1916461.1916467 10.1145/1916461.1916467 Samuel Williams, Nathan Bell, Jee Whan Choi, Michael Garland, Leonid Oliker, and Richard Vu. 2010. Sparse Matrix-Vector Multiplication on Multicore and Accelerators. In Scientific Computing with Multicore and Accelerators, Jakub Kurzak, David A. Bader, and Jack J. Dongarra (Eds.). CRC Press / Taylor & Francis, 83–109. https://doi.org/10.1201/b10376-8 10.1201/b10376-8 Georg Ofenbeck, Tiark Rompf, Alen Stojanov, Martin Odersky, and Markus Püschel. 2013. Spiral in scala: towards the systematic construction of generators for performance libraries. 125–134. https://doi.org/10.1145/2517208.2517228 10.1145/2517208.2517228 Uday Bondhugula, Aravind Acharya, and Albert Cohen. 2016. The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests. ACM Trans. Program. Lang. Syst., 38, 3 (2016), 12:1–12:32. https://doi.org/10.1145/2896389 10.1145/2896389 Sven Verdoolaege and Gerda Janssens. 2017. Scheduling for PPCG. https://doi.org/10.13140/RG.2.2.28998.68169 10.13140/RG.2.2.28998.68169 David A. Padua and Michael Wolfe. 1986. Advanced Compiler Optimizations for Supercomputers. Commun. ACM, 29, 12 (1986), 1184–1201. https://doi.org/10.1145/7902.7904 10.1145/7902.7904 John C. Strikwerda. 2004. Finite Difference Schemes and Partial Differential Equations, Second Edition. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898717938 arxiv:https://epubs.siam.org/doi/pdf/10.1137/1.9780898717938. 10.1137/1.9780898717938 Ali Pinar and Michael T. Heath. 1999. Improving Performance of Sparse Matrix-Vector Multiplication. In Proceedings of the ACM/IEEE Conference on Supercomputing, SC 1999, November 13-19, 1999, Portland, Oregon, USA. ACM, 30. https://doi.org/10.1145/331532.331562 10.1145/331532.331562 Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, June 7-13, 2008, Rajiv Gupta and Saman P. Amarasinghe (Eds.). ACM, 101–113. https://doi.org/10.1145/1375581.1375595 10.1145/1375581.1375595 Shoaib Kamil, Kaushik Datta, Samuel Williams, Leonid Oliker, John Shalf, and Katherine A. Yelick. 2006. Implicit and explicit optimizations for stencil computations. In Proceedings of the 2006 workshop on Memory System Performance and Correctness, San Jose, California, USA, October 11, 2006, Antony L. Hosking and Ali-Reza Adl-Tabatabai (Eds.). ACM, 51–60. https://doi.org/10.1145/1178597.1178605 10.1145/1178597.1178605 OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774. Leonardo Mendonça de Moura and Nikolaj S. Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, C. R. Ramakrishnan and Jakob Rehof (Eds.) (Lecture Notes in Computer Science, Vol. 4963). Springer, 337–340. https://doi.org/10.1007/978-3-540-78800-3_24 10.1007/978-3-540-78800-3_24 M. Louboutin, M. Lange, F. Luporini, N. Kukreja, P. A. Witte, F. J. Herrmann, P. Velesko, and G. J. Gorman. 2019. Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration. Geoscientific Model Development, 12, 3 (2019), 1165–1187. https://doi.org/10.5194/gmd-12-1165-2019 10.5194/gmd-12-1165-2019 Andreas Pieper, Georg Hager, and Holger Fehske. 2021. A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials. Int. J. High Perform. Comput. Appl., 35, 1 (2021), https://doi.org/10.1177/1094342020959423 10.1177/1094342020959423 Nathan Zhang, Michael B. Driscoll, Charles Markley, Samuel Williams, Protonu Basu, and Armando Fox. 2017. Snowflake: A Lightweight Portable Stencil DSL. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops 2017, Orlando / Buena Vista, FL, USA, May 29 - June 2, 2017. IEEE Computer Society, 795–804. https://doi.org/10.1109/IPDPSW.2017.89 10.1109/IPDPSW.2017.89 Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman P. Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2019, Washington, DC, USA, February 16-20, 2019, Mahmut Taylan Kandemir, Alexandra Jimborean, and Tipp Moseley (Eds.). IEEE, 193–205. https://doi.org/10.1109/CGO.2019.8661197 10.1109/CGO.2019.8661197 Shizhi Tang, Jidong Zhai, Haojie Wang, Lin Jiang, Liyan Zheng, Zhenhao Yuan, and Chen Zhang. 2022. FreeTensor: a free-form DSL with holistic optimizations for irregular tensor programs. In PLDI ’22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13 - 17, 2022, Ranjit Jhala and Isil Dillig (Eds.). ACM, 872–887. https://doi.org/10.1145/3519939.3523448 10.1145/3519939.3523448 Randy Allen and Ken Kennedy. 2001. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco, CA, USA. isbn:1-55860-286-0 Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The pochoir stencil compiler. In SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, San Jose, CA, USA, June 4-6, 2011 (Co-located with FCRC 2011), Rajmohan Rajaraman and Friedhelm Meyer auf der Heide (Eds.). ACM, 117–128. https://doi.org/10.1145/1989493.1989508 10.1145/1989493.1989508 Mohamed Essadki, Bertrand Michel, Bruno Maugars, Oleksandr Zinenko, Nicolas Vasilache, and Albert Cohen. 2023. Code Generation for In-Place Stencils. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2023, Montréal, QC, Canada, 25 February 2023- 1 March 2023, Christophe Dubach, Derek Bruening, and Ben Hardekopf (Eds.). ACM, 2–13. https://doi.org/10.1145/3579990.3580006 10.1145/3579990.3580006 X. Huang, X. Huang, D. Wang, Q. Wu, Y. Li, S. Zhang, Y. Chen, M. Wang, Y. Gao, Q. Tang, Y. Chen, Z. Fang, Z. Song, and G. Yang. 2019. OpenArray v1.0: a simple operator library for the decoupling of ocean modeling and parallel computing. Geoscientific Model Development, 12, 11 (2019), 4729–4749. https://doi.org/10.5194/gmd-12-4729-2019 10.5194/gmd-12-4729-2019 Qianchao Zhu, Hao Luo, Chao Yang, Mingshuo Ding, Wanwang Yin, and Xinhui Yuan. 2021. Enabling and scaling the HPCG benchmark on the newest generation Sunway supercomputer with 42 million heterogeneous cores. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021, St. Louis, Missouri, USA, November 14-19, 2021, Bronis R. de Supinski, Mary W. Hall, and Todd Gamblin (Eds.). ACM, 57. https://doi.org/10.1145/3458817.3476158 10.1145/3458817.3476158 Edmond Chow and Aftab Patel. 2015. Fine-Grained Parallel Incomplete LU Factorization. SIAM J. Sci. Comput., 37, 2 (2015), https://doi.org/10.1137/140968896 10.1137/140968896 James Decker. 2019. Implementation of Lightweight Modular Staging (LMS) in Python. https://github.com/jmd1011/snek-LMS Yousef Saad. 2003. Iterative methods for sparse linear systems. SIAM. isbn:978-0-89871-534-7 https://doi.org/10.1137/1.9780898718003 10.1137/1.9780898718003 C. Skamarock, BogumiŁ a Klemp, Jimy Dudhia, O. Gill, Zhiquan Liu, Judith Berner, Wei Wang, G. Powers, Greg Duda, Dale M. Barker, and Xiangyu Huang. 2019. A Description of the Advanced Research WRF Model Version 4. Nicolas Stucki, Aggelos Biboudis, and Martin Odersky. 2018. A practical unification of multi-stage programming and macros. In Proceedings of the 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2018, Boston, MA, USA, November 5-6, 2018, Eric Van Wyk and Tiark Rompf (Eds.). ACM, 14–27. https://doi.org/10.1145/3278122.3278139 10.1145/3278122.3278139 Sriram Krishnamoorthy, Muthu Manikandan Baskaran, Uday Bondhugula, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2007. Effective automatic parallelization of stencil computations. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007, Jeanne Ferrante and Kathryn S. McKinley (Eds.). ACM, 235–244. https://doi.org/10.1145/1250734.1250761 10.1145/1250734.1250761 Tiark Rompf and Martin Odersky. 2012. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. Commun. ACM, 55, 6 (2012), 121–130. https://doi.org/10.1145/2184319.2184345 10.1145/2184319.2184345 Walid Taha. 1999. Multi-Stage Programming: Its Theory and Applications. Ph. D. Dissertation. Halmstad University, Sweden. https://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-15052 Huanqi Cao. 2023. Artifact of Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid. https://doi.org/10.5281/zenodo.8149701 10.5281/zenodo.8149701 Christian Lengauer, Sven Apel, Matthias Bolten, Shigeru Chiba, Ulrich Rüde, Jürgen Teich, Armin Größ linger, Frank Hannig, Harald Köstler, Lisa Claus, Alexander Grebhahn, Stefan Groth Skamarock C. (e_1_2_1_35_1) 2019 Taha Walid (e_1_2_1_39_1) e_1_2_1_42_1 e_1_2_1_20_1 e_1_2_1_41_1 e_1_2_1_40_1 e_1_2_1_23_1 e_1_2_1_46_1 e_1_2_1_24_1 e_1_2_1_45_1 e_1_2_1_21_1 e_1_2_1_44_1 e_1_2_1_22_1 e_1_2_1_43_1 e_1_2_1_27_1 e_1_2_1_49_1 e_1_2_1_25_1 e_1_2_1_48_1 e_1_2_1_26_1 e_1_2_1_47_1 e_1_2_1_29_1 Allen Randy (e_1_2_1_1_1) e_1_2_1_7_1 e_1_2_1_31_1 e_1_2_1_8_1 e_1_2_1_30_1 e_1_2_1_5_1 e_1_2_1_6_1 e_1_2_1_3_1 e_1_2_1_12_1 e_1_2_1_4_1 e_1_2_1_13_1 e_1_2_1_34_1 e_1_2_1_10_1 e_1_2_1_33_1 e_1_2_1_2_1 e_1_2_1_11_1 e_1_2_1_32_1 e_1_2_1_16_1 e_1_2_1_17_1 e_1_2_1_38_1 e_1_2_1_14_1 e_1_2_1_37_1 e_1_2_1_15_1 e_1_2_1_36_1 e_1_2_1_9_1 e_1_2_1_18_1 e_1_2_1_19_1
References_xml	– reference: Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. 2019. Generating piecewise-regular code from irregular structures. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019, Kathryn S. McKinley and Kathleen Fisher (Eds.). ACM, New York, NY, USA. 625–639. https://doi.org/10.1145/3314221.3314615 10.1145/3314221.3314615 – reference: Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17 (2020), 261–272. https://doi.org/10.1038/s41592-019-0686-2 10.1038/s41592-019-0686-2 – reference: Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages. ACM Trans. Embed. Comput. Syst., 13, 4s (2014), 134:1–134:25. https://doi.org/10.1145/2584665 10.1145/2584665 – reference: Walid Taha. 1999. Multi-Stage Programming: Its Theory and Applications. Ph. D. Dissertation. Halmstad University, Sweden. https://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-15052 – reference: James Decker. 2019. Implementation of Lightweight Modular Staging (LMS) in Python. https://github.com/jmd1011/snek-LMS – reference: Yousef Saad. 2003. Iterative methods for sparse linear systems. SIAM. isbn:978-0-89871-534-7 https://doi.org/10.1137/1.9780898718003 10.1137/1.9780898718003 – reference: Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman P. Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2019, Washington, DC, USA, February 16-20, 2019, Mahmut Taylan Kandemir, Alexandra Jimborean, and Tipp Moseley (Eds.). IEEE, 193–205. https://doi.org/10.1109/CGO.2019.8661197 10.1109/CGO.2019.8661197 – reference: Randy Allen and Ken Kennedy. 2001. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco, CA, USA. isbn:1-55860-286-0 – reference: Samuel Williams, Nathan Bell, Jee Whan Choi, Michael Garland, Leonid Oliker, and Richard Vu. 2010. Sparse Matrix-Vector Multiplication on Multicore and Accelerators. In Scientific Computing with Multicore and Accelerators, Jakub Kurzak, David A. Bader, and Jack J. Dongarra (Eds.). CRC Press / Taylor & Francis, 83–109. https://doi.org/10.1201/b10376-8 10.1201/b10376-8 – reference: OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774. – reference: John C. Strikwerda. 2004. Finite Difference Schemes and Partial Differential Equations, Second Edition. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898717938 arxiv:https://epubs.siam.org/doi/pdf/10.1137/1.9780898717938. 10.1137/1.9780898717938 – reference: Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The pochoir stencil compiler. In SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, San Jose, CA, USA, June 4-6, 2011 (Co-located with FCRC 2011), Rajmohan Rajaraman and Friedhelm Meyer auf der Heide (Eds.). ACM, 117–128. https://doi.org/10.1145/1989493.1989508 10.1145/1989493.1989508 – reference: Mohamed Essadki, Bertrand Michel, Bruno Maugars, Oleksandr Zinenko, Nicolas Vasilache, and Albert Cohen. 2023. Code Generation for In-Place Stencils. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2023, Montréal, QC, Canada, 25 February 2023- 1 March 2023, Christophe Dubach, Derek Bruening, and Ben Hardekopf (Eds.). ACM, 2–13. https://doi.org/10.1145/3579990.3580006 10.1145/3579990.3580006 – reference: Chao Yang, Wei Xue, Haohuan Fu, Hongtao You, Xinliang Wang, Yulong Ao, Fangfang Liu, Lin Gan, Ping Xu, Lanning Wang, Guangwen Yang, and Weimin Zheng. 2016. 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, Salt Lake City, UT, USA, November 13-18, 2016, John West and Cherri M. Pancake (Eds.). IEEE Computer Society, 57–68. https://doi.org/10.1109/SC.2016.5 10.1109/SC.2016.5 – reference: Uday Bondhugula, Aravind Acharya, and Albert Cohen. 2016. The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests. ACM Trans. Program. Lang. Syst., 38, 3 (2016), 12:1–12:32. https://doi.org/10.1145/2896389 10.1145/2896389 – reference: Andreas Pieper, Georg Hager, and Holger Fehske. 2021. A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials. Int. J. High Perform. Comput. Appl., 35, 1 (2021), https://doi.org/10.1177/1094342020959423 10.1177/1094342020959423 – reference: Leonardo Mendonça de Moura and Nikolaj S. Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, C. R. Ramakrishnan and Jakob Rehof (Eds.) (Lecture Notes in Computer Science, Vol. 4963). Springer, 337–340. https://doi.org/10.1007/978-3-540-78800-3_24 10.1007/978-3-540-78800-3_24 – reference: Tiark Rompf and Martin Odersky. 2012. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. Commun. ACM, 55, 6 (2012), 121–130. https://doi.org/10.1145/2184319.2184345 10.1145/2184319.2184345 – reference: Jack J. Dongarra, Michael A. Heroux, and Piotr Luszczek. 2016. High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems. Int. J. High Perform. Comput. Appl., 30, 1 (2016), 3–10. https://doi.org/10.1177/1094342015593158 10.1177/1094342015593158 – reference: Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine A. Yelick. 2007. Scientific Computing Kernels on the Cell Processor. Int. J. Parallel Program., 35, 3 (2007), 263–298. https://doi.org/10.1007/s10766-007-0034-5 10.1007/s10766-007-0034-5 – reference: Johannes Habich, T. Zeiser, Georg Hager, and Gerhard Wellein. 2009. Enabling temporal blocking for a lattice Boltzmann flow solver through multicore-aware wavefront parallelization. – reference: David H. Bailey, Eric Barszcz, John T. Barton, D. S. Browning, Robert L. Carter, Leonardo Dagum, Rod A. Fatoohi, Paul O. Frederickson, T. A. Lasinski, Robert Schreiber, Horst D. Simon, V. Venkatakrishnan, and Sisira Weeratunga. 1991. The Nas Parallel Benchmarks. Int. J. High Perform. Comput. Appl., 5, 3 (1991), 63–73. https://doi.org/10.1177/109434209100500306 10.1177/109434209100500306 – reference: Qianchao Zhu, Hao Luo, Chao Yang, Mingshuo Ding, Wanwang Yin, and Xinhui Yuan. 2021. Enabling and scaling the HPCG benchmark on the newest generation Sunway supercomputer with 42 million heterogeneous cores. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021, St. Louis, Missouri, USA, November 14-19, 2021, Bronis R. de Supinski, Mary W. Hall, and Todd Gamblin (Eds.). ACM, 57. https://doi.org/10.1145/3458817.3476158 10.1145/3458817.3476158 – reference: Sriram Krishnamoorthy, Muthu Manikandan Baskaran, Uday Bondhugula, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2007. Effective automatic parallelization of stencil computations. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007, Jeanne Ferrante and Kathryn S. McKinley (Eds.). ACM, 235–244. https://doi.org/10.1145/1250734.1250761 10.1145/1250734.1250761 – reference: C. Skamarock, BogumiŁ a Klemp, Jimy Dudhia, O. Gill, Zhiquan Liu, Judith Berner, Wei Wang, G. Powers, Greg Duda, Dale M. Barker, and Xiangyu Huang. 2019. A Description of the Advanced Research WRF Model Version 4. – reference: Shoaib Kamil, Kaushik Datta, Samuel Williams, Leonid Oliker, John Shalf, and Katherine A. Yelick. 2006. Implicit and explicit optimizations for stencil computations. In Proceedings of the 2006 workshop on Memory System Performance and Correctness, San Jose, California, USA, October 11, 2006, Antony L. Hosking and Ali-Reza Adl-Tabatabai (Eds.). ACM, 51–60. https://doi.org/10.1145/1178597.1178605 10.1145/1178597.1178605 – reference: Shizhi Tang, Jidong Zhai, Haojie Wang, Lin Jiang, Liyan Zheng, Zhenhao Yuan, and Chen Zhang. 2022. FreeTensor: a free-form DSL with holistic optimizations for irregular tensor programs. In PLDI ’22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13 - 17, 2022, Ranjit Jhala and Isil Dillig (Eds.). ACM, 872–887. https://doi.org/10.1145/3519939.3523448 10.1145/3519939.3523448 – reference: Nathan Zhang, Michael B. Driscoll, Charles Markley, Samuel Williams, Protonu Basu, and Armando Fox. 2017. Snowflake: A Lightweight Portable Stencil DSL. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops 2017, Orlando / Buena Vista, FL, USA, May 29 - June 2, 2017. IEEE Computer Society, 795–804. https://doi.org/10.1109/IPDPSW.2017.89 10.1109/IPDPSW.2017.89 – reference: Georg Ofenbeck, Tiark Rompf, Alen Stojanov, Martin Odersky, and Markus Püschel. 2013. Spiral in scala: towards the systematic construction of generators for performance libraries. 125–134. https://doi.org/10.1145/2517208.2517228 10.1145/2517208.2517228 – reference: Edmond Chow and Aftab Patel. 2015. Fine-Grained Parallel Incomplete LU Factorization. SIAM J. Sci. Comput., 37, 2 (2015), https://doi.org/10.1137/140968896 10.1137/140968896 – reference: Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, and Pradeep Dubey. 2010. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs. In Conference on High Performance Computing Networking, Storage and Analysis, SC 2010, New Orleans, LA, USA, November 13-19, 2010. IEEE, 1–13. https://doi.org/10.1109/SC.2010.2 10.1109/SC.2010.2 – reference: Sven Verdoolaege. 2010. isl: An Integer Set Library for the Polyhedral Model. In Mathematical Software - ICMS 2010, Third International Congress on Mathematical Software, Kobe, Japan, September 13-17, 2010. Proceedings, Komei Fukuda, Joris van der Hoeven, Michael Joswig, and Nobuki Takayama (Eds.) (Lecture Notes in Computer Science, Vol. 6327). Springer, 299–302. https://doi.org/10.1007/978-3-642-15582-6_49 10.1007/978-3-642-15582-6_49 – reference: Ali Pinar and Michael T. Heath. 1999. Improving Performance of Sparse Matrix-Vector Multiplication. In Proceedings of the ACM/IEEE Conference on Supercomputing, SC 1999, November 13-19, 1999, Portland, Oregon, USA. ACM, 30. https://doi.org/10.1145/331532.331562 10.1145/331532.331562 – reference: George Mcmechan. 2006. Migration by extrapolation of time-dependent boundary values. Geophysical Prospecting, 31 (2006), 04, 413 – 420. https://doi.org/10.1111/j.1365-2478.1983.tb01060.x 10.1111/j.1365-2478.1983.tb01060.x – reference: David A. Padua and Michael Wolfe. 1986. Advanced Compiler Optimizations for Supercomputers. Commun. ACM, 29, 12 (1986), 1184–1201. https://doi.org/10.1145/7902.7904 10.1145/7902.7904 – reference: Nicolas Stucki, Aggelos Biboudis, and Martin Odersky. 2018. A practical unification of multi-stage programming and macros. In Proceedings of the 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2018, Boston, MA, USA, November 5-6, 2018, Eric Van Wyk and Tiark Rompf (Eds.). ACM, 14–27. https://doi.org/10.1145/3278122.3278139 10.1145/3278122.3278139 – reference: Duane Merrill and Michael Garland. 2016. Merge-based parallel sparse matrix-vector multiplication. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, Salt Lake City, UT, USA, November 13-18, 2016, John West and Cherri M. Pancake (Eds.). IEEE Computer Society, 678–689. https://doi.org/10.1109/SC.2016.57 10.1109/SC.2016.57 – reference: Intel. 2023. Intel oneAPI Math Kernel Library. https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html – reference: Xiaoye S. Li and Meiyue Shao. 2011. A Supernodal Approach to Incomplete LU Factorization with Partial Pivoting. ACM Trans. Math. Softw., 37, 4 (2011), 43:1–43:20. https://doi.org/10.1145/1916461.1916467 10.1145/1916461.1916467 – reference: M. Louboutin, M. Lange, F. Luporini, N. Kukreja, P. A. Witte, F. J. Herrmann, P. Velesko, and G. J. Gorman. 2019. Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration. Geoscientific Model Development, 12, 3 (2019), 1165–1187. https://doi.org/10.5194/gmd-12-1165-2019 10.5194/gmd-12-1165-2019 – reference: Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, June 7-13, 2008, Rajiv Gupta and Saman P. Amarasinghe (Eds.). ACM, 101–113. https://doi.org/10.1145/1375581.1375595 10.1145/1375581.1375595 – reference: Huanqi Cao. 2023. Artifact of Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid. https://doi.org/10.5281/zenodo.8149701 10.5281/zenodo.8149701 – reference: Amir Shaikhha, Yannis Klonatos, and Christoph Koch. 2018. Building Efficient Query Engines in a High-Level Language. ACM Trans. Database Syst., 43, 1 (2018), 4:1–4:45. https://doi.org/10.1145/3183653 10.1145/3183653 – reference: X. Huang, X. Huang, D. Wang, Q. Wu, Y. Li, S. Zhang, Y. Chen, M. Wang, Y. Gao, Q. Tang, Y. Chen, Z. Fang, Z. Song, and G. Yang. 2019. OpenArray v1.0: a simple operator library for the decoupling of ocean modeling and parallel computing. Geoscientific Model Development, 12, 11 (2019), 4729–4749. https://doi.org/10.5194/gmd-12-4729-2019 10.5194/gmd-12-4729-2019 – reference: Christian Lengauer, Sven Apel, Matthias Bolten, Shigeru Chiba, Ulrich Rüde, Jürgen Teich, Armin Größ linger, Frank Hannig, Harald Köstler, Lisa Claus, Alexander Grebhahn, Stefan Groth, Stefan Kronawitter, Sebastian Kuckuk, Hannah Rittich, Christian Schmitt, and Jonas Schmitt. 2020. ExaStencils: Advanced Multigrid Solver Generation. In Software for Exascale Computing - SPPEXA 2016-2019, Hans-Joachim Bungartz, Severin Reiz, Benjamin Uekermann, Philipp Neumann, and Wolfgang E. Nagel (Eds.) (Lecture Notes in Computational Science and Engineering, Vol. 136). Springer, 405–452. https://doi.org/10.1007/978-3-030-47956-5_14 10.1007/978-3-030-47956-5_14 – reference: Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David A. Patterson, John Shalf, and Katherine A. Yelick. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the ACM/IEEE Conference on High Performance Computing, SC 2008, November 15-21, 2008, Austin, Texas, USA. IEEE/ACM, 4. https://doi.org/10.1109/SC.2008.5222004 10.1109/SC.2008.5222004 – reference: Sven Verdoolaege and Gerda Janssens. 2017. Scheduling for PPCG. https://doi.org/10.13140/RG.2.2.28998.68169 10.13140/RG.2.2.28998.68169 – reference: Paul Feautrier. 1992. Some efficient solutions to the affine scheduling problem. I. One-dimensional time. Int. J. Parallel Program., 21, 5 (1992), 313–347. https://doi.org/10.1007/BF01407835 10.1007/BF01407835 – reference: Uday Bondhugula, Vinayaka Bandishti, and Irshad Pananilath. 2017. Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations. IEEE Trans. Parallel Distributed Syst., 28, 5 (2017), 1285–1298. https://doi.org/10.1109/TPDS.2016.2615094 10.1109/TPDS.2016.2615094 – ident: e_1_2_1_9_1 doi: 10.1137/140968896 – ident: e_1_2_1_32_1 doi: 10.1145/2184319.2184345 – ident: e_1_2_1_36_1 doi: 10.1137/1.9780898717938 – ident: e_1_2_1_15_1 doi: 10.1007/BF01407835 – ident: e_1_2_1_3_1 doi: 10.1109/CGO.2019.8661197 – ident: e_1_2_1_11_1 doi: 10.1007/978-3-540-78800-3_24 – ident: e_1_2_1_16_1 – ident: e_1_2_1_41_1 doi: 10.1145/1989493.1989508 – ident: e_1_2_1_48_1 doi: 10.1109/IPDPSW.2017.89 – ident: e_1_2_1_25_1 doi: 10.1109/SC.2016.57 – ident: e_1_2_1_21_1 doi: 10.1007/978-3-030-47956-5_14 – ident: e_1_2_1_22_1 doi: 10.1145/1916461.1916467 – ident: e_1_2_1_34_1 doi: 10.1145/3183653 – ident: e_1_2_1_44_1 doi: 10.1038/s41592-019-0686-2 – volume-title: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann ident: e_1_2_1_1_1 – ident: e_1_2_1_30_1 doi: 10.1177/1094342020959423 – ident: e_1_2_1_46_1 doi: 10.1007/s10766-007-0034-5 – ident: e_1_2_1_23_1 doi: 10.5194/gmd-12-1165-2019 – ident: e_1_2_1_18_1 – ident: e_1_2_1_27_1 doi: 10.1145/2517208.2517228 – ident: e_1_2_1_13_1 doi: 10.1177/1094342015593158 – ident: e_1_2_1_19_1 doi: 10.1145/1178597.1178605 – ident: e_1_2_1_40_1 doi: 10.1145/3519939.3523448 – ident: e_1_2_1_6_1 doi: 10.1109/TPDS.2016.2615094 – ident: e_1_2_1_12_1 – ident: e_1_2_1_4_1 doi: 10.1177/109434209100500306 – ident: e_1_2_1_7_1 doi: 10.1145/1375581.1375595 – ident: e_1_2_1_43_1 doi: 10.13140/RG.2.2.28998.68169 – ident: e_1_2_1_49_1 doi: 10.1145/3458817.3476158 – ident: e_1_2_1_2_1 doi: 10.1145/3314221.3314615 – ident: e_1_2_1_20_1 doi: 10.1145/1250734.1250761 – ident: e_1_2_1_29_1 doi: 10.1145/7902.7904 – ident: e_1_2_1_42_1 doi: 10.1007/978-3-642-15582-6_49 – ident: e_1_2_1_17_1 doi: 10.5194/gmd-12-4729-2019 – ident: e_1_2_1_10_1 doi: 10.1109/SC.2008.5222004 – ident: e_1_2_1_26_1 doi: 10.1109/SC.2010.2 – ident: e_1_2_1_8_1 doi: 10.5281/zenodo.8149701 – ident: e_1_2_1_33_1 doi: 10.1137/1.9780898718003 – ident: e_1_2_1_47_1 doi: 10.1109/SC.2016.5 – ident: e_1_2_1_31_1 doi: 10.1145/331532.331562 – ident: e_1_2_1_38_1 doi: 10.1145/2584665 – ident: e_1_2_1_5_1 doi: 10.1145/2896389 – volume-title: Jimy Dudhia, O. Gill, Zhiquan Liu, Judith Berner, Wei Wang, G. Powers, Greg Duda, Dale M. Barker, and Xiangyu Huang. year: 2019 ident: e_1_2_1_35_1 – ident: e_1_2_1_14_1 doi: 10.1145/3579990.3580006 – ident: e_1_2_1_45_1 doi: 10.1201/b10376-8 – ident: e_1_2_1_24_1 doi: 10.1111/j.1365-2478.1983.tb01060.x – volume-title: Multi-Stage Programming: Its Theory and Applications. Ph. D. Dissertation ident: e_1_2_1_39_1 – ident: e_1_2_1_37_1 doi: 10.1145/3278122.3278139
SSID	ssj0001934839
Score	2.2562923
Snippet	Partial differential equation (PDE) solvers are extensively utilized across numerous scientific and engineering fields. However, achieving high performance and...
SourceID	crossref acm
SourceType	Enrichment Source Index Database Publisher
StartPage	686
SubjectTerms	Computing methodologies Domain specific languages Parallel programming languages Shared memory algorithms Software and its engineering Source code generation
SubjectTermsDisplay	Computing methodologies -- Parallel programming languages Computing methodologies -- Shared memory algorithms Software and its engineering -- Domain specific languages Software and its engineering -- Source code generation
Title	Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid
URI	https://dl.acm.org/doi/10.1145/3622822
Volume	7
WOSCitedRecordID	wos001087279100026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2475-1421 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001934839 issn: 2475-1421 databaseCode: M~E dateStart: 20170101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6FwoELjwJqykN7QFwiQ7xee21uadpSpKYNSpEqLtHaWSuWErukSYk48Jf4i8zswzEBCThwWcWjXSva-bwzHn8zQ8hLIQUYyUR5PBchDD73pEqFF_GuZDmXLMty3WxCnJ3Fl5fJsNX67nJhbmaiLOP1Orn6r6oGGSgbU2f_Qd31TUEAv0HpMILaYfwrxQ_kko3QEy5mJu18UE0013SA1fjX3gHYrUnncHSqGYbIwSuywvDM38_thZ16vFCqMzw86owqJFDrLwsjXXB2hbT1d4vipzafw9oYan5Irz_ABZYBNseYhIuO1o58X-pQ7clKlp-LTRTBxrCnxddpLf00XaHwA0ZhprKqjystPai-2Jw2G8FgmgtnEizNQcc4wMTnJlP6tfqNzJ7UogHI8_Ph6LTHGmdv5Gpq6ythskR_tRAci2mA2Ub-7MYIug__W7axZiya_O1wbBfeIreZAIwjV_RbI6iXBDzW3evq_28StXHtG7sW_aBs3vCDGg7NxQNyz76J0J5B0EPSUuUuue-6fFB76D8iswag3tIetXCiTThRgBMFOFEHJwpwog5OtAEnCnCiFk60KukGThTh9Jh8PD666J94tkeHJ5kQSy8MJJYUTGQMvuokAm8e7KWf-jLLQ3jYYwUPexKKtJtzdK6ZhNeDMMikzPlEKB4ET8hOWZVqj9CARyrK4lT6acSTtJuKPEpjwXzpo5eftMku7Nv4ylRhcZpok1duH8eZLWuP3VVm4y2VtQmtJ7p7bE3Z__OUp-TuBsLPyA5sknpO7mQ3y-J68UKj4QeZuo0N
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Mat2Stencil%3A+A+Modular+Matrix-Based+DSL+for+Explicit+and+Implicit+Matrix-Free+PDE+Solvers+on+Structured+Grid&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Cao%2C+Huanqi&rft.au=Tang%2C+Shizhi&rft.au=Zhu%2C+Qianchao&rft.au=Yu%2C+Bowen&rft.date=2023-10-16&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=7&rft.issue=OOPSLA2&rft.spage=686&rft.epage=715&rft_id=info:doi/10.1145%2F3622822&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3622822
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon