Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs
Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In th...
Uložené v:
| Vydané v: | Proceedings of ACM on programming languages Ročník 9; číslo PLDI; s. 2106 - 2130 |
|---|---|
| Hlavní autori: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York, NY, USA
ACM
10.06.2025
|
| Predmet: | |
| ISSN: | 2475-1421, 2475-1421 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In this work we present UZP, a new sparse format based on polyhedral sets of integer points. UZP is a flexible format that subsumes CSR, COO, DIA, BCSR, etc., by raising them to a common mathematical abstraction: a union of integer polyhedra, each intersected with an affine lattice. We present a modular approach to building and optimizing UZP: it captures equivalence classes for the sparse structure, enabling the tuning of the representation for target-specific and application-specific performance considerations. UZP is built from any input sparse structure using integer coordinates, and is interoperable with existing software using CSR and COO data layout. We provide detailed performance evaluation of UZP on 200+ matrices from SuiteSparse, demonstrating how simple and mostly unoptimized generic executors for UZP can already achieve solid performance by exploiting Z-polyhedra structures. |
|---|---|
| AbstractList | Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In this work we present UZP, a new sparse format based on polyhedral sets of integer points. UZP is a flexible format that subsumes CSR, COO, DIA, BCSR, etc., by raising them to a common mathematical abstraction: a union of integer polyhedra, each intersected with an affine lattice. We present a modular approach to building and optimizing UZP: it captures equivalence classes for the sparse structure, enabling the tuning of the representation for target-specific and application-specific performance considerations. UZP is built from any input sparse structure using integer coordinates, and is interoperable with existing software using CSR and COO data layout. We provide detailed performance evaluation of UZP on 200+ matrices from SuiteSparse, demonstrating how simple and mostly unoptimized generic executors for UZP can already achieve solid performance by exploiting Z-polyhedra structures. Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In this work we present UZP, a new sparse format based on polyhedral sets of integer points. UZP is a flexible format that subsumes CSR, COO, DIA, BCSR, etc., by raising them to a common mathematical abstraction: a union of integer polyhedra, each intersected with an affine lattice. We present a modular approach to building and optimizing UZP: it captures equivalence classes for the sparse structure, enabling the tuning of the representation for target-specific and application-specific performance considerations. UZP is built from any input sparse structure using integer coordinates, and is interoperable with existing software using CSR and COO data layout. We provide detailed performance evaluation of UZP on 200+ matrices from SuiteSparse, demonstrating how simple and mostly unoptimized generic executors for UZP can already achieve solid performance by exploiting Z-polyhedra structures. |
| ArticleNumber | 232 |
| Author | Tucker, Emily Touriño, Juan Pouchet, Louis-Noël Rodríguez-Iglesias, Alonso Tongli, Santoshkumar T. Rodríguez, Gabriel |
| Author_xml | – sequence: 1 givenname: Alonso orcidid: 0000-0002-5982-9118 surname: Rodríguez-Iglesias fullname: Rodríguez-Iglesias, Alonso email: alonso.rodriguez@udc.es organization: Universidade da Coruña, a Coruña, Spain – sequence: 2 givenname: Santoshkumar T. orcidid: 0009-0005-3147-2179 surname: Tongli fullname: Tongli, Santoshkumar T. email: Santoshkumar.T@colostate.edu organization: Colorado State University, Fort Collins, USA – sequence: 3 givenname: Emily orcidid: 0009-0002-1447-5683 surname: Tucker fullname: Tucker, Emily email: Emily.Tucker@colostate.edu organization: Colorado State University, Fort Collins, USA – sequence: 4 givenname: Louis-Noël orcidid: 0000-0001-5103-3097 surname: Pouchet fullname: Pouchet, Louis-Noël email: pouchet@colostate.edu organization: Colorado State University, Fort Collins, USA – sequence: 5 givenname: Gabriel orcidid: 0000-0002-0338-3655 surname: Rodríguez fullname: Rodríguez, Gabriel email: gabriel.rodriguez@udc.es organization: Universidade da Coruña, a Coruña, Spain – sequence: 6 givenname: Juan orcidid: 0000-0001-9670-1933 surname: Touriño fullname: Touriño, Juan email: juan.tourino@udc.es organization: Universidade da Coruña, a Coruña, Spain |
| BookMark | eNpNkE1Lw0AYhBepYK3Fu6e9eYrud7NHCbYKLSloPHgJb_cDI0227KYH_fVGW8XTDDMPc5hzNOpC5xC6pOSGUiFv-YxpzuUJGjMxkxkVjI7--TM0TemdEEI1FznXY1Sugt1vIeIidKmPe9M3ocPQWVzu-qZtPuEnCB73bw5Xr2v8tIOYHJ6H2EKPfYhDsnrBA1Ssq3SBTj1sk5sedYKq-f1z8ZAty8VjcbfMgEolM7ERnjtrPbHOOEIpkyyXwK22XEEOQnGtiGBD56VW0oH0llAi9UYoZgyfoOvDrokhpeh8vYtNC_GjpqT-vqI-XjGQVwcSTPsH_ZZf7iRY_A |
| Cites_doi | 10.1145/1583991.1584053 10.1145/2833179.2833183 10.1109/SUPERC.1994.344269 10.1145/2049662.2049663 10.1145/2838734 10.1109/SC41404.2022.00071 10.1145/3559009.3569668 10.1109/SC.2018.00065 10.1145/2751205.2751209 10.1145/224170.224420 10.1007/978-3-642-37658-0_5 10.1109/JPROC.2018.2857721 10.1145/2751205.2751244 10.1145/3276493 10.1145/3168818 10.1145/3520484 10.5281/zenodo.15240673 10.1145/1183401.1183444 10.1109/SC.2002.10025 10.1007/978-3-642-15582-6_49 10.1145/1654059.1654078 10.1109/SC.2016.40 10.1145/3126908.3126936 10.1016/0743-7315(90)90129-D 10.1145/169627.169752 10.1145/1375581.1375595 10.1109/TC.2018.2853747 10.1145/3314221.3314615 10.1007/3-540-57502-2_42 10.1145/3581784.3607097 10.1109/CGO51591.2021.9370308 10.48550/arXiv.2105.04937 10.1145/1837853.1693471 10.1145/2854038.2854056 10.1145/223428.207157 10.1109/IPDPS.2008.4536313 10.1145/3133901 10.1109/SC41404.2022.00037 10.1145/3293883.3295712 10.1145/1362622.1362674 10.1145/3295500.3356216 10.1145/1229428.1229478 10.1109/PACT.2004.1342537 10.1145/2688500.2688515 10.1145/3591302 |
| ContentType | Journal Article |
| Copyright | Owner/Author |
| Copyright_xml | – notice: Owner/Author |
| DBID | AAYXX CITATION |
| DOI | 10.1145/3729335 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2475-1421 |
| EndPage | 2130 |
| ExternalDocumentID | 10_1145_3729335 3729335 |
| GrantInformation_xml | – fundername: MICIU grantid: PID2022-136435NB-I00, FPU2022/01651 – fundername: US National Science Foundation grantid: 2009020 |
| GroupedDBID | AAKMM AAYFX ACM AEFXT AEJOY AIKLT AKRVB ALMA_UNASSIGNED_HOLDINGS GUFHI LHSKQ M~E OK1 ROL AAYXX CITATION |
| ID | FETCH-LOGICAL-a1565-4b4f3eddf0dece01125285a3d9d36a8a46396042ce0f5965ea5fd01059b462cc3 |
| ISSN | 2475-1421 |
| IngestDate | Sat Nov 29 07:43:35 EST 2025 Mon Aug 18 16:40:35 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | PLDI |
| Keywords | SIMD vectorization code generation polyhedral compilation sparse format sparse linear algebra |
| Language | English |
| License | This work is licensed under Creative Commons Attribution International 4.0. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-a1565-4b4f3eddf0dece01125285a3d9d36a8a46396042ce0f5965ea5fd01059b462cc3 |
| ORCID | 0000-0002-0338-3655 0009-0002-1447-5683 0000-0001-9670-1933 0009-0005-3147-2179 0000-0001-5103-3097 0000-0002-5982-9118 |
| OpenAccessLink | https://dl.acm.org/doi/10.1145/3729335 |
| PageCount | 25 |
| ParticipantIDs | crossref_primary_10_1145_3729335 acm_primary_3729335 |
| PublicationCentury | 2000 |
| PublicationDate | 20250610 2025-06-10 |
| PublicationDateYYYYMMDD | 2025-06-10 |
| PublicationDate_xml | – month: 06 year: 2025 text: 20250610 day: 10 |
| PublicationDecade | 2020 |
| PublicationPlace | New York, NY, USA |
| PublicationPlace_xml | – name: New York, NY, USA |
| PublicationTitle | Proceedings of ACM on programming languages |
| PublicationTitleAbbrev | ACM PACMPL |
| PublicationYear | 2025 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| References | S. Sharma, R. Ponnusamy, B. Moon, Y.-S. Hwang, R. Das, and J. Saltz. 1994. Run-time and Compile-time Support for Adaptive Irregular Problems. In ACM/IEEE Conference on Supercomputing, SC. Washington, DC, USA. 97–106. https://doi.org/10.1109/SUPERC.1994.344269 10.1109/SUPERC.1994.344269 Uday Bondhugula, Albert Hartono, J. Ramanujan, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Program Optimization System. In 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. Tucson, AZ, USA. 101–113. https://doi.org/10.1145/1375581.1375595 10.1145/1375581.1375595 M.M. Strout, G. George, and C. Olschanowsky. 2012. Set and Relation Manipulation for the Sparse Polyhedral Framework. In 25th International Workshop on Languages and Compilers for Parallel Computing, LCPC. Tokyo, Japan. 61–75. https://doi.org/10.1007/978-3-642-37658-0_5 10.1007/978-3-642-37658-0_5 A. Sukumaran-Rajam and P. Clauss. 2016. The Polyhedral Model of Nonlinear Loops. ACM Trans. Archit. Code Optim., 12, 4 (2016), https://doi.org/10.1145/2838734 10.1145/2838734 J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. 1990. Run-time Scheduling and Execution of Loops on Message Passing Machines. J. Parallel Distrib. Comput., 8, 4 (1990), 303–312. https://doi.org/10.1016/0743-7315(90)90129-D 10.1016/0743-7315(90)90129-D Zhen Du, Jiajia Li, Yinshan Wang, Xueqi Li, Guangming Tan, and Ninghui Sun. 2022. AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC41404.2022.00071 10.1109/SC41404.2022.00071 L.-N. Pouchet. 2011. PolyBench: The Polyhedral Benchmarking suite, version PolyBench/C 4.2.1. http://polybench.sf.net Aydin Buluç and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In 2008 IEEE International Symposium on Parallel and Distributed Processing, PDP. Miami, FL, USA. https://doi.org/10.1109/IPDPS.2008.4536313 10.1109/IPDPS.2008.4536313 R.W. Vuduc. 2004. Automatic Performance Tuning of Sparse Matrix Kernels. Ph. D. Dissertation. University of California. Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In 5th Workshop on Irregular Applications: Architectures and Algorithms, IA³. Austin, TX, USA. https://doi.org/10.1145/2833179.2833183 10.1145/2833179.2833183 Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In 29th ACM on International Conference on Supercomputing, ICS. Newport Beach, CA, USA. 339–350. https://doi.org/10.1145/2751205.2751208 10.1145/2751205.2751208 Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: efficient vectorization of SpMV on x86 processors. In International Symposium on Code Generation and Optimization, CGO. Vienna, Austria. 149–162. https://doi.org/10.1145/3168818 10.1145/3168818 Lucas Wilkinson, Kazem Cheshmi, and Maryam Mehri Dehnavi. 2023. Register Tiling for Unstructured Sparsity in Neural Network Inference. Proceedings of the ACM on Programming Languages, 7, PLDI (2023), 1995–2020. https://doi.org/10.1145/3591302 10.1145/3591302 K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2017. Sympiler: transforming sparse matrix codes by decoupling symbolic analysis. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, USA. https://doi.org/10.1145/3126908.3126936 10.1145/3126908.3126936 Marcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez, and Juan Touriño. 2022. Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations. In International Conference on Parallel Architectures and Compilation Techniques, PACT. Chicago, IL, USA. 160–171. isbn:9781450398688 https://doi.org/10.1145/3559009.3569668 10.1145/3559009.3569668 Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), https://doi.org/10.1145/2049662.2049663 10.1145/2049662.2049663 A. LaMielle and M. Strout. 2010. Enabling Code Generation within the Sparse Polyhedral Framework. Colorado State University. https://www.cs.colostate.edu/TechReports/Reports/2010/tr10-102.pdf Philipp Herholz, Xuan Tang, Teseo Schneider, Shoaib Kamil, Daniele Panozzo, and Olga Sorkine-Hornung. 2022. Sparsity-Specific Code Optimization using Expression Trees. ACM Trans. Graph., 41, 5 (2022), issn:0730-0301 https://doi.org/10.1145/3520484 10.1145/3520484 R. Das, P. Havlak, J. Saltz, and K. Kennedy. 1995. Index Array Flattening Through Program Transformation. In ACM/IEEE Supercomputing Conference, SC. San Diego, CA, USA. https://doi.org/10.1145/224170.224420 10.1145/224170.224420 M. Ravishankar, R. Dathathri, V. Elango, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. 2015. Distributed Memory Code Generation for Mixed Irregular/Regular Computations. In 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. San Francisco, CA, USA. 65–75. https://doi.org/10.1145/2688500.2688515 10.1145/2688500.2688515 Naser Sedaghati, Te Mu, Louis-Noel Pouchet, Srinivasan Parthasarathy, and P. Sadayappan. 2015. Automatic selection of sparse matrix representation on GPUs. In 29th ACM on International Conference on Supercomputing, ICS. Newport Beach, CA, USA. 99–108. https://doi.org/10.1145/2751205.2751244 10.1145/2751205.2751244 A. Venkat, M.S. Mohammadi, J. Park, H. Rong, R. Barik, M.M. Strout, and M. Hall. 2016. Automating Wavefront Parallelization for Sparse Matrix Computations. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Salt Lake City, UT, USA. https://doi.org/10.1109/SC.2016.40 10.1109/SC.2016.40 Sven Verdoolaege. 2010. ISL: An integer set library for the polyhedral model. In 3rd International Congress on Mathematical Software, ICMS. Kobe, Japan. 299–302. https://doi.org/10.1007/978-3-642-15582-6_49 10.1007/978-3-642-15582-6_49 Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization, CGO. Seoul, South Korea. 2–14. https://doi.org/10.1109/CGO51591.2021.9370308 10.1109/CGO51591.2021.9370308 S. Chou, F. Kjolstad, and S. Amarasinghe. 2018. Format abstraction for sparse tensor algebra compilers. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), https://doi.org/10.1145/3276493 10.1145/3276493 G. Rodríguez, M. T. Kandemir, and J. Touriño. 2019. Affine Modeling of Program Traces. IEEE Trans. Comput., 68, 2 (2019), 294–300. https://doi.org/10.1109/TC.2018.2853747 10.1109/TC.2018.2853747 Takeshi Fukaya, Koki Ishida, Akie Miura, Takeshi Iwashita, and Hiroshi Nakashima. 2021. Accelerating the SpMV kernel on standard CPUs by exploiting the partially diagonal structures. arXiv preprint arXiv:2105.04937, https://doi.org/10.48550/arXiv.2105.04937 10.48550/arXiv.2105.04937 Gautam Gupta and Sanjay Rajopadhye. 2007. The Z-Polyhedral Model. In 12th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP. San Jose, CA, USA. 237–248. https://doi.org/10.1145/1229428.1229478 10.1145/1229428.1229478 L.-N. Pouchet, G. Rodriguez, and colleagues. 2025. The UZP sparse format. https://github.com/UDC-GAC/uzp-sparse-format N. Bell and M. Garland. 2009. Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. In ACM/IEEE Conference on High Performance Computing, SC. Portland, OR, USA. https://doi.org/10.1145/1654059.1654078 10.1145/1654059.1654078 L.-N. Pouchet, G. Rodriguez, and colleagues. 2025. Artifact for PLDI’25 Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs. https://doi.org/10.5281/zenodo.15240673 10.5281/zenodo.15240673 Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. 2019. Generating Piecewise-Regular Code from Irregular Structures. In 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. Phoenix, AZ, USA. 625–639. isbn:9781450367127 https://doi.org/10.1145/3314221.3314615 10.1145/3314221.3314615 K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2018. ParSy: inspection and transformation of sparse matrix computations for parallelism. In International Conference for High Performance Computing, Networking, Storage, and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC.2018.00065 10.1109/SC.2018.00065 S. Williams, L. Oliker, R.W. Vuduc, J. Shalf, K.A. Yelick, and J. Demmel. 2009. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Parallel Comput., 35, 3 (2009), 178–194. https://doi.org/10.1145/1362622.1362674 10.1145/1362622.1362674 Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The sparse polyhedral framework: Composing compiler-generated inspector-executor code. Proc. IEEE, 106, 11 (2018), 1921–1934. https://doi.org/10.1109/JPROC.2018.2857721 10.1109/JPROC.2018.2857721 G. Agrawal, J. Saltz, and R. Das. 1995. Interprocedural Partial Redundancy Elimination and its Application to Distributed Memory Compilation. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. La Jolla, CA, USA. 258–269. https://doi.org/10.1145/223428.207157 10.1145/223428.207157 R. Ponnusamy, J.H. Saltz, and A.N. Choudhary. 1993. Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse. In ACM/IEEE Conference on Supercomputing, SC. Portland, OR, USA. 361–370. https://doi.org/10.1145/169627.169752 10.1145/169627.169752 J.W. Choi, A. Singh, and R.W. Vuduc. 2010. Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs. In 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. Ba e_1_2_2_4_1 e_1_2_2_24_1 e_1_2_2_49_1 e_1_2_2_6_1 e_1_2_2_22_1 e_1_2_2_20_1 e_1_2_2_2_1 e_1_2_2_41_1 e_1_2_2_43_1 e_1_2_2_8_1 e_1_2_2_28_1 e_1_2_2_26_1 e_1_2_2_47_1 Vuduc R.W. (e_1_2_2_45_1) e_1_2_2_13_1 e_1_2_2_38_1 e_1_2_2_11_1 e_1_2_2_30_1 e_1_2_2_19_1 e_1_2_2_32_1 e_1_2_2_17_1 e_1_2_2_34_1 e_1_2_2_15_1 e_1_2_2_36_1 e_1_2_2_25_1 e_1_2_2_48_1 e_1_2_2_5_1 e_1_2_2_23_1 e_1_2_2_7_1 e_1_2_2_21_1 e_1_2_2_1_1 e_1_2_2_3_1 e_1_2_2_40_1 e_1_2_2_42_1 e_1_2_2_9_1 e_1_2_2_29_1 e_1_2_2_44_1 e_1_2_2_27_1 e_1_2_2_46_1 e_1_2_2_14_1 e_1_2_2_37_1 e_1_2_2_12_1 e_1_2_2_39_1 e_1_2_2_10_1 e_1_2_2_31_1 e_1_2_2_18_1 e_1_2_2_33_1 e_1_2_2_16_1 e_1_2_2_35_1 e_1_2_2_50_1 |
| References_xml | – reference: Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In 29th ACM on International Conference on Supercomputing, ICS. Newport Beach, CA, USA. 339–350. https://doi.org/10.1145/2751205.2751208 10.1145/2751205.2751208 – reference: L.-N. Pouchet, G. Rodriguez, and colleagues. 2025. Artifact for PLDI’25 Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs. https://doi.org/10.5281/zenodo.15240673 10.5281/zenodo.15240673 – reference: S. Chou, F. Kjolstad, and S. Amarasinghe. 2018. Format abstraction for sparse tensor algebra compilers. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), https://doi.org/10.1145/3276493 10.1145/3276493 – reference: S. Williams, L. Oliker, R.W. Vuduc, J. Shalf, K.A. Yelick, and J. Demmel. 2009. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Parallel Comput., 35, 3 (2009), 178–194. https://doi.org/10.1145/1362622.1362674 10.1145/1362622.1362674 – reference: F. Kjolstad, S. Kamil, S. Chou, D. Lugato, and S. Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), https://doi.org/10.1145/3133901 10.1145/3133901 – reference: C. Bastoul. 2004. Code Generation in the Polyhedral Model Is Easier Than You Think. In 13th International Conference on Parallel Architectures and Compilation Techniques, PACT. Antibes, France. 7–16. https://doi.org/10.1109/PACT.2004.1342537 10.1109/PACT.2004.1342537 – reference: Philipp Herholz, Xuan Tang, Teseo Schneider, Shoaib Kamil, Daniele Panozzo, and Olga Sorkine-Hornung. 2022. Sparsity-Specific Code Optimization using Expression Trees. ACM Trans. Graph., 41, 5 (2022), issn:0730-0301 https://doi.org/10.1145/3520484 10.1145/3520484 – reference: Kazem Cheshmi, Michelle Strout, and Maryam Mehri Dehnavi. 2023. Runtime Composition of Iterations for Fusing Loop-carried Sparse Dependence. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, USA. isbn:9798400701092 https://doi.org/10.1145/3581784.3607097 10.1145/3581784.3607097 – reference: M. Ravishankar, R. Dathathri, V. Elango, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. 2015. Distributed Memory Code Generation for Mixed Irregular/Regular Computations. In 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. San Francisco, CA, USA. 65–75. https://doi.org/10.1145/2688500.2688515 10.1145/2688500.2688515 – reference: Jeremiah Willcock and Andrew Lumsdaine. 2006. Accelerating sparse matrix computations via data compression. In 20th Annual International Conference on Supercomputing, ICS. Cairns, QLD, Australia. 307–316. isbn:1595932828 https://doi.org/10.1145/1183401.1183444 10.1145/1183401.1183444 – reference: R. Ponnusamy, J.H. Saltz, and A.N. Choudhary. 1993. Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse. In ACM/IEEE Conference on Supercomputing, SC. Portland, OR, USA. 361–370. https://doi.org/10.1145/169627.169752 10.1145/169627.169752 – reference: Sven Verdoolaege. 2010. ISL: An integer set library for the polyhedral model. In 3rd International Congress on Mathematical Software, ICMS. Kobe, Japan. 299–302. https://doi.org/10.1007/978-3-642-15582-6_49 10.1007/978-3-642-15582-6_49 – reference: G. Rodríguez, J. M. Andión, M. T. Kandemir, and J. Touriño. 2016. Trace-based Affine Reconstruction of Codes. In 14th International Symposium on Code Generation and Optimization, CGO. Barcelona, Spain. 139–149. https://doi.org/10.1145/2854038.2854056 10.1145/2854038.2854056 – reference: Kazem Cheshmi. 2023. Partially Strided Codelet GitHub repository. https://github.com/sparse-specialize/partially-strided-codelet Commit: c03d0593411c8afc9c6861de152695c453358a04 – reference: Aydin Buluç and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In 2008 IEEE International Symposium on Parallel and Distributed Processing, PDP. Miami, FL, USA. https://doi.org/10.1109/IPDPS.2008.4536313 10.1109/IPDPS.2008.4536313 – reference: R. von Hanxleden, K. Kennedy, C. Koelbel, R. Das, and J. Saltz. 1992. Compiler analysis for irregular problems in Fortran D. In 6th International Workshop on Languages and Compilers for Parallel Computing, LCPC. New Haven, CT, USA. 97–111. https://doi.org/10.1007/3-540-57502-2_42 10.1007/3-540-57502-2_42 – reference: K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2018. ParSy: inspection and transformation of sparse matrix computations for parallelism. In International Conference for High Performance Computing, Networking, Storage, and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC.2018.00065 10.1109/SC.2018.00065 – reference: Naser Sedaghati, Te Mu, Louis-Noel Pouchet, Srinivasan Parthasarathy, and P. Sadayappan. 2015. Automatic selection of sparse matrix representation on GPUs. In 29th ACM on International Conference on Supercomputing, ICS. Newport Beach, CA, USA. 99–108. https://doi.org/10.1145/2751205.2751244 10.1145/2751205.2751244 – reference: Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization, CGO. Seoul, South Korea. 2–14. https://doi.org/10.1109/CGO51591.2021.9370308 10.1109/CGO51591.2021.9370308 – reference: Uday Bondhugula, Albert Hartono, J. Ramanujan, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Program Optimization System. In 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. Tucson, AZ, USA. 101–113. https://doi.org/10.1145/1375581.1375595 10.1145/1375581.1375595 – reference: Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Prasant Singh Rawat, Sriram Krishnamoorthy, and P. Sadayappan. 2019. An efficient mixed-mode representation of sparse tensors. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, USA. https://doi.org/10.1145/3295500.3356216 10.1145/3295500.3356216 – reference: L.-N. Pouchet. 2011. PolyBench: The Polyhedral Benchmarking suite, version PolyBench/C 4.2.1. http://polybench.sf.net – reference: A. LaMielle and M. Strout. 2010. Enabling Code Generation within the Sparse Polyhedral Framework. Colorado State University. https://www.cs.colostate.edu/TechReports/Reports/2010/tr10-102.pdf – reference: Rich Vuduc, James W Demmel, Katherine A Yelick, Shoaib Kamil, Rajesh Nishtala, and Benjamin Lee. 2002. Performance optimizations and bounds for sparse matrix-vector multiply. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, SC. Baltimore, MD, USA. https://doi.org/10.1109/SC.2002.10025 10.1109/SC.2002.10025 – reference: S. Sharma, R. Ponnusamy, B. Moon, Y.-S. Hwang, R. Das, and J. Saltz. 1994. Run-time and Compile-time Support for Adaptive Irregular Problems. In ACM/IEEE Conference on Supercomputing, SC. Washington, DC, USA. 97–106. https://doi.org/10.1109/SUPERC.1994.344269 10.1109/SUPERC.1994.344269 – reference: Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The sparse polyhedral framework: Composing compiler-generated inspector-executor code. Proc. IEEE, 106, 11 (2018), 1921–1934. https://doi.org/10.1109/JPROC.2018.2857721 10.1109/JPROC.2018.2857721 – reference: J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. 1990. Run-time Scheduling and Execution of Loops on Message Passing Machines. J. Parallel Distrib. Comput., 8, 4 (1990), 303–312. https://doi.org/10.1016/0743-7315(90)90129-D 10.1016/0743-7315(90)90129-D – reference: K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2017. Sympiler: transforming sparse matrix codes by decoupling symbolic analysis. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, USA. https://doi.org/10.1145/3126908.3126936 10.1145/3126908.3126936 – reference: R. Das, P. Havlak, J. Saltz, and K. Kennedy. 1995. Index Array Flattening Through Program Transformation. In ACM/IEEE Supercomputing Conference, SC. San Diego, CA, USA. https://doi.org/10.1145/224170.224420 10.1145/224170.224420 – reference: Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: efficient vectorization of SpMV on x86 processors. In International Symposium on Code Generation and Optimization, CGO. Vienna, Austria. 149–162. https://doi.org/10.1145/3168818 10.1145/3168818 – reference: Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive sparse tiling for sparse matrix multiplication. In 24th Symposium on Principles and Practice of Parallel Programming, PPoPP. Washington, DC, USA. 300–314. https://doi.org/10.1145/3293883.3295712 10.1145/3293883.3295712 – reference: Aydin Buluç, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, and Charles E. Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In 21st Annual Symposium on Parallelism in Algorithms and Architectures, SPAA. Calgary, AB, Canada. 233–244. https://doi.org/10.1145/1583991.1584053 10.1145/1583991.1584053 – reference: G. Rodríguez, M. T. Kandemir, and J. Touriño. 2019. Affine Modeling of Program Traces. IEEE Trans. Comput., 68, 2 (2019), 294–300. https://doi.org/10.1109/TC.2018.2853747 10.1109/TC.2018.2853747 – reference: A. Venkat, M.S. Mohammadi, J. Park, H. Rong, R. Barik, M.M. Strout, and M. Hall. 2016. Automating Wavefront Parallelization for Sparse Matrix Computations. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Salt Lake City, UT, USA. https://doi.org/10.1109/SC.2016.40 10.1109/SC.2016.40 – reference: G. Agrawal, J. Saltz, and R. Das. 1995. Interprocedural Partial Redundancy Elimination and its Application to Distributed Memory Compilation. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. La Jolla, CA, USA. 258–269. https://doi.org/10.1145/223428.207157 10.1145/223428.207157 – reference: Gautam Gupta and Sanjay Rajopadhye. 2007. The Z-Polyhedral Model. In 12th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP. San Jose, CA, USA. 237–248. https://doi.org/10.1145/1229428.1229478 10.1145/1229428.1229478 – reference: Takeshi Fukaya, Koki Ishida, Akie Miura, Takeshi Iwashita, and Hiroshi Nakashima. 2021. Accelerating the SpMV kernel on standard CPUs by exploiting the partially diagonal structures. arXiv preprint arXiv:2105.04937, https://doi.org/10.48550/arXiv.2105.04937 10.48550/arXiv.2105.04937 – reference: L.-N. Pouchet, G. Rodriguez, and colleagues. 2025. The UZP sparse format. https://github.com/UDC-GAC/uzp-sparse-format – reference: M.M. Strout, G. George, and C. Olschanowsky. 2012. Set and Relation Manipulation for the Sparse Polyhedral Framework. In 25th International Workshop on Languages and Compilers for Parallel Computing, LCPC. Tokyo, Japan. 61–75. https://doi.org/10.1007/978-3-642-37658-0_5 10.1007/978-3-642-37658-0_5 – reference: Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. 2019. Generating Piecewise-Regular Code from Irregular Structures. In 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. Phoenix, AZ, USA. 625–639. isbn:9781450367127 https://doi.org/10.1145/3314221.3314615 10.1145/3314221.3314615 – reference: Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), https://doi.org/10.1145/2049662.2049663 10.1145/2049662.2049663 – reference: R.W. Vuduc. 2004. Automatic Performance Tuning of Sparse Matrix Kernels. Ph. D. Dissertation. University of California. – reference: Lucas Wilkinson, Kazem Cheshmi, and Maryam Mehri Dehnavi. 2023. Register Tiling for Unstructured Sparsity in Neural Network Inference. Proceedings of the ACM on Programming Languages, 7, PLDI (2023), 1995–2020. https://doi.org/10.1145/3591302 10.1145/3591302 – reference: N. Bell and M. Garland. 2009. Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. In ACM/IEEE Conference on High Performance Computing, SC. Portland, OR, USA. https://doi.org/10.1145/1654059.1654078 10.1145/1654059.1654078 – reference: Kazem Cheshmi, Zachary Cetinic, and Maryam Mehri Dehnavi. 2022. Vectorizing sparse matrix computations with partially-strided codelets. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC41404.2022.00037 10.1109/SC41404.2022.00037 – reference: Zhen Du, Jiajia Li, Yinshan Wang, Xueqi Li, Guangming Tan, and Ninghui Sun. 2022. AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC41404.2022.00071 10.1109/SC41404.2022.00071 – reference: Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In 5th Workshop on Irregular Applications: Architectures and Algorithms, IA³. Austin, TX, USA. https://doi.org/10.1145/2833179.2833183 10.1145/2833179.2833183 – reference: A. Sukumaran-Rajam and P. Clauss. 2016. The Polyhedral Model of Nonlinear Loops. ACM Trans. Archit. Code Optim., 12, 4 (2016), https://doi.org/10.1145/2838734 10.1145/2838734 – reference: J.W. Choi, A. Singh, and R.W. Vuduc. 2010. Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs. In 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. Bangalore, India. 115–126. https://doi.org/10.1145/1837853.1693471 10.1145/1837853.1693471 – reference: Marcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez, and Juan Touriño. 2022. Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations. In International Conference on Parallel Architectures and Compilation Techniques, PACT. Chicago, IL, USA. 160–171. isbn:9781450398688 https://doi.org/10.1145/3559009.3569668 10.1145/3559009.3569668 – ident: e_1_2_2_6_1 doi: 10.1145/1583991.1584053 – ident: e_1_2_2_38_1 doi: 10.1145/2833179.2833183 – ident: e_1_2_2_37_1 doi: 10.1109/SUPERC.1994.344269 – ident: e_1_2_2_16_1 doi: 10.1145/2049662.2049663 – ident: e_1_2_2_41_1 doi: 10.1145/2838734 – ident: e_1_2_2_17_1 doi: 10.1109/SC41404.2022.00071 – ident: e_1_2_2_31_1 – ident: e_1_2_2_22_1 doi: 10.1145/3559009.3569668 – ident: e_1_2_2_11_1 doi: 10.1109/SC.2018.00065 – ident: e_1_2_2_26_1 doi: 10.1145/2751205.2751209 – ident: e_1_2_2_29_1 – ident: e_1_2_2_15_1 doi: 10.1145/224170.224420 – ident: e_1_2_2_39_1 doi: 10.1007/978-3-642-37658-0_5 – ident: e_1_2_2_40_1 doi: 10.1109/JPROC.2018.2857721 – ident: e_1_2_2_36_1 doi: 10.1145/2751205.2751244 – ident: e_1_2_2_14_1 doi: 10.1145/3276493 – ident: e_1_2_2_50_1 doi: 10.1145/3168818 – ident: e_1_2_2_20_1 doi: 10.1145/3520484 – ident: e_1_2_2_30_1 doi: 10.5281/zenodo.15240673 – ident: e_1_2_2_48_1 doi: 10.1145/1183401.1183444 – ident: e_1_2_2_46_1 doi: 10.1109/SC.2002.10025 – ident: e_1_2_2_43_1 doi: 10.1007/978-3-642-15582-6_49 – ident: e_1_2_2_4_1 doi: 10.1145/1654059.1654078 – ident: e_1_2_2_42_1 doi: 10.1109/SC.2016.40 – ident: e_1_2_2_10_1 doi: 10.1145/3126908.3126936 – ident: e_1_2_2_35_1 doi: 10.1016/0743-7315(90)90129-D – ident: e_1_2_2_28_1 doi: 10.1145/169627.169752 – ident: e_1_2_2_5_1 doi: 10.1145/1375581.1375595 – ident: e_1_2_2_34_1 doi: 10.1109/TC.2018.2853747 – ident: e_1_2_2_2_1 doi: 10.1145/3314221.3314615 – ident: e_1_2_2_44_1 doi: 10.1007/3-540-57502-2_42 – ident: e_1_2_2_12_1 doi: 10.1145/3581784.3607097 – ident: e_1_2_2_25_1 doi: 10.1109/CGO51591.2021.9370308 – ident: e_1_2_2_18_1 doi: 10.48550/arXiv.2105.04937 – ident: e_1_2_2_13_1 doi: 10.1145/1837853.1693471 – ident: e_1_2_2_33_1 doi: 10.1145/2854038.2854056 – ident: e_1_2_2_1_1 doi: 10.1145/223428.207157 – ident: e_1_2_2_7_1 doi: 10.1109/IPDPS.2008.4536313 – ident: e_1_2_2_23_1 doi: 10.1145/3133901 – ident: e_1_2_2_9_1 doi: 10.1109/SC41404.2022.00037 – ident: e_1_2_2_21_1 doi: 10.1145/3293883.3295712 – ident: e_1_2_2_49_1 doi: 10.1145/1362622.1362674 – ident: e_1_2_2_8_1 – ident: e_1_2_2_27_1 doi: 10.1145/3295500.3356216 – ident: e_1_2_2_19_1 doi: 10.1145/1229428.1229478 – volume-title: Automatic Performance Tuning of Sparse Matrix Kernels. Ph. D. Dissertation ident: e_1_2_2_45_1 – ident: e_1_2_2_24_1 – ident: e_1_2_2_3_1 doi: 10.1109/PACT.2004.1342537 – ident: e_1_2_2_32_1 doi: 10.1145/2688500.2688515 – ident: e_1_2_2_47_1 doi: 10.1145/3591302 |
| SSID | ssj0001934839 |
| Score | 2.2942047 |
| Snippet | Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific... |
| SourceID | crossref acm |
| SourceType | Index Database Publisher |
| StartPage | 2106 |
| SubjectTerms | General and reference Theory of computation Vector / streaming algorithms |
| SubjectTermsDisplay | General and reference -- Performance Theory of computation -- Vector / streaming algorithms |
| Title | Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs |
| URI | https://dl.acm.org/doi/10.1145/3729335 |
| Volume | 9 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2475-1421 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001934839 issn: 2475-1421 databaseCode: M~E dateStart: 20170101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Bb9MwFLbK4MAFxgAxYJMP3KrAkthJfCzdpk2iXaS1aOIyObEzKtqkIus0ceBv8Hd5z3ZawzjAgUsUxU5a-X16fn7ve-8R8kaHrCrDqoBjqi4DxrDkbRqLIOM6DVUaxWklTbOJdDzOLi5E3uv96HJhbuZpXWe3t2L5X0UNz0DYmDr7D-JefxQewD0IHa4gdrj-leBHjTLUUmzF2RWHNSGCM9AOC5d22VEDpp_y_vkSDre6f2ysV0M7PF-OPmIUYZhPW996zde7nSGADIYjnOUoXgt0OnTuzw2DvlE2GH94tdLfgtOruW5nNolsMId_2Gzc3JhQbJ3U9XXTfv6C5O_-5O1dBsgR-mTWOr1ZAe6sf6FZzdpg3Jjfez_3HRoRR-KVo7ZaCDqSvFGCEUt5EDKbRd1pbOEBM_9wePqLAj5IvM08Cm3U5-5GwbCmBsYsY1su5beq227kHrkfpVygxh9995x3ImZgWdokbPzWOzcfbZxy4dk4nrEy2SaP3CmDDiw6npCernfI466DB3UK_Sk5c2ChPlgogIX6YKFNRQEsFMBCLVioBQsFsFAEC4VJCJZnZHp8NBmeBK7FRiDh4M4DVrAq1kpVB0qXGnR9xKOMy1gJFScykwwM2AT0OoxVXCRcS14pbKoqCpZEZRk_J1t1U-sXhIY6kwrbQ2ZZwsqESRGKQjMN8wrJY7lLdmBpLpe2iMqlW7BdQrulWg_ZXHneTXn5xxdfkYcb8LwmW7BGeo88KG-uZ-3XfSOunxjWaf0 |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Modular+Construction+and+Optimization+of+the+UZP+Sparse+Format+for+SpMV+on+CPUs&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Rodr%C3%ADguez-Iglesias%2C+Alonso&rft.au=Tongli%2C+Santoshkumar+T.&rft.au=Tucker%2C+Emily&rft.au=Pouchet%2C+Louis-No%C3%ABl&rft.date=2025-06-10&rft.pub=ACM&rft.eissn=2475-1421&rft.volume=9&rft.issue=PLDI&rft.spage=2106&rft.epage=2130&rft_id=info:doi/10.1145%2F3729335&rft.externalDocID=3729335 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon |