Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs

Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In th...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Proceedings of ACM on programming languages Ročník 9; číslo PLDI; s. 2106 - 2130
Hlavní autori:	Rodríguez-Iglesias, Alonso, Tongli, Santoshkumar T., Tucker, Emily, Pouchet, Louis-Noël, Rodríguez, Gabriel, Touriño, Juan
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	New York, NY, USA ACM 10.06.2025
Predmet:	General and reference Theory of computation Vector / streaming algorithms SIMD vectorization code generation polyhedral compilation sparse format sparse linear algebra
ISSN:	2475-1421, 2475-1421
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Abstract	Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In this work we present UZP, a new sparse format based on polyhedral sets of integer points. UZP is a flexible format that subsumes CSR, COO, DIA, BCSR, etc., by raising them to a common mathematical abstraction: a union of integer polyhedra, each intersected with an affine lattice. We present a modular approach to building and optimizing UZP: it captures equivalence classes for the sparse structure, enabling the tuning of the representation for target-specific and application-specific performance considerations. UZP is built from any input sparse structure using integer coordinates, and is interoperable with existing software using CSR and COO data layout. We provide detailed performance evaluation of UZP on 200+ matrices from SuiteSparse, demonstrating how simple and mostly unoptimized generic executors for UZP can already achieve solid performance by exploiting Z-polyhedra structures.
AbstractList	Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In this work we present UZP, a new sparse format based on polyhedral sets of integer points. UZP is a flexible format that subsumes CSR, COO, DIA, BCSR, etc., by raising them to a common mathematical abstraction: a union of integer polyhedra, each intersected with an affine lattice. We present a modular approach to building and optimizing UZP: it captures equivalence classes for the sparse structure, enabling the tuning of the representation for target-specific and application-specific performance considerations. UZP is built from any input sparse structure using integer coordinates, and is interoperable with existing software using CSR and COO data layout. We provide detailed performance evaluation of UZP on 200+ matrices from SuiteSparse, demonstrating how simple and mostly unoptimized generic executors for UZP can already achieve solid performance by exploiting Z-polyhedra structures. Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In this work we present UZP, a new sparse format based on polyhedral sets of integer points. UZP is a flexible format that subsumes CSR, COO, DIA, BCSR, etc., by raising them to a common mathematical abstraction: a union of integer polyhedra, each intersected with an affine lattice. We present a modular approach to building and optimizing UZP: it captures equivalence classes for the sparse structure, enabling the tuning of the representation for target-specific and application-specific performance considerations. UZP is built from any input sparse structure using integer coordinates, and is interoperable with existing software using CSR and COO data layout. We provide detailed performance evaluation of UZP on 200+ matrices from SuiteSparse, demonstrating how simple and mostly unoptimized generic executors for UZP can already achieve solid performance by exploiting Z-polyhedra structures.
ArticleNumber	232
Author	Tucker, Emily Touriño, Juan Pouchet, Louis-Noël Rodríguez-Iglesias, Alonso Tongli, Santoshkumar T. Rodríguez, Gabriel
Author_xml	– sequence: 1 givenname: Alonso orcidid: 0000-0002-5982-9118 surname: Rodríguez-Iglesias fullname: Rodríguez-Iglesias, Alonso email: alonso.rodriguez@udc.es organization: Universidade da Coruña, a Coruña, Spain – sequence: 2 givenname: Santoshkumar T. orcidid: 0009-0005-3147-2179 surname: Tongli fullname: Tongli, Santoshkumar T. email: Santoshkumar.T@colostate.edu organization: Colorado State University, Fort Collins, USA – sequence: 3 givenname: Emily orcidid: 0009-0002-1447-5683 surname: Tucker fullname: Tucker, Emily email: Emily.Tucker@colostate.edu organization: Colorado State University, Fort Collins, USA – sequence: 4 givenname: Louis-Noël orcidid: 0000-0001-5103-3097 surname: Pouchet fullname: Pouchet, Louis-Noël email: pouchet@colostate.edu organization: Colorado State University, Fort Collins, USA – sequence: 5 givenname: Gabriel orcidid: 0000-0002-0338-3655 surname: Rodríguez fullname: Rodríguez, Gabriel email: gabriel.rodriguez@udc.es organization: Universidade da Coruña, a Coruña, Spain – sequence: 6 givenname: Juan orcidid: 0000-0001-9670-1933 surname: Touriño fullname: Touriño, Juan email: juan.tourino@udc.es organization: Universidade da Coruña, a Coruña, Spain
BookMark	eNpNkE1Lw0AYhBepYK3Fu6e9eYrud7NHCbYKLSloPHgJb_cDI0227KYH_fVGW8XTDDMPc5hzNOpC5xC6pOSGUiFv-YxpzuUJGjMxkxkVjI7--TM0TemdEEI1FznXY1Sugt1vIeIidKmPe9M3ocPQWVzu-qZtPuEnCB73bw5Xr2v8tIOYHJ6H2EKPfYhDsnrBA1Ssq3SBTj1sk5sedYKq-f1z8ZAty8VjcbfMgEolM7ERnjtrPbHOOEIpkyyXwK22XEEOQnGtiGBD56VW0oH0llAi9UYoZgyfoOvDrokhpeh8vYtNC_GjpqT-vqI-XjGQVwcSTPsH_ZZf7iRY_A
Cites_doi	10.1145/1583991.1584053 10.1145/2833179.2833183 10.1109/SUPERC.1994.344269 10.1145/2049662.2049663 10.1145/2838734 10.1109/SC41404.2022.00071 10.1145/3559009.3569668 10.1109/SC.2018.00065 10.1145/2751205.2751209 10.1145/224170.224420 10.1007/978-3-642-37658-0_5 10.1109/JPROC.2018.2857721 10.1145/2751205.2751244 10.1145/3276493 10.1145/3168818 10.1145/3520484 10.5281/zenodo.15240673 10.1145/1183401.1183444 10.1109/SC.2002.10025 10.1007/978-3-642-15582-6_49 10.1145/1654059.1654078 10.1109/SC.2016.40 10.1145/3126908.3126936 10.1016/0743-7315(90)90129-D 10.1145/169627.169752 10.1145/1375581.1375595 10.1109/TC.2018.2853747 10.1145/3314221.3314615 10.1007/3-540-57502-2_42 10.1145/3581784.3607097 10.1109/CGO51591.2021.9370308 10.48550/arXiv.2105.04937 10.1145/1837853.1693471 10.1145/2854038.2854056 10.1145/223428.207157 10.1109/IPDPS.2008.4536313 10.1145/3133901 10.1109/SC41404.2022.00037 10.1145/3293883.3295712 10.1145/1362622.1362674 10.1145/3295500.3356216 10.1145/1229428.1229478 10.1109/PACT.2004.1342537 10.1145/2688500.2688515 10.1145/3591302
ContentType	Journal Article
Copyright	Owner/Author
Copyright_xml	– notice: Owner/Author
DBID	AAYXX CITATION
DOI	10.1145/3729335
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2475-1421
EndPage	2130
ExternalDocumentID	10_1145_3729335 3729335
GrantInformation_xml	– fundername: MICIU grantid: PID2022-136435NB-I00, FPU2022/01651 – fundername: US National Science Foundation grantid: 2009020
GroupedDBID	AAKMM AAYFX ACM AEFXT AEJOY AIKLT AKRVB ALMA_UNASSIGNED_HOLDINGS GUFHI LHSKQ M~E OK1 ROL AAYXX CITATION
ID	FETCH-LOGICAL-a1565-4b4f3eddf0dece01125285a3d9d36a8a46396042ce0f5965ea5fd01059b462cc3
ISSN	2475-1421
IngestDate	Sat Nov 29 07:43:35 EST 2025 Mon Aug 18 16:40:35 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	PLDI
Keywords	SIMD vectorization code generation polyhedral compilation sparse format sparse linear algebra
Language	English
License	This work is licensed under Creative Commons Attribution International 4.0.
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-a1565-4b4f3eddf0dece01125285a3d9d36a8a46396042ce0f5965ea5fd01059b462cc3
ORCID	0000-0002-0338-3655 0009-0002-1447-5683 0000-0001-9670-1933 0009-0005-3147-2179 0000-0001-5103-3097 0000-0002-5982-9118
OpenAccessLink	https://dl.acm.org/doi/10.1145/3729335
PageCount	25
ParticipantIDs	crossref_primary_10_1145_3729335 acm_primary_3729335
PublicationCentury	2000
PublicationDate	20250610 2025-06-10
PublicationDateYYYYMMDD	2025-06-10
PublicationDate_xml	– month: 06 year: 2025 text: 20250610 day: 10
PublicationDecade	2020
PublicationPlace	New York, NY, USA
PublicationPlace_xml	– name: New York, NY, USA
PublicationTitle	Proceedings of ACM on programming languages
PublicationTitleAbbrev	ACM PACMPL
PublicationYear	2025
Publisher	ACM
Publisher_xml	– name: ACM
References	S. Sharma, R. Ponnusamy, B. Moon, Y.-S. Hwang, R. Das, and J. Saltz. 1994. Run-time and Compile-time Support for Adaptive Irregular Problems. In ACM/IEEE Conference on Supercomputing, SC. Washington, DC, USA. 97–106. https://doi.org/10.1109/SUPERC.1994.344269 10.1109/SUPERC.1994.344269 Uday Bondhugula, Albert Hartono, J. Ramanujan, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Program Optimization System. In 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. Tucson, AZ, USA. 101–113. https://doi.org/10.1145/1375581.1375595 10.1145/1375581.1375595 M.M. Strout, G. George, and C. Olschanowsky. 2012. Set and Relation Manipulation for the Sparse Polyhedral Framework. In 25th International Workshop on Languages and Compilers for Parallel Computing, LCPC. Tokyo, Japan. 61–75. https://doi.org/10.1007/978-3-642-37658-0_5 10.1007/978-3-642-37658-0_5 A. Sukumaran-Rajam and P. Clauss. 2016. The Polyhedral Model of Nonlinear Loops. ACM Trans. Archit. Code Optim., 12, 4 (2016), https://doi.org/10.1145/2838734 10.1145/2838734 J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. 1990. Run-time Scheduling and Execution of Loops on Message Passing Machines. J. Parallel Distrib. Comput., 8, 4 (1990), 303–312. https://doi.org/10.1016/0743-7315(90)90129-D 10.1016/0743-7315(90)90129-D Zhen Du, Jiajia Li, Yinshan Wang, Xueqi Li, Guangming Tan, and Ninghui Sun. 2022. AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC41404.2022.00071 10.1109/SC41404.2022.00071 L.-N. Pouchet. 2011. PolyBench: The Polyhedral Benchmarking suite, version PolyBench/C 4.2.1. http://polybench.sf.net Aydin Buluç and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In 2008 IEEE International Symposium on Parallel and Distributed Processing, PDP. Miami, FL, USA. https://doi.org/10.1109/IPDPS.2008.4536313 10.1109/IPDPS.2008.4536313 R.W. Vuduc. 2004. Automatic Performance Tuning of Sparse Matrix Kernels. Ph. D. Dissertation. University of California. Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In 5th Workshop on Irregular Applications: Architectures and Algorithms, IA³. Austin, TX, USA. https://doi.org/10.1145/2833179.2833183 10.1145/2833179.2833183 Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In 29th ACM on International Conference on Supercomputing, ICS. Newport Beach, CA, USA. 339–350. https://doi.org/10.1145/2751205.2751208 10.1145/2751205.2751208 Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: efficient vectorization of SpMV on x86 processors. In International Symposium on Code Generation and Optimization, CGO. Vienna, Austria. 149–162. https://doi.org/10.1145/3168818 10.1145/3168818 Lucas Wilkinson, Kazem Cheshmi, and Maryam Mehri Dehnavi. 2023. Register Tiling for Unstructured Sparsity in Neural Network Inference. Proceedings of the ACM on Programming Languages, 7, PLDI (2023), 1995–2020. https://doi.org/10.1145/3591302 10.1145/3591302 K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2017. Sympiler: transforming sparse matrix codes by decoupling symbolic analysis. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, USA. https://doi.org/10.1145/3126908.3126936 10.1145/3126908.3126936 Marcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez, and Juan Touriño. 2022. Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations. In International Conference on Parallel Architectures and Compilation Techniques, PACT. Chicago, IL, USA. 160–171. isbn:9781450398688 https://doi.org/10.1145/3559009.3569668 10.1145/3559009.3569668 Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), https://doi.org/10.1145/2049662.2049663 10.1145/2049662.2049663 A. LaMielle and M. Strout. 2010. Enabling Code Generation within the Sparse Polyhedral Framework. Colorado State University. https://www.cs.colostate.edu/TechReports/Reports/2010/tr10-102.pdf Philipp Herholz, Xuan Tang, Teseo Schneider, Shoaib Kamil, Daniele Panozzo, and Olga Sorkine-Hornung. 2022. Sparsity-Specific Code Optimization using Expression Trees. ACM Trans. Graph., 41, 5 (2022), issn:0730-0301 https://doi.org/10.1145/3520484 10.1145/3520484 R. Das, P. Havlak, J. Saltz, and K. Kennedy. 1995. Index Array Flattening Through Program Transformation. In ACM/IEEE Supercomputing Conference, SC. San Diego, CA, USA. https://doi.org/10.1145/224170.224420 10.1145/224170.224420 M. Ravishankar, R. Dathathri, V. Elango, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. 2015. Distributed Memory Code Generation for Mixed Irregular/Regular Computations. In 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. San Francisco, CA, USA. 65–75. https://doi.org/10.1145/2688500.2688515 10.1145/2688500.2688515 Naser Sedaghati, Te Mu, Louis-Noel Pouchet, Srinivasan Parthasarathy, and P. Sadayappan. 2015. Automatic selection of sparse matrix representation on GPUs. In 29th ACM on International Conference on Supercomputing, ICS. Newport Beach, CA, USA. 99–108. https://doi.org/10.1145/2751205.2751244 10.1145/2751205.2751244 A. Venkat, M.S. Mohammadi, J. Park, H. Rong, R. Barik, M.M. Strout, and M. Hall. 2016. Automating Wavefront Parallelization for Sparse Matrix Computations. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Salt Lake City, UT, USA. https://doi.org/10.1109/SC.2016.40 10.1109/SC.2016.40 Sven Verdoolaege. 2010. ISL: An integer set library for the polyhedral model. In 3rd International Congress on Mathematical Software, ICMS. Kobe, Japan. 299–302. https://doi.org/10.1007/978-3-642-15582-6_49 10.1007/978-3-642-15582-6_49 Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization, CGO. Seoul, South Korea. 2–14. https://doi.org/10.1109/CGO51591.2021.9370308 10.1109/CGO51591.2021.9370308 S. Chou, F. Kjolstad, and S. Amarasinghe. 2018. Format abstraction for sparse tensor algebra compilers. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), https://doi.org/10.1145/3276493 10.1145/3276493 G. Rodríguez, M. T. Kandemir, and J. Touriño. 2019. Affine Modeling of Program Traces. IEEE Trans. Comput., 68, 2 (2019), 294–300. https://doi.org/10.1109/TC.2018.2853747 10.1109/TC.2018.2853747 Takeshi Fukaya, Koki Ishida, Akie Miura, Takeshi Iwashita, and Hiroshi Nakashima. 2021. Accelerating the SpMV kernel on standard CPUs by exploiting the partially diagonal structures. arXiv preprint arXiv:2105.04937, https://doi.org/10.48550/arXiv.2105.04937 10.48550/arXiv.2105.04937 Gautam Gupta and Sanjay Rajopadhye. 2007. The Z-Polyhedral Model. In 12th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP. San Jose, CA, USA. 237–248. https://doi.org/10.1145/1229428.1229478 10.1145/1229428.1229478 L.-N. Pouchet, G. Rodriguez, and colleagues. 2025. The UZP sparse format. https://github.com/UDC-GAC/uzp-sparse-format N. Bell and M. Garland. 2009. Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. In ACM/IEEE Conference on High Performance Computing, SC. Portland, OR, USA. https://doi.org/10.1145/1654059.1654078 10.1145/1654059.1654078 L.-N. Pouchet, G. Rodriguez, and colleagues. 2025. Artifact for PLDI’25 Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs. https://doi.org/10.5281/zenodo.15240673 10.5281/zenodo.15240673 Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. 2019. Generating Piecewise-Regular Code from Irregular Structures. In 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. Phoenix, AZ, USA. 625–639. isbn:9781450367127 https://doi.org/10.1145/3314221.3314615 10.1145/3314221.3314615 K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2018. ParSy: inspection and transformation of sparse matrix computations for parallelism. In International Conference for High Performance Computing, Networking, Storage, and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC.2018.00065 10.1109/SC.2018.00065 S. Williams, L. Oliker, R.W. Vuduc, J. Shalf, K.A. Yelick, and J. Demmel. 2009. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Parallel Comput., 35, 3 (2009), 178–194. https://doi.org/10.1145/1362622.1362674 10.1145/1362622.1362674 Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The sparse polyhedral framework: Composing compiler-generated inspector-executor code. Proc. IEEE, 106, 11 (2018), 1921–1934. https://doi.org/10.1109/JPROC.2018.2857721 10.1109/JPROC.2018.2857721 G. Agrawal, J. Saltz, and R. Das. 1995. Interprocedural Partial Redundancy Elimination and its Application to Distributed Memory Compilation. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. La Jolla, CA, USA. 258–269. https://doi.org/10.1145/223428.207157 10.1145/223428.207157 R. Ponnusamy, J.H. Saltz, and A.N. Choudhary. 1993. Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse. In ACM/IEEE Conference on Supercomputing, SC. Portland, OR, USA. 361–370. https://doi.org/10.1145/169627.169752 10.1145/169627.169752 J.W. Choi, A. Singh, and R.W. Vuduc. 2010. Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs. In 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. Ba e_1_2_2_4_1 e_1_2_2_24_1 e_1_2_2_49_1 e_1_2_2_6_1 e_1_2_2_22_1 e_1_2_2_20_1 e_1_2_2_2_1 e_1_2_2_41_1 e_1_2_2_43_1 e_1_2_2_8_1 e_1_2_2_28_1 e_1_2_2_26_1 e_1_2_2_47_1 Vuduc R.W. (e_1_2_2_45_1) e_1_2_2_13_1 e_1_2_2_38_1 e_1_2_2_11_1 e_1_2_2_30_1 e_1_2_2_19_1 e_1_2_2_32_1 e_1_2_2_17_1 e_1_2_2_34_1 e_1_2_2_15_1 e_1_2_2_36_1 e_1_2_2_25_1 e_1_2_2_48_1 e_1_2_2_5_1 e_1_2_2_23_1 e_1_2_2_7_1 e_1_2_2_21_1 e_1_2_2_1_1 e_1_2_2_3_1 e_1_2_2_40_1 e_1_2_2_42_1 e_1_2_2_9_1 e_1_2_2_29_1 e_1_2_2_44_1 e_1_2_2_27_1 e_1_2_2_46_1 e_1_2_2_14_1 e_1_2_2_37_1 e_1_2_2_12_1 e_1_2_2_39_1 e_1_2_2_10_1 e_1_2_2_31_1 e_1_2_2_18_1 e_1_2_2_33_1 e_1_2_2_16_1 e_1_2_2_35_1 e_1_2_2_50_1
References_xml	– reference: Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In 29th ACM on International Conference on Supercomputing, ICS. Newport Beach, CA, USA. 339–350. https://doi.org/10.1145/2751205.2751208 10.1145/2751205.2751208 – reference: L.-N. Pouchet, G. Rodriguez, and colleagues. 2025. Artifact for PLDI’25 Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs. https://doi.org/10.5281/zenodo.15240673 10.5281/zenodo.15240673 – reference: S. Chou, F. Kjolstad, and S. Amarasinghe. 2018. Format abstraction for sparse tensor algebra compilers. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), https://doi.org/10.1145/3276493 10.1145/3276493 – reference: S. Williams, L. Oliker, R.W. Vuduc, J. Shalf, K.A. Yelick, and J. Demmel. 2009. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Parallel Comput., 35, 3 (2009), 178–194. https://doi.org/10.1145/1362622.1362674 10.1145/1362622.1362674 – reference: F. Kjolstad, S. Kamil, S. Chou, D. Lugato, and S. Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), https://doi.org/10.1145/3133901 10.1145/3133901 – reference: C. Bastoul. 2004. Code Generation in the Polyhedral Model Is Easier Than You Think. In 13th International Conference on Parallel Architectures and Compilation Techniques, PACT. Antibes, France. 7–16. https://doi.org/10.1109/PACT.2004.1342537 10.1109/PACT.2004.1342537 – reference: Philipp Herholz, Xuan Tang, Teseo Schneider, Shoaib Kamil, Daniele Panozzo, and Olga Sorkine-Hornung. 2022. Sparsity-Specific Code Optimization using Expression Trees. ACM Trans. Graph., 41, 5 (2022), issn:0730-0301 https://doi.org/10.1145/3520484 10.1145/3520484 – reference: Kazem Cheshmi, Michelle Strout, and Maryam Mehri Dehnavi. 2023. Runtime Composition of Iterations for Fusing Loop-carried Sparse Dependence. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, USA. isbn:9798400701092 https://doi.org/10.1145/3581784.3607097 10.1145/3581784.3607097 – reference: M. Ravishankar, R. Dathathri, V. Elango, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. 2015. Distributed Memory Code Generation for Mixed Irregular/Regular Computations. In 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. San Francisco, CA, USA. 65–75. https://doi.org/10.1145/2688500.2688515 10.1145/2688500.2688515 – reference: Jeremiah Willcock and Andrew Lumsdaine. 2006. Accelerating sparse matrix computations via data compression. In 20th Annual International Conference on Supercomputing, ICS. Cairns, QLD, Australia. 307–316. isbn:1595932828 https://doi.org/10.1145/1183401.1183444 10.1145/1183401.1183444 – reference: R. Ponnusamy, J.H. Saltz, and A.N. Choudhary. 1993. Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse. In ACM/IEEE Conference on Supercomputing, SC. Portland, OR, USA. 361–370. https://doi.org/10.1145/169627.169752 10.1145/169627.169752 – reference: Sven Verdoolaege. 2010. ISL: An integer set library for the polyhedral model. In 3rd International Congress on Mathematical Software, ICMS. Kobe, Japan. 299–302. https://doi.org/10.1007/978-3-642-15582-6_49 10.1007/978-3-642-15582-6_49 – reference: G. Rodríguez, J. M. Andión, M. T. Kandemir, and J. Touriño. 2016. Trace-based Affine Reconstruction of Codes. In 14th International Symposium on Code Generation and Optimization, CGO. Barcelona, Spain. 139–149. https://doi.org/10.1145/2854038.2854056 10.1145/2854038.2854056 – reference: Kazem Cheshmi. 2023. Partially Strided Codelet GitHub repository. https://github.com/sparse-specialize/partially-strided-codelet Commit: c03d0593411c8afc9c6861de152695c453358a04 – reference: Aydin Buluç and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In 2008 IEEE International Symposium on Parallel and Distributed Processing, PDP. Miami, FL, USA. https://doi.org/10.1109/IPDPS.2008.4536313 10.1109/IPDPS.2008.4536313 – reference: R. von Hanxleden, K. Kennedy, C. Koelbel, R. Das, and J. Saltz. 1992. Compiler analysis for irregular problems in Fortran D. In 6th International Workshop on Languages and Compilers for Parallel Computing, LCPC. New Haven, CT, USA. 97–111. https://doi.org/10.1007/3-540-57502-2_42 10.1007/3-540-57502-2_42 – reference: K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2018. ParSy: inspection and transformation of sparse matrix computations for parallelism. In International Conference for High Performance Computing, Networking, Storage, and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC.2018.00065 10.1109/SC.2018.00065 – reference: Naser Sedaghati, Te Mu, Louis-Noel Pouchet, Srinivasan Parthasarathy, and P. Sadayappan. 2015. Automatic selection of sparse matrix representation on GPUs. In 29th ACM on International Conference on Supercomputing, ICS. Newport Beach, CA, USA. 99–108. https://doi.org/10.1145/2751205.2751244 10.1145/2751205.2751244 – reference: Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization, CGO. Seoul, South Korea. 2–14. https://doi.org/10.1109/CGO51591.2021.9370308 10.1109/CGO51591.2021.9370308 – reference: Uday Bondhugula, Albert Hartono, J. Ramanujan, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Program Optimization System. In 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. Tucson, AZ, USA. 101–113. https://doi.org/10.1145/1375581.1375595 10.1145/1375581.1375595 – reference: Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Prasant Singh Rawat, Sriram Krishnamoorthy, and P. Sadayappan. 2019. An efficient mixed-mode representation of sparse tensors. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, USA. https://doi.org/10.1145/3295500.3356216 10.1145/3295500.3356216 – reference: L.-N. Pouchet. 2011. PolyBench: The Polyhedral Benchmarking suite, version PolyBench/C 4.2.1. http://polybench.sf.net – reference: A. LaMielle and M. Strout. 2010. Enabling Code Generation within the Sparse Polyhedral Framework. Colorado State University. https://www.cs.colostate.edu/TechReports/Reports/2010/tr10-102.pdf – reference: Rich Vuduc, James W Demmel, Katherine A Yelick, Shoaib Kamil, Rajesh Nishtala, and Benjamin Lee. 2002. Performance optimizations and bounds for sparse matrix-vector multiply. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, SC. Baltimore, MD, USA. https://doi.org/10.1109/SC.2002.10025 10.1109/SC.2002.10025 – reference: S. Sharma, R. Ponnusamy, B. Moon, Y.-S. Hwang, R. Das, and J. Saltz. 1994. Run-time and Compile-time Support for Adaptive Irregular Problems. In ACM/IEEE Conference on Supercomputing, SC. Washington, DC, USA. 97–106. https://doi.org/10.1109/SUPERC.1994.344269 10.1109/SUPERC.1994.344269 – reference: Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The sparse polyhedral framework: Composing compiler-generated inspector-executor code. Proc. IEEE, 106, 11 (2018), 1921–1934. https://doi.org/10.1109/JPROC.2018.2857721 10.1109/JPROC.2018.2857721 – reference: J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. 1990. Run-time Scheduling and Execution of Loops on Message Passing Machines. J. Parallel Distrib. Comput., 8, 4 (1990), 303–312. https://doi.org/10.1016/0743-7315(90)90129-D 10.1016/0743-7315(90)90129-D – reference: K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2017. Sympiler: transforming sparse matrix codes by decoupling symbolic analysis. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, USA. https://doi.org/10.1145/3126908.3126936 10.1145/3126908.3126936 – reference: R. Das, P. Havlak, J. Saltz, and K. Kennedy. 1995. Index Array Flattening Through Program Transformation. In ACM/IEEE Supercomputing Conference, SC. San Diego, CA, USA. https://doi.org/10.1145/224170.224420 10.1145/224170.224420 – reference: Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: efficient vectorization of SpMV on x86 processors. In International Symposium on Code Generation and Optimization, CGO. Vienna, Austria. 149–162. https://doi.org/10.1145/3168818 10.1145/3168818 – reference: Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive sparse tiling for sparse matrix multiplication. In 24th Symposium on Principles and Practice of Parallel Programming, PPoPP. Washington, DC, USA. 300–314. https://doi.org/10.1145/3293883.3295712 10.1145/3293883.3295712 – reference: Aydin Buluç, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, and Charles E. Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In 21st Annual Symposium on Parallelism in Algorithms and Architectures, SPAA. Calgary, AB, Canada. 233–244. https://doi.org/10.1145/1583991.1584053 10.1145/1583991.1584053 – reference: G. Rodríguez, M. T. Kandemir, and J. Touriño. 2019. Affine Modeling of Program Traces. IEEE Trans. Comput., 68, 2 (2019), 294–300. https://doi.org/10.1109/TC.2018.2853747 10.1109/TC.2018.2853747 – reference: A. Venkat, M.S. Mohammadi, J. Park, H. Rong, R. Barik, M.M. Strout, and M. Hall. 2016. Automating Wavefront Parallelization for Sparse Matrix Computations. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Salt Lake City, UT, USA. https://doi.org/10.1109/SC.2016.40 10.1109/SC.2016.40 – reference: G. Agrawal, J. Saltz, and R. Das. 1995. Interprocedural Partial Redundancy Elimination and its Application to Distributed Memory Compilation. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. La Jolla, CA, USA. 258–269. https://doi.org/10.1145/223428.207157 10.1145/223428.207157 – reference: Gautam Gupta and Sanjay Rajopadhye. 2007. The Z-Polyhedral Model. In 12th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP. San Jose, CA, USA. 237–248. https://doi.org/10.1145/1229428.1229478 10.1145/1229428.1229478 – reference: Takeshi Fukaya, Koki Ishida, Akie Miura, Takeshi Iwashita, and Hiroshi Nakashima. 2021. Accelerating the SpMV kernel on standard CPUs by exploiting the partially diagonal structures. arXiv preprint arXiv:2105.04937, https://doi.org/10.48550/arXiv.2105.04937 10.48550/arXiv.2105.04937 – reference: L.-N. Pouchet, G. Rodriguez, and colleagues. 2025. The UZP sparse format. https://github.com/UDC-GAC/uzp-sparse-format – reference: M.M. Strout, G. George, and C. Olschanowsky. 2012. Set and Relation Manipulation for the Sparse Polyhedral Framework. In 25th International Workshop on Languages and Compilers for Parallel Computing, LCPC. Tokyo, Japan. 61–75. https://doi.org/10.1007/978-3-642-37658-0_5 10.1007/978-3-642-37658-0_5 – reference: Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. 2019. Generating Piecewise-Regular Code from Irregular Structures. In 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. Phoenix, AZ, USA. 625–639. isbn:9781450367127 https://doi.org/10.1145/3314221.3314615 10.1145/3314221.3314615 – reference: Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), https://doi.org/10.1145/2049662.2049663 10.1145/2049662.2049663 – reference: R.W. Vuduc. 2004. Automatic Performance Tuning of Sparse Matrix Kernels. Ph. D. Dissertation. University of California. – reference: Lucas Wilkinson, Kazem Cheshmi, and Maryam Mehri Dehnavi. 2023. Register Tiling for Unstructured Sparsity in Neural Network Inference. Proceedings of the ACM on Programming Languages, 7, PLDI (2023), 1995–2020. https://doi.org/10.1145/3591302 10.1145/3591302 – reference: N. Bell and M. Garland. 2009. Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. In ACM/IEEE Conference on High Performance Computing, SC. Portland, OR, USA. https://doi.org/10.1145/1654059.1654078 10.1145/1654059.1654078 – reference: Kazem Cheshmi, Zachary Cetinic, and Maryam Mehri Dehnavi. 2022. Vectorizing sparse matrix computations with partially-strided codelets. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC41404.2022.00037 10.1109/SC41404.2022.00037 – reference: Zhen Du, Jiajia Li, Yinshan Wang, Xueqi Li, Guangming Tan, and Ninghui Sun. 2022. AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC41404.2022.00071 10.1109/SC41404.2022.00071 – reference: Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In 5th Workshop on Irregular Applications: Architectures and Algorithms, IA³. Austin, TX, USA. https://doi.org/10.1145/2833179.2833183 10.1145/2833179.2833183 – reference: A. Sukumaran-Rajam and P. Clauss. 2016. The Polyhedral Model of Nonlinear Loops. ACM Trans. Archit. Code Optim., 12, 4 (2016), https://doi.org/10.1145/2838734 10.1145/2838734 – reference: J.W. Choi, A. Singh, and R.W. Vuduc. 2010. Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs. In 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. Bangalore, India. 115–126. https://doi.org/10.1145/1837853.1693471 10.1145/1837853.1693471 – reference: Marcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez, and Juan Touriño. 2022. Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations. In International Conference on Parallel Architectures and Compilation Techniques, PACT. Chicago, IL, USA. 160–171. isbn:9781450398688 https://doi.org/10.1145/3559009.3569668 10.1145/3559009.3569668 – ident: e_1_2_2_6_1 doi: 10.1145/1583991.1584053 – ident: e_1_2_2_38_1 doi: 10.1145/2833179.2833183 – ident: e_1_2_2_37_1 doi: 10.1109/SUPERC.1994.344269 – ident: e_1_2_2_16_1 doi: 10.1145/2049662.2049663 – ident: e_1_2_2_41_1 doi: 10.1145/2838734 – ident: e_1_2_2_17_1 doi: 10.1109/SC41404.2022.00071 – ident: e_1_2_2_31_1 – ident: e_1_2_2_22_1 doi: 10.1145/3559009.3569668 – ident: e_1_2_2_11_1 doi: 10.1109/SC.2018.00065 – ident: e_1_2_2_26_1 doi: 10.1145/2751205.2751209 – ident: e_1_2_2_29_1 – ident: e_1_2_2_15_1 doi: 10.1145/224170.224420 – ident: e_1_2_2_39_1 doi: 10.1007/978-3-642-37658-0_5 – ident: e_1_2_2_40_1 doi: 10.1109/JPROC.2018.2857721 – ident: e_1_2_2_36_1 doi: 10.1145/2751205.2751244 – ident: e_1_2_2_14_1 doi: 10.1145/3276493 – ident: e_1_2_2_50_1 doi: 10.1145/3168818 – ident: e_1_2_2_20_1 doi: 10.1145/3520484 – ident: e_1_2_2_30_1 doi: 10.5281/zenodo.15240673 – ident: e_1_2_2_48_1 doi: 10.1145/1183401.1183444 – ident: e_1_2_2_46_1 doi: 10.1109/SC.2002.10025 – ident: e_1_2_2_43_1 doi: 10.1007/978-3-642-15582-6_49 – ident: e_1_2_2_4_1 doi: 10.1145/1654059.1654078 – ident: e_1_2_2_42_1 doi: 10.1109/SC.2016.40 – ident: e_1_2_2_10_1 doi: 10.1145/3126908.3126936 – ident: e_1_2_2_35_1 doi: 10.1016/0743-7315(90)90129-D – ident: e_1_2_2_28_1 doi: 10.1145/169627.169752 – ident: e_1_2_2_5_1 doi: 10.1145/1375581.1375595 – ident: e_1_2_2_34_1 doi: 10.1109/TC.2018.2853747 – ident: e_1_2_2_2_1 doi: 10.1145/3314221.3314615 – ident: e_1_2_2_44_1 doi: 10.1007/3-540-57502-2_42 – ident: e_1_2_2_12_1 doi: 10.1145/3581784.3607097 – ident: e_1_2_2_25_1 doi: 10.1109/CGO51591.2021.9370308 – ident: e_1_2_2_18_1 doi: 10.48550/arXiv.2105.04937 – ident: e_1_2_2_13_1 doi: 10.1145/1837853.1693471 – ident: e_1_2_2_33_1 doi: 10.1145/2854038.2854056 – ident: e_1_2_2_1_1 doi: 10.1145/223428.207157 – ident: e_1_2_2_7_1 doi: 10.1109/IPDPS.2008.4536313 – ident: e_1_2_2_23_1 doi: 10.1145/3133901 – ident: e_1_2_2_9_1 doi: 10.1109/SC41404.2022.00037 – ident: e_1_2_2_21_1 doi: 10.1145/3293883.3295712 – ident: e_1_2_2_49_1 doi: 10.1145/1362622.1362674 – ident: e_1_2_2_8_1 – ident: e_1_2_2_27_1 doi: 10.1145/3295500.3356216 – ident: e_1_2_2_19_1 doi: 10.1145/1229428.1229478 – volume-title: Automatic Performance Tuning of Sparse Matrix Kernels. Ph. D. Dissertation ident: e_1_2_2_45_1 – ident: e_1_2_2_24_1 – ident: e_1_2_2_3_1 doi: 10.1109/PACT.2004.1342537 – ident: e_1_2_2_32_1 doi: 10.1145/2688500.2688515 – ident: e_1_2_2_47_1 doi: 10.1145/3591302
SSID	ssj0001934839
Score	2.2942047
Snippet	Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific...
SourceID	crossref acm
SourceType	Index Database Publisher
StartPage	2106
SubjectTerms	General and reference Theory of computation Vector / streaming algorithms
SubjectTermsDisplay	General and reference -- Performance Theory of computation -- Vector / streaming algorithms
Title	Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs
URI	https://dl.acm.org/doi/10.1145/3729335
Volume	9
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2475-1421 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001934839 issn: 2475-1421 databaseCode: M~E dateStart: 20170101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Bb9MwFLbK4MAFxgAxYJMP3KrAkthJfCzdpk2iXaS1aOIyObEzKtqkIus0ceBv8Hd5z3ZawzjAgUsUxU5a-X16fn7ve-8R8kaHrCrDqoBjqi4DxrDkbRqLIOM6DVUaxWklTbOJdDzOLi5E3uv96HJhbuZpXWe3t2L5X0UNz0DYmDr7D-JefxQewD0IHa4gdrj-leBHjTLUUmzF2RWHNSGCM9AOC5d22VEDpp_y_vkSDre6f2ysV0M7PF-OPmIUYZhPW996zde7nSGADIYjnOUoXgt0OnTuzw2DvlE2GH94tdLfgtOruW5nNolsMId_2Gzc3JhQbJ3U9XXTfv6C5O_-5O1dBsgR-mTWOr1ZAe6sf6FZzdpg3Jjfez_3HRoRR-KVo7ZaCDqSvFGCEUt5EDKbRd1pbOEBM_9wePqLAj5IvM08Cm3U5-5GwbCmBsYsY1su5beq227kHrkfpVygxh9995x3ImZgWdokbPzWOzcfbZxy4dk4nrEy2SaP3CmDDiw6npCernfI466DB3UK_Sk5c2ChPlgogIX6YKFNRQEsFMBCLVioBQsFsFAEC4VJCJZnZHp8NBmeBK7FRiDh4M4DVrAq1kpVB0qXGnR9xKOMy1gJFScykwwM2AT0OoxVXCRcS14pbKoqCpZEZRk_J1t1U-sXhIY6kwrbQ2ZZwsqESRGKQjMN8wrJY7lLdmBpLpe2iMqlW7BdQrulWg_ZXHneTXn5xxdfkYcb8LwmW7BGeo88KG-uZ-3XfSOunxjWaf0
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Modular+Construction+and+Optimization+of+the+UZP+Sparse+Format+for+SpMV+on+CPUs&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Rodr%C3%ADguez-Iglesias%2C+Alonso&rft.au=Tongli%2C+Santoshkumar+T.&rft.au=Tucker%2C+Emily&rft.au=Pouchet%2C+Louis-No%C3%ABl&rft.date=2025-06-10&rft.pub=ACM&rft.eissn=2475-1421&rft.volume=9&rft.issue=PLDI&rft.spage=2106&rft.epage=2130&rft_id=info:doi/10.1145%2F3729335&rft.externalDocID=3729335
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon