Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs

Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In th...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings of ACM on programming languages Ročník 9; číslo PLDI; s. 2106 - 2130
Hlavní autori: Rodríguez-Iglesias, Alonso, Tongli, Santoshkumar T., Tucker, Emily, Pouchet, Louis-Noël, Rodríguez, Gabriel, Touriño, Juan
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York, NY, USA ACM 10.06.2025
Predmet:
ISSN:2475-1421, 2475-1421
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In this work we present UZP, a new sparse format based on polyhedral sets of integer points. UZP is a flexible format that subsumes CSR, COO, DIA, BCSR, etc., by raising them to a common mathematical abstraction: a union of integer polyhedra, each intersected with an affine lattice. We present a modular approach to building and optimizing UZP: it captures equivalence classes for the sparse structure, enabling the tuning of the representation for target-specific and application-specific performance considerations. UZP is built from any input sparse structure using integer coordinates, and is interoperable with existing software using CSR and COO data layout. We provide detailed performance evaluation of UZP on 200+ matrices from SuiteSparse, demonstrating how simple and mostly unoptimized generic executors for UZP can already achieve solid performance by exploiting Z-polyhedra structures.
AbstractList Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In this work we present UZP, a new sparse format based on polyhedral sets of integer points. UZP is a flexible format that subsumes CSR, COO, DIA, BCSR, etc., by raising them to a common mathematical abstraction: a union of integer polyhedra, each intersected with an affine lattice. We present a modular approach to building and optimizing UZP: it captures equivalence classes for the sparse structure, enabling the tuning of the representation for target-specific and application-specific performance considerations. UZP is built from any input sparse structure using integer coordinates, and is interoperable with existing software using CSR and COO data layout. We provide detailed performance evaluation of UZP on 200+ matrices from SuiteSparse, demonstrating how simple and mostly unoptimized generic executors for UZP can already achieve solid performance by exploiting Z-polyhedra structures.
Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific sparsity patterns, aiming to achieve higher performance for key numerical computations than more general-purpose formats such as CSR and COO. In this work we present UZP, a new sparse format based on polyhedral sets of integer points. UZP is a flexible format that subsumes CSR, COO, DIA, BCSR, etc., by raising them to a common mathematical abstraction: a union of integer polyhedra, each intersected with an affine lattice. We present a modular approach to building and optimizing UZP: it captures equivalence classes for the sparse structure, enabling the tuning of the representation for target-specific and application-specific performance considerations. UZP is built from any input sparse structure using integer coordinates, and is interoperable with existing software using CSR and COO data layout. We provide detailed performance evaluation of UZP on 200+ matrices from SuiteSparse, demonstrating how simple and mostly unoptimized generic executors for UZP can already achieve solid performance by exploiting Z-polyhedra structures.
ArticleNumber 232
Author Tucker, Emily
Touriño, Juan
Pouchet, Louis-Noël
Rodríguez-Iglesias, Alonso
Tongli, Santoshkumar T.
Rodríguez, Gabriel
Author_xml – sequence: 1
  givenname: Alonso
  orcidid: 0000-0002-5982-9118
  surname: Rodríguez-Iglesias
  fullname: Rodríguez-Iglesias, Alonso
  email: alonso.rodriguez@udc.es
  organization: Universidade da Coruña, a Coruña, Spain
– sequence: 2
  givenname: Santoshkumar T.
  orcidid: 0009-0005-3147-2179
  surname: Tongli
  fullname: Tongli, Santoshkumar T.
  email: Santoshkumar.T@colostate.edu
  organization: Colorado State University, Fort Collins, USA
– sequence: 3
  givenname: Emily
  orcidid: 0009-0002-1447-5683
  surname: Tucker
  fullname: Tucker, Emily
  email: Emily.Tucker@colostate.edu
  organization: Colorado State University, Fort Collins, USA
– sequence: 4
  givenname: Louis-Noël
  orcidid: 0000-0001-5103-3097
  surname: Pouchet
  fullname: Pouchet, Louis-Noël
  email: pouchet@colostate.edu
  organization: Colorado State University, Fort Collins, USA
– sequence: 5
  givenname: Gabriel
  orcidid: 0000-0002-0338-3655
  surname: Rodríguez
  fullname: Rodríguez, Gabriel
  email: gabriel.rodriguez@udc.es
  organization: Universidade da Coruña, a Coruña, Spain
– sequence: 6
  givenname: Juan
  orcidid: 0000-0001-9670-1933
  surname: Touriño
  fullname: Touriño, Juan
  email: juan.tourino@udc.es
  organization: Universidade da Coruña, a Coruña, Spain
BookMark eNpNkE1Lw0AYhBepYK3Fu6e9eYrud7NHCbYKLSloPHgJb_cDI0227KYH_fVGW8XTDDMPc5hzNOpC5xC6pOSGUiFv-YxpzuUJGjMxkxkVjI7--TM0TemdEEI1FznXY1Sugt1vIeIidKmPe9M3ocPQWVzu-qZtPuEnCB73bw5Xr2v8tIOYHJ6H2EKPfYhDsnrBA1Ssq3SBTj1sk5sedYKq-f1z8ZAty8VjcbfMgEolM7ERnjtrPbHOOEIpkyyXwK22XEEOQnGtiGBD56VW0oH0llAi9UYoZgyfoOvDrokhpeh8vYtNC_GjpqT-vqI-XjGQVwcSTPsH_ZZf7iRY_A
Cites_doi 10.1145/1583991.1584053
10.1145/2833179.2833183
10.1109/SUPERC.1994.344269
10.1145/2049662.2049663
10.1145/2838734
10.1109/SC41404.2022.00071
10.1145/3559009.3569668
10.1109/SC.2018.00065
10.1145/2751205.2751209
10.1145/224170.224420
10.1007/978-3-642-37658-0_5
10.1109/JPROC.2018.2857721
10.1145/2751205.2751244
10.1145/3276493
10.1145/3168818
10.1145/3520484
10.5281/zenodo.15240673
10.1145/1183401.1183444
10.1109/SC.2002.10025
10.1007/978-3-642-15582-6_49
10.1145/1654059.1654078
10.1109/SC.2016.40
10.1145/3126908.3126936
10.1016/0743-7315(90)90129-D
10.1145/169627.169752
10.1145/1375581.1375595
10.1109/TC.2018.2853747
10.1145/3314221.3314615
10.1007/3-540-57502-2_42
10.1145/3581784.3607097
10.1109/CGO51591.2021.9370308
10.48550/arXiv.2105.04937
10.1145/1837853.1693471
10.1145/2854038.2854056
10.1145/223428.207157
10.1109/IPDPS.2008.4536313
10.1145/3133901
10.1109/SC41404.2022.00037
10.1145/3293883.3295712
10.1145/1362622.1362674
10.1145/3295500.3356216
10.1145/1229428.1229478
10.1109/PACT.2004.1342537
10.1145/2688500.2688515
10.1145/3591302
ContentType Journal Article
Copyright Owner/Author
Copyright_xml – notice: Owner/Author
DBID AAYXX
CITATION
DOI 10.1145/3729335
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2475-1421
EndPage 2130
ExternalDocumentID 10_1145_3729335
3729335
GrantInformation_xml – fundername: MICIU
  grantid: PID2022-136435NB-I00, FPU2022/01651
– fundername: US National Science Foundation
  grantid: 2009020
GroupedDBID AAKMM
AAYFX
ACM
AEFXT
AEJOY
AIKLT
AKRVB
ALMA_UNASSIGNED_HOLDINGS
GUFHI
LHSKQ
M~E
OK1
ROL
AAYXX
CITATION
ID FETCH-LOGICAL-a1565-4b4f3eddf0dece01125285a3d9d36a8a46396042ce0f5965ea5fd01059b462cc3
ISSN 2475-1421
IngestDate Sat Nov 29 07:43:35 EST 2025
Mon Aug 18 16:40:35 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue PLDI
Keywords SIMD vectorization
code generation
polyhedral compilation
sparse format
sparse linear algebra
Language English
License This work is licensed under Creative Commons Attribution International 4.0.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a1565-4b4f3eddf0dece01125285a3d9d36a8a46396042ce0f5965ea5fd01059b462cc3
ORCID 0000-0002-0338-3655
0009-0002-1447-5683
0000-0001-9670-1933
0009-0005-3147-2179
0000-0001-5103-3097
0000-0002-5982-9118
OpenAccessLink https://dl.acm.org/doi/10.1145/3729335
PageCount 25
ParticipantIDs crossref_primary_10_1145_3729335
acm_primary_3729335
PublicationCentury 2000
PublicationDate 20250610
2025-06-10
PublicationDateYYYYMMDD 2025-06-10
PublicationDate_xml – month: 06
  year: 2025
  text: 20250610
  day: 10
PublicationDecade 2020
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationTitle Proceedings of ACM on programming languages
PublicationTitleAbbrev ACM PACMPL
PublicationYear 2025
Publisher ACM
Publisher_xml – name: ACM
References S. Sharma, R. Ponnusamy, B. Moon, Y.-S. Hwang, R. Das, and J. Saltz. 1994. Run-time and Compile-time Support for Adaptive Irregular Problems. In ACM/IEEE Conference on Supercomputing, SC. Washington, DC, USA. 97–106. https://doi.org/10.1109/SUPERC.1994.344269 10.1109/SUPERC.1994.344269
Uday Bondhugula, Albert Hartono, J. Ramanujan, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Program Optimization System. In 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. Tucson, AZ, USA. 101–113. https://doi.org/10.1145/1375581.1375595 10.1145/1375581.1375595
M.M. Strout, G. George, and C. Olschanowsky. 2012. Set and Relation Manipulation for the Sparse Polyhedral Framework. In 25th International Workshop on Languages and Compilers for Parallel Computing, LCPC. Tokyo, Japan. 61–75. https://doi.org/10.1007/978-3-642-37658-0_5 10.1007/978-3-642-37658-0_5
A. Sukumaran-Rajam and P. Clauss. 2016. The Polyhedral Model of Nonlinear Loops. ACM Trans. Archit. Code Optim., 12, 4 (2016), https://doi.org/10.1145/2838734 10.1145/2838734
J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. 1990. Run-time Scheduling and Execution of Loops on Message Passing Machines. J. Parallel Distrib. Comput., 8, 4 (1990), 303–312. https://doi.org/10.1016/0743-7315(90)90129-D 10.1016/0743-7315(90)90129-D
Zhen Du, Jiajia Li, Yinshan Wang, Xueqi Li, Guangming Tan, and Ninghui Sun. 2022. AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC41404.2022.00071 10.1109/SC41404.2022.00071
L.-N. Pouchet. 2011. PolyBench: The Polyhedral Benchmarking suite, version PolyBench/C 4.2.1. http://polybench.sf.net
Aydin Buluç and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In 2008 IEEE International Symposium on Parallel and Distributed Processing, PDP. Miami, FL, USA. https://doi.org/10.1109/IPDPS.2008.4536313 10.1109/IPDPS.2008.4536313
R.W. Vuduc. 2004. Automatic Performance Tuning of Sparse Matrix Kernels. Ph. D. Dissertation. University of California.
Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In 5th Workshop on Irregular Applications: Architectures and Algorithms, IA³. Austin, TX, USA. https://doi.org/10.1145/2833179.2833183 10.1145/2833179.2833183
Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In 29th ACM on International Conference on Supercomputing, ICS. Newport Beach, CA, USA. 339–350. https://doi.org/10.1145/2751205.2751208 10.1145/2751205.2751208
Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: efficient vectorization of SpMV on x86 processors. In International Symposium on Code Generation and Optimization, CGO. Vienna, Austria. 149–162. https://doi.org/10.1145/3168818 10.1145/3168818
Lucas Wilkinson, Kazem Cheshmi, and Maryam Mehri Dehnavi. 2023. Register Tiling for Unstructured Sparsity in Neural Network Inference. Proceedings of the ACM on Programming Languages, 7, PLDI (2023), 1995–2020. https://doi.org/10.1145/3591302 10.1145/3591302
K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2017. Sympiler: transforming sparse matrix codes by decoupling symbolic analysis. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, USA. https://doi.org/10.1145/3126908.3126936 10.1145/3126908.3126936
Marcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez, and Juan Touriño. 2022. Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations. In International Conference on Parallel Architectures and Compilation Techniques, PACT. Chicago, IL, USA. 160–171. isbn:9781450398688 https://doi.org/10.1145/3559009.3569668 10.1145/3559009.3569668
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), https://doi.org/10.1145/2049662.2049663 10.1145/2049662.2049663
A. LaMielle and M. Strout. 2010. Enabling Code Generation within the Sparse Polyhedral Framework. Colorado State University. https://www.cs.colostate.edu/TechReports/Reports/2010/tr10-102.pdf
Philipp Herholz, Xuan Tang, Teseo Schneider, Shoaib Kamil, Daniele Panozzo, and Olga Sorkine-Hornung. 2022. Sparsity-Specific Code Optimization using Expression Trees. ACM Trans. Graph., 41, 5 (2022), issn:0730-0301 https://doi.org/10.1145/3520484 10.1145/3520484
R. Das, P. Havlak, J. Saltz, and K. Kennedy. 1995. Index Array Flattening Through Program Transformation. In ACM/IEEE Supercomputing Conference, SC. San Diego, CA, USA. https://doi.org/10.1145/224170.224420 10.1145/224170.224420
M. Ravishankar, R. Dathathri, V. Elango, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. 2015. Distributed Memory Code Generation for Mixed Irregular/Regular Computations. In 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. San Francisco, CA, USA. 65–75. https://doi.org/10.1145/2688500.2688515 10.1145/2688500.2688515
Naser Sedaghati, Te Mu, Louis-Noel Pouchet, Srinivasan Parthasarathy, and P. Sadayappan. 2015. Automatic selection of sparse matrix representation on GPUs. In 29th ACM on International Conference on Supercomputing, ICS. Newport Beach, CA, USA. 99–108. https://doi.org/10.1145/2751205.2751244 10.1145/2751205.2751244
A. Venkat, M.S. Mohammadi, J. Park, H. Rong, R. Barik, M.M. Strout, and M. Hall. 2016. Automating Wavefront Parallelization for Sparse Matrix Computations. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Salt Lake City, UT, USA. https://doi.org/10.1109/SC.2016.40 10.1109/SC.2016.40
Sven Verdoolaege. 2010. ISL: An integer set library for the polyhedral model. In 3rd International Congress on Mathematical Software, ICMS. Kobe, Japan. 299–302. https://doi.org/10.1007/978-3-642-15582-6_49 10.1007/978-3-642-15582-6_49
Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization, CGO. Seoul, South Korea. 2–14. https://doi.org/10.1109/CGO51591.2021.9370308 10.1109/CGO51591.2021.9370308
S. Chou, F. Kjolstad, and S. Amarasinghe. 2018. Format abstraction for sparse tensor algebra compilers. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), https://doi.org/10.1145/3276493 10.1145/3276493
G. Rodríguez, M. T. Kandemir, and J. Touriño. 2019. Affine Modeling of Program Traces. IEEE Trans. Comput., 68, 2 (2019), 294–300. https://doi.org/10.1109/TC.2018.2853747 10.1109/TC.2018.2853747
Takeshi Fukaya, Koki Ishida, Akie Miura, Takeshi Iwashita, and Hiroshi Nakashima. 2021. Accelerating the SpMV kernel on standard CPUs by exploiting the partially diagonal structures. arXiv preprint arXiv:2105.04937, https://doi.org/10.48550/arXiv.2105.04937 10.48550/arXiv.2105.04937
Gautam Gupta and Sanjay Rajopadhye. 2007. The Z-Polyhedral Model. In 12th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP. San Jose, CA, USA. 237–248. https://doi.org/10.1145/1229428.1229478 10.1145/1229428.1229478
L.-N. Pouchet, G. Rodriguez, and colleagues. 2025. The UZP sparse format. https://github.com/UDC-GAC/uzp-sparse-format
N. Bell and M. Garland. 2009. Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. In ACM/IEEE Conference on High Performance Computing, SC. Portland, OR, USA. https://doi.org/10.1145/1654059.1654078 10.1145/1654059.1654078
L.-N. Pouchet, G. Rodriguez, and colleagues. 2025. Artifact for PLDI’25 Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs. https://doi.org/10.5281/zenodo.15240673 10.5281/zenodo.15240673
Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. 2019. Generating Piecewise-Regular Code from Irregular Structures. In 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. Phoenix, AZ, USA. 625–639. isbn:9781450367127 https://doi.org/10.1145/3314221.3314615 10.1145/3314221.3314615
K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2018. ParSy: inspection and transformation of sparse matrix computations for parallelism. In International Conference for High Performance Computing, Networking, Storage, and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC.2018.00065 10.1109/SC.2018.00065
S. Williams, L. Oliker, R.W. Vuduc, J. Shalf, K.A. Yelick, and J. Demmel. 2009. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Parallel Comput., 35, 3 (2009), 178–194. https://doi.org/10.1145/1362622.1362674 10.1145/1362622.1362674
Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The sparse polyhedral framework: Composing compiler-generated inspector-executor code. Proc. IEEE, 106, 11 (2018), 1921–1934. https://doi.org/10.1109/JPROC.2018.2857721 10.1109/JPROC.2018.2857721
G. Agrawal, J. Saltz, and R. Das. 1995. Interprocedural Partial Redundancy Elimination and its Application to Distributed Memory Compilation. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. La Jolla, CA, USA. 258–269. https://doi.org/10.1145/223428.207157 10.1145/223428.207157
R. Ponnusamy, J.H. Saltz, and A.N. Choudhary. 1993. Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse. In ACM/IEEE Conference on Supercomputing, SC. Portland, OR, USA. 361–370. https://doi.org/10.1145/169627.169752 10.1145/169627.169752
J.W. Choi, A. Singh, and R.W. Vuduc. 2010. Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs. In 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. Ba
e_1_2_2_4_1
e_1_2_2_24_1
e_1_2_2_49_1
e_1_2_2_6_1
e_1_2_2_22_1
e_1_2_2_20_1
e_1_2_2_2_1
e_1_2_2_41_1
e_1_2_2_43_1
e_1_2_2_8_1
e_1_2_2_28_1
e_1_2_2_26_1
e_1_2_2_47_1
Vuduc R.W. (e_1_2_2_45_1)
e_1_2_2_13_1
e_1_2_2_38_1
e_1_2_2_11_1
e_1_2_2_30_1
e_1_2_2_19_1
e_1_2_2_32_1
e_1_2_2_17_1
e_1_2_2_34_1
e_1_2_2_15_1
e_1_2_2_36_1
e_1_2_2_25_1
e_1_2_2_48_1
e_1_2_2_5_1
e_1_2_2_23_1
e_1_2_2_7_1
e_1_2_2_21_1
e_1_2_2_1_1
e_1_2_2_3_1
e_1_2_2_40_1
e_1_2_2_42_1
e_1_2_2_9_1
e_1_2_2_29_1
e_1_2_2_44_1
e_1_2_2_27_1
e_1_2_2_46_1
e_1_2_2_14_1
e_1_2_2_37_1
e_1_2_2_12_1
e_1_2_2_39_1
e_1_2_2_10_1
e_1_2_2_31_1
e_1_2_2_18_1
e_1_2_2_33_1
e_1_2_2_16_1
e_1_2_2_35_1
e_1_2_2_50_1
References_xml – reference: Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In 29th ACM on International Conference on Supercomputing, ICS. Newport Beach, CA, USA. 339–350. https://doi.org/10.1145/2751205.2751208 10.1145/2751205.2751208
– reference: L.-N. Pouchet, G. Rodriguez, and colleagues. 2025. Artifact for PLDI’25 Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs. https://doi.org/10.5281/zenodo.15240673 10.5281/zenodo.15240673
– reference: S. Chou, F. Kjolstad, and S. Amarasinghe. 2018. Format abstraction for sparse tensor algebra compilers. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), https://doi.org/10.1145/3276493 10.1145/3276493
– reference: S. Williams, L. Oliker, R.W. Vuduc, J. Shalf, K.A. Yelick, and J. Demmel. 2009. Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms. Parallel Comput., 35, 3 (2009), 178–194. https://doi.org/10.1145/1362622.1362674 10.1145/1362622.1362674
– reference: F. Kjolstad, S. Kamil, S. Chou, D. Lugato, and S. Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), https://doi.org/10.1145/3133901 10.1145/3133901
– reference: C. Bastoul. 2004. Code Generation in the Polyhedral Model Is Easier Than You Think. In 13th International Conference on Parallel Architectures and Compilation Techniques, PACT. Antibes, France. 7–16. https://doi.org/10.1109/PACT.2004.1342537 10.1109/PACT.2004.1342537
– reference: Philipp Herholz, Xuan Tang, Teseo Schneider, Shoaib Kamil, Daniele Panozzo, and Olga Sorkine-Hornung. 2022. Sparsity-Specific Code Optimization using Expression Trees. ACM Trans. Graph., 41, 5 (2022), issn:0730-0301 https://doi.org/10.1145/3520484 10.1145/3520484
– reference: Kazem Cheshmi, Michelle Strout, and Maryam Mehri Dehnavi. 2023. Runtime Composition of Iterations for Fusing Loop-carried Sparse Dependence. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, USA. isbn:9798400701092 https://doi.org/10.1145/3581784.3607097 10.1145/3581784.3607097
– reference: M. Ravishankar, R. Dathathri, V. Elango, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. 2015. Distributed Memory Code Generation for Mixed Irregular/Regular Computations. In 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. San Francisco, CA, USA. 65–75. https://doi.org/10.1145/2688500.2688515 10.1145/2688500.2688515
– reference: Jeremiah Willcock and Andrew Lumsdaine. 2006. Accelerating sparse matrix computations via data compression. In 20th Annual International Conference on Supercomputing, ICS. Cairns, QLD, Australia. 307–316. isbn:1595932828 https://doi.org/10.1145/1183401.1183444 10.1145/1183401.1183444
– reference: R. Ponnusamy, J.H. Saltz, and A.N. Choudhary. 1993. Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse. In ACM/IEEE Conference on Supercomputing, SC. Portland, OR, USA. 361–370. https://doi.org/10.1145/169627.169752 10.1145/169627.169752
– reference: Sven Verdoolaege. 2010. ISL: An integer set library for the polyhedral model. In 3rd International Congress on Mathematical Software, ICMS. Kobe, Japan. 299–302. https://doi.org/10.1007/978-3-642-15582-6_49 10.1007/978-3-642-15582-6_49
– reference: G. Rodríguez, J. M. Andión, M. T. Kandemir, and J. Touriño. 2016. Trace-based Affine Reconstruction of Codes. In 14th International Symposium on Code Generation and Optimization, CGO. Barcelona, Spain. 139–149. https://doi.org/10.1145/2854038.2854056 10.1145/2854038.2854056
– reference: Kazem Cheshmi. 2023. Partially Strided Codelet GitHub repository. https://github.com/sparse-specialize/partially-strided-codelet Commit: c03d0593411c8afc9c6861de152695c453358a04
– reference: Aydin Buluç and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In 2008 IEEE International Symposium on Parallel and Distributed Processing, PDP. Miami, FL, USA. https://doi.org/10.1109/IPDPS.2008.4536313 10.1109/IPDPS.2008.4536313
– reference: R. von Hanxleden, K. Kennedy, C. Koelbel, R. Das, and J. Saltz. 1992. Compiler analysis for irregular problems in Fortran D. In 6th International Workshop on Languages and Compilers for Parallel Computing, LCPC. New Haven, CT, USA. 97–111. https://doi.org/10.1007/3-540-57502-2_42 10.1007/3-540-57502-2_42
– reference: K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2018. ParSy: inspection and transformation of sparse matrix computations for parallelism. In International Conference for High Performance Computing, Networking, Storage, and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC.2018.00065 10.1109/SC.2018.00065
– reference: Naser Sedaghati, Te Mu, Louis-Noel Pouchet, Srinivasan Parthasarathy, and P. Sadayappan. 2015. Automatic selection of sparse matrix representation on GPUs. In 29th ACM on International Conference on Supercomputing, ICS. Newport Beach, CA, USA. 99–108. https://doi.org/10.1145/2751205.2751244 10.1145/2751205.2751244
– reference: Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization, CGO. Seoul, South Korea. 2–14. https://doi.org/10.1109/CGO51591.2021.9370308 10.1109/CGO51591.2021.9370308
– reference: Uday Bondhugula, Albert Hartono, J. Ramanujan, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Program Optimization System. In 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. Tucson, AZ, USA. 101–113. https://doi.org/10.1145/1375581.1375595 10.1145/1375581.1375595
– reference: Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Prasant Singh Rawat, Sriram Krishnamoorthy, and P. Sadayappan. 2019. An efficient mixed-mode representation of sparse tensors. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, USA. https://doi.org/10.1145/3295500.3356216 10.1145/3295500.3356216
– reference: L.-N. Pouchet. 2011. PolyBench: The Polyhedral Benchmarking suite, version PolyBench/C 4.2.1. http://polybench.sf.net
– reference: A. LaMielle and M. Strout. 2010. Enabling Code Generation within the Sparse Polyhedral Framework. Colorado State University. https://www.cs.colostate.edu/TechReports/Reports/2010/tr10-102.pdf
– reference: Rich Vuduc, James W Demmel, Katherine A Yelick, Shoaib Kamil, Rajesh Nishtala, and Benjamin Lee. 2002. Performance optimizations and bounds for sparse matrix-vector multiply. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, SC. Baltimore, MD, USA. https://doi.org/10.1109/SC.2002.10025 10.1109/SC.2002.10025
– reference: S. Sharma, R. Ponnusamy, B. Moon, Y.-S. Hwang, R. Das, and J. Saltz. 1994. Run-time and Compile-time Support for Adaptive Irregular Problems. In ACM/IEEE Conference on Supercomputing, SC. Washington, DC, USA. 97–106. https://doi.org/10.1109/SUPERC.1994.344269 10.1109/SUPERC.1994.344269
– reference: Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The sparse polyhedral framework: Composing compiler-generated inspector-executor code. Proc. IEEE, 106, 11 (2018), 1921–1934. https://doi.org/10.1109/JPROC.2018.2857721 10.1109/JPROC.2018.2857721
– reference: J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. 1990. Run-time Scheduling and Execution of Loops on Message Passing Machines. J. Parallel Distrib. Comput., 8, 4 (1990), 303–312. https://doi.org/10.1016/0743-7315(90)90129-D 10.1016/0743-7315(90)90129-D
– reference: K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. 2017. Sympiler: transforming sparse matrix codes by decoupling symbolic analysis. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, USA. https://doi.org/10.1145/3126908.3126936 10.1145/3126908.3126936
– reference: R. Das, P. Havlak, J. Saltz, and K. Kennedy. 1995. Index Array Flattening Through Program Transformation. In ACM/IEEE Supercomputing Conference, SC. San Diego, CA, USA. https://doi.org/10.1145/224170.224420 10.1145/224170.224420
– reference: Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: efficient vectorization of SpMV on x86 processors. In International Symposium on Code Generation and Optimization, CGO. Vienna, Austria. 149–162. https://doi.org/10.1145/3168818 10.1145/3168818
– reference: Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive sparse tiling for sparse matrix multiplication. In 24th Symposium on Principles and Practice of Parallel Programming, PPoPP. Washington, DC, USA. 300–314. https://doi.org/10.1145/3293883.3295712 10.1145/3293883.3295712
– reference: Aydin Buluç, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, and Charles E. Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In 21st Annual Symposium on Parallelism in Algorithms and Architectures, SPAA. Calgary, AB, Canada. 233–244. https://doi.org/10.1145/1583991.1584053 10.1145/1583991.1584053
– reference: G. Rodríguez, M. T. Kandemir, and J. Touriño. 2019. Affine Modeling of Program Traces. IEEE Trans. Comput., 68, 2 (2019), 294–300. https://doi.org/10.1109/TC.2018.2853747 10.1109/TC.2018.2853747
– reference: A. Venkat, M.S. Mohammadi, J. Park, H. Rong, R. Barik, M.M. Strout, and M. Hall. 2016. Automating Wavefront Parallelization for Sparse Matrix Computations. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Salt Lake City, UT, USA. https://doi.org/10.1109/SC.2016.40 10.1109/SC.2016.40
– reference: G. Agrawal, J. Saltz, and R. Das. 1995. Interprocedural Partial Redundancy Elimination and its Application to Distributed Memory Compilation. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. La Jolla, CA, USA. 258–269. https://doi.org/10.1145/223428.207157 10.1145/223428.207157
– reference: Gautam Gupta and Sanjay Rajopadhye. 2007. The Z-Polyhedral Model. In 12th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP. San Jose, CA, USA. 237–248. https://doi.org/10.1145/1229428.1229478 10.1145/1229428.1229478
– reference: Takeshi Fukaya, Koki Ishida, Akie Miura, Takeshi Iwashita, and Hiroshi Nakashima. 2021. Accelerating the SpMV kernel on standard CPUs by exploiting the partially diagonal structures. arXiv preprint arXiv:2105.04937, https://doi.org/10.48550/arXiv.2105.04937 10.48550/arXiv.2105.04937
– reference: L.-N. Pouchet, G. Rodriguez, and colleagues. 2025. The UZP sparse format. https://github.com/UDC-GAC/uzp-sparse-format
– reference: M.M. Strout, G. George, and C. Olschanowsky. 2012. Set and Relation Manipulation for the Sparse Polyhedral Framework. In 25th International Workshop on Languages and Compilers for Parallel Computing, LCPC. Tokyo, Japan. 61–75. https://doi.org/10.1007/978-3-642-37658-0_5 10.1007/978-3-642-37658-0_5
– reference: Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. 2019. Generating Piecewise-Regular Code from Irregular Structures. In 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. Phoenix, AZ, USA. 625–639. isbn:9781450367127 https://doi.org/10.1145/3314221.3314615 10.1145/3314221.3314615
– reference: Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), https://doi.org/10.1145/2049662.2049663 10.1145/2049662.2049663
– reference: R.W. Vuduc. 2004. Automatic Performance Tuning of Sparse Matrix Kernels. Ph. D. Dissertation. University of California.
– reference: Lucas Wilkinson, Kazem Cheshmi, and Maryam Mehri Dehnavi. 2023. Register Tiling for Unstructured Sparsity in Neural Network Inference. Proceedings of the ACM on Programming Languages, 7, PLDI (2023), 1995–2020. https://doi.org/10.1145/3591302 10.1145/3591302
– reference: N. Bell and M. Garland. 2009. Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. In ACM/IEEE Conference on High Performance Computing, SC. Portland, OR, USA. https://doi.org/10.1145/1654059.1654078 10.1145/1654059.1654078
– reference: Kazem Cheshmi, Zachary Cetinic, and Maryam Mehri Dehnavi. 2022. Vectorizing sparse matrix computations with partially-strided codelets. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC41404.2022.00037 10.1109/SC41404.2022.00037
– reference: Zhen Du, Jiajia Li, Yinshan Wang, Xueqi Li, Guangming Tan, and Ninghui Sun. 2022. AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Dallas, TX, USA. https://doi.org/10.1109/SC41404.2022.00071 10.1109/SC41404.2022.00071
– reference: Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In 5th Workshop on Irregular Applications: Architectures and Algorithms, IA³. Austin, TX, USA. https://doi.org/10.1145/2833179.2833183 10.1145/2833179.2833183
– reference: A. Sukumaran-Rajam and P. Clauss. 2016. The Polyhedral Model of Nonlinear Loops. ACM Trans. Archit. Code Optim., 12, 4 (2016), https://doi.org/10.1145/2838734 10.1145/2838734
– reference: J.W. Choi, A. Singh, and R.W. Vuduc. 2010. Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs. In 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP. Bangalore, India. 115–126. https://doi.org/10.1145/1837853.1693471 10.1145/1837853.1693471
– reference: Marcos Horro, Louis-Noël Pouchet, Gabriel Rodríguez, and Juan Touriño. 2022. Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations. In International Conference on Parallel Architectures and Compilation Techniques, PACT. Chicago, IL, USA. 160–171. isbn:9781450398688 https://doi.org/10.1145/3559009.3569668 10.1145/3559009.3569668
– ident: e_1_2_2_6_1
  doi: 10.1145/1583991.1584053
– ident: e_1_2_2_38_1
  doi: 10.1145/2833179.2833183
– ident: e_1_2_2_37_1
  doi: 10.1109/SUPERC.1994.344269
– ident: e_1_2_2_16_1
  doi: 10.1145/2049662.2049663
– ident: e_1_2_2_41_1
  doi: 10.1145/2838734
– ident: e_1_2_2_17_1
  doi: 10.1109/SC41404.2022.00071
– ident: e_1_2_2_31_1
– ident: e_1_2_2_22_1
  doi: 10.1145/3559009.3569668
– ident: e_1_2_2_11_1
  doi: 10.1109/SC.2018.00065
– ident: e_1_2_2_26_1
  doi: 10.1145/2751205.2751209
– ident: e_1_2_2_29_1
– ident: e_1_2_2_15_1
  doi: 10.1145/224170.224420
– ident: e_1_2_2_39_1
  doi: 10.1007/978-3-642-37658-0_5
– ident: e_1_2_2_40_1
  doi: 10.1109/JPROC.2018.2857721
– ident: e_1_2_2_36_1
  doi: 10.1145/2751205.2751244
– ident: e_1_2_2_14_1
  doi: 10.1145/3276493
– ident: e_1_2_2_50_1
  doi: 10.1145/3168818
– ident: e_1_2_2_20_1
  doi: 10.1145/3520484
– ident: e_1_2_2_30_1
  doi: 10.5281/zenodo.15240673
– ident: e_1_2_2_48_1
  doi: 10.1145/1183401.1183444
– ident: e_1_2_2_46_1
  doi: 10.1109/SC.2002.10025
– ident: e_1_2_2_43_1
  doi: 10.1007/978-3-642-15582-6_49
– ident: e_1_2_2_4_1
  doi: 10.1145/1654059.1654078
– ident: e_1_2_2_42_1
  doi: 10.1109/SC.2016.40
– ident: e_1_2_2_10_1
  doi: 10.1145/3126908.3126936
– ident: e_1_2_2_35_1
  doi: 10.1016/0743-7315(90)90129-D
– ident: e_1_2_2_28_1
  doi: 10.1145/169627.169752
– ident: e_1_2_2_5_1
  doi: 10.1145/1375581.1375595
– ident: e_1_2_2_34_1
  doi: 10.1109/TC.2018.2853747
– ident: e_1_2_2_2_1
  doi: 10.1145/3314221.3314615
– ident: e_1_2_2_44_1
  doi: 10.1007/3-540-57502-2_42
– ident: e_1_2_2_12_1
  doi: 10.1145/3581784.3607097
– ident: e_1_2_2_25_1
  doi: 10.1109/CGO51591.2021.9370308
– ident: e_1_2_2_18_1
  doi: 10.48550/arXiv.2105.04937
– ident: e_1_2_2_13_1
  doi: 10.1145/1837853.1693471
– ident: e_1_2_2_33_1
  doi: 10.1145/2854038.2854056
– ident: e_1_2_2_1_1
  doi: 10.1145/223428.207157
– ident: e_1_2_2_7_1
  doi: 10.1109/IPDPS.2008.4536313
– ident: e_1_2_2_23_1
  doi: 10.1145/3133901
– ident: e_1_2_2_9_1
  doi: 10.1109/SC41404.2022.00037
– ident: e_1_2_2_21_1
  doi: 10.1145/3293883.3295712
– ident: e_1_2_2_49_1
  doi: 10.1145/1362622.1362674
– ident: e_1_2_2_8_1
– ident: e_1_2_2_27_1
  doi: 10.1145/3295500.3356216
– ident: e_1_2_2_19_1
  doi: 10.1145/1229428.1229478
– volume-title: Automatic Performance Tuning of Sparse Matrix Kernels. Ph. D. Dissertation
  ident: e_1_2_2_45_1
– ident: e_1_2_2_24_1
– ident: e_1_2_2_3_1
  doi: 10.1109/PACT.2004.1342537
– ident: e_1_2_2_32_1
  doi: 10.1145/2688500.2688515
– ident: e_1_2_2_47_1
  doi: 10.1145/3591302
SSID ssj0001934839
Score 2.2942047
Snippet Sparse data structures are ubiquitous in modern computing, and numerous formats have been designed to represent them. These formats may exploit specific...
SourceID crossref
acm
SourceType Index Database
Publisher
StartPage 2106
SubjectTerms General and reference
Theory of computation
Vector / streaming algorithms
SubjectTermsDisplay General and reference -- Performance
Theory of computation -- Vector / streaming algorithms
Title Modular Construction and Optimization of the UZP Sparse Format for SpMV on CPUs
URI https://dl.acm.org/doi/10.1145/3729335
Volume 9
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2475-1421
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001934839
  issn: 2475-1421
  databaseCode: M~E
  dateStart: 20170101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Bb9MwFLbK4MAFxgAxYJMP3KrAkthJfCzdpk2iXaS1aOIyObEzKtqkIus0ceBv8Hd5z3ZawzjAgUsUxU5a-X16fn7ve-8R8kaHrCrDqoBjqi4DxrDkbRqLIOM6DVUaxWklTbOJdDzOLi5E3uv96HJhbuZpXWe3t2L5X0UNz0DYmDr7D-JefxQewD0IHa4gdrj-leBHjTLUUmzF2RWHNSGCM9AOC5d22VEDpp_y_vkSDre6f2ysV0M7PF-OPmIUYZhPW996zde7nSGADIYjnOUoXgt0OnTuzw2DvlE2GH94tdLfgtOruW5nNolsMId_2Gzc3JhQbJ3U9XXTfv6C5O_-5O1dBsgR-mTWOr1ZAe6sf6FZzdpg3Jjfez_3HRoRR-KVo7ZaCDqSvFGCEUt5EDKbRd1pbOEBM_9wePqLAj5IvM08Cm3U5-5GwbCmBsYsY1su5beq227kHrkfpVygxh9995x3ImZgWdokbPzWOzcfbZxy4dk4nrEy2SaP3CmDDiw6npCernfI466DB3UK_Sk5c2ChPlgogIX6YKFNRQEsFMBCLVioBQsFsFAEC4VJCJZnZHp8NBmeBK7FRiDh4M4DVrAq1kpVB0qXGnR9xKOMy1gJFScykwwM2AT0OoxVXCRcS14pbKoqCpZEZRk_J1t1U-sXhIY6kwrbQ2ZZwsqESRGKQjMN8wrJY7lLdmBpLpe2iMqlW7BdQrulWg_ZXHneTXn5xxdfkYcb8LwmW7BGeo88KG-uZ-3XfSOunxjWaf0
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Modular+Construction+and+Optimization+of+the+UZP+Sparse+Format+for+SpMV+on+CPUs&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Rodr%C3%ADguez-Iglesias%2C+Alonso&rft.au=Tongli%2C+Santoshkumar+T.&rft.au=Tucker%2C+Emily&rft.au=Pouchet%2C+Louis-No%C3%ABl&rft.date=2025-06-10&rft.pub=ACM&rft.eissn=2475-1421&rft.volume=9&rft.issue=PLDI&rft.spage=2106&rft.epage=2130&rft_id=info:doi/10.1145%2F3729335&rft.externalDocID=3729335
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon