Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM.
Saved in:
| Title: | Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM. |
|---|---|
| Authors: | ALAEJOS, GUILLERMO, CASTELLÓ, ADRIÁN, ALONSO-JORDÁ, PEDRO, IGUAL, FRANCISCO D., MARTÍNEZ, HÉCTOR, QUINTANA-ORTÍ, ENRIQUE S. |
| Source: | ACM Transactions on Mathematical Software; Mar2024, Vol. 50 Issue 1, p1-34, 34p |
| Subject Terms: | MATRIX multiplications, LINEAR algebra, FOOTPRINTS, MAINTAINABILITY (Engineering) |
| Abstract: | We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS, and OpenBLAS, to obtain high-performance blocked formulations of the general matrix multiplication (gemm). In addition, we fully automatize the generation process by also leveraging the Apache TVM framework to derive a complete variety of the processor-specific micro-kernels for gemm. This is in contrast with the convention in high-performance libraries, which hand-encode a single micro-kernel per architecture using Assembly code. In global, the combination of our TVM-generated blocked algorithms and micro-kernels for gemm (1) improves portability, maintainability, and, globally, streamlines the software life cycle; (2) provides high flexibility to easily tailor and optimize the solution to different data types, processor architectures, and matrix operand shapes, yielding performance on a par (or even superior for specific matrix shapes) with that of hand-tuned libraries; and (3) features a small memory footprint. [ABSTRACT FROM AUTHOR] |
| Copyright of ACM Transactions on Mathematical Software is the property of Association for Computing Machinery and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Complementary Index |
Be the first to leave a comment!
Full Text Finder
Nájsť tento článok vo Web of Science