Key Operator Vectorization for LeNet and ResNet Based on Buddy Compiler.

Uložené v:
Podrobná bibliografia
Názov: Key Operator Vectorization for LeNet and ResNet Based on Buddy Compiler.
Autori: Chen, Juncheng, Chen, Weiwei, Cai, Zhi
Zdroj: Applied Sciences (2076-3417); Sep2025, Vol. 15 Issue 17, p9523, 17p
Predmety: DEEP learning, COMPILERS (Computer programs), PREDICTION models, MATHEMATICAL optimization
Abstrakt: Deep learning has emerged as a prominent focus in both academia and industry, with a wide range of models being applied across diverse domains. Fast and efficient model inference is essential for the practical deployment of deep learning models. Under specific hardware constraints, accelerating inference remains a key research challenge. Common techniques for model acceleration include quantization, pruning, and vectorization. Although quantization and pruning primarily reduce model precision or complexity to enhance efficiency, this paper concentrates on vectorization, a technique that accelerates models by increasing the parallelism of operator execution. Based on the open-source Buddy-MLIR project, this work implements vectorization optimizations for Matmul, Conv2d, and Max Pooling operations to improve inference performance. These optimizations are designed as compiler passes and integrated into the Buddy-MLIR framework, offering a general solution for vectorizing such operators. Two optimization approaches are proposed: general vectorization and adaptive vectorization. Compared to the standard MLIR lowering pipeline and the fully optimized LLVM backend, the proposed general and adaptive vectorization methods reduce the inference latency of LeNet-5 by 26.7 % and 37.3 % , respectively. For the more complex ResNet-18 model, these methods achieve latency reductions of 79.9 % and 82.6 % , respectively. [ABSTRACT FROM AUTHOR]
Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáza: Complementary Index
Popis
Abstrakt:Deep learning has emerged as a prominent focus in both academia and industry, with a wide range of models being applied across diverse domains. Fast and efficient model inference is essential for the practical deployment of deep learning models. Under specific hardware constraints, accelerating inference remains a key research challenge. Common techniques for model acceleration include quantization, pruning, and vectorization. Although quantization and pruning primarily reduce model precision or complexity to enhance efficiency, this paper concentrates on vectorization, a technique that accelerates models by increasing the parallelism of operator execution. Based on the open-source Buddy-MLIR project, this work implements vectorization optimizations for Matmul, Conv2d, and Max Pooling operations to improve inference performance. These optimizations are designed as compiler passes and integrated into the Buddy-MLIR framework, offering a general solution for vectorizing such operators. Two optimization approaches are proposed: general vectorization and adaptive vectorization. Compared to the standard MLIR lowering pipeline and the fully optimized LLVM backend, the proposed general and adaptive vectorization methods reduce the inference latency of LeNet-5 by 26.7 % and 37.3 % , respectively. For the more complex ResNet-18 model, these methods achieve latency reductions of 79.9 % and 82.6 % , respectively. [ABSTRACT FROM AUTHOR]
ISSN:20763417
DOI:10.3390/app15179523