Design and Optimization of LLVM Compiler for Domestic High Performance Accelerator

National University of Defense Technology independently developed a high-performance accelerator that uses an on-chip heterogeneous fusion architecture of a Central Processing Unit(CPU) and General Purpose Digital Signal Processor(GPDSP). The GPDSP,with its Very Long Instruction Word(VLIW)+ Single I...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Ji suan ji gong cheng Ročník 50; číslo 4; s. 321 - 331
Hlavní autor: SONG Qiang, TANG Junlong, CHEN Zhaoyun, SHI Yang, TAN Qixuan, XIAO Ziyang, ZOU Wanghui
Médium: Journal Article
Jazyk:čínština
angličtina
Vydáno: Editorial Office of Computer Engineering 15.04.2024
Témata:
ISSN:1000-3428
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:National University of Defense Technology independently developed a high-performance accelerator that uses an on-chip heterogeneous fusion architecture of a Central Processing Unit(CPU) and General Purpose Digital Signal Processor(GPDSP). The GPDSP,with its Very Long Instruction Word(VLIW)+ Single Instruction Multiple Datastream(SIMD) vectorization structure,is the main support for the peak performance acceleration core. However,mainstream compilers cannot adequately support high-performance accelerators in intensive data calculation instruction layouts,static allocation of hardware execution units for instructions,and GPDSP-specific vector instructions. In this study,based on the Low Level Virtual Machine(LLVM) compilation framework,the PERP method,Ant Colony Optimization(ACO) algorithm,and GPDSP structural characteristics are combined to optimize the cost model in the pre-RA-sched stage,and the instruction scheduling module is designed to support register pressure awareness. This study proposes an instruction scheduling strategy that supports static functional unit allocation in the post-RA-sched stage,which guarantees correct functional unit allocation through a conflict detection mechanism,and provides a software basis for the parallel execution of instructions. Furthermore,a series of rich and regular vector instruction interfaces are encapsulated in the backend to support the GPDSP vector instructions. The experimental results demonstrate that the LLVM compilation architecture optimization method proposed in this study provides good support for the GPDSP in terms of both functionality and performance. Specifically,the overall performance average speedup ratio of GCC testsuite is 4.539,the overall performance average speedup ratio of SPEC CPU 2017 floating-point test is 4.49,and the overall performance average speedup ratio of SPEC CPU 2017 integer test is 3.24. Additionally,the vector program using vector interfaces achieves an average performance improvement ratio of 97.1%.
ISSN:1000-3428
DOI:10.19678/j.issn.1000-3428.0067000