Evaluating RISC-V Vector Instruction Set Architecture Extension with Computer Vision Workloads

Computer vision (CV) algorithms have been extensively used for a myriad of applications nowadays. As the multimedia data are generally well-formatted and regular, it is beneficial to leverage the massive parallel processing power of the underlying platform to improve the performances of CV algorithm...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of computer science and technology Ročník 38; číslo 4; s. 807 - 820
Hlavní autoři: Li, Ruo-Shi, Peng, Ping, Shao, Zhi-Yuan, Jin, Hai, Zheng, Ran
Médium: Journal Article
Jazyk:angličtina
Vydáno: Singapore Springer Nature Singapore 01.07.2023
Springer
Springer Nature B.V
Témata:
ISSN:1000-9000, 1860-4749
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Computer vision (CV) algorithms have been extensively used for a myriad of applications nowadays. As the multimedia data are generally well-formatted and regular, it is beneficial to leverage the massive parallel processing power of the underlying platform to improve the performances of CV algorithms. Single Instruction Multiple Data (SIMD) instructions, capable of conducting the same operation on multiple data items in a single instruction, are extensively employed to improve the efficiency of CV algorithms. In this paper, we evaluate the power and effectiveness of RISC-V vector extension (RV-V) on typical CV algorithms, such as Gray Scale, Mean Filter, and Edge Detection. By our examinations, we show that compared with the baseline OpenCV implementation using scalar instructions, the equivalent implementations using the RV-V (version 0.8) can reduce the instruction count of the same CV algorithm up to 24x, when processing the same input images. Whereas, the actual performances improvement measured by the cycle counts is highly related with the specific implementation of the underlying RV-V co-processor. In our evaluation, by using the vector co-processor (with eight execution lanes) of Xuantie C906, vector-version CV algorithms averagely exhibit up to 2.98x performances speedups compared with their scalar counterparts.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1000-9000
1860-4749
DOI:10.1007/s11390-023-1266-6