Coarse-Grained Task Parallelization by Dynamic Profiling for Heterogeneous SoC-Based Embedded System.

Uloženo v:
Podrobná bibliografie
Název: Coarse-Grained Task Parallelization by Dynamic Profiling for Heterogeneous SoC-Based Embedded System.
Autoři: Chang, Liangliang, Gener, Serhan, Mack, Joshua, Suluhan, Hasan Umut, Akoglu, Ali, Chakrabarti, Chaitali
Zdroj: ACM Transactions on Embedded Computing Systems; Jan2025, Vol. 24 Issue 1, p1-32, 32p
Témata: TELECOMMUNICATION systems, PARALLEL processing, HARDWARE, SCHEDULING
Abstrakt: In this study, we introduce a methodology for automatically transforming user applications written in C/C++ to a parallel representation consisting of coarse-grained tasks based on dynamic profiling. Such a parallel representation is suitable for mapping applications onto heterogeneous SoCs. We present our approach for instrumenting the user application binary during the compilation process with parallel primitives that enable the runtime system to schedule and execute independent computation-intensive coarse-grained tasks concurrently. We use the proposed compilation and code transformation methodology to retarget each application for execution on a heterogeneous SoC composed of processor cores and accelerators. We demonstrate the capabilities of our integrated compile time and runtime flow through task-level parallelization and functionally correct execution of real-world applications in the communication systems and radar processing domains. We demonstrate the functionality of our integrated system by executing six distinct applications with different degrees of parallelism on four different platforms: an eight-core general-purpose processor, a heterogeneous SoC simulator, and two heterogeneous SoCs utilizing the Xilinx Zynq UltraScale+ FPGA and the Nvidia Jetson AGX board. Our integrated approach offers a path forward for application developers to take full advantage of the target SoC without requiring users to become hardware or parallel programming experts. [ABSTRACT FROM AUTHOR]
Copyright of ACM Transactions on Embedded Computing Systems is the property of Association for Computing Machinery and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze: Complementary Index
Popis
Abstrakt:In this study, we introduce a methodology for automatically transforming user applications written in C/C++ to a parallel representation consisting of coarse-grained tasks based on dynamic profiling. Such a parallel representation is suitable for mapping applications onto heterogeneous SoCs. We present our approach for instrumenting the user application binary during the compilation process with parallel primitives that enable the runtime system to schedule and execute independent computation-intensive coarse-grained tasks concurrently. We use the proposed compilation and code transformation methodology to retarget each application for execution on a heterogeneous SoC composed of processor cores and accelerators. We demonstrate the capabilities of our integrated compile time and runtime flow through task-level parallelization and functionally correct execution of real-world applications in the communication systems and radar processing domains. We demonstrate the functionality of our integrated system by executing six distinct applications with different degrees of parallelism on four different platforms: an eight-core general-purpose processor, a heterogeneous SoC simulator, and two heterogeneous SoCs utilizing the Xilinx Zynq UltraScale+ FPGA and the Nvidia Jetson AGX board. Our integrated approach offers a path forward for application developers to take full advantage of the target SoC without requiring users to become hardware or parallel programming experts. [ABSTRACT FROM AUTHOR]
ISSN:15399087
DOI:10.1145/3704635