Performance Characterization of Popular DNN Models on Out-of-Order CPUs

DNN popularity, which is driving advances in a growing number of fields, has increased the amount of computing resources running this kind of applications at an unprecedent rate. Specialized hardware, such as GPUs or ASIC-based accelerators, has been the preferred platform to run these applications....

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) s. 199 - 210
Hlavní autoři: Prieto, Pablo, Abad, Pablo, Gregorio, Jose Angel, Puente, Valentin
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 21.10.2023
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:DNN popularity, which is driving advances in a growing number of fields, has increased the amount of computing resources running this kind of applications at an unprecedent rate. Specialized hardware, such as GPUs or ASIC-based accelerators, has been the preferred platform to run these applications. However, the ubiquity of DNN models is rapidly extending the presence of this software to general-purpose CPUs. For this reason, there is a pressing need to gain understanding of the main features of state-of-the-art DNN models to adapt CPU microarchitecture accordingly. In this paper we investigated a representative set of DNN models and, based on data collected from real hardware, we evaluated how efficiently they utilize the underlying system. We analyzed overall system performance, as well as the amount of vectorization provided by CPU-optimized frameworks. We quantified the performance loss caused by processor backend, and the contribution of memory hierarchy and functional units to it. We compared the backend utilization of DNN applications to popular benchmarks such as SPEC CPU2017 and found a lower balance in the use of the elements that make up the processor microarchitecture. Although many workloads seem to be constrained by functional unit availability, in a significant group of applications we found a non-negligible impact of memory hierarchy on performance.
DOI:10.1109/PACT58117.2023.00025