ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing

The next generation HPC and data centers are likely to be reconfigurable and data-centric due to the trend of hardware specialization and the emergence of data-driven applications. In this article, we propose ARENA - an asynchronous reconfigurable accelerator ring architecture as a potential scenari...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on parallel and distributed systems Ročník 32; číslo 12; s. 2880 - 2892
Hlavní autoři: Tan, Cheng, Xie, Chenhao, Geng, Tong, Marquez, Andres, Tumeo, Antonino, Barker, Kevin, Li, Ang
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.12.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1045-9219, 1558-2183
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The next generation HPC and data centers are likely to be reconfigurable and data-centric due to the trend of hardware specialization and the emergence of data-driven applications. In this article, we propose ARENA - an asynchronous reconfigurable accelerator ring architecture as a potential scenario on how the future HPC and data centers will be like. Despite using the coarse-grained reconfigurable arrays (CGRAs) as the substrate platform, our key contribution is not only the CGRA-cluster design itself, but also the ensemble of a new architecture and programming model that enables asynchronous tasking across a cluster of reconfigurable nodes, so as to bring specialized computation to the data rather than the reverse. We presume distributed data storage without asserting any prior knowledge on the data distribution. Hardware specialization occurs at runtime when a task finds the majority of data it requires are available at the present node. In other words, we dynamically generate specialized CGRA accelerators where the data reside. The asynchronous tasking for bringing computation to data is achieved by circulating the task token, which describes the dataflow graphs to be executed for a task, among the CGRA cluster connected by a fast ring network. Evaluations on a set of HPC and data-driven applications across different domains show that ARENA can provide better parallel scalability with reduced data movement (53.9 percent). Compared with contemporary compute-centric parallel models, ARENA can bring on average 4.37× speedup. The synthesized CGRAs and their task-dispatchers only occupy 2.93mm<inline-formula><tex-math notation="LaTeX">^2</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="tan-ieq1-3081074.gif"/> </inline-formula> chip area under 45nm process technology and can run at 800MHz with on average 759.8mW power consumption. ARENA also supports the concurrent execution of multi-applications, offering ideal architectural support for future high-performance parallel computing and data analytics systems.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
AC05-76RL01830; 66150
PNNL-SA-152862
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2021.3081074