ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing
The next generation HPC and data centers are likely to be reconfigurable and data-centric due to the trend of hardware specialization and the emergence of data-driven applications. In this article, we propose ARENA - an asynchronous reconfigurable accelerator ring architecture as a potential scenari...
Uloženo v:
| Vydáno v: | IEEE transactions on parallel and distributed systems Ročník 32; číslo 12; s. 2880 - 2892 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
01.12.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 1045-9219, 1558-2183 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | The next generation HPC and data centers are likely to be reconfigurable and data-centric due to the trend of hardware specialization and the emergence of data-driven applications. In this article, we propose ARENA - an asynchronous reconfigurable accelerator ring architecture as a potential scenario on how the future HPC and data centers will be like. Despite using the coarse-grained reconfigurable arrays (CGRAs) as the substrate platform, our key contribution is not only the CGRA-cluster design itself, but also the ensemble of a new architecture and programming model that enables asynchronous tasking across a cluster of reconfigurable nodes, so as to bring specialized computation to the data rather than the reverse. We presume distributed data storage without asserting any prior knowledge on the data distribution. Hardware specialization occurs at runtime when a task finds the majority of data it requires are available at the present node. In other words, we dynamically generate specialized CGRA accelerators where the data reside. The asynchronous tasking for bringing computation to data is achieved by circulating the task token, which describes the dataflow graphs to be executed for a task, among the CGRA cluster connected by a fast ring network. Evaluations on a set of HPC and data-driven applications across different domains show that ARENA can provide better parallel scalability with reduced data movement (53.9 percent). Compared with contemporary compute-centric parallel models, ARENA can bring on average 4.37× speedup. The synthesized CGRAs and their task-dispatchers only occupy 2.93mm<inline-formula><tex-math notation="LaTeX">^2</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="tan-ieq1-3081074.gif"/> </inline-formula> chip area under 45nm process technology and can run at 800MHz with on average 759.8mW power consumption. ARENA also supports the concurrent execution of multi-applications, offering ideal architectural support for future high-performance parallel computing and data analytics systems. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) AC05-76RL01830; 66150 PNNL-SA-152862 |
| ISSN: | 1045-9219 1558-2183 |
| DOI: | 10.1109/TPDS.2021.3081074 |