A Flexible Heterogeneous Multi-Core Architecture
Multi-core processors naturally exploit thread-level parallelism (TLP). However, extracting instruction-level parallelism (ILP) from individual applications or threads is still a challenge as application mixes in this environment are nonuniform. Thus, multi-core processors should be flexible enough...
Uloženo v:
| Vydáno v: | 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007) s. 13 - 24 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek Publikace |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
01.09.2007
Institute of Electrical and Electronics Engineers (IEEE) |
| Témata: | |
| ISBN: | 0769529445, 9780769529448 |
| ISSN: | 1089-795X |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Multi-core processors naturally exploit thread-level parallelism (TLP). However, extracting instruction-level parallelism (ILP) from individual applications or threads is still a challenge as application mixes in this environment are nonuniform. Thus, multi-core processors should be flexible enough to provide high throughput for uniform parallel applications as well as high performance for more general workloads. Heterogeneous architectures are a first step in this direction, but partitioning remains static and only roughly fits application requirements. This paper proposes the Flexible Heterogeneous Mul-tiCore processor (FMC), the first dynamic heterogeneous multi-core architecture capable of reconfiguring itself to fit application requirements without programmer intervention. The basic building block of this microarchitecture is a scalable, variable-size window microarchitecture that exploits the concept of Execution Locality to provide large-window capabilities. This allows to overcome the memory wall for applications with high memory-level parallelism (MLP). The microarchitecture contains a set of small and fast cache processors that execute high locality code and a network of small in-order memory engines that together exploit low locality code. Single-threaded applications can use the entire network of cores while multi-threaded applications can efficiently share the resources. The sizing of critical structures remains small enough to handle current power envelopes. In single-threaded mode this processor is able to outperform previous state-of-the-art high-performance processor research by 12% on SpecFP. We show how in a quad- threaded/quad-core environment the processor outperforms a statically allocated configuration in both throughput and harmonic mean, two commonly used metrics to evaluate SMTperformance, by around 2-4%. This is achieved while using a very simple sharing algorithm. |
|---|---|
| Bibliografie: | SourceType-Conference Papers & Proceedings-1 ObjectType-Conference Paper-1 content type line 25 |
| ISBN: | 0769529445 9780769529448 |
| ISSN: | 1089-795X |
| DOI: | 10.1109/PACT.2007.4336196 |

