A Flexible Heterogeneous Multi-Core Architecture

Multi-core processors naturally exploit thread-level parallelism (TLP). However, extracting instruction-level parallelism (ILP) from individual applications or threads is still a challenge as application mixes in this environment are nonuniform. Thus, multi-core processors should be flexible enough...

Full description

Saved in:

Bibliographic Details
Published in:	16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007) pp. 13 - 24
Main Authors:	Pericas, M., Cristal, A., Cazorla, F.J., Gonzalez, R., Jimenez, D.A., Valero, M.
Format:	Conference Proceeding Publication
Language:	English
Published:	IEEE 01.09.2007 Institute of Electrical and Electronics Engineers (IEEE)
Subjects:	Application software Arquitectura de computadors Computer architecture Concurrent computing Delay Engines Informàtica Microarchitecture Multi-threading Multicore processing Multiprocessing systems Multiprocessors Parallel architectures Parallel processing Parallel processing (Electronic computers) Processament en paral·lel (Ordinadors) Reconfigurable architectures Simultaneous multithreading processors Throughput Yarn Àrees temàtiques de la UPC
ISBN:	0769529445, 9780769529448
ISSN:	1089-795X
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Multi-core processors naturally exploit thread-level parallelism (TLP). However, extracting instruction-level parallelism (ILP) from individual applications or threads is still a challenge as application mixes in this environment are nonuniform. Thus, multi-core processors should be flexible enough to provide high throughput for uniform parallel applications as well as high performance for more general workloads. Heterogeneous architectures are a first step in this direction, but partitioning remains static and only roughly fits application requirements. This paper proposes the Flexible Heterogeneous Mul-tiCore processor (FMC), the first dynamic heterogeneous multi-core architecture capable of reconfiguring itself to fit application requirements without programmer intervention. The basic building block of this microarchitecture is a scalable, variable-size window microarchitecture that exploits the concept of Execution Locality to provide large-window capabilities. This allows to overcome the memory wall for applications with high memory-level parallelism (MLP). The microarchitecture contains a set of small and fast cache processors that execute high locality code and a network of small in-order memory engines that together exploit low locality code. Single-threaded applications can use the entire network of cores while multi-threaded applications can efficiently share the resources. The sizing of critical structures remains small enough to handle current power envelopes. In single-threaded mode this processor is able to outperform previous state-of-the-art high-performance processor research by 12% on SpecFP. We show how in a quad- threaded/quad-core environment the processor outperforms a statically allocated configuration in both throughput and harmonic mean, two commonly used metrics to evaluate SMTperformance, by around 2-4%. This is achieved while using a very simple sharing algorithm.
Bibliography:	SourceType-Conference Papers & Proceedings-1 ObjectType-Conference Paper-1 content type line 25
ISBN:	0769529445 9780769529448
ISSN:	1089-795X
DOI:	10.1109/PACT.2007.4336196