Apple-CORE: Microgrids of SVP Cores -- Flexible, General-Purpose, Fine-Grained Hardware Concurrency Management

To harness the potential of CMPs for scalable, energy-efficient performance in general-purpose computers, the Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency management across multiple cores. Its SVP interfa...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2012 15th Euromicro Conference on Digital System Design s. 501 - 508
Hlavní autoři: Poss, R., Lankamp, M., Qiang Yang, Jian Fu, van Tol, M. W., Jesshope, C.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.09.2012
Témata:
ISBN:1467324981, 9781467324984
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:To harness the potential of CMPs for scalable, energy-efficient performance in general-purpose computers, the Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency management across multiple cores. Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general-purpose workloads. The corresponding hardware implementation provides logic able to coordinate single-issue, in-order multi-threaded RISC cores into computation clusters on chip, called Microgrids. In contrast with the traditional "accelerator" approach, Microgrids are intended to be used as components in distributed systems on chip that consider both clusters of small cores and optional larger cores optimized towards sequential performance as system services shared between applications. The key aspects of the design are asynchrony, i.e. the ability to tolerate operations with irregular long latencies, a scale-invariant programming model, a distributed vision of the chip's structure, and the transparent performance scaling of a single program binary code across multiple cluster sizes. This paper describes the execution model, the core micro-architecture, its realization in a many-core, general-purpose processor chip and its software environment. The reference chip parameters include 128 cores, a 4 MB on-chip distributed cache network and four DDR3-1600 memory channels. This paper presents cycle-accurate simulation results for various key algorithmic and cryptographic kernels. The results show good efficiency in terms of the utilization of hardware despite the high-latency memory accesses and good scalability across relatively large clusters of cores.
ISBN:1467324981
9781467324984
DOI:10.1109/DSD.2012.25