Utility-Based Cache Partitioning A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demand basis, giving more cache resources to the application that has a high demand and fewer cache resource...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06) s. 423 - 432
Hlavní autoři: Qureshi, Moinuddin K., Patt, Yale N.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: Washington, DC, USA IEEE Computer Society 09.12.2006
IEEE
Edice:ACM Conferences
Témata:
ISBN:0769527329, 9780769527321
ISSN:1072-4451
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demand basis, giving more cache resources to the application that has a high demand and fewer cache resources to the application that has a low demand. However, a higher demand for cache resources does not always correlate with a higher performance from additional cache resources. It is beneficial for performance to invest cache resources in the application that benefits more from the cache resources rather than in the application that has more demand for the cache resources. This paper proposes utility-based cache partitioning (UCP), a low-overhead, runtime mechanism that partitions a shared cache between multiple applications depending on the reduction in cache misses that each application is likely to obtain for a given amount of cache resources. The proposed mechanism monitors each application at runtime using a novel, cost-effective, hardware circuit that requires less than 2kB of storage. The information collected by the monitoring circuits is used by a partitioning algorithm to decide the amount of cache resources allocated to each application. Our evaluation, with 20 multiprogrammed workloads, shows that UCP improves performance of a dual-core system by up to 23% and on average 11% over LRU-based cache partitioning.
Bibliografie:SourceType-Conference Papers & Proceedings-1
ObjectType-Conference Paper-1
content type line 25
ISBN:0769527329
9780769527321
ISSN:1072-4451
DOI:10.1109/MICRO.2006.49