IMP: Indirect memory prefetcher

Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) s. 178 - 190
Hlavní autoři:	Xiangyao Yu, Hughes, Christopher J., Satish, Nadathur, Devadas, Srinivas
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	ACM 01.12.2015
Témata:	Arrays Bandwidth Hardware Indexes Multicore processing Prefetching Sparse matrices
ISSN:	2379-3155
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and large bandwidth requirements. A traditional streaming or striding prefetcher cannot capture these irregular access patterns. A majority of these irregular accesses come from indirect patterns of the form A[B[j]]. We propose an efficient hardware indirect memory prefetcher (IMP) to capture this access pattern and hide latency. We also propose a partial cacheline accessing mechanism for these prefetches to reduce the network and DRAM bandwidth pressure from the lack of spatial locality. Evaluated on 7 applications, IMP shows 56% speedup on average (up to 2.3×) compared to a baseline 64 core system with streaming prefetchers. This is within 23% of an idealized system. With partial cacheline accessing, we see another 9.4% speedup on average (up to 46.6%).
ISSN:	2379-3155
DOI:	10.1145/2830772.2830807