A New Formulation of Neural Data Prefetching

Temporal data prefetchers have the potential to produce significant performance gains by prefetching irregular data streams. Recent work has introduced a neural model for temporal prefetching that outperforms practical table-based temporal prefetchers, but the large storage and latency costs, along...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) s. 1173 - 1187
Hlavní autori: Duong, Quang, Jain, Akanksha, Lin, Calvin
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 29.06.2024
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Temporal data prefetchers have the potential to produce significant performance gains by prefetching irregular data streams. Recent work has introduced a neural model for temporal prefetching that outperforms practical table-based temporal prefetchers, but the large storage and latency costs, along with the inability to generalize to memory addresses outside of the training dataset, prevent such a neural network from seeing any practical use in hardware. In this paper, we reformulate the temporal prefetching prediction problem so that neural solutions to it are more amenable for hardware deployment. Our key insight is that while temporal prefetchers typically assume that each address can be followed by any possible successor, there are empirically only a few successors for each address. Utilizing this insight, we introduce a new abstraction of memory addresses, and we show how this abstraction enables the design of a much more efficient neural prefetcher. Our new prefetcher, Twilight, improves upon the previous state-of-the-art neural prefetcher, Voyager, in multiple dimensions: It reduces latency by 988 \times, shrinks storage by 10.8 \times, achieves 4% more speedup on a mix of irregular SPEC 2006, SPEC 2017, and GAP benchmarks, and is capable of predicting new temporal correlations not present in the training data. Twilight outperforms idealized versions of the non-neural temporal prefetchers STMS by 12.2% and Domino by 8.5%. While Twilight is still not practical, T-LITE, a slimmed-down version of Twilight that can prefetch across different program runs, further reduces latency and storage (1421 \times faster and 142 \times smaller than Voyager), matches Voyager's performance and outperforms the practical non-neural Triage prefetcher by 5.9%.
DOI:10.1109/ISCA59077.2024.00088