FireAxe: Partitioned FPGA-Accelerated Simulation of Large-Scale RTL Designs

Pre-silicon validation and end-to-end system evaluation are integral parts of hardware development as they provide architects with insights about the complex interactions between various hardware components, system software, and application code. Although this process can be accelerated using FPGAs...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) s. 501 - 515
Hlavní autoři: Whangbo, Joonho, Lim, Edwin, Zhang, Chengyi Lux, Anderson, Kevin, Gonzalez, Abraham, Gupta, Raghav, Krishnakumar, Nivedha, Karandikar, Sagar, Nikolic, Borivoje, Shao, Yakun Sophia, Asanovic, Krste
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 29.06.2024
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Pre-silicon validation and end-to-end system evaluation are integral parts of hardware development as they provide architects with insights about the complex interactions between various hardware components, system software, and application code. Although this process can be accelerated using FPGAs as a simulation host, existing platforms fall short when the resource requirements of a custom hardware design exceed a single FPGA. We present FireAxe, an open-source FPGA-accelerated RTL simulation platform that supports push-button user-guided partitioning across multiple FPGAs, using a compiler called FireRipper. Given a partition point, FireRipper automatically maps a monolithic RTL design onto multiple FPGAs while providing hardware designers quick feedback about the partition interface and expected simulation performance. Furthermore, FireRipper enables users to choose between an exact-mode which provides cycle-exact results with RTL-level fidelity, or a fast-mode that improves simulation rate while sacrificing fidelity only at the partition boundary. Built on FireSim, FireAxe preserves the ability to elastically scale simulations from on-premises FPGAs to cloud FPGAs. For example, pulling out a core from a systemon-chip (SoC) onto a separate FPGA, we achieve simulation rates of 1.6 MHz using on-premises FPGAs connected by direct-attach cables and 1 MHz on AWS F1 FPGAs using peer-to-peer PCIe. To show FireAxe's ability to enable pre-silicon performance validation at unprecedented scale, we show several case studies. First, we replicate full-stack system-level effects such as latency spikes from garbage collection in a Golang application on an SoC containing 4 out-of-order (OoO) cores. We also boot Linux on, to our knowledge, the largest OoO core ever cycle-exactly simulated in academia. Lastly, we simulate a system-on-chip containing 24 OoO cores mapped onto five datacenter-class FPGAs. We discover an RTL bug when trying to run Linux user-space applications that did not appear with less substantial software stacks. This was discovered in less than 2 hours using FireAxe and would have taken weeks in a commercial software RTL simulator.
DOI:10.1109/ISCA59077.2024.00044