The Dataflow Abstract Machine Simulator Framework
The growing interest in novel dataflow architectures and streaming execution paradigms has created the need for a simulator optimized for modeling dataflow systems. To fill this need, we present three new techniques that make it feasible to simulate complex systems consisting of thousands of compone...
Saved in:
| Published in: | 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) pp. 532 - 547 |
|---|---|
| Main Authors: | , , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
29.06.2024
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The growing interest in novel dataflow architectures and streaming execution paradigms has created the need for a simulator optimized for modeling dataflow systems. To fill this need, we present three new techniques that make it feasible to simulate complex systems consisting of thousands of components. First, we introduce an interface based on Communicating Sequential Processes which allows users to simultaneously describe functional and timing characteristics. Second, we introduce a scalable point-to-point synchronization scheme that avoids global synchronization. Finally, we demonstrate a technique to exploit slack in the simulated system, such as FIFOs, to increase simulation parallelism. We implement these techniques in the Dataflow Machine (DAM), a parallel simulator framework for dataflow systems. We demonstrate the benefits of using DAM by highlighting three case studies using the framework. First, we use DAM directly as an exploration tool for streaming algorithms on dataflow hardware. We simulate two different implementations of the attention algorithm used in large language models, and use DAM to show that the second implementation only requires a constant amount of local memory. Second, we re-implement a simulator for a sparse tensor algebra accelerator, resulting in 57 \% less code and a simulation speedup of up to four orders of magnitude. Finally, we demonstrate a general technique for time-multiplexing real hardware to simulate multiple virtual copies of the hardware using DAM. |
|---|---|
| DOI: | 10.1109/ISCA59077.2024.00046 |