FlexStep: Enabling Flexible Error Detection in Multi/Many-core Real-time Systems
Saved in:
| Title: | FlexStep: Enabling Flexible Error Detection in Multi/Many-core Real-time Systems |
|---|---|
| Authors: | Wang, Tinglue, Li, Yiming, Tang, Wei, Guan, Jiapeng, Guo, Zhenghui, Jiang, Renshuang, Wei, Ran, Li, Jing, Jiang, Zhe |
| Source: | 2025 62nd ACM/IEEE Design Automation Conference (DAC). :1-7 |
| Publication Status: | Preprint |
| Publisher Information: | IEEE, 2025. |
| Publication Year: | 2025 |
| Subject Terms: | FOS: Computer and information sciences, Computer Science - Distributed, Parallel, and Cluster Computing, Hardware Architecture (cs.AR), Distributed, Parallel, and Cluster Computing (cs.DC), Computer Science - Hardware Architecture |
| Description: | Reliability and real-time responsiveness in safety-critical systems have traditionally been achieved using error detection mechanisms, such as LockStep, which require pre-configured checker cores,strict synchronisation between main and checker cores, static error detection regions, or limited preemption capabilities. However, these core-bound hardware mechanisms often lead to significant resource over-provisioning, and diminished real-time responsiveness, particularly in modern systems where tasks with varying reliability requirements are consolidated on shared processors to improve efficiency, reduce costs, and save power. To address these challenges, this work presents FlexStep, a systematic solution that integrates hardware and software across the SoC, ISA, and OS scheduling layers. FlexStep features a novel microarchitecture that supports dynamic core configuration and asynchronous, preemptive error detection. The FlexStep architecture naturally allows for flexible task scheduling and error detection, enabling new scheduling algorithms that enhance both resource efficiency and real-time schedulability. We publicly release FlexStep's source code, at https://anonymous.4open.science/r/FlexStep-DAC25-7B0C. |
| Document Type: | Article |
| DOI: | 10.1109/dac63849.2025.11132561 |
| DOI: | 10.48550/arxiv.2503.13848 |
| Access URL: | http://arxiv.org/abs/2503.13848 |
| Rights: | STM Policy #29 arXiv Non-Exclusive Distribution |
| Accession Number: | edsair.doi.dedup.....7fa79fc8c53b4dde414d917365864329 |
| Database: | OpenAIRE |
| Abstract: | Reliability and real-time responsiveness in safety-critical systems have traditionally been achieved using error detection mechanisms, such as LockStep, which require pre-configured checker cores,strict synchronisation between main and checker cores, static error detection regions, or limited preemption capabilities. However, these core-bound hardware mechanisms often lead to significant resource over-provisioning, and diminished real-time responsiveness, particularly in modern systems where tasks with varying reliability requirements are consolidated on shared processors to improve efficiency, reduce costs, and save power. To address these challenges, this work presents FlexStep, a systematic solution that integrates hardware and software across the SoC, ISA, and OS scheduling layers. FlexStep features a novel microarchitecture that supports dynamic core configuration and asynchronous, preemptive error detection. The FlexStep architecture naturally allows for flexible task scheduling and error detection, enabling new scheduling algorithms that enhance both resource efficiency and real-time schedulability. We publicly release FlexStep's source code, at https://anonymous.4open.science/r/FlexStep-DAC25-7B0C. |
|---|---|
| DOI: | 10.1109/dac63849.2025.11132561 |
Nájsť tento článok vo Web of Science