Scheduling Weakly Consistent C Concurrency for Reconfigurable Hardware
Lock-free algorithms, in which threads synchronise not via coarse-grained mutual exclusion but via fine-grained atomic operations (`atomics'), have been shown empirically to be the fastest class of multi-threaded algorithms in the realm of conventional processors. This article explores how thes...
Uložené v:
| Vydané v: | IEEE transactions on computers Ročník 67; číslo 7; s. 992 - 1006 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York
IEEE
01.07.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Predmet: | |
| ISSN: | 0018-9340, 1557-9956 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Lock-free algorithms, in which threads synchronise not via coarse-grained mutual exclusion but via fine-grained atomic operations (`atomics'), have been shown empirically to be the fastest class of multi-threaded algorithms in the realm of conventional processors. This article explores how these algorithms can be compiled from C to reconfigurable hardware via high-level synthesis(HLS). We focus on the scheduling problem, in which software instructions are assigned to hardware clock cycles. We first show that typical HLS scheduling constraints are insufficient to implement atomics, because they permit some instruction reorderings that, though sound in a single-threaded context, demonstrably cause erroneous results when synthesising multi-threaded programs. We then show that correct behaviour can be restored by imposing additional intra-thread constraints among the memory operations. In addition, we show that we can support the pipelining of loops containing atomics by injecting further inter-iteration constraints. We implement our approach on two constraint-based scheduling HLS tools: LegUp 4.0 and LegUp 5.1. We extend both tools to support two memory models that are capable of synthesising atomics correctly. The first memory model only supports sequentially consistent (SC) atomics and the second supports weakly consistent (`weak') atomics as defined by the 2011 revision of the C standard. Weak atomics necessitate fewer constraints than SC atomics, but suffice for many multi-threaded algorithms. We confirm, via automatic model-checking, that we correctly implement the semantics in accordance with the C standard. A case study on a circular buffer suggests that on average circuits synthesised from programs that schedule atomics correctly can be 6× faster than an existing lock-based implementation of atomics, that weak atomics can yield a further 1.3× speedup, and that pipelining can yield a further 1.3× speedup. |
|---|---|
| Bibliografia: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0018-9340 1557-9956 |
| DOI: | 10.1109/TC.2017.2786249 |