Scheduling Weakly Consistent C Concurrency for Reconfigurable Hardware

Lock-free algorithms, in which threads synchronise not via coarse-grained mutual exclusion but via fine-grained atomic operations (`atomics'), have been shown empirically to be the fastest class of multi-threaded algorithms in the realm of conventional processors. This article explores how thes...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE transactions on computers Ročník 67; číslo 7; s. 992 - 1006
Hlavní autori:	Ramanathan, Nadesh, Wickerson, John, Constantinides, George A.
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	New York IEEE 01.07.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	Algorithms atomic operations Concurrency FPGA Hardware High level synthesis HLS Instruction sets Iterative methods lock-free algorithms Optimization Pipeline processing Programming Reconfigurable hardware Scheduling Semantics
ISSN:	0018-9340, 1557-9956
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Lock-free algorithms, in which threads synchronise not via coarse-grained mutual exclusion but via fine-grained atomic operations (`atomics'), have been shown empirically to be the fastest class of multi-threaded algorithms in the realm of conventional processors. This article explores how these algorithms can be compiled from C to reconfigurable hardware via high-level synthesis(HLS). We focus on the scheduling problem, in which software instructions are assigned to hardware clock cycles. We first show that typical HLS scheduling constraints are insufficient to implement atomics, because they permit some instruction reorderings that, though sound in a single-threaded context, demonstrably cause erroneous results when synthesising multi-threaded programs. We then show that correct behaviour can be restored by imposing additional intra-thread constraints among the memory operations. In addition, we show that we can support the pipelining of loops containing atomics by injecting further inter-iteration constraints. We implement our approach on two constraint-based scheduling HLS tools: LegUp 4.0 and LegUp 5.1. We extend both tools to support two memory models that are capable of synthesising atomics correctly. The first memory model only supports sequentially consistent (SC) atomics and the second supports weakly consistent (`weak') atomics as defined by the 2011 revision of the C standard. Weak atomics necessitate fewer constraints than SC atomics, but suffice for many multi-threaded algorithms. We confirm, via automatic model-checking, that we correctly implement the semantics in accordance with the C standard. A case study on a circular buffer suggests that on average circuits synthesised from programs that schedule atomics correctly can be 6× faster than an existing lock-based implementation of atomics, that weak atomics can yield a further 1.3× speedup, and that pipelining can yield a further 1.3× speedup.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9340 1557-9956
DOI:	10.1109/TC.2017.2786249