Probabilistic and temporal failure detectors for solving distributed problems

Failure detectors (FD)s are celebrated for their modularity in solving distributed problems. Algorithms are constructed using FD building blocks. Synchrony assumptions to implement FDs are studied separately and are typically expressed as eventual guarantees that need to hold, after some point in ti...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of parallel and distributed computing Ročník 158; s. 1 - 15
Hlavní autoři:	Guerraoui, Rachid, Kozhaya, David, Pignolet, Yvonne-Anne
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Inc 01.12.2021
Témata:	Consensus Failure detectors Message loss Modular algorithms Probabilistic links Probabilistic links Failure detectors Modular algorithms Message loss Consensus
ISSN:	0743-7315, 1096-0848
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Failure detectors (FD)s are celebrated for their modularity in solving distributed problems. Algorithms are constructed using FD building blocks. Synchrony assumptions to implement FDs are studied separately and are typically expressed as eventual guarantees that need to hold, after some point in time, forever and deterministically. But in practice, they may hold only probabilistically and temporarily. This paper studies FDs in a realistic system N, where asynchrony is inflicted by probabilistic synchronous communication. We first address a problem with ⋄S, the weakest FD to solve consensus: an implementation of “consensus with probability 1” is possible in N without randomness in the algorithm, while an implementation of “⋄S with probability 1” is impossible in N. We introduce ⋄S⁎, a new FD with probabilistic and temporal accuracy. We prove that ⋄S⁎ (i) is implementable in N and (ii) can replace ⋄S, in several existing deterministic consensus algorithms that use ⋄S, to yield an algorithm that solves “consensus with probability 1”. We extend our results to other FD classes, e.g., ⋄P, and to a larger set of problems (beyond consensus), which we call decisive problems. •We propose a way to preserve the usefulness of failure detectors (FD)s as software building blocks in probabilistically synchronous systems as (N).•We define <>S, a probabilistic FD with accuracy ensured for arbitrarily long finite periods and that can be implemented in systems as N.•We present an optimal <>S algorithm, which achieves in the best case the lowest communication overhead (C-1) compared to all known <>S algorithms.•We extend our FD definitions for other FD classes and other distributed computing problems besides consensus, which we call decisive problems.•We encapsulate the randomization of probabilistic links in the very abstraction of FD, without affecting deterministic algorithms built on top.
ISSN:	0743-7315 1096-0848
DOI:	10.1016/j.jpdc.2021.07.017