Nicaea: A Byzantine Fault Tolerant Consensus Under Unpredictable Message Delivery Failures for Parallel and Distributed Computing

Byzantine fault-tolerant (BFT) consensus is a critical problem in parallel and distributed computing systems, particularly with potential adversaries. Most prior work on BFT consensus assumes reliable message delivery and tolerates arbitrary failures of up to <inline-formula><tex-math notat...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on computers Ročník 74; číslo 3; s. 915 - 928
Hlavní autoři: Jing, Guanlin, Zou, Yifei, Xu, Minghui, Zhang, Yanqiang, Yu, Dongxiao, Shan, Zhiguang, Cheng, Xiuzhen, Ranjan, Rajiv
Médium: Journal Article
Jazyk:angličtina
Vydáno: IEEE 01.03.2025
Témata:
ISSN:0018-9340, 1557-9956
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Byzantine fault-tolerant (BFT) consensus is a critical problem in parallel and distributed computing systems, particularly with potential adversaries. Most prior work on BFT consensus assumes reliable message delivery and tolerates arbitrary failures of up to <inline-formula><tex-math notation="LaTeX">\frac{n}{3}</tex-math> <mml:math><mml:mfrac><mml:mi>n</mml:mi><mml:mn>3</mml:mn></mml:mfrac></mml:math><inline-graphic xlink:href="jing-ieq1-3506856.gif"/> </inline-formula> nodes out of <inline-formula><tex-math notation="LaTeX">n</tex-math> <mml:math><mml:mi>n</mml:mi></mml:math><inline-graphic xlink:href="jing-ieq2-3506856.gif"/> </inline-formula> total nodes. However, many systems face unpredictable message delivery failures. This paper investigates the impact of unpredictable message delivery failures on the BFT consensus problem. We propose Nicaea, a novel protocol enabling consensus among loyal nodes when the number of Byzantine nodes is below a new threshold, given by: <inline-formula><tex-math notation="LaTeX">\frac{\left(2-\rho\right)\left(1-\rho\right)^{2n-2}-1}{\left(2-\rho\right) \left(1-\rho\right)^{2n-2}+1}n</tex-math> <mml:math><mml:mfrac><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mn>2</mml:mn><mml:mo>−</mml:mo><mml:mi>ρ</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>ρ</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mi>n</mml:mi><mml:mo>−</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mn>2</mml:mn><mml:mo>−</mml:mo><mml:mi>ρ</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mi>ρ</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mi>n</mml:mi><mml:mo>−</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac><mml:mi>n</mml:mi></mml:math><inline-graphic xlink:href="jing-ieq3-3506856.gif"/> </inline-formula>, where <inline-formula><tex-math notation="LaTeX">\rho</tex-math> <mml:math><mml:mi>ρ</mml:mi></mml:math><inline-graphic xlink:href="jing-ieq4-3506856.gif"/> </inline-formula> denotes the message failure rate. Theoretical proofs and experimental results validate Nicaea's Byzantine resilience. Our findings reveal a fundamental trade-off: as message delivery instability increases, a system's tolerance to Byzantine failures decreases. The well-known <inline-formula><tex-math notation="LaTeX">\frac{n}{3}</tex-math> <mml:math><mml:mfrac><mml:mi>n</mml:mi><mml:mn>3</mml:mn></mml:mfrac></mml:math><inline-graphic xlink:href="jing-ieq5-3506856.gif"/> </inline-formula> threshold under reliable message delivery is a special case of our generalized threshold when <inline-formula><tex-math notation="LaTeX">\rho=0</tex-math> <mml:math><mml:mi>ρ</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math><inline-graphic xlink:href="jing-ieq6-3506856.gif"/> </inline-formula>. To the best of our knowledge, this work presents the first quantitative characterization of unpredictable message delivery failures' impact on Byzantine fault tolerance in parallel and distributed computing.
ISSN:0018-9340
1557-9956
DOI:10.1109/TC.2024.3506856